<a href:link @text:title />
<a href="one.html"> Page 1</a>
<a href="two.html"> Page 2</a>
<a href="three.html">Page 3</a>
{"link": "one.html", "title": "Page 1"}
{"link": "two.html", "title": "Page 2"}
{"link": "three.html", "title": "Page 3"}
Hext is a domain-specific language for extracting structured data from
HTML documents.
Learn how to hext in the
documentation.
Also, there is an editor below, where you can try Hext from the comfort
of your browser.
htmlext is a command line utility that applies Hext templates
to HTML documents and outputs JSON.
For example, to extract all links:
$ htmlext -s "<a href:x />" -i <(curl "example.com")
Hext is released under the terms of the
Apache License and therefore suitable for inclusion in both open
and closed source software. The project is publicly available on
Github —
Contributions are welcome!