All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Pages
Public Member Functions | List of all members
hext::Rule Class Reference

Extracts values from HTML. More...

Public Member Functions

 Rule (HtmlTag tag=HtmlTag::ANY, bool optional=false) noexcept
 Constructs a Rule. More...
 
 ~Rule () noexcept=default
 
 Rule (Rule &&) noexcept=default
 
 Rule (const Rule &other)
 
Ruleoperator= (Rule &&) noexcept=default
 
Ruleoperator= (const Rule &other)
 
const Rulechild () const noexcept
 Returns the first child or nullptr if childless. More...
 
const Rulenext () const noexcept
 Returns the next rule or nullptr if no following rule. More...
 
Rulechild () noexcept
 Returns the first child or nullptr if childless. More...
 
Rulenext () noexcept
 Returns the next rule or nullptr if no following rule. More...
 
Ruleappend_child (Rule new_child)
 Appends a child. More...
 
Ruleappend_next (Rule sibling)
 Appends a following Rule. More...
 
Ruleappend_match (std::unique_ptr< Match > match)
 Appends a Match. More...
 
template<typename MatchType , typename... Args>
Ruleappend_match (Args &&...arg)
 Emplaces a Match. More...
 
Ruleappend_capture (std::unique_ptr< Capture > cap)
 Appends a Capture. More...
 
template<typename CaptureType , typename... Args>
Ruleappend_capture (Args &&...arg)
 Emplaces a Capture. More...
 
HtmlTag get_tag () const noexcept
 Returns the HtmlTag this rule matches. More...
 
Ruleset_tag (HtmlTag tag) noexcept
 Sets the HtmlTag this rule matches. More...
 
bool is_optional () const noexcept
 Returns true if this rule is optional, i.e. if a match has to be found. More...
 
Ruleset_optional (bool optional) noexcept
 Sets whether this rule is optional, i.e. More...
 
hext::Result extract (const Html &html) const
 Recursively extracts values from an hext::HTML. More...
 
hext::Result extract (const GumboNode *node) const
 Recursively extracts values from a GumboNode. More...
 
bool matches (const GumboNode *node) const
 Returns true if this Rule matches node. More...
 
std::vector< ResultPaircapture (const GumboNode *node) const
 Returns the result of applying every Capture to node. More...
 

Detailed Description

Extracts values from HTML.

A Rule defines how to match and capture HTML nodes. It can be applied to a GumboNode tree, where it recursively tries to find matches.

Example:
// create a rule that matches anchor elements, ..
Rule anchor(HtmlTag::A);
// .. which must have an attribute called "href"
anchor.append_match<AttributeMatch>("href")
// capture attribute href and save it as "link"
.append_capture<AttributeCapture>("href", "link");
{
// create a rule that matches image elements
// capture attribute src and save it as "img"
img.append_capture<AttributeCapture>("src", "img");
// append the image-rule to the anchor-rule
anchor.append_child(std::move(img));
}
// anchor is now equivalent to the following hext:
// <a href:link><img src:img/></a>
Html html(
"<div><a href='/bob'> <img src='bob.jpg'/> </a></div>"
"<div><a href='/alice'><img src='alice.jpg'/></a></div>"
"<div><a href='/carol'><img src='carol.jpg'/></a></div>");
hext::Result result = anchor.extract(html);
// result will be equivalent to this:
// vector{
// map{
// {"link", "/bob"}
// {"img", "bob.jpg"}
// },
// map{
// {"link", "/alice"}
// {"img", "alice.jpg"}
// },
// map{
// {"link", "/carol"}
// {"img", "carol.jpg"}
// },
// }

Definition at line 85 of file Rule.h.

Constructor & Destructor Documentation

hext::Rule::Rule ( HtmlTag  tag = HtmlTag::ANY,
bool  optional = false 
)
explicitnoexcept

Constructs a Rule.

Parameters
tagThe HtmlTag that this rule matches. Default: Match any tag.
optionalA subtree matches only if all mandatory rules were matched. Optional rules on the other hand are ignored if not found. Default: Rule is mandatory.
hext::Rule::~Rule ( )
defaultnoexcept
hext::Rule::Rule ( Rule &&  )
defaultnoexcept
hext::Rule::Rule ( const Rule other)

Member Function Documentation

Rule& hext::Rule::append_capture ( std::unique_ptr< Capture cap)

Appends a Capture.

Parameters
capThe Capture to append.
Returns
A reference for this Rule to enable method chaining.
template<typename CaptureType , typename... Args>
Rule& hext::Rule::append_capture ( Args &&...  arg)
inline

Emplaces a Capture.

Forwards arguments to std::make_unique.

Returns
A reference for this Rule to enable method chaining.

Definition at line 157 of file Rule.h.

Rule& hext::Rule::append_child ( Rule  new_child)

Appends a child.

Parameters
new_childThe Rule to append.
Returns
A reference for this Rule to enable method chaining.
Rule& hext::Rule::append_match ( std::unique_ptr< Match match)

Appends a Match.

Parameters
matchThe Match to append.
Returns
A reference for this Rule to enable method chaining.
template<typename MatchType , typename... Args>
Rule& hext::Rule::append_match ( Args &&...  arg)
inline

Emplaces a Match.

Forwards arguments to std::make_unique.

Returns
A reference for this Rule to enable method chaining.

Definition at line 140 of file Rule.h.

Rule& hext::Rule::append_next ( Rule  sibling)

Appends a following Rule.

Parameters
siblingThe Rule to append.
Returns
A reference for this Rule to enable method chaining.
std::vector<ResultPair> hext::Rule::capture ( const GumboNode *  node) const

Returns the result of applying every Capture to node.

Parameters
nodeA GumboNode that is to be captured.
const Rule* hext::Rule::child ( ) const
noexcept

Returns the first child or nullptr if childless.

Rule* hext::Rule::child ( )
noexcept

Returns the first child or nullptr if childless.

hext::Result hext::Rule::extract ( const Html html) const

Recursively extracts values from an hext::HTML.

Returns
A vector containing maps filled with the captured name value pairs.
hext::Result hext::Rule::extract ( const GumboNode *  node) const

Recursively extracts values from a GumboNode.

Returns
A vector containing maps filled with the captured name value pairs.
HtmlTag hext::Rule::get_tag ( ) const
noexcept

Returns the HtmlTag this rule matches.

bool hext::Rule::is_optional ( ) const
noexcept

Returns true if this rule is optional, i.e. if a match has to be found.

bool hext::Rule::matches ( const GumboNode *  node) const

Returns true if this Rule matches node.

Parameters
nodeA GumboNode that is to be matched.
const Rule* hext::Rule::next ( ) const
noexcept

Returns the next rule or nullptr if no following rule.

Rule* hext::Rule::next ( )
noexcept

Returns the next rule or nullptr if no following rule.

Rule& hext::Rule::operator= ( Rule &&  )
defaultnoexcept
Rule& hext::Rule::operator= ( const Rule other)
Rule& hext::Rule::set_optional ( bool  optional)
noexcept

Sets whether this rule is optional, i.e.

if a match has to be found.

Returns
A reference for this Rule to enable method chaining.
Rule& hext::Rule::set_tag ( HtmlTag  tag)
noexcept

Sets the HtmlTag this rule matches.

Returns
A reference for this Rule to enable method chaining.

The documentation for this class was generated from the following file: