libhext: C++ Library Documentation
1.0.12-3ea013c
|
Classes | |
class | AppendPipe |
Appends a given string to a string. More... | |
class | AttributeCapture |
Captures an HTML Element's attribute. More... | |
class | AttributeCountMatch |
Matches HTML elements that have a certain amount of HTML attributes. More... | |
class | AttributeMatch |
Matches HTML elements having an HTML attribute with a certain name and, optionally, whose value is matched by a ValueTest. More... | |
class | BeginsWithTest |
Tests whether a string begins with a given literal. More... | |
class | Capture |
Abstract base for every Capture. More... | |
class | CasePipe |
Changes the case of a string. Changes to lower case by default. More... | |
class | ChildCountMatch |
Matches HTML elements that have a certain amount of children of type element (excluding text nodes, document nodes and others). More... | |
class | Cloneable |
Curiously recurring template pattern that extends a base class to provide a virtual method Cloneable::clone(). More... | |
class | CollapseWsPipe |
Removes whitespace from beginning and end and collapses multiple whitespace to a single space. More... | |
class | ContainsTest |
Tests whether a string contains a given literal. More... | |
class | ContainsWordsTest |
Tests whether a string contains all given words. More... | |
class | EndsWithTest |
Tests whether a string ends with a given literal. More... | |
class | EqualsTest |
Tests whether a string equals a given literal. More... | |
class | FunctionCapture |
Captures the result of applying a function to an HTML node. More... | |
class | FunctionMatch |
Matches if the result of applying a given MatchFunction to an HTML node returns true. More... | |
class | FunctionValueMatch |
Matches if the result of applying a given CaptureFunction to an HTML node passes a ValueTest. More... | |
class | Html |
A RAII wrapper for Gumbo. More... | |
class | Match |
Abstract base for every Match. More... | |
class | MaxSearchError |
The exception that is thrown when max_searches reaches zero while calling Rule::extract . More... | |
class | NegateMatch |
Matches HTML nodes for which every given Match returns false. More... | |
class | NegateTest |
Negates the result of another ValueTest. More... | |
class | NthChildMatch |
Matches HTML nodes having a certain position within their parent HTML element. More... | |
class | OnlyChildMatch |
Matches HTML nodes that are the only child of their parent HTML element. More... | |
class | PrependPipe |
Prepends a given string to a string. More... | |
class | RegexPipe |
Filters a string according to a given regex. More... | |
class | RegexReplacePipe |
Replaces a string within a string according to a given regex. More... | |
class | RegexTest |
Tests whether another string matches a given regex. More... | |
class | Rule |
Extracts values from HTML. More... | |
class | StringPipe |
Abstract base for every StringPipe. More... | |
class | SyntaxError |
The exception that is thrown when parsing invalid hext. More... | |
class | TrimPipe |
Trims characters from the beginning and end of a string. More... | |
class | TypeRegexMatch |
Matches the name of an HTML element against a regular expression. More... | |
class | ValueTest |
Abstract base for every ValueTest. More... | |
Typedefs | |
using | CaptureFunction = std::function< std::string(const GumboNode *)> |
A type of std::function that receives an HTML element and returns a string. More... | |
using | MatchFunction = std::function< bool(const GumboNode *)> |
A type of std::function that receives an HTML element and returns a bool. More... | |
using | ResultPair = std::pair< std::string, std::string > |
A string-pair containing a name and a value. More... | |
using | ResultMap = std::multimap< ResultPair::first_type, ResultPair::second_type > |
A multimap containing the values produced by capturing. More... | |
using | Result = std::vector< ResultMap > |
A vector containing ResultMap. More... | |
Enumerations | |
enum class | HtmlTag : int { HTML = GUMBO_TAG_HTML , HEAD = GUMBO_TAG_HEAD , TITLE = GUMBO_TAG_TITLE , BASE = GUMBO_TAG_BASE , LINK = GUMBO_TAG_LINK , META = GUMBO_TAG_META , STYLE = GUMBO_TAG_STYLE , SCRIPT = GUMBO_TAG_SCRIPT , NOSCRIPT = GUMBO_TAG_NOSCRIPT , TEMPLATE = GUMBO_TAG_TEMPLATE , BODY = GUMBO_TAG_BODY , ARTICLE = GUMBO_TAG_ARTICLE , SECTION = GUMBO_TAG_SECTION , NAV = GUMBO_TAG_NAV , ASIDE = GUMBO_TAG_ASIDE , H1 = GUMBO_TAG_H1 , H2 = GUMBO_TAG_H2 , H3 = GUMBO_TAG_H3 , H4 = GUMBO_TAG_H4 , H5 = GUMBO_TAG_H5 , H6 = GUMBO_TAG_H6 , HGROUP = GUMBO_TAG_HGROUP , HEADER = GUMBO_TAG_HEADER , FOOTER = GUMBO_TAG_FOOTER , ADDRESS = GUMBO_TAG_ADDRESS , P = GUMBO_TAG_P , HR = GUMBO_TAG_HR , PRE = GUMBO_TAG_PRE , BLOCKQUOTE = GUMBO_TAG_BLOCKQUOTE , OL = GUMBO_TAG_OL , UL = GUMBO_TAG_UL , LI = GUMBO_TAG_LI , DL = GUMBO_TAG_DL , DT = GUMBO_TAG_DT , DD = GUMBO_TAG_DD , FIGURE = GUMBO_TAG_FIGURE , FIGCAPTION = GUMBO_TAG_FIGCAPTION , MAIN = GUMBO_TAG_MAIN , DIV = GUMBO_TAG_DIV , A = GUMBO_TAG_A , EM = GUMBO_TAG_EM , STRONG = GUMBO_TAG_STRONG , SMALL = GUMBO_TAG_SMALL , S = GUMBO_TAG_S , CITE = GUMBO_TAG_CITE , Q = GUMBO_TAG_Q , DFN = GUMBO_TAG_DFN , ABBR = GUMBO_TAG_ABBR , DATA = GUMBO_TAG_DATA , TIME = GUMBO_TAG_TIME , CODE = GUMBO_TAG_CODE , VAR = GUMBO_TAG_VAR , SAMP = GUMBO_TAG_SAMP , KBD = GUMBO_TAG_KBD , SUB = GUMBO_TAG_SUB , SUP = GUMBO_TAG_SUP , I = GUMBO_TAG_I , B = GUMBO_TAG_B , U = GUMBO_TAG_U , MARK = GUMBO_TAG_MARK , RUBY = GUMBO_TAG_RUBY , RT = GUMBO_TAG_RT , RP = GUMBO_TAG_RP , BDI = GUMBO_TAG_BDI , BDO = GUMBO_TAG_BDO , SPAN = GUMBO_TAG_SPAN , BR = GUMBO_TAG_BR , WBR = GUMBO_TAG_WBR , INS = GUMBO_TAG_INS , DEL = GUMBO_TAG_DEL , IMAGE = GUMBO_TAG_IMAGE , IMG = GUMBO_TAG_IMG , IFRAME = GUMBO_TAG_IFRAME , EMBED = GUMBO_TAG_EMBED , OBJECT = GUMBO_TAG_OBJECT , PARAM = GUMBO_TAG_PARAM , VIDEO = GUMBO_TAG_VIDEO , AUDIO = GUMBO_TAG_AUDIO , SOURCE = GUMBO_TAG_SOURCE , TRACK = GUMBO_TAG_TRACK , CANVAS = GUMBO_TAG_CANVAS , MAP = GUMBO_TAG_MAP , AREA = GUMBO_TAG_AREA , MATH = GUMBO_TAG_MATH , MI = GUMBO_TAG_MI , MO = GUMBO_TAG_MO , MN = GUMBO_TAG_MN , MS = GUMBO_TAG_MS , MTEXT = GUMBO_TAG_MTEXT , MGLYPH = GUMBO_TAG_MGLYPH , MALIGNMARK = GUMBO_TAG_MALIGNMARK , ANNOTATION_XML = GUMBO_TAG_ANNOTATION_XML , SVG = GUMBO_TAG_SVG , FOREIGNOBJECT = GUMBO_TAG_FOREIGNOBJECT , DESC = GUMBO_TAG_DESC , TABLE = GUMBO_TAG_TABLE , CAPTION = GUMBO_TAG_CAPTION , COLGROUP = GUMBO_TAG_COLGROUP , COL = GUMBO_TAG_COL , TBODY = GUMBO_TAG_TBODY , THEAD = GUMBO_TAG_THEAD , TFOOT = GUMBO_TAG_TFOOT , TR = GUMBO_TAG_TR , TD = GUMBO_TAG_TD , TH = GUMBO_TAG_TH , FORM = GUMBO_TAG_FORM , FIELDSET = GUMBO_TAG_FIELDSET , LEGEND = GUMBO_TAG_LEGEND , LABEL = GUMBO_TAG_LABEL , INPUT = GUMBO_TAG_INPUT , BUTTON = GUMBO_TAG_BUTTON , SELECT = GUMBO_TAG_SELECT , DATALIST = GUMBO_TAG_DATALIST , OPTGROUP = GUMBO_TAG_OPTGROUP , OPTION = GUMBO_TAG_OPTION , TEXTAREA = GUMBO_TAG_TEXTAREA , KEYGEN = GUMBO_TAG_KEYGEN , OUTPUT = GUMBO_TAG_OUTPUT , PROGRESS = GUMBO_TAG_PROGRESS , METER = GUMBO_TAG_METER , DETAILS = GUMBO_TAG_DETAILS , SUMMARY = GUMBO_TAG_SUMMARY , MENU = GUMBO_TAG_MENU , MENUITEM = GUMBO_TAG_MENUITEM , APPLET = GUMBO_TAG_APPLET , ACRONYM = GUMBO_TAG_ACRONYM , BGSOUND = GUMBO_TAG_BGSOUND , DIR = GUMBO_TAG_DIR , FRAME = GUMBO_TAG_FRAME , FRAMESET = GUMBO_TAG_FRAMESET , NOFRAMES = GUMBO_TAG_NOFRAMES , ISINDEX = GUMBO_TAG_ISINDEX , LISTING = GUMBO_TAG_LISTING , XMP = GUMBO_TAG_XMP , NEXTID = GUMBO_TAG_NEXTID , NOEMBED = GUMBO_TAG_NOEMBED , PLAINTEXT = GUMBO_TAG_PLAINTEXT , RB = GUMBO_TAG_RB , STRIKE = GUMBO_TAG_STRIKE , BASEFONT = GUMBO_TAG_BASEFONT , BIG = GUMBO_TAG_BIG , BLINK = GUMBO_TAG_BLINK , CENTER = GUMBO_TAG_CENTER , FONT = GUMBO_TAG_FONT , MARQUEE = GUMBO_TAG_MARQUEE , MULTICOL = GUMBO_TAG_MULTICOL , NOBR = GUMBO_TAG_NOBR , SPACER = GUMBO_TAG_SPACER , TT = GUMBO_TAG_TT , RTC = GUMBO_TAG_RTC , UNKNOWN = GUMBO_TAG_UNKNOWN , ANY = 512 } |
An enum containing all valid HTML tags. More... | |
Functions | |
HEXT_PUBLIC NthChildMatch::Option | operator| (NthChildMatch::Option left, NthChildMatch::Option right) noexcept |
Applies Bitwise-OR to NthChildMatch::Option. More... | |
HEXT_PUBLIC NthChildMatch::Option | operator& (NthChildMatch::Option left, NthChildMatch::Option right) noexcept |
Applies Bitwise-AND to NthChildMatch::Option. More... | |
HEXT_PUBLIC OnlyChildMatch::Option | operator| (OnlyChildMatch::Option left, OnlyChildMatch::Option right) noexcept |
Applies Bitwise-OR to OnlyChildMatch::Option. More... | |
HEXT_PUBLIC OnlyChildMatch::Option | operator& (OnlyChildMatch::Option left, OnlyChildMatch::Option right) noexcept |
Applies Bitwise-AND to OnlyChildMatch::Option. More... | |
HEXT_PUBLIC Rule | ParseHext (const char *hext) |
Parses a null-terminated string containing hext rule definitions. More... | |
HEXT_PUBLIC Rule | ParseHext (const char *hext, std::size_t size) |
Parses a buffer containing hext rule definitions. More... | |
Variables | |
HEXT_PUBLIC const CaptureFunction | TextBuiltin |
A CaptureFunction that returns the text of an HTML element. More... | |
HEXT_PUBLIC const CaptureFunction | InnerHtmlBuiltin |
A CaptureFunction that returns the inner HTML of an HTML element. More... | |
HEXT_PUBLIC const CaptureFunction | StripTagsBuiltin |
A CaptureFunction that returns the inner HTML of an HTML element without tags. More... | |
HEXT_PUBLIC const int | version_major |
Major version number. More... | |
HEXT_PUBLIC const int | version_minor |
Minor version number. More... | |
HEXT_PUBLIC const int | version_patch |
Patch version number. More... | |
using hext::CaptureFunction = typedef std::function<std::string (const GumboNode *)> |
A type of std::function that receives an HTML element and returns a string.
Definition at line 31 of file CaptureFunction.h.
using hext::MatchFunction = typedef std::function<bool (const GumboNode *)> |
A type of std::function that receives an HTML element and returns a bool.
Definition at line 30 of file MatchFunction.h.
using hext::Result = typedef std::vector<ResultMap> |
using hext::ResultMap = typedef std::multimap<ResultPair::first_type, ResultPair::second_type> |
using hext::ResultPair = typedef std::pair<std::string, std::string> |
|
strong |
An enum containing all valid HTML tags.
With the exception of HtmlTag::ANY, every HtmlTag can be casted to its GumboTag counterpart (same int value).
|
inlinenoexcept |
Applies Bitwise-AND to NthChildMatch::Option.
Definition at line 169 of file NthChildMatch.h.
|
inlinenoexcept |
Applies Bitwise-AND to OnlyChildMatch::Option.
Definition at line 76 of file OnlyChildMatch.h.
|
inlinenoexcept |
Applies Bitwise-OR to NthChildMatch::Option.
Definition at line 160 of file NthChildMatch.h.
|
inlinenoexcept |
Applies Bitwise-OR to OnlyChildMatch::Option.
Definition at line 67 of file OnlyChildMatch.h.
HEXT_PUBLIC Rule hext::ParseHext | ( | const char * | hext | ) |
Parses a null-terminated string containing hext rule definitions.
Throws SyntaxError with a detailed error message on invalid input.
SyntaxError |
hext | A null-terminated string containing hext rule definitions. |
HEXT_PUBLIC Rule hext::ParseHext | ( | const char * | hext, |
std::size_t | size | ||
) |
Parses a buffer containing hext rule definitions.
Throws SyntaxError with a detailed error message on invalid input.
SyntaxError |
hext | A string containing hext rule definitions. |
size | The length of the string. |
|
extern |
A CaptureFunction that returns the inner HTML of an HTML element.
The intent is to mimic innerHtml().
node | A pointer to a GumboNode. |
|
extern |
A CaptureFunction that returns the inner HTML of an HTML element without tags.
node | A pointer to a GumboNode. |
|
extern |
A CaptureFunction that returns the text of an HTML element.
The intent is to mimic functions like jQuery's text(), IE's innerText() or textContent().
node | A pointer to a GumboNode. |
|
extern |
Major version number.
|
extern |
Minor version number.
|
extern |
Patch version number.