1 / 89

Parsing with Boost.Spirit

Parsing with Boost.Spirit. Rob Stewart robert.stewart@sig.com. Overview. Introduction to Boost.Spirit Parsing with Qi Parsing ping command output Problems using Qi. Introduction to Boost.Spirit. Introduction to Boost.Spirit. Three sub-libraries Lex : Lexical analysis Qi: Parsing

shino
Download Presentation

Parsing with Boost.Spirit

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parsing with Boost.Spirit Rob Stewart robert.stewart@sig.com

  2. Overview • Introduction to Boost.Spirit • Parsing with Qi • Parsing ping command output • Problems using Qi

  3. Introduction to Boost.Spirit

  4. Introduction to Boost.Spirit • Three sub-libraries • Lex: Lexical analysis • Qi: Parsing • Karma: Generating output • DSELs • Clear, readable because targeted to domain • Use within your C++ code • No external tools required

  5. Boost.Spirit.Lex • Tokenizes input • Parses character sequence • Produces tokens • Applies your grammar • Separates tokenization from analysis • Reduces complexity of parser • Not covered in this presentation

  6. Boost.Spirit.Qi • Converts sequence of tokens or characters • Implements a recursive descent parser • Parsing Expression Grammar (PEG) based • Similar to Extended Backus-Naur Form (EBNF) • Not ambiguous • Well-suited to computer languages • Ill-suited to natural languages • Replaces uses of scanf(), regular expressions, and tokenizers • Much more powerful and flexible than common tools

  7. Boost.Spirit.Karma • Produces character sequence from data • Can replace uses of printf(), std::ostream, boost::format(), etc. • Much more powerful and flexible than common output tools • Inverse of Qi • Not covered in this presentation

  8. Parsing with Qi

  9. Parsing Basics • Iterate input sequence • Optionally tokenize • Apply grammar • Indicate a match • Produce side effects • Save text • Convert text to another type • Call a function

  10. Parsers like Function Objects • Arguments: Inherited Attributes • Return value: Synthesized Attribute • State

  11. Parser Concept boolparse(FwdIt, FwdIt, Context, Skipper, Attribute); infowhat(Context);

  12. Kinds of Parsers • Primitive • char_, float_, int_, lit, etc. • Rule • Placeholder for one or more parsers • Reusable • Support recursion • Have a name (empty by default) • Grammar: • Encapsulates a set of rules, parsers, and nested grammars • High level abstraction • Offers modularization and composition

  13. Parsers for doubles • To parse one double: boost::spirit::qi::double_ • To parse two whitespace-delimited doubles: double_ >> double_ • Parsing zero or more doubles: *double_ • Parsing a comma-delimited list of doubles: double_ >> *(lit(',') >> double_)

  14. Parsing a Comma-delimited List of doubles double_ >> *(lit(',') >> double_)

  15. Parsing a Comma-delimited List of doubles double_ >> *(lit(',') >> double_) Matches sign, mantissa, and exponent

  16. Parsing a Comma-delimited List of doubles double_ >> *(lit(',') >> double_) Left side might be followed by right side

  17. Parsing a Comma-delimited List of doubles double_ >> *(lit(',') >> double_) Kleene star: zero or more

  18. Parsing a Comma-delimited List of doubles double_ >> *(lit(',') >> double_) Matches a comma which won’t be added to the synthesized attribute

  19. Parsing a Comma-delimited List of doubles double_ >> *(lit(',') >> double_)

  20. Parsing a Comma-delimited List of doubles double_ >> *(lit(',') >> double_)

  21. Parsing a Comma-delimited List of doubles double_ >> *(lit(',') >> double_) double_ % ',' Qi extends PEG operators for convenience

  22. Parsing Functions • boost::spirit::qi::parse() • Parses exactly what’s described by the supplied parser • Provides complete control over where whitespace may occur • Appropriate when parsing token sequences from Lex • boost::spirit::qi::phrase_parse() • Applies a skip parser between parsers comprising the main parser • Simplifies delimiter handling • Can disable for specific parts of the main parser

  23. Using parse() template <class It> bool matches(It _first, It _last) { return parse(_first, _last, double_ % ','); }

  24. Using phrase_parse() template <class It> bool matches(It _first, It _last) { return phrase_parse(_first, _last, double_ % ',', space); }

  25. Reality Isn’t Quite So Pretty #include <boost/spirit/include/qi.hpp> template <class It> bool matches(It _first, It _last) { using boost::spirit::qi::double_; using boost::spirit::qi::lit; using boost::spirit::qi::phrase_parse; using boost::spirit::ascii::space; return phrase_parse(_first, _last, double_ % ',', space); }

  26. Reality Isn’t Quite So Pretty #include <boost/spirit/include/qi.hpp> namespace qi = boost::spirit::qi; template <class It> bool matches(It _first, It _last) { using boost::spirit::ascii::space; return qi::phrase_parse(_first, _last, qi::double_ % ',', space); }

  27. Deconstructing phrase_parse() Calls template <class It> bool matches(It _first, It _last) { return phrase_parse( _first, _last, double_ % ',', space) && _first == _last; }

  28. Deconstructing phrase_parse() Calls template <class It> bool matches(It _first, It _last) { return phrase_parse( _first, _last, double_ % ',', space) && _first == _last; } Half open input range of characters

  29. Deconstructing phrase_parse() Calls template <class It> bool matches(It _first, It _last) { return phrase_parse( _first, _last, double_ % ',', space) && _first == _last; } The parser to apply

  30. Deconstructing phrase_parse() Calls template <class It> bool matches(It _first, It _last) { return phrase_parse( _first, _last, double_ % ',', space) && _first == _last; } The skip parser

  31. Deconstructing phrase_parse() Calls template <class It> bool matches(It _first, It _last) { return phrase_parse( _first, _last, double_ % ',', space) && _first == _last; } Check that the entire input range was consumed

  32. Example: Parsing ping Command Output

  33. ping Command Output PING www.google.com (74.125.131.147) 56(84) bytes of data. 64 bytes from vc-in-f147.1e100.net (74.125.131.147): icmp_seq=1 ttl=39 time=24.6 ms 64 bytes from vc-in-f147.1e100.net (74.125.131.147): icmp_seq=2 ttl=39 time=20.5 ms 64 bytes from vc-in-f147.1e100.net (74.125.131.147): icmp_seq=3 ttl=39 time=18.9 ms --- www.google.com ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2003ms rtt min/avg/max/mdev = 18.984/21.411/24.697/2.410 ms

  34. Creating the ping Parser template <class It, class Skipper>class ping::parser : public qi::grammar<It,Skipper>{public:parser() { // grammar here } private: // rules here};

  35. Creating the ping Parser template <class It, class Skipper>class ping::parser : public qi::grammar<It,Skipper>{public:parser() { // grammar here } private: // rules here};

  36. Creating the ping Parser template <class It, class Skipper>class ping::parser : public qi::grammar<It,Skipper>{public:parser() { // grammar here } private: // rules here};

  37. Creating the ping Parser template <class It, class Skipper>class ping::parser : public qi::grammar<It,Skipper>{public:parser() { // grammar here } private: // rules here};

  38. Creating the ping Parser public: parser() : parser::base_type(start, "ping parser") { } private: qi::rule<It,Skipper> start;

  39. Creating the ping Parser public: parser() : parser::base_type(start, "ping parser") { } private:qi::rule<It,Skipper> start;

  40. Creating the ping Parser public: parser() : parser::base_type(start, "ping parser") { } private: qi::rule<It,Skipper> start;

  41. start Rule PING www.google.com (74.125.131.147) 56(84) bytes of data. start = lit("PING") …

  42. start Rule PING www.google.com (74.125.131.147) 56(84) bytes of data. start = lit("PING")> host …

  43. start Rule PING www.google.com (74.125.131.147) 56(84) bytes of data. start = lit("PING")> host> ip_address …

  44. start Rule PING www.google.com (74.125.131.147) 56(84) bytes of data. start = lit("PING")> host> ip_address> +(char_ - '.') > '.' …

  45. start Rule PING www.google.com (74.125.131.147) 56(84) bytes of data. start = lit("PING")> host> ip_address> +(omit[char_] - '.') > '.' …

  46. start Rule PING www.google.com (74.125.131.147) 56(84) bytes of data. start = lit("PING")> host> ip_address> +(omit[char_] - '.') > '.' > eol …

  47. start Rule PING www.google.com (74.125.131.147) 56(84) bytes of data. start = lit("PING")>host>ip_address>+(omit[char_] - '.') >'.'>eol …

  48. start Rule PING www.google.com (74.125.131.147) 56(84) bytes of data. 64 bytes from vc-in-f147.1e100.net (74.125.131.147): icmp_seq=1 ttl=39 time=24.6 ms 64 bytes from vc-in-f147.1e100.net (74.125.131.147): icmp_seq=2 ttl=39 time=20.5 ms 64 bytes from vc-in-f147.1e100.net (74.125.131.147): icmp_seq=3 ttl=39 time=18.9 ms start = lit("PING")>host> ip_address> +(omit[char_] - '.') > '.' > eol >> *(reply > eol) …

  49. start Rule PING www.google.com (74.125.131.147) 56(84) bytes of data. 64 bytes from vc-in-f147.1e100.net (74.125.131.147): icmp_seq=1 ttl=39 time=24.6 ms 64 bytes from vc-in-f147.1e100.net (74.125.131.147): icmp_seq=2 ttl=39 time=20.5 ms 64 bytes from vc-in-f147.1e100.net (74.125.131.147): icmp_seq=3 ttl=39 time=18.9 ms --- www.google.com ping statistics ---start = lit("PING") … >> *(reply > eol) > eol > +(omit[char_("A-Za-z0-9.-")]) > eol …

  50. start Rule PING www.google.com (74.125.131.147) 56(84) bytes of data. 64 bytes from vc-in-f147.1e100.net (74.125.131.147): icmp_seq=1 ttl=39 time=24.6 ms 64 bytes from vc-in-f147.1e100.net (74.125.131.147): icmp_seq=2 ttl=39 time=20.5 ms 64 bytes from vc-in-f147.1e100.net (74.125.131.147): icmp_seq=3 ttl=39 time=18.9 ms --- www.google.com ping statistics ---start = lit("PING") … >> *(reply > eol) > eol > +(omit[char_] - eol) > eol …

More Related