Parsing with Boost.Spirit - PowerPoint PPT Presentation

parsing with boost spirit n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Parsing with Boost.Spirit PowerPoint Presentation
Download Presentation
Parsing with Boost.Spirit

play fullscreen
1 / 89
Parsing with Boost.Spirit
164 Views
Download Presentation
shino
Download Presentation

Parsing with Boost.Spirit

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Parsing with Boost.Spirit Rob Stewart robert.stewart@sig.com

  2. Overview • Introduction to Boost.Spirit • Parsing with Qi • Parsing ping command output • Problems using Qi

  3. Introduction to Boost.Spirit

  4. Introduction to Boost.Spirit • Three sub-libraries • Lex: Lexical analysis • Qi: Parsing • Karma: Generating output • DSELs • Clear, readable because targeted to domain • Use within your C++ code • No external tools required

  5. Boost.Spirit.Lex • Tokenizes input • Parses character sequence • Produces tokens • Applies your grammar • Separates tokenization from analysis • Reduces complexity of parser • Not covered in this presentation

  6. Boost.Spirit.Qi • Converts sequence of tokens or characters • Implements a recursive descent parser • Parsing Expression Grammar (PEG) based • Similar to Extended Backus-Naur Form (EBNF) • Not ambiguous • Well-suited to computer languages • Ill-suited to natural languages • Replaces uses of scanf(), regular expressions, and tokenizers • Much more powerful and flexible than common tools

  7. Boost.Spirit.Karma • Produces character sequence from data • Can replace uses of printf(), std::ostream, boost::format(), etc. • Much more powerful and flexible than common output tools • Inverse of Qi • Not covered in this presentation

  8. Parsing with Qi

  9. Parsing Basics • Iterate input sequence • Optionally tokenize • Apply grammar • Indicate a match • Produce side effects • Save text • Convert text to another type • Call a function

  10. Parsers like Function Objects • Arguments: Inherited Attributes • Return value: Synthesized Attribute • State

  11. Parser Concept boolparse(FwdIt, FwdIt, Context, Skipper, Attribute); infowhat(Context);

  12. Kinds of Parsers • Primitive • char_, float_, int_, lit, etc. • Rule • Placeholder for one or more parsers • Reusable • Support recursion • Have a name (empty by default) • Grammar: • Encapsulates a set of rules, parsers, and nested grammars • High level abstraction • Offers modularization and composition

  13. Parsers for doubles • To parse one double: boost::spirit::qi::double_ • To parse two whitespace-delimited doubles: double_ >> double_ • Parsing zero or more doubles: *double_ • Parsing a comma-delimited list of doubles: double_ >> *(lit(',') >> double_)

  14. Parsing a Comma-delimited List of doubles double_ >> *(lit(',') >> double_)

  15. Parsing a Comma-delimited List of doubles double_ >> *(lit(',') >> double_) Matches sign, mantissa, and exponent

  16. Parsing a Comma-delimited List of doubles double_ >> *(lit(',') >> double_) Left side might be followed by right side

  17. Parsing a Comma-delimited List of doubles double_ >> *(lit(',') >> double_) Kleene star: zero or more

  18. Parsing a Comma-delimited List of doubles double_ >> *(lit(',') >> double_) Matches a comma which won’t be added to the synthesized attribute

  19. Parsing a Comma-delimited List of doubles double_ >> *(lit(',') >> double_)

  20. Parsing a Comma-delimited List of doubles double_ >> *(lit(',') >> double_)

  21. Parsing a Comma-delimited List of doubles double_ >> *(lit(',') >> double_) double_ % ',' Qi extends PEG operators for convenience

  22. Parsing Functions • boost::spirit::qi::parse() • Parses exactly what’s described by the supplied parser • Provides complete control over where whitespace may occur • Appropriate when parsing token sequences from Lex • boost::spirit::qi::phrase_parse() • Applies a skip parser between parsers comprising the main parser • Simplifies delimiter handling • Can disable for specific parts of the main parser

  23. Using parse() template <class It> bool matches(It _first, It _last) { return parse(_first, _last, double_ % ','); }

  24. Using phrase_parse() template <class It> bool matches(It _first, It _last) { return phrase_parse(_first, _last, double_ % ',', space); }

  25. Reality Isn’t Quite So Pretty #include <boost/spirit/include/qi.hpp> template <class It> bool matches(It _first, It _last) { using boost::spirit::qi::double_; using boost::spirit::qi::lit; using boost::spirit::qi::phrase_parse; using boost::spirit::ascii::space; return phrase_parse(_first, _last, double_ % ',', space); }

  26. Reality Isn’t Quite So Pretty #include <boost/spirit/include/qi.hpp> namespace qi = boost::spirit::qi; template <class It> bool matches(It _first, It _last) { using boost::spirit::ascii::space; return qi::phrase_parse(_first, _last, qi::double_ % ',', space); }

  27. Deconstructing phrase_parse() Calls template <class It> bool matches(It _first, It _last) { return phrase_parse( _first, _last, double_ % ',', space) && _first == _last; }

  28. Deconstructing phrase_parse() Calls template <class It> bool matches(It _first, It _last) { return phrase_parse( _first, _last, double_ % ',', space) && _first == _last; } Half open input range of characters

  29. Deconstructing phrase_parse() Calls template <class It> bool matches(It _first, It _last) { return phrase_parse( _first, _last, double_ % ',', space) && _first == _last; } The parser to apply

  30. Deconstructing phrase_parse() Calls template <class It> bool matches(It _first, It _last) { return phrase_parse( _first, _last, double_ % ',', space) && _first == _last; } The skip parser

  31. Deconstructing phrase_parse() Calls template <class It> bool matches(It _first, It _last) { return phrase_parse( _first, _last, double_ % ',', space) && _first == _last; } Check that the entire input range was consumed

  32. Example: Parsing ping Command Output

  33. ping Command Output PING www.google.com (74.125.131.147) 56(84) bytes of data. 64 bytes from vc-in-f147.1e100.net (74.125.131.147): icmp_seq=1 ttl=39 time=24.6 ms 64 bytes from vc-in-f147.1e100.net (74.125.131.147): icmp_seq=2 ttl=39 time=20.5 ms 64 bytes from vc-in-f147.1e100.net (74.125.131.147): icmp_seq=3 ttl=39 time=18.9 ms --- www.google.com ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2003ms rtt min/avg/max/mdev = 18.984/21.411/24.697/2.410 ms

  34. Creating the ping Parser template <class It, class Skipper>class ping::parser : public qi::grammar<It,Skipper>{public:parser() { // grammar here } private: // rules here};

  35. Creating the ping Parser template <class It, class Skipper>class ping::parser : public qi::grammar<It,Skipper>{public:parser() { // grammar here } private: // rules here};

  36. Creating the ping Parser template <class It, class Skipper>class ping::parser : public qi::grammar<It,Skipper>{public:parser() { // grammar here } private: // rules here};

  37. Creating the ping Parser template <class It, class Skipper>class ping::parser : public qi::grammar<It,Skipper>{public:parser() { // grammar here } private: // rules here};

  38. Creating the ping Parser public: parser() : parser::base_type(start, "ping parser") { } private: qi::rule<It,Skipper> start;

  39. Creating the ping Parser public: parser() : parser::base_type(start, "ping parser") { } private:qi::rule<It,Skipper> start;

  40. Creating the ping Parser public: parser() : parser::base_type(start, "ping parser") { } private: qi::rule<It,Skipper> start;

  41. start Rule PING www.google.com (74.125.131.147) 56(84) bytes of data. start = lit("PING") …

  42. start Rule PING www.google.com (74.125.131.147) 56(84) bytes of data. start = lit("PING")> host …

  43. start Rule PING www.google.com (74.125.131.147) 56(84) bytes of data. start = lit("PING")> host> ip_address …

  44. start Rule PING www.google.com (74.125.131.147) 56(84) bytes of data. start = lit("PING")> host> ip_address> +(char_ - '.') > '.' …

  45. start Rule PING www.google.com (74.125.131.147) 56(84) bytes of data. start = lit("PING")> host> ip_address> +(omit[char_] - '.') > '.' …

  46. start Rule PING www.google.com (74.125.131.147) 56(84) bytes of data. start = lit("PING")> host> ip_address> +(omit[char_] - '.') > '.' > eol …

  47. start Rule PING www.google.com (74.125.131.147) 56(84) bytes of data. start = lit("PING")>host>ip_address>+(omit[char_] - '.') >'.'>eol …

  48. start Rule PING www.google.com (74.125.131.147) 56(84) bytes of data. 64 bytes from vc-in-f147.1e100.net (74.125.131.147): icmp_seq=1 ttl=39 time=24.6 ms 64 bytes from vc-in-f147.1e100.net (74.125.131.147): icmp_seq=2 ttl=39 time=20.5 ms 64 bytes from vc-in-f147.1e100.net (74.125.131.147): icmp_seq=3 ttl=39 time=18.9 ms start = lit("PING")>host> ip_address> +(omit[char_] - '.') > '.' > eol >> *(reply > eol) …

  49. start Rule PING www.google.com (74.125.131.147) 56(84) bytes of data. 64 bytes from vc-in-f147.1e100.net (74.125.131.147): icmp_seq=1 ttl=39 time=24.6 ms 64 bytes from vc-in-f147.1e100.net (74.125.131.147): icmp_seq=2 ttl=39 time=20.5 ms 64 bytes from vc-in-f147.1e100.net (74.125.131.147): icmp_seq=3 ttl=39 time=18.9 ms --- www.google.com ping statistics ---start = lit("PING") … >> *(reply > eol) > eol > +(omit[char_("A-Za-z0-9.-")]) > eol …

  50. start Rule PING www.google.com (74.125.131.147) 56(84) bytes of data. 64 bytes from vc-in-f147.1e100.net (74.125.131.147): icmp_seq=1 ttl=39 time=24.6 ms 64 bytes from vc-in-f147.1e100.net (74.125.131.147): icmp_seq=2 ttl=39 time=20.5 ms 64 bytes from vc-in-f147.1e100.net (74.125.131.147): icmp_seq=3 ttl=39 time=18.9 ms --- www.google.com ping statistics ---start = lit("PING") … >> *(reply > eol) > eol > +(omit[char_] - eol) > eol …