Perl 6 Update - PGE and Pugs - PowerPoint PPT Presentation

Perl 6 update pge and pugs l.jpg
Download
1 / 31

  • 274 Views
  • Uploaded on
  • Presentation posted in: Pets / Animals

Perl 6 Update - PGE and Pugs. Dr. Patrick R. Michaud April 26, 2005. Rules and Grammars. Perl 6 completely redesigns the regular expression syntax Regular expressions are now "rules" Rules can call/embed other rules Groups of rules can be combined into Grammars. Current events in Perl 6.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Perl 6 Update - PGE and Pugs

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Perl 6 update pge and pugs l.jpg

Perl 6 Update - PGE and Pugs

Dr. Patrick R. Michaud

April 26, 2005


Rules and grammars l.jpg

Rules and Grammars

  • Perl 6 completely redesigns the regular expression syntax

  • Regular expressions are now "rules"

  • Rules can call/embed other rules

  • Groups of rules can be combined into Grammars


Current events in perl 6 l.jpg

Current events in Perl 6

  • Parrot 1.2 released

  • The Perl Foundation receives $25,000 for completion of Parrot milestones

  • New Parrot pumpking - Chip Salzenburg

  • New version of Parrot Grammar Engine (PGE / Perl 6 rules) to be released this week

  • Pugs - Autrijus Tang

    • Perl 6 test suite


Slide4 l.jpg

Pugs

  • Perl 6 compiler written in Haskell

  • Started by Autrijus Tang

  • Compiles directly to Haskell or to Parrot AST

  • Being used to develop Perl 6 tests and experiment with Perl 6 design

  • Available at http://pugscode.org

  • Discussion on perl6-compiler@perl.org mailing list


Perl 6 rules parrot grammar engine l.jpg

Perl 6 rules / Parrot Grammar Engine

  • The heart of the Perl 6 compiler is the Perl/Parrot Grammar Engine (PGE)

  • Implements the Perl 6 rules syntax, compiles to Parrot code

  • Perl 6 rules compiler currently written in C

  • Bootstrap to Perl 6


Steps to perl 6 compiler l.jpg

Steps to Perl 6 compiler

  • Finish PGE bootstrap in C

    • Parse p6 "rule" statements and grammars

  • Use p6 rules to define the Perl 6 grammar

  • P6 grammar can be used to generate Parrot abstract syntax trees from Perl 6 programs

  • Compile, (optimize), execute the abstract syntax tree to get working Perl 6 program

  • Use Perl 6 to rewrite the grammar engine in Perl 6 (faster)


Current state of pge l.jpg

Current state of PGE

  • Handles concatenation, alternation, quantifiers, captures*, subpatterns, subrules

  • Capture semantics redefined in Dec 2004, still not final

  • To be added next

    • Character classes (note: Unicode)

    • Patterns containing scalars, arrays, hashes


P6 rule syntax l.jpg

P6 rule syntax

  • Changes from perl 5

    • No more trailing /e, /x, /s options

    • [...] denotes non-capturing groups

    • ^ and $ are beginning/end of string

    • ^^ and $$ are beginning/end of line

    • . matches any character, including newline

    • \n and \N match newline/non-newline

    • # marks a comment (to end of line)

    • Quantifiers are *, +, ?, and **{m..n}


Character classes l.jpg

Character classes

  • [aeiou] changed to <[aeiou]>

  • [^0-9] now <-[0..9]>

  • Properties defined as

    • <alpha>

    • <digit>

    • <alnum>

  • Combine classes using +/- syntax:

    • <+<alpha>-[aeiou]>


Subrules l.jpg

Subrules

  • Patterns are now called "rules"

  • Analogous to subroutines and closures

  • Like {...}, /.../ compiles into a "rule" subroutine

  • P6 rule statement allows named rules:

    rule ident / [<alpha>|_] \w* /;

  • Named rules can be easily used in other rules:

    m / <ident> \:= (.*) /;

    rule expr / <term> [ <[+-]> <term> ]* /;


Interpolation l.jpg

Interpolation

  • Variables no longer interpolate directly, thus

    / $var /

    matches the contents of $var literally, even if it contains rule metacharacters. (No \Q and \E)

  • To treat $var as a rule, use

    / <$var> /

  • Interpolated arrays match as an alternation:

    / @cmds /

    / [ @cmds[0] | @cmds[1] | @cmds[2] | ... ] /


Interpolation cont d l.jpg

Interpolation, cont'd

  • Hashes match the keys of the hash, and the value of the hash is either

    • Executed if it is a closure

    • Treated as a subrule if it's a string or rule object

    • Succeeds if value is 1

    • Fails for any other value

  • Useful for parsed languages

    rule expr / <term> [ %infixop <expr> ]? /


Metasyntax l.jpg

< metasyntax >

  • The < ... > introduce various forms of metasyntax

  • A leading alphabetic character indicates a subrule or grammatical assertion

    <alpha>

    <expr>

    <before pattern>

    <after pattern>

  • A leading ! negates the match

    <!before pattern>


Metasyntax14 l.jpg

< metasyntax >

  • Leading ' matches a literal string

    <'match this exactly (whitespace matters)'>

  • Leading " matches an interpolated string

    <"match $THIS exactly (whitespace matters)">

  • Leading '+' or '-' are character classes

    /<-[a..z]> <-<alpha>>/


Metacharacters l.jpg

< metacharacters >

  • Leading '(' indicates code assertion

    /(\d**{1..3}) <( $1 < 256 )>/

    # (fail if $1 is not less than 256)

  • A $, @, or % indicates a variable subrule, where each value (or key) is a subrule to be matched

    <$myrule>

    <@cmds>

    <%commands>


A cool and somewhat scary example l.jpg

A cool and somewhat scary example

%cmd{'^\d+'} = { say "You entered a number" };

%cmd{'^hello'} = { say "world" };

%cmd{'^print \s (.*)'} = { say $1; };

%cmd{'^exit'} = { exit() };

while =$*IN {

/<%cmd>/ || say "Unrecognized command";

}


Backtracking control l.jpg

Backtracking control

  • Single colons skip previous atom

    m/ \( <expr> [ , <expr> ]* : \) /

    (if we don't find closing paren, no point in trying to match fewer <expr>s)

  • Two colons break an alternation:

    m:w/ [ if :: <expr> <block>

    | for :: <list> <block>

    | loop :: <loop_controls>? <block>

    ]

    (once we've found "if", "for", or "loop", no point in trying the other branches of the alternation)


Backtracking control18 l.jpg

Backtracking control

  • Three colons (:::) fail the current rule

  • The <commit> assertion fails the entire match (including any rules that called the current rule)

  • The <cut> assertion matches successfully, removes the matched portion of the string up to the <cut>, and if backtracked over fails the match entirely

    • Useful for throwing away successfully processed input when matching from an input stream

    • Like, say, when writing a compiler :-)


Backslash l.jpg

Backslash

  • \L, \U, \Q, \E, \A, \z gone from rules

  • \n and \N match newline/not newline

  • \s matches any Unicode space

  • backreferences are gone, use $1, $2, $3 (non-interpolated)

  • Perl 6 allows defining custom backslash sequences for use in rules


Closures l.jpg

Closures

  • Anything in curlies is executed as a Perl 6 closure

    / (\w+) { say "Got $1"; } /


Capture semantics l.jpg

Capture semantics

  • Captures are different in Perl 6

  • The result of a match is a "match object"

  • If a match succeeds, the match object has:

    • Boolean value true

    • Numeric value 1 (except for global matches)

    • String value the matched substring

    • Array component is matched subpatterns

    • Hash component is matched subrules


Subpattern captures l.jpg

Subpattern captures

  • Part of a rule in parenthesis is a subpattern

  • Each subpattern produces its own match object

    /Scooby (dooby) (doo)!/

    $1 $2

  • Quantified subpatterns produce arrays of match objects:

    /Scooby (\w+ \s+)* (doo)!/

    $1 $2

    $1 is a (possibly empty) array of matches


Non capturing groups l.jpg

Non-capturing groups

  • Brackets do not capture, thus they don't result in a match object

    /Scooby [ (\w+ \s+)* (doo) ]!/

    $1 $2

  • Quantified brackets replace nested subpatterns with the last component matched:

    /Scooby [ (\w+ \s+)* (doo) ]+ !/

    $1 $2


Nested capturing subpatterns l.jpg

Nested capturing subpatterns

  • Each capturing subpattern introduces a new lexical scope, with nested captures inside the new match object:

    /Scooby ( (\w+ \s+)* (doo) ) !/

    $1[0] $1[1]

    <-------- $1 --------->


Alternations l.jpg

Alternations

  • Alternations introduce a new lexical scope, thus subpatterns restart counting at zero for each alternative branch (unlike p5):

    $1 $2

    m/ Scooby (dooby)* (doo)!

    | Yabba (dabba)* (doo) /

    $1 $2

    This avoids lots of empty subpatterns when an alternation doesn't match.


Subrules26 l.jpg

Subrules

  • Subrules capture into a hash keyed by the name of the subrule:

    rule ident / [<alpha>|_] \w* /;

    rule num / \d+ /;

    m/ <ident> \:= <num> /;

    places match objects into $<ident> and $<num>


Quantified subrules l.jpg

Quantified subrules

  • Like subpatterns, quantified subrules produce arrays of matches

    m:w / dir <file>* /

    produces matches in $<file>[0], $<file>[1], etc.

  • Nested parens in a subrule capture to the subrule's match object


Named captures l.jpg

Named captures

  • Portions of a match can be captured directly into a match object without a subrule:

    m:w/ $<name> := \w+ , <$val> := \d+ /

    captures the first sequence of alphanumerics into $<name>, and digits following the comma into $<val>.


Grammars l.jpg

Grammars

  • Rules can be packaged together into separate name spaces to form Grammars

    grammar Perl6 {

    rule ident { ... };

    rule term { ... };

    rule expr { ... };

    }


Parsetree l.jpg

:parsetree

  • The :parsetree flag to a rule causes the grammar engine to keep all information about a match.

  • Thus, one can do something like

    $parse = ($source ~~ Perl6::program);

    to get the entire parsetree for a program (including comments)


Questions l.jpg

Questions?


  • Login