Hasklex the haskell lexer senior seminar final project
This presentation is the property of its rightful owner.
Sponsored Links
1 / 13

HaskLex - The Haskell Lexer Senior Seminar Final Project PowerPoint PPT Presentation


  • 41 Views
  • Uploaded on
  • Presentation posted in: General

HaskLex - The Haskell Lexer Senior Seminar Final Project. Chris Lattner April 2000. Outline. Goals Design Example internal code Demonstration!. Hasklex Goals. Write a lexer in Haskell! Provide a usable regex dialect *, +, ?, |, (), [] are required. ! and - are bonuses.

Download Presentation

HaskLex - The Haskell Lexer Senior Seminar Final Project

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Hasklex the haskell lexer senior seminar final project

HaskLex - The Haskell LexerSenior Seminar Final Project

Chris Lattner

April 2000


Outline

Outline

  • Goals

  • Design

  • Example internal code

  • Demonstration!

Chris Lattner - Senior Seminar - April 2000


Hasklex goals

Hasklex Goals

  • Write a lexer in Haskell!

    • Provide a usable regex dialect

      • *, +, ?, |, (), [] are required. ! and - are bonuses.

    • Provide “adequate” performance

      • Matches must be very fast, compiling a lexer can be slow, but we prefer it not to be

    • Provide enough functionality to be useful

      • Metric: Can an assembler be written with it?

Chris Lattner - Senior Seminar - April 2000


Hasklex design

Hasklex Design

  • Four modules layered on top of each other

  • Each may be used independently

    • DFA provides ADT and match operations

    • NFA provides higher level abstraction

    • RegEx parses regular expression constructs

    • Lexer ties it all together

Chris Lattner - Senior Seminar - April 2000


Dfa data structure

Previously Presented:

FA is a list of nodes

Each node contains:

“Finality” number

List of transitions

Ugly to read

Easy to implement

Structures:

type FANode = (Int, [Int])

newtype FA = F [FANode]

Example functions:

addNode, matchDFA, emptyFA, nukeNode, addFATransition, removeDeadStates

DFA Data Structure

Chris Lattner - Senior Seminar - April 2000


Nfa data structure

NFA Data Structure

  • Identical definition to DFA

  • Extra transitions on transition lists are considered to be “lambda” transitions ()

  • Unlimited number of  transitions may come from any given node

  • May be converted to a DFA with the buildDFA function

Chris Lattner - Senior Seminar - April 2000


Regex data structure

A simple string!

Primatives Recognized:

“x” - Literal chars

“.” - Any character

“[a-z]” - Char classes

“(aa)*” - Grouping

Postfix Operators:

“ab” - Juxtaposition

“x*” - Klein Enclosure

“x+” - Repetition

“x?” - Optionalization

“x!” - Inversion

Infix Operators:

“a|b” - Alternation

“a-b” - Subtraction

Char class provides escaping mechanism

buildNFA to convert

RegEx Data Structure

Chris Lattner - Senior Seminar - April 2000


Lexer data structure

Contains:

Composite DFA

Mapping of “finality” states back to user defined tokens

First entry of map (list) is error token

Uses all other modules

Structures:

type Token a = (String, a)

newtype Lexer a = Lexr (FA, [a])

Example Functions:

compileLexer, lexFirstToken, lexIntoList, measureLexer, serializeLexer

Lexer Data Structure

Chris Lattner - Senior Seminar - April 2000


Example lexer

Example Lexer

testLexer =

compileLexer [

("[ \n\t]+", TokIgnore), -- Ignore whitespace

(";[^\n]*", TokIgnore), -- Ignore comments...

("if?", TokIf),-- Recognize keywords

("t(hen)?", TokThen),

("e(lse)?", TokElse),

("w(hile)?", TokWhile),

("do?", TokDo),

("[+]", TokPlus),-- Recognize operators

("[-]?[0-9]+", TokInt),-- Recognize numbers

("[-]?[0-9]*[.][0-9]*-[.]", TokFloat),

("[a-zA-Z_][a-zA-Z0-9_]+", TokVar)-- Recognize variables

] TokError-- Error token

Chris Lattner - Senior Seminar - April 2000


Lexer usage

Lexer Usage

  • Many functions available to use lexer:

    lexIntoList :: Eq a => Lexer a -> String -> [Token a]

    • Lex a string into a list of tokens

      lexFirstToken :: Lexer a -> String -> Token a

    • Lex only the first token from the string

      lexFile :: Eq a => Lexer a -> FilePath -> IO [Token a]

    • Lex an entire file into a list of tokens

Chris Lattner - Senior Seminar - April 2000


But what about performance

But what about performance?

  • Problem: Lexer is slow to compile

    • Must parse regexs, build NFA, reduce to DFA

  • Why not save the finished product?

    serializeLexer :: Show a => Lexer a -> String -> IO ()

    • Write compiled lexer to file, allow you to “import” it for later use.

  • Result: “Compile” is very fast!

Chris Lattner - Senior Seminar - April 2000


Internal dfa match code

Internal DFA Match Code

matchDFALengthState :: FA -> String -> (Int, Int)

matchDFALengthState (F dfa) = matchDFAh (dfa!!0) dfa

matchDFAh :: FANode -> [FANode] -> String -> (Int, Int)

matchDFAh (final, _) _ "" = (0, final) -- Empty string...

matchDFAh (final, transitions) dfa (s:str)

| blocked = (0, final)

| otherwise = (sLen+1, sFin)

where -- trans is the transition for the 's' character

trans = transitions!!(ord s)

(sLen, sFin) = matchDFAh (dfa!!trans) dfa str

blocked = trans == (-1) || sFin == (-1)

Chris Lattner - Senior Seminar - April 2000


Conclusions

Conclusions

  • Finite automata can be useful [but let someone else implement them!]

  • Writing a library isn’t as cool as writing an application

  • It is possible to write complex programs in Haskell, though painful at times [stack overflows]

  • Compiled Haskell [GHC] would probably solve many problems

Chris Lattner - Senior Seminar - April 2000


  • Login