1 / 23

Understanding the Computation Involved in Scanner Implementation

This article explores the reasons for separating the scanner and parser in programming language syntax. It delves into the computation involved in the scanner and provides solutions for implementing an ad hoc scanner based on automaton. The steps for generating a scanner using finite automata and optimizing it are also discussed.

athigpen
Download Presentation

Understanding the Computation Involved in Scanner Implementation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Programming Language Syntax 3 http://flic.kr/p/zCyMp

  2. Why separate scanner and parser? ANTLR generates for you … • Parser much more computationally intensive than scanner • Scanner considerably reduces number of items that parser must inspect But what computation is involved in scanner? That’s today’s topic…

  3. Consider these “calculator language” tokens

  4. Complete this ad hoc scanner Tokens

  5. Here’s a solution Tokens

  6. Ad hoc scanner implementations common for production languages • Fast, compact code • But… • Finite automata can be generated automatically from a set of regular expressions • Good for developing languages • Easy to regenerate scanner

  7. Calculator language scanner automaton • Note that this is a deterministic finite automaton (DFA) • Only ever one possible transition for an input character

  8. Three steps generate scanner • Generate nondeterministic finite automaton (NFA) • Multiple transition out of state for same character • Epsilon transitions (ε) • Convert NFA to DFA • No need to search all paths in DFA • Optimize by minimizing states in DFA

  9. NFA building blocks

  10. Construct an NFA for this regex Solve in this order . d d* .d d. .d|d.

  11. A solution…

  12. NFA to DFA conversion • “Set of subsets” construction • State of DFA after reading given input represents set of states NFA might have reached • Example: Start 1, 2, 4 d 2, 3, 4

  13. Given this NFA…

  14. Fill in the blanks

  15. The solution…

  16. Minimizing the DFA Steps: • Merge all non-final states into a single state and merge all final states into a single state (expect ambiguity) • For each ambiguous input, split states back to their original division until input is no longer ambiguous

  17. Minimizing the DFAStep 1: Merge state into non-final and final

  18. Minimizing the DFAStep 2 (repeated): Disambiguate by splitting First, we disambiguate d Now, how to disambiguate “.”?

  19. Minimizing the DFAStep 2 (repeated): Disambiguate by splitting

  20. How implement scanner based on automaton? Two common approaches: • Nested case statements • Tend to be ad hoc • Table and driver • Tend to be generated

  21. Nested-case approach Outer cases handle states Inner cases handle transitions (set a new state) Note: Look-ahead may be necessary to accept longest possible token

  22. Table and driver approach Good news! We’ll let ANTLR handle the implementing!

  23. What’s next? • Homework 1 due next class!

More Related