1 / 98

Lexical Analysis

Lexical Analysis. Uses formalism of Regular Languages Regular Expressions Deterministic Finite Automata (DFA) Non-deterministic Finite Automata (NDFA) RE  NDFA  DFA  minimal DFA (F)Lex uses RE as input, builds lexor. Regular Expressions. Regular expression (over S )  e

abe
Download Presentation

Lexical Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lexical Analysis • Uses formalism of Regular Languages • Regular Expressions • Deterministic Finite Automata (DFA) • Non-deterministic Finite Automata (NDFA) • RE  NDFA  DFA  minimal DFA • (F)Lex uses RE as input, builds lexor

  2. Regular Expressions • Regular expression (over S) •  • e • a where aS • r+r’ • r r’ • r* • where r,r’ regular (over S) • Notational shorthand: • r0 = e, ri = rri-1 • r+ = rr*

  3. DFAs: Formal Definition DFA M = (Q, S, d, q0, F) Q = states finite set S = alphabet finite set d = transition function function in Q S Q q0 = initial/starting state q0 Q F = final states F  Q

  4. a b …aa …ab b a a a e b a b a b b …ba …bb a a b b DFAs: Example strings over {a,b} with next-to-last symbol = a

  5. Nondeterministic Finite Automata “Nondeterminism” implies having a choice. Multiple possible transitions from a state on a symbol. d(q,a) is a set of states d : Q S Pow(Q) Can be empty, so no need for error/nonsense state. Acceptance: exist path to a final state? I.e., try all choices. Also allow transitions on no input: d : Q  (S {e})  Pow(Q)

  6. S a …a …aS … S Loop until we “guess” which is the next-to-last a. NFAs: Example strings over {a,b} with next-to-last symbol = a

  7. CFGs: Formal Definition G = (V, S, P, S) V = variables, a finite set S = alphabet or terminals a finite set P = productions, a finite set S = start variable, SV Productions’ form, where AV, a(VS)*: • A  a

  8. CFGs: Derivations Derivations in one step: bAgGbag Aa P  xS*, a,b,g(VS)* Can choose any variable for use for derivation step. Derivations in zero-or-more steps: G* is the reflexive and transitive closure of G . Language of a grammar: L(G) = {xS* | S G* x}

  9. S A B Root label = start node. A A b B Each interior label = variable. a a b Each parent/child relation = derivation step. Each leaf label = terminal or e. All leaf labels together = derived string = yield. Parse Trees S  A | A B A e | a | A b | A A B b | bc | B c | b B • Sample derivations: • S  AB  AAB aAB aaB aabB aabb • S  AB  AbB  Abb AAbb Aabbaabb These two derivations use same productions, but in different orders.

  10. S A B A A b B a a b Left- & Rightmost Derivations S  A | A B A e | a | A b | A A B b | bc | B c | b B • Sample derivations: • S  AB  AAB aAB aaB aabB aabb • S  AB  AbB  Abb AAbb Aabbaabb • These two derivations are special. • 1st derivation is leftmost. • Always picks leftmost variable. • 2nd derivation is rightmost. • Always picks rightmost variable.

  11. Disambiguation Example Exp  n | Exp + Exp | Exp  Exp What is an equivalent unambiguous grammar? Exp  Term | Term + Exp Term  n | n Term Uses • operator precedence • left-associativity

  12. Parsing Designations • Major parsing algorithm classes are LL and LR • The first letter indicates what order the input is read – L means left to right • Second letter is direction in the “parsing tree” the derivation goes, L = top down, R = bottom up • K of LL(k) or LR(k) is number of symbols lookahead in input during parsing • Power of parsing techniques • LL(k) < LR(k) • LL(n) < LL(n+1), LR(n) < LR(n+1) • Choice of LL or LR largely religious

  13. Items and Itemsets • An itemset is merely a set of items • In LR parsing terminology an item • Looks like a production with a ‘.’ in it • The ‘.’ indicates how far the parse has gone in recognizing a string that matches this production • e.g. A -> aAb.BcC suggests that we’ve “seen” input that could replace aAb. If, by following the rules we get A -> aAbBcC. we can reduce by A -> aAbBcC

  14. Building LR(0) Itemsets • Start with an augmented grammar; if S is the grammar start symbol add S’ -> S • The first set of items includes the closure of S’ -> S • Itemset construction requires two functions • Closure • Goto

  15. Closure of LR(0) Itemset If J is a set of items for Grammar G, then closure(J) is the set of items constructed from G by two rules 1) Each item in J is added to closure(J) 2) If A  α.Bβ is in closure(J) and B  φ is a production, add B  .φ to closure(J)

  16. Closure Example Grammar: A  aBC A  aA B  bB B bC C cC C  λ Closure(J) A  a.BC A-> a.A A  .aBC A  .aA B  .bB B  .bC J A  a.BC A a.A

  17. GoTo Goto(J,X) where J is a set of items and X is a grammar symbol – either terminal or non-terminal is defined to be closure of A αX.β for A  α.Xβ in J So, in English, Goto(J,X) is the closure of all items in J which have a ‘.’ immediately preceding X

  18. Set of Items Construction Procedure items(G’) Begin C = {closure({[S’  .S]})} repeat for each set of items J in C and each grammar symbol X such that GoTo(J,X) is not empty and not in C do add GoTo(J,X) to C until no more sets of items can be added to C

  19. Build LR(0) Itemsets for: • {S  (S), S  λ} • {S  (S), S  SS, S  λ}

  20. Building LR(0) Table from Itemsets • One row for each Itemset • One column for each terminal or non-terminal symbol, and one for $ • Table [J][X] is: • Rn if J includes A  rhs., A  rhs is rule number n, and X is a terminal • Sn if Goto(J,X) is itemset n

  21. LR(0) Parse Table for: • {S  (S), S  λ} • {S  (S), S  SS, S  λ}

  22. Building SLR Table from Itemsets • One row for each Itemset • One column for each terminal or non-terminal symbol, and one for $ • Table [J][X] is: • Rn if J includes A  rhs., A  rhs is rule number n, X is a terminal, AND X is in Follow(A) • Sn if Goto(J,X) is itemset n

  23. LR(0) and LR(1) Items • LR(0) item “is” a production with a ‘.’ in it. • LR(1) item has a “kernel” that looks like LR(0), but also has a “lookahead” – e.g. A  α.Xβ, {terminals} A  α.Xβ, a/b/c ≠ A  α.Xβ, a/b/d

  24. Closure of LR(1) Itemset If J is a set of LR(1) items for Grammar G, then closure(J) includes 1) Each LR(1) item in J 2) If A  α.Bβ, a in closure(J) and B  φ is a production, add B  .φ, First(β,a) to closure(J)

  25. LR(1) Itemset Construction Procedure items(G’) Begin C = {closure({[S’  .S, $]})} repeat for each set of items J in C and each grammar symbol X such that GoTo(J,X) is not empty and not in C do add GoTo(J,X) to C until no more sets of items can be added to C

  26. Build LR(1) Itemsets for: • {S  (S), S  SS, S  λ}

  27. {S  CC, C  cC, C d} Is this grammar • LR(0)? • SLR? • LR(1)? How can we tell?

  28. LR(1) Table from LR(1) Itemsets • One row for each Itemset • One column for each terminal or non-terminal symbol, and one for $ • Table [J][X] is: • Rn if J includes A  rhs., a; A  rhs is rule number n; X = a • Sn if Goto(J,X) in LR(1) itemset n

  29. LALR(1) Parsing • LookAhead LR (1) • Start with LR(1) items • LALR(1) items --- combine LR(1) items with same kernel, different lookahead sets • Build table just as LR(1) table but use LALR(1) items • Same number of states (row) as LR(0)

  30. Code Generation • Pick three registers to be used throughout • Assuming stmt of form dest = s1 op s2 • Generate code by: • Load source 1 into r5 • Load source 2 into r6 • R7 = r5 op r6 • Store r7 into destination

  31. Three-Address Codesection 6.2.1 (new), pp 467 (old) • Assembler for generic computer • Types of statements 3-address (Dragon) • Assignment statement x = y op z • Unconditional jump br label • Conditional jump if( cond ) goto label • Parameter x • Call statement call f

  32. Example “Source” a = ((c-1) * b) + (-c * b)

  33. Example 3-Address t1 = c - 1 t2 = b * t1 t3 = -c t4 = t3 * b t5 = t2 + t4 a = t5

  34. Three-Address Implementation(Quadruples, sec 6.2.2; pp 470-2)

  35. Three-Address Implementation(Triples, section 6.2.3)

  36. Three-Address Implementation • N-tuples (my choice – and yours ??) • Lhs = oper(op1, op2, …, opn) • Lhs = call(func, arg1, arg2, … argn) • If condOper(op1, op2, Label) • br Label

  37. Three-Address Code • 3-address operands • Variable • Constant • Array • Pointer

  38. Variable Storage Memory Locations (Logical) Stack Heap Program Code Register Variable Classes Automatic (locals) Parameters Globals

  39. Variable Types • Scalars • Arrays • Structs • Unions • Objects ?

  40. Row Major Array Storage char A[20][15][10];

  41. Column Major Array Storage char A[20][15][10];

  42. OR (Row Major) char A[20][15][10];

  43. Array Declaration Algorithm Dimension Node { int min; int max; int size; }

  44. Declaration Algorithm (2) • Doubly linked list of dimension nodes • Pass 1 – while parsing • Build linked list from left to right • Insert min, max • Size = size of an element (e.g. 4 for int) • Append node to end of list • min = max = size = 1

  45. Declaration Algorithm (3)Pass 2 • Traverse list from tail to head • For each node, n, going “right” to “left” • Factor = n.max – n.min + 1 • For each node, m, right to left starting with n • m.size = m.size * factor • For each node, n, going right to left • max = N->left->max; min = N->left->min • Save size of first node as size of entire array • Delete first element of list • Set tail->size = size of an element (e.g. 4 for int)

  46. Array Declaration (Row Major) int weight[2000..2005][1..12][1..31]; list of “dimension” nodes int min, max, size size of element of this dimension 1448 124 4

  47. Array Offset (Row Major) Traverse list summing (max-min) * size int weight[2000..2005][1..12][1..31]; x = weight [2002][5][31] (2002-2000) * 1448 + (5-1) * 124 + (31-1) * 4 1448 124 4

  48. Array Offset (Row Major) Traverse list summing (max-min) * size int weight[2000..2005][1..12][1..31]; x = weight [i][j][k] (i - 2000) * 1448 + (j-1) * 124 + (k-1) * 4 1448 124 4

  49. Your Turn • Assume • int A[10][20][30]; • Row major order • “Show” A’s dimension list • Show hypothetical 3-addr code for • X = A[2][3][4] ; • A[3][4][5] = 9

  50. My “Assembly” code X = A[2][3][4]; T1 = 2 * 2400 T2 = 3 * 120 T3 = T1 + T2 T4 = 4 * 4 T5 = T3 + T4 T6 = T5 + 64 # 64 is A’s offset %eax = T5 %eax = %ebp + %eax %eax = 0(%eax) 16(%ebp) = %eax # 16 is X’s offset

More Related