1 / 37

Chapter 2 Language & Syntax Description Section 1 Alphabet & String

Chapter 2 Language & Syntax Description Section 1 Alphabet & String. 1 、 Alphabet Non-empty set of symbols , usually expressed in  、 V or Other Upper-case Greece Letter 2 、 Symbol(Character) Elements in alphabet, finest elements in a language 3 、 String

afia
Download Presentation

Chapter 2 Language & Syntax Description Section 1 Alphabet & String

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 2 Language & Syntax DescriptionSection 1 Alphabet & String 1、Alphabet Non-empty set of symbols,usually expressed in 、V or Other Upper-case Greece Letter 2、Symbol(Character) Elements in alphabet, finest elements in a language 3、String Finite sequence of symbols in the Alphabet. Notes:Null-string is string without any symbol, written as 。

  2. Chapter 2 Language & Syntax DescriptionSection 1 Alphabet & String 4、Sentence A set of strings based on symbols in the Alphabet in certain construction rules 5、Language Sets of sentences in the Alphabet. Notes:By convention, a symbol is expressed as a,b,c,…;a string is expressed as ,,,…;a set of strings is expressed in A,B,C,….

  3. Chapter 2 Language & Syntax DescriptionSection 1 Alphabet & String 6、Operations on the sets of strings 1)、Concatenate (Product) Operation Let the string set A={1,2,…},B={1,2,...}, then (Cartesian) Product AB is defined as AB={|A and B} Notes:1)String set product on self is called as power of the string set 2)A0={} 3)n powers of Alphabet A is the set of all strings with n length

  4. Chapter 2 Language & Syntax DescriptionSection 1 Alphabet & String 6、Operations on the sets of strings 2)、Closure and positive closure a)Closure A*=A0A1A2… It is meant by the set of all strings on Alphabet A(Including null-string ) b)Positive closure A+=A1A2…=A*-{} Notes:A language is a subset of positive closure on the Alphabet.

  5. Chapter 2 Language & Syntax DescriptionSection 2 Grammar & Language1、Basic concepts a、Grammar Grammar is the formal production rules describing the construction of syntax elements. Notes:1) Syntax elements include sentences and words in sentences, a language is composed of sentences. 2) The form of a production rule is as following: left-sideright-side (that can be read as “left-side is defined as right-side”, “left-side derives right-side”,or “left-side produces right-side”, it expresses the relation between the two sides)

  6. Chapter 2 Language & Syntax DescriptionSection 2 Grammar & Language1、Basic concepts b、Non-terminal symbol • A symbol that appears in the left of a rule , is bracketed in <> and expresses a syntax concept. • A set of non-terminal symbols is expressed in VN c、Terminal symbol • Strings in a language that cannot be decomposed (including strings of single characters), expressed in VT. Notes:Terminal symbols are basic elements of a sentence.

  7. Chapter 2 Language & Syntax DescriptionSection 2 Grammar & Language1、Basic concepts d、Start symbol • A special non-terminal symbol that is the core of the defined syntax. Notes:The start symbol is also named as “identified symbol”. e、Production • A set of rules to define the relations among strings The form :A (A produce  ) E.g. <Sentence>  <Subject><Predicate>

  8. Chapter 2 Language & Syntax DescriptionSection 2 Grammar & Language1、Basic concepts f、Derivation • The process that starts from the Start Symbol, and derives a sentence by replacing the left-side with right side in a production rule. • Leftmost (Rightmost) Derivation:Only use a production rule every time and replace the leftmost (Rightmost) Terminal Symbol with the right side Notes: Leftmost (Rightmost) Derivation are called canonical derivation.

  9. Chapter 2 Language & Syntax DescriptionSection 2 Grammar & Language1、Basic concepts g、Reduction • Reduction is the inverse process of derivation,that is, starting from a given sentence of a language, arriving at the Start Symbol by replacing the right-side with left-side of the production rules finally. • Leftmost(Rightmost) Reduction is the inverse process of Rightmost(Leftmost) derivation. Notes: Leftmost and Rightmost Reduction are called canonical reduction.

  10. + Chapter 2 Language & Syntax DescriptionSection 2 Grammar & Language1、Basic concepts h、Sentential form、Sentence & Language • Sentential form • String  that is produced from every derivation (including 0 derivation) from the Start Symbol. Written as S , ( VNVT)* • Sentence • A sentential form that only include terminal symbol • Language • The set of sentences (strings) that are produced from one or more derivation from S. Written as L(G), L(G)={|S ,and   VT*} *

  11. Chapter 2 Language & Syntax DescriptionSection 2 Grammar & Language1、Basic concepts i、Recursive definition of grammar rules • A non-terminal symbol is included in the definition of the non-terminal symbol. Notes:You should be careful when you define a grammar in a recursive method. You must give the exit statement (special case statement) of the recursion. Otherwise you can not get a sentence forever.

  12. Chapter 2 Language & Syntax DescriptionSection 2 Grammar & Language1、Basic concepts j、Extended notations of grammar rules Use extended BNF(Backus Naur Form) notations • () ——Extract factor E.g. Uax|ay|az Rewritten as Ua(x|y|z) • {} ——Assignment of repeat number E.g. <Identifier><Letter>{<Letter>|<Digit>}50. • [] ——Optional symbol E.g. <Integer>[+|-]<Digit>{<Digit>}

  13. Chapter 2 Language & Syntax DescriptionSection 2 Grammar & Language1、Basic concepts k、Meta-language symbol The symbols that are used in describing the relations of grammar symbol, E.g. “” and “|” are called as meta-language symbol.

  14. Chapter 2 Language & Syntax DescriptionSection 2 Grammar & Language2、Formal definition a、Grammar definition A grammar G is defined as a quadruple (VN,VT,P,S) b、Catalog of grammars According to the limitation on the production rules in a grammar, we can classify grammars into 4 sorts, such as ,0-type grammar、1-type grammar、2-type grammar and 3-type grammar

  15. Chapter 2 Language & Syntax DescriptionSection 2 Grammar & Language2、Formal definition b、Catalog of grammars (1) 0-type grammar (Phrase grammar or grammar without limitation) • To any production  in P where V+ and V*, there is at least a non-terminal symbol in . Notes:The automation that can recognizes a 0-type language is called as Turing Machine; 0-type grammar is a grammar that has least limitation on its productions; We can get other types of grammar by limiting the form of productions in a 0-type grammar.

  16. Chapter 2 Language & Syntax DescriptionSection 2 Grammar & Language2、Formal definition b、Catalog of grammars (2) 1-type grammar(context-sensitive grammar or length-added grammar) • To any production  in P,there is the limitation of ||>=|| except for S  . If S  ,S can not appear in the right side of any production. • Or,any production  in P has the form of A  (where , V* ,A VN, V+) except for S  . Notes: The automation that can recognizes a 1-type language is called as Linear Bound (LBA) ; In a 1-type grammar, we should consider the context of a non-terminal symbol when we replace the non-terminal symbol. And a non-terminal symbol can not be replaced by  except that the Start Symbol can produce 

  17. Chapter 2 Language & Syntax DescriptionSection 2 Grammar & Language2、Formal definition b、Catalog of grammars (3) 2-type grammar(Context-free grammar) • Every production in P is of the form A where AVN, V*. Notes:The left side of each production should be a non-terminal symbol, the right side of each production may be VN , VT or .The automation that recognizes a 2-type language is called as Push-Down Automation(PDA)

  18. Chapter 2 Language & Syntax DescriptionSection 2 Grammar & Language2、Formal definition b、Catalog of grammars (4)3-type grammar(Regular grammar, right-linear grammar or left-linear grammar) • Every production in P is of the form A  B, A  ,or A B, A , where A,BVN,VT*。 Notes: The productions in 3-type grammar are right-linear productions or else left-linear productions. There cannot be either left-linear productions or right-linear productions. If all the productions in a 3-type grammar are left-linear productions, we call name grammar as left-linear grammar. If all the productions in a 3-type grammar are right-linear productions, we name the grammar as right-linear grammar.

  19. Chapter 2 Language & Syntax DescriptionSection 2 Grammar & Language2、Formal definition b、Catalog of grammars (4)3-type grammar(Regular grammar, right-linear grammar or left-linear grammar) Notes: The automation that recognizes 3-type language is called as finite state automation;  2-type grammar=self-embedded grammar(The productions are of the form S  aSb) +regular grammar, that is, any 2-type grammar without self-embedded property is equivalent to regular grammar.

  20. Chapter 2 Language & Syntax DescriptionSection 2 Grammar & Language2、Formal definition b、Catalog of grammars

  21. Chapter 2 Language & Syntax DescriptionSection 2 Grammar & Language2、Formal definition c、i-type language • A language produced from i-type. Written as L(G): L(G)={| VT*,and S } +

  22. Example:Let G1=({S},{a,b},P,S) Where P includes: (0) S aS (1) S a (2) S b L(G1)={ai(a|b)|i>=0} Example:LetG2=({S},{a,b},P,S) Where P includes: (0) S aSb (1) S ab L(G2)={anbn|n>=1}

  23. Chapter 2 Language & Syntax DescriptionSection 2 Grammar & Language2、Formal definition Notes:Limitations on productions in grammars used by lexical analysis and syntax analysis are as followings, • There is not the production such as P P, for this kind of production would be useless but for leading to ambiguity • Any non-terminal symbol P should be accessed , and can derive terminal string. • Start from the Start Symbol S,there exists the derivation S P • P must be able to derive a terminal string , that is P  ;  VT*. + *

  24. Chapter 2 Language & Syntax DescriptionSection 3 Grammar construction and simplification1、Constructing a grammar from a language Example1:Let L1={a2nbn|n>=1 and a,b  VT} Try to construct the grammar G1 from L1 Let n=1, L1 =aab n=2, L1 =aaaabb n=3, L1 =aaaaaabbb …… So we have:S  aaSb S  aab

  25. Chapter 2 Language & Syntax DescriptionSection 3 Grammar construction and simplification1、Constructing a grammar from a language Example 2:Let L2={aibjck | i,j,k>=1 and a,b,c  VT} Try to construct the grammar G2 from L2 S aS S  aB B bB B bC C cC | c

  26. Chapter 2 Language & Syntax DescriptionSection 3 Grammar construction and simplification1、Constructing a grammar from a language Example 3:Let L3={ |  (a,b)* and there are as many a’s as b’s in } Try to construct the grammar G3 from L3 S   S bB, S  aA A bS|b , A aAA B aS | a | bBB (0) S   S aSbS S bSaS

  27. Chapter 2 Language & Syntax DescriptionSection 3 Grammar construction and simplification1、Constructing a grammar from a language Example 4:Let L4={ |  (0,1)* and the number of 1 appeared in  is even} Try to construct the grammar G4 from L4 S   S 0S, S  1A A 0A , A 1S

  28. Chapter 2 Language & Syntax DescriptionSection 3 Grammar construction and simplification2、Grammar Simplification a、Because a language can be described in different grammars, it is true that should select the grammar which has least productions and is the most suitable to the properties of the language. b、In a grammar, there may be some redundant productions that are useless to derivation. We should delete these productions. • The production which is of the form PP • The production which can not derive a terminal string forever • The production whose left-side non-terminal symbol does not appear in the right-side of any production

  29. Chapter 2 Language & Syntax DescriptionSection 3 Grammar construction and simplification2、Grammar Simplification c、Steps of simplification: • Look for the productions of the form PP, and delete them; • If a production can not be used in the derivations forever, delete it; • If a production can not derive a terminal string, delete it; • Arrange the remained productions.

  30. Chapter 2 Language & Syntax DescriptionSection 3 Grammar construction and simplification2、Grammar Simplification Example:Simplify the following grammar (0)S  Be (1)S  Ec (2)A  Ae (3)A e (4)A A (5)B Ce (6)B Af (7)C Cf (8)D f Result: (0) S  Be (1)A  Ae (2)A e (3)B Af

  31. Chapter 2 Language & Syntax DescriptionSection 3 Grammar construction and simplification3、Construct a context-free grammar without -production a、A context-free grammar without -production should satisfy the conditions as followings • If there is the production S   of the form in P, S should not appear in right-side of any production, where S is the Start Symbol of the grammar; • There are no other -productions in P. b、The algorithm to construct a context-free grammar without -production: • G=(VN,VT,P,S) G’=(V’N,V’T,P’,S’) (1) Find out all non-terminal symbols that can derive  after some steps, and put them into the set V0;

  32. Chapter 2 Language & Syntax DescriptionSection 3 Grammar construction and simplification3、Construct a context-free grammar without -production b、The algorithm to construct a context-free grammar without -production: (2)Construct the P’ set of productions of G’ as following steps: (A)If an symbol in V0 appears in the right-side of a production, change the production into two productions:substitute the symbol in  and itself in the production respectively;put the new productions into P’ ( B)Otherwise, put the productions relating to the symbol into P’ except for -production relating to the symbol ( C)If there exists the production of the form S  in P, change the production into S’  | S and put them into P’,let S’ be the Start Symbol of G’,let V’N=VN{S’ },

  33. Example:Let G1=({S},{a,b},P,S),where P: (0) S   (1) S aSbS (2) S bSaS (1)V0={S} (2)P’ (1) SabS|aSbS|aSb|ab (2) SbaS|bSaS|bSa|ba (0) S’   | S So:G1’=({S’,S},{a,b},P’,S’),where P’: (0) S’   | S (1)S abS|aSbS|aSb|ab (2) S baS|bSaS|bSa|ba

  34. Chapter 2 Language & Syntax DescriptionSection 4 Ambiguity of a grammar a、Ambiguity of a sentence If a sentence in a grammar has two or more related syntax tree, the sentence is ambiguous. b、Ambiguity of a grammar If a language to a grammar has ambiguous sentences, the grammar is ambiguous.

  35. Chapter 2 Language & Syntax DescriptionSection 4 Ambiguity of a grammar Example:G=({E},{+,*,(,),i},P,E) where:E E+E | E*E | (E) | i To the sentence (i* i+ i), there are two leftmost derivations, thus there are two syntax trees to the sentence. (1) E (E) (E+E) (E*E+E) ( i*E+E) ( i*i+E) ( i* i+ i) (2) E (E) (E*E) ( i*E) ( i*E+E) ( i*i+E) ( i* i+ i)

  36. E ( E ) E + E E * E i i i E ( E ) E * E i E + E i i

  37. Chapter 2 Language & Syntax DescriptionSection 4 Ambiguity of a grammar Notes: (1)Ambiguity would bring uncertainty of syntax analysis (2)Ambiguity of a grammar is undetermined, that is, there is no such algorithm that can determine a grammar is an ambiguous grammar in finite steps (3)If you want to prove a grammar is ambiguous, you just give a counterexample (4)If we can control the ambiguity of a grammar, that is, use additional conditions, the existence of ambiguity is not so bad

More Related