260 likes | 403 Views
So far. A language is a set of strings over an alphabet. Languages serve two purposes in computing: (a) communicating instructions or information (b) defining valid communications. We have defined languages by: (i) regular expressions (ii) finite state automata.
E N D
So far ... A language is a set of strings over an alphabet. • Languages serve two purposes in computing: • (a) communicating instructions or • information • (b) defining valid communications • We have defined languages by: • (i) regular expressions • (ii) finite state automata Both (i) and (ii) give us exactly the same class of languages. What about languages outwith this class?
Specifying Non-Regular Languages • We have already seen a number of languages that are not regular. • In particular, • {anbn : n ≥ 0} • the language of matched round brackets • arithmetic expressions • standard programming languages • are not regular. However, these languages are all systematic constructions, and can be clearly and explicitly defined. • Consider L = {anbn : n ≥ 0}: • (i) l Î L • (ii) if x Î L, then axbÎ L • (iii) nothing else is in L • This is a clear and concise specification of L. • Can we use it to generate members of L?
Generating Languages • Using the previous definition of L, and the notion of string substitution, we can give a generative definition of L. Let X be a new symbol. • 1) X -> l • 2) X -> aXb This definition says that if we have a symbol X, we can replace it by the empty string, or by aXb. We now define L to be all strings over {a,b} formed by starting with X and applying rules 1) and 2) until we get a string with no X's. Example: X => aXb => aaXbb => aabb X => l X => aXb => aaXbb => aaaXbbb => aaabbb
A grammar is a 4-tuple, G = (N, T, S, P), where • N is a finite alphabet called the non-terminals; • T is a finite alphabet, called the terminals; • N Ç T = f; • S Î N is the start symbol; and • P is a finite set of productions of the form • a -> b, where a Î (N È T)+, a has at least • one member from N, and bÎ (N È T)* Grammar Formalising the previous notion of a generative definition based on string substitution, we get: • Thus the previous example is a grammar where • N = {X} • T = {a, b} • S = X • P = { X -> l, X -> aXb} • so G = ({X}, {a,b}, {X}, {X -> l , X -> aXb})
Definitions and Notation Let G = (N,T,S,P) be a grammar. If s, t, x, y, u and v are strings s.t. s = xuy , t = xvy, and (u -> v ) Î P then sdirectly derivest., written s => t. aaaSbbb => aaaaSbbbb. S =>* aaaabbbb. aaaSbbb is a sentential form of G aaaabbbb is a sentence of G. L(G) = {, ab, aabb, aaabbb, ...}, which is {anbn: n ≥ 0} If there is a sequence of strings s0, s1, ..., sn s.t. s0 => s1 => ... => sn-1 => sn, then s0derivessn, written s0 =>* sn. A sentential form of G is a string wÎ (N È T)* s.t. S =>* w. A sentence of G is a sentential form wÎ T* i.e. one with no non-terminals. The language defined by G is the set of all sentences of G, denoted L(G).
Definitions and Notation (cont.) • a -> b1 | b2 | b3 ... | bn is shorthand for • a -> b1 • a -> b2 • : • a -> bn Notation: we normally order the set of productions, and assign them numbers. If x => y by using rule number i, then we write x =>iy In general, non-terminals will be uppercase, while terminals will be lowercase. A context-free grammar (CFG) is one in which all productions are of the form a -> b, where aÎ N - i.e. the left-hand side is a single non-terminal. A context-free language (CFL) is one that can be defined by a context-free grammar.
Context-Free Grammars A CFG is called context-free because the left-hand side of all productions contain only single symbols, and so a production can be applied to a symbol without needing to consider the symbol's context. We only consider context-free grammars in this course. Some languages are not context-free. Example: {anbncn : n ≥ 0} Some languages cannot be defined by any grammar. It is believed that these are the same languages that cannot be defined by any algorithm or effective procedure.
G = ({S}, {a, +, *, (, )}, S, { S -> S+S | S*S | (S) | a} ) Example CFG
G = ({S}, {a, +, *, (, )}, S, { S -> S+S | S*S | (S) | a} ) This is a grammar of algebraic expressions. The productions are: 1) S -> S + S 2) S -> S * S 3) S -> (S) 4) S -> a. Example derivation: S => S * S => a * S => a * (S) => a * (S + S) => a * (a + S) => a * (a + a). Note that there are many other ways of deriving the same string. Example CFG
Why Grammar? • In English, the grammar is the set of conventions • defining the structure of sentences - e.g. • a sentence must have a subject and an object • verbs must agree with nouns • e.g. "John walks" & "John and Mary walk" • adjectives come before nouns • e.g. "the red car" and not "the car red" We have shown a formalisation of this notion. We now can write explicit clear statements of what sentences are in a language. Grammars can be used in the processing of natural language by computer (4th year option), in formalising design, in pattern recognition, and many other areas.
A grammar for a small part of English S -> NP VP NP -> Det NP1 | PN NP1 -> Adj NP1| N Det -> a | the PN -> peter | paul | mary Adj -> large | black N -> dog | cat | horse VP -> V NP V -> is | likes | hates Can you derive: peter is a large black cat
A grammar for a small part of English S -> NP VP NP -> Det NP1 | PN NP1 -> Adj NP1| N Det -> a | the PN -> peter | paul | mary Adj -> large | black N -> dog | cat | horse VP -> V NP V -> is | likes | hates Example derivations: S => NP VP => PN VP => mary VP => mary V NP => mary hates NP => mary hates Det NP1 => mary hates the NP1 => mary hates the N => mary hates the dog S => NP VP => NP V NP => NP V Det NP1 => NP V a NP1 => NP V a Adj NP1 => NP is a Adj NP1 => NP is a Adj Adj NP1 => NP is a large Adj NP1 => NP is a large Adj N => NP is a large black N => NP is a large black cat => PN is a large black cat => peter is a large black cat
Regular Grammars • A grammar is regular if each production is of • the form: • (i) A -> t or • (ii) A -> tB • (iii) A -> l • where A, B Î N, tÎ T. Example: S -> aA | bB A -> aS | a B -> bS | b Is this s sentence of the language? aaaabb
Regular Grammars • A grammar is regular if each production is of • the form: • (i) A -> t or • (ii) A -> tB • (iii) A -> l • where A, B Î N, tÎ T. Example: S -> aA | bB A -> aS | a B -> bS | b S => aA => aaS => aaaA => aaaaS => aaaabB => aaaabb
Regular Grammars • A grammar is regular if each production is of • the form: • (i) A -> t or • (ii) A -> tB • (iii) A -> l • where A, B Î N, tÎ T. Example: S -> aA | bB A -> aS | a B -> bS | b S => aA => aaS => aaaA => aaaaS => aaaabB => aaaabb The language generated by this grammar is the language denoted by …..
Regular Grammars • A grammar is regular if each production is of • the form: • (i) A -> t or • (ii) A -> tB • (iii) A -> l • where A, B Î N, tÎ T. Example: S -> aA | bB A -> aS | a B -> bS | b S => aA => aaS => aaaA => aaaaS => aaaabB => aaaabb The language generated by this grammar is the language denoted by (aa + bb)+
Regular Grammars and Regular Languages Theorem: (stated here without proof) A language is regular iff it can be defined by a regular grammar. • Thus we now have three different definitions • of the one class of languages: • regular expressions • finite state automata • regular grammars All three are useful in Computing Science
Example CFG (2) 1) S -> XaaX 2) X -> aX 3) X -> bX 4) X -> l S => XaaX => bXaaX => baXaaX => babXaaX => babaaX => babaaaX => babaaabX => babaaab 3 1 2 3 3 2 4 4 This grammar defines the language: ………
Example CFG (2) 1) S -> XaaX 2) X -> aX 3) X -> bX 4) X -> l S => XaaX => bXaaX => baXaaX => babXaaX => babaaX => babaaaX => babaaabX => babaaab 3 1 2 3 3 2 4 4 This grammar defines the language (a + b)*aa(a + b)*
...as a Regular Grammar 1) S -> aS 2) S -> bS 3) S -> aM 4) M -> aB 5) B -> aB 6) B -> bB 7) B -> l S => bS => baS => babS => babaM => babaaB => babaaaB => babaaabB => babaaab 1 3 4 2 2 6 5 7 S => bS => baM => baaB => baa 2 3 4 7
Example: <simple decl> ::= <type> <id list> <type> ::= real | integer | boolean <id list> ::= identifier | <id list> identifier Backus-Naur Form A notation devised for defining the language Algol 60. PASCAL syntax rules are often presented in this form. This formalism is equivalent to CFG's, where names enclosed in <...> are non-terminals, names in bold are terminals, and ::= is the same as the -> notation.
Constructing Grammars Suppose we wanted to construct a grammar for the language of all strings of the form accc...cb or abab...abcc....cabab...ab n times n times • We need to find rules to create: • (i) sequences of strings - ccc....c • (ii) bracketed strigs - accc...cb, and • (iii) nested strings - abab...ab<...>abab...ab • Sequencing • A -> aA | l or A -> Aa | l • e.g. A => aA => aaA => ... => aaaaaA => aaaaa • Bracketing • A -> aBb or A -> Bb • B ->xB | l B -> ax | Bx • e.g. A => aBb => axBb => axxBb => ... => axxxxxb
S -> abSab | abBab B -> cB | c What language does this generate? (Say it precisely) Constructing Grammars (cont.) • Nesting • A -> aAb | B • B -> xB | l • e.g. A => aAb => aaAbb => aaaAbbb => ... => • aaaaaAbbbbb => aaaaaBbbbbb => ... => • aaaaaxxxBbbbbb => aaaaaxxxbbbbb Example:
S -> abSab | abBab B -> cB | c What language does this generate? The language (ab)n+cm+(ab)n (where n>0 and m>0) Constructing Grammars (cont.) • Nesting • A -> aAb | B • B -> xB | l • e.g. A => aAb => aaAbb => aaaAbbb => ... => • aaaaaAbbbbb => aaaaaBbbbbb => ... => • aaaaaxxxBbbbbb => aaaaaxxxbbbbb Example:
S -> abSab | abBab B -> cB | c Example derivations: S => abBab => abcBab => ... abccccab S => abSab => ababSabab =>abababSababab => abababBababab => abababcBababab => ... => abababccccababab Constructing Grammars (cont.) • Nesting • A -> aAb | B • B -> xB | l • e.g. A => aAb => aaAbb => aaaAbbb => ... => • aaaaaAbbbbb => aaaaaBbbbbb => ... => • aaaaaxxxBbbbbb => aaaaaxxxbbbbb Example: