# Languages & Strings - PowerPoint PPT Presentation

Languages & Strings

1 / 18
Languages & Strings

## Languages & Strings

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. Languages & Strings String Operations Language Definitions

2. Strings • A string x (over alphabet A) is a finite sequence x = x1x2 .. xn where xi A. • Length – the length of x is the number of characters, n, in the sequence. • Empty String – λ denotes the empty string of length 0. • Recursive definition of the set of strings A* over alphabet A • Basis : The empty string λ A* • Recursive Step : If x  A* and a  A, then xa  A* • Closure : A* contains no other strings

3. Languages and String Operations • Languages • A language L over alphabet A is any subset of A* • Concatenation : The concatenation of two strings x, y is xy, a string of length of x + length of y. • The concatenation of two languages : The concatenation of two languages L and M is LM, where LM = { z | z = xy where x  L, y  M. • Example: T = D* and O = {“+”,”-”} where D = {0,1,..,9}. Then TOT is the language {“1+1, 12+24, . . .}

4. Recursive Definition of Regular Sets • Let A be an alphabet. The regular sets over A are: • Basis :  , {λ} and {a} are each regular sets • Recursive Step : If X, Y are regular sets, so is • X  Y • XY • X* • Closure : X is a regular set over A iff it can be obtained by a finite number of applications of the recursive step

5. Regular Set Examples • Signed and unsigned integers • Unparethesized expressions with variable operands and binary operators if variables are formed by l – letter followed by string of string of l,d where d - digit • English sentences with structure <noun phrase><verb phrase><noun phrase> with • Lexical categories : d – determiner, a – adjective, n – noun, x – adverb, v - verb

6. Regular Set Examples • Signed and unsigned integers • ({}  {+}  {-}){d}{d}* • Expressions without parentheses • (({l}({l}  {d})*)(({+}  {*})(({l}({l}  {d})*))* • Sentences • ({d}{a}*{n})(({}  {x}){v})({d}{a}*{n})

7. Regular Expressions • The set of strings which begin with an “a” and end with a “b” is a regular set over {a,b} since it equals {a}({a}  {b})*{b}. • Regular expressions represent regular sets as follows: • , λ and a represent , {λ} and {a}. • If u and v are regular expressions (representing reguar sets) then (u  v), (uv) and (u*) are regular expressions representing their union, concatenation and Kleene closure. • Dropping superfluous parentheses, a(a,b)*b represents the regular set: all strings starting with a and ending with b.

8. Grammars A context free grammar G is a 4-tuple : G = ( V,,P,S ) where 1.V is a set of nonterminals (or string variables), each representing a sublanguage from which the variable takes its values. Examples are <noun phrase> which can take on values such as “the big box” and T which can take on string values used to represent products in an algebraic expression. 2. is a finite alphabet. Examples are the English vocabulary (consisting of over a hundred thousand words, each treated as an atomic symbol). Another example is the printable ASCII character set. The binary alphabet consists of {0,1}. The alphabet contains the symbols from which language strings are formed.

9. Grammars Continued 3.P is a finite set of productions or rules used to define the sublanguages represented by the nonterminals. In a context free grammar, a rule has the format A  X where A  V and X  ( V  )* . The interpretation is that the strings in the sublanguage represented by A can be constructed according to the format indicated by X. For a terminal character in X, the terminal character is used in the A string and for a variable in X, a string in the sublanguage is substituted for the variable. Examples are <noun phrase>  <determiner> <adj-list> <noun> and T a * T. 4.S is a designated variable (referred to as the start symbol or the head of the language). It represents the language being defined by the grammar G.

10. Grammar Examples • Signed and unsigned integers • Unparethesized expressions with variable operands and binary operators if variables are formed by l – letter followed by string of string of l,d where d - digit • English sentences with structure <noun phrase><verb phrase><noun phrase> with • Lexical categories : d – determiner, a – adjective, n – noun, x – adverb, v - verb

11. Grammar Examples • Signed and unsigned integers • I  SD, S  + | - | , D  dD, D  d • Unparethesized expressions with variable operands and binary operators if variables are formed by l – letter followed by string of string of l,d where d – digit • E  VE, E  V, V  lU, U  lU, U  dU, U   • English sentences with structure <noun phrase><verb phrase><noun phrase> with • Lexical categories : d – determiner, a – adjective, n – noun, x – adverb, v - verb

12. Grammars and Derivations Derivations If u,v are strings in ( V  )* , A is in V and A  X is in P, then uAv  uXv , referred to as uAv “derives” uXv by application of the rule A  X. For repeated applications of 0 or more rules, the symbol * is used. Language Definition The language L(G) defined by G is { x | x *, S * x }

13. Language Definition • Language Definition is a means of specifying which strings belong to the language. Two approaches to language definition are • Acceptive – Given a string, a device specifies whether or not it belongs to the language. • An automaton A which processes a language string x accepts x as belonging to the language if it’s final state belongs to set of legal final states. • A parser constructed from the grammar defining the language accepts the string if it can parse it. • Generative – Given an alphabet, a generative device tells how strings in the language are formed • A language manual which tells how strings are formed can be used to generate language strings. • A grammar is a generative means of specification. Any string which can be derived from the start symbol by applying gramar rules is in the language.

14. Grammars and Derivations • Derivations If u,v are strings in ( V  )* , • A is in V and • A  X is in P, • then uAv  uXv , referred to as uAv “derives” uXv by application of the rule A  X. • For repeated applications of 0 or more rules, the symbol * is used. • Language Definition The language L(G) defined by G is • { x | x *, S * x }

15. Finite state automata and language recognition d I d S · · F D d d Finite state automaton has  = {d,•} , start state S and legal final states I and D. The transition function is represented by above diagram or table below: d • S I F I I D F D D D - Accepts : ddd, d.dd, .ddd Rejects d.dd.d

16. Automata as Acceptors d I d S · · F D d d • The string • ddd.d produces the state sequence : SIIIDD is accepted in L because the last state D is a legal final state. • The string • .dd produces the state sequence : SFD is accepted because D is legal. • The string • ddd produces the state sequence : SIII is accepted because I is legal

17. Parsing • Given a Grammar G with distinguished nonterminal S and a string X over the alphabet, does S * X? • Parsing attempts to find a sequence of rules by which • S * X

18. Parse tree for d d . d d d I d I d I • D d D d D d Grammar for Decimal Numbers I  d I I  d I  • D D  d D D  d A parse tree has intermediate nodes for nonterminals, a child node for each RHS character in the production used to replace the nonterminal, a leaf node for each character in the language string produced by the derivation. The language is the set of strings for which there exist parse trees.