1 / 30

CSCI 3130: Automata theory and formal languages

Fall 2010. The Chinese University of Hong Kong. CSCI 3130: Automata theory and formal languages. Text search and closure properties. Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130. Text search. The program grep. grep -E regexp file.

Download Presentation

CSCI 3130: Automata theory and formal languages

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fall 2010 The Chinese University of Hong Kong CSCI 3130: Automata theory and formal languages Text search and closure properties Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130

  2. Text search

  3. The program grep grep -E regexpfile Searches for the occurrence of patterns matching a regular expression cat|12 {cat, 12} union [abc] {a, b, c} shorthand for a|b|c [ab][12] {a1, a2, b1, b2} concatenation (ab)* {e, ab, abab, ...} star [ab]? {e, a, b} zero or one (cat)+{cat, catcat, ...}one or more [ab]{2}{aa, ab, ba, bb}{n} copies

  4. Searching with grep Words containing savor or savour Words with 5 consecutive a or b grep –E `[ab]{5}` words grep –E `savou?r` words grabbable savorsome savory savour unsavored unsavoredly unsavoredness unsavorily unsavoriness unsavory outsavor savor savored savorer savorily savoriness savoringly savorless savorous grep –E `zo+zo+` words zoozoo

  5. More grep commands . any symbol 5 a [a-z] anything in a range n f a 1 \< beginning of line s u f f e r 2 $ end of line l r e g u l a r 3 1 If a DFA is too hard, I do an ... e n 2CSCI 3130 homework makes me ... s t a r 4 3If a DFA likes it, it is ... 4$10000000 =$10? 5I study 3130 hard because it will make me ... grep –E `\<.ff.u..t` words

  6. how do you look for... Words that start in cat and have another cat grep –E `\<cat.*cat` words Words with at least ten vowels? grep –E `([aeiouy].*){10}` words Words without any vowels? grep –E `\<[^AEIOUYaeiouy]*$` words Words with exactly ten vowels? [^R] does not contain R grep –E `\<[^AEIOUYaeiouy]* ([aeiouy][^AEIOUYaeiouy]*){10}$` words

  7. How grep (could) work regularexpression NFA NFAwithout e DFA input text file differences in class in grep not allowed allowed [ab]?, a+, (cat){3} matches whole looks for pattern input handling accept/reject finds pattern output

  8. Implementation of grep • How do you handle expressions like: [ab]? zero or one → ()|[ab] R? → e|R (cat)+ one or more → (cat)(cat)* R+ → RR* a{3}{n} copies → aaa ? R{n} → RR...R [^aeiouy] not containing any n times

  9. Closure properties

  10. Example • The language L of strings that end in 101 is regular • How about the language L of strings that do not end in 101? (0+1)*101

  11. Example • Hint:w does not end in 101if and only if it ends in:or it has length 0, 1, or 2 • So L can be described by the regular expression 000, 001, 010, 011, 100, 110 or 111 (0+1)*(000+001+010+010+100+110+111) + e + (0 + 1) + (0 + 1)(0 + 1)

  12. Complement • The complementL of a language L is the set of all strings that are not in L • Examples (S = {0, 1}) • L1 = all strings that end in 101 • L1= all strings that do not end in 101= all strings end in 000, …, 111 or have length 0, 1, or 2 • L2 = 1* = {e, 1, 11, 111, …} • L2 = all strings that contain at least one 0 = (0 + 1)*0(0 + 1)*

  13. Example • The language L of strings that contain101 is regular • How about the language L of strings that do not contain 101? (0+1)*101(0+1)* You can write a regular expression, but it is a lot of work!

  14. Closure under complement • To argue this, we can use any of the equivalent definitions for regular languages: • The DFA definition will be most convenient • We assume L has a DFA, and show L also has a DFA If L is a regular language, so is L. regularexpression NFA DFA

  15. Arguing closure under complement • Suppose L is regular, then it has a DFA M • Now consider the DFA M’ with the accepting and rejecting states of Mreversed accepts L accepts strings not in L this is exactly L

  16. Food for thought NO! • Can we do the same thing with an NFA? 0, 1 1 0 (0+1)*10 q0 q1 q2 0, 1 (0+1)* 1 0 q0 q1 q2

  17. Intersection • The intersectionL L’ is the set of strings that are in both L and L’ • Examples: • If L, L’ are regular, is L L’ also regular? L L’ = L = (0 + 1)*11 L’ = 1* 1*11 L L’ = ∅ L = (0 + 1)*10 L’ = 1*

  18. Closure under intersection If L and L’ are regular languages, so is L L’. • To argue this, we can use any of the equivalent definitions for regular languages: regularexpression NFA DFA Suppose L and L’ have DFAs, call them M and M’ Goal: Construct a DFA (or NFA) for L L’

  19. An example M 1 1 0 r1 r0 0 0 0 L = even number of 0s L’ = odd number of 1s M’ 1 s1 s0 1 1 r0, s0 r0, s1 1 0 0 0 0 1 r1, s0 r1, s1 1 L  L’ = even number of 0s and odd number of 1s

  20. Closure under intersection M and M’ DFA for L  L’ states Q = {r1, ..., rn} F for M Q × Q’ = {(r1, s1), (r1, s2), ..., (r2, s1), ..., (rn, sn’)} F’ for M’ Q’ = {s1, ..., sn’} start state ri for M sj for M’ (ri,sj) accepting states F × F’ = {(ri, sj): ri ∈F, sj ∈F’} Whenever M is in state ri and M’ is in state sj, the DFA for L  L’ will be in state (ri,sj)

  21. Closure under intersection M and M’ DFA for L  L’ a transitions in M ri,si’ rj,sj’ in M’ a a si’ ri sj’ rj

  22. Reversal • The reversalwR of a string w is w written backwards • The reversalLRof a languageL is the language obtained by reversing all its strings w = cave wR = evac L = {cat, dog} LR = {tac, god}

  23. Reversal of regular languages • L = all strings that end in 101 is regular • How about LR? • This is the language of all strings beginning in 101 • Yes, because it is represented by (0+1)*101 101(0+1)*

  24. Closure under reversal • How do we argue? If L is a regular language, so is LR. regularexpression NFA DFA

  25. Arguing closure under reversal • Take a regular expression E for L • We will show how to reverse E • A regular expression can be of the following types: • The special symbols  and e • Alphabet symbols like a, b • The union, concatenation, or star of simpler expressions

  26. Proof of closure under reversal regular expression E reversal ER Æ Æ e e a (alphabet symbol) a E1 + E2 E1R + E2R E1E2 E2RE1R E1* (E1R)*

  27. A question L = {cat, dog} LDUP = {ww: w L} Ex. LDUP = {catcat, dogdog} If L is regular, is LDUP also regular? ? regularexpression NFA DFA

  28. A question • Let’s try with regular expression: • Let’s try with NFA: L = {a, b}    q0 q1 NFA for L NFA for L LDUP = LL LDUP = {aa, bb} LL = {aa, ab, ba, bb}

  29. An example • Let’s try to design an NFA for LDUP L = 0*1 is regular LDUP = {1, 01, 001, 0001, ...} LDUP = {11, 0101, 001001, 00010001, ...} = {0n10n1: n ≥ 0}

  30. An example 0 0 0 0 1 1 1 1 01 1 001 0001 LDUP = {11, 0101, 001001, 00010001, ...} = {0n10n1: n ≥ 0}

More Related