1 / 13

Strings

Strings. A string over a set A is a finite sequence of elements from A. The set of elements from which the strings are built is called an alphabet. Definition. An alphabet is a nonempty, finite set of indivisible symbols. We are going to denote it by .

lew
Download Presentation

Strings

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Strings • A string over a set A is a finite sequence of elements from A. • The set of elements from which the strings are built is called • an alphabet. Definition. An alphabet is a nonempty, finite set of indivisible symbols. We are going to denote it by . • Any program is a string of keywords, variable names, and • permissible symbols. • A programming language should satisfy general rules (grammar) • to be understood by computer (compiler). These rules are studied • by the formal theory of programming languages.

  2. Definition. A string w over an alphabet is a sequence of symbols, w = a1a2… an, where each ai , 1 i  n. • The number of symbols is called thelengthof the string, |w|=n. There is one special string that has zero length (contains no symbols). It is called the empty string and has special notation , . |w|=0  w = .  is not an element of any alphabet,  . Example. Let  = {a, bb, c}. Find all possible strings with length less or equal 3 built from . Length 0:  Length 1: a bb c Length 2: aaabb ac bba bbbb bbc ca cbb cc Length 3: aaa aabb aac abba abbbb abbc aca acbb …

  3. Two strings u and v can be concatenated to form a single • string uv, that consists of the symbols of string u, followed by • symbols of of string v. The length | uv | = | u|+| v |. • u = u = u • Concatenation is associative: (uv)w = u(vw), • but not commutative: uv  vu (the order is important!).

  4. ? Q : L1L2 = L2L1 Definition. Any set of strings over some alphabet is called a language. Examples: Set of all executable computer programs is a language. Alphabet itself is a language as well (the language of all one-symbol words). Since languages are sets, we can apply all set operations to languages: union, intersection and set difference. There is one operation specific for languages: concatenation of two languages : L1L2 ={uv | u  L1 and v  L2 } A: L1L2L2L1

  5. Example. Take the alphabet = {a, b, c}. Consider two languages over alphabet : L1 ={a, ab} and L2 ={b, bc, c}. Find L1L2 and L2L1. We need to take every string from L1 and concatenate with every string from L2 . In this way we get |L1||L2 | strings: ab, abc, ac, abb, abbc, abc. Note, that not all strings are distinct, like abc. L1L2 = {ab, abc, ac, abb, abbc} . In the same way: L2L1 = {ba, bab, bca, bcab, ca, cab}. The cardinality | L1L2 | is the number of distinct strings, resulting from concatenation . In general, | L1L2 |  | L1|  |L2 | and | L1L2 |  | L2L1 | In the example | L1L2 | = 5< | L1|  |L2 |=6.

  6. In particular, we can consider the concatenation of an alphabet  with itself:  is the language of all two-symbol words. Notation:  = 2 Example: ={a, b}, = 2 = {aa, ab, ba, bb} Similarly, 3 = 2, the language that consists of all 3-symbol words: 3 ={aaa, aba, baa, bba, aab, abb, bab, bbb}. So, we can define recursively for any n>1: n = n-1 To make this recursive definition agree with the basis case n =1, = 0, zero power 0 is defined as 0 = {}, (no matter what is ). Then {} ={ x | x  } = { x | x  }=  What is  2? What is  2  3  …  n?

  7. ‘Kleene star’ notation: * = 0  1  2  … So, * is the (infinite) set of all possible words over alphabet , including empty string . Example. = {0, 1}. * is an infinite set of all possible bit strings. (or all binary numbers including numbers with leading 0’s and empty string). Any language L over alphabet is a subset of *, L  *. Note that {} , because  ={}  {} | {}|=1, ||=|{}|=0. A language L may contain  , or may not.

  8. Example. Consider two languages over alphabet  = {a}: L1={aa}, L2={, aa, aaaa}. What is L1*? By definition of Kleene star L1* = L10 L11 L12 … ={}{aa} {aaaa} {aaaaaa} … = {, aa, aaaa, aaaaaa, …} infinite set of strings of even length build from symbol a. What is L2*? L2* = L20 L21 L22 … ={}{, aa, aaaa} {, aa, aaaa, aaaaaa, aaaaaaaa}… ={, aa, aaaa, aaaaaa, …} = L1*

  9. Definition. A string u is called a substring of v if there exist two strings x and y, such that v = xuy, and x, y  * Definition. A string u is called a prefix of v if there exists a string x  *, such that v = ux. Similarly, a string u is called a suffix of v if there exists a string y  *, such that v = yu.

  10. Theorem 1. Let A, B and C be sets of strings. Then (AB)C = ACBC Proof. a) We need to prove the equality of two sets of strings. We can do it by double-inclusion, i. e. to show that i) (AB)C  ACBC and ii) ACBC  (AB)C

  11. i) To prove (AB)C  ACBC, it’s suffices to show that for any string w, w(AB)C  wAC BC Take any w (AB)C x, y, such that w = xy and x(AB) and yC (dfn of concat) … (xA or xB) and y C (dfn of  ) … (xA and yC)or (xB and yC) (distributive property)  w AC or wBC (dfn of concat)  w AC BC (dfn of )

  12. ii) To prove that ACBC (AB)C, we need to show that for any string w, w ACBC  w (AB)C Take any w  AC BC  w AC or w BC (dfnof ) x, y, such that w = xy and (x A and y C) or (x B and y C) (dfnof concat) So we can have two cases. In the first case, (x A and y C) implies that (x AB and y C) because A (AB). In the second case, (x B and y C) implies that (x AB and y C) because B (AB). So, in either case we have  w (AB)C(dfn of concat) So, we proved ACBC (AB)C and (AB)CACBC, that means (AB)C = ACBC

  13. Theorem 2. Let A, B and C be sets of strings. Then (AB)C  ACBC Proof. To prove subset relation we need to show that for any string w, w(AB)C  wACBC. Why not to prove ACBC  (AB)C as well? Let’s try. Take arbitrary wACBC  wAC and wBC . (x, y, w=xy, xA and yC)and (u,v, w=uv, uB and vC) Can we imply xy=uv  x = u ? No, because the same string abc may come from abc and abc Example.A ={a}, B ={ab}, C ={c, bc}. Then AB={}, (AB)C={}. AC={ac, abc} BC={abc, abbc} abc ACBC, but we can not imply that abc (AB)C={}

More Related