Strings

Strings • A string over a set A is a finite sequence of elements from A. • The set of elements from which the strings are built is called • an alphabet. Definition. An alphabet is a nonempty, finite set of indivisible symbols. We are going to denote it by . • Any program is a string of keywords, variable names, and • permissible symbols. • A programming language should satisfy general rules (grammar) • to be understood by computer (compiler). These rules are studied • by the formal theory of programming languages.

Definition. A string w over an alphabet is a sequence of symbols, w = a1a2… an, where each ai , 1 i  n. • The number of symbols is called thelengthof the string, |w|=n. There is one special string that has zero length (contains no symbols). It is called the empty string and has special notation , . |w|=0  w = .  is not an element of any alphabet,  . Example. Let  = {a, bb, c}. Find all possible strings with length less or equal 3 built from . Length 0:  Length 1: a bb c Length 2: aaabb ac bba bbbb bbc ca cbb cc Length 3: aaa aabb aac abba abbbb abbc aca acbb …

Two strings u and v can be concatenated to form a single • string uv, that consists of the symbols of string u, followed by • symbols of of string v. The length | uv | = | u|+| v |. • u = u = u • Concatenation is associative: (uv)w = u(vw), • but not commutative: uv  vu (the order is important!).

? Q : L1L2 = L2L1 Definition. Any set of strings over some alphabet is called a language. Examples: Set of all executable computer programs is a language. Alphabet itself is a language as well (the language of all one-symbol words). Since languages are sets, we can apply all set operations to languages: union, intersection and set difference. There is one operation specific for languages: concatenation of two languages : L1L2 ={uv | u  L1 and v  L2 } A: L1L2L2L1

Example. Take the alphabet = {a, b, c}. Consider two languages over alphabet : L1 ={a, ab} and L2 ={b, bc, c}. Find L1L2 and L2L1. We need to take every string from L1 and concatenate with every string from L2 . In this way we get |L1||L2 | strings: ab, abc, ac, abb, abbc, abc. Note, that not all strings are distinct, like abc. L1L2 = {ab, abc, ac, abb, abbc} . In the same way: L2L1 = {ba, bab, bca, bcab, ca, cab}. The cardinality | L1L2 | is the number of distinct strings, resulting from concatenation . In general, | L1L2 |  | L1|  |L2 | and | L1L2 |  | L2L1 | In the example | L1L2 | = 5< | L1|  |L2 |=6.

In particular, we can consider the concatenation of an alphabet  with itself:  is the language of all two-symbol words. Notation:  = 2 Example: ={a, b}, = 2 = {aa, ab, ba, bb} Similarly, 3 = 2, the language that consists of all 3-symbol words: 3 ={aaa, aba, baa, bba, aab, abb, bab, bbb}. So, we can define recursively for any n>1: n = n-1 To make this recursive definition agree with the basis case n =1, = 0, zero power 0 is defined as 0 = {}, (no matter what is ). Then {} ={ x | x  } = { x | x  }=  What is  2? What is  2  3  …  n?

‘Kleene star’ notation: * = 0  1  2  … So, * is the (infinite) set of all possible words over alphabet , including empty string . Example. = {0, 1}. * is an infinite set of all possible bit strings. (or all binary numbers including numbers with leading 0’s and empty string). Any language L over alphabet is a subset of *, L  *. Note that {} , because  ={}  {} | {}|=1, ||=|{}|=0. A language L may contain  , or may not.

Example. Consider two languages over alphabet  = {a}: L1={aa}, L2={, aa, aaaa}. What is L1*? By definition of Kleene star L1* = L10 L11 L12 … ={}{aa} {aaaa} {aaaaaa} … = {, aa, aaaa, aaaaaa, …} infinite set of strings of even length build from symbol a. What is L2*? L2* = L20 L21 L22 … ={}{, aa, aaaa} {, aa, aaaa, aaaaaa, aaaaaaaa}… ={, aa, aaaa, aaaaaa, …} = L1*

Definition. A string u is called a substring of v if there exist two strings x and y, such that v = xuy, and x, y  * Definition. A string u is called a prefix of v if there exists a string x  *, such that v = ux. Similarly, a string u is called a suffix of v if there exists a string y  *, such that v = yu.

Theorem 1. Let A, B and C be sets of strings. Then (AB)C = ACBC Proof. a) We need to prove the equality of two sets of strings. We can do it by double-inclusion, i. e. to show that i) (AB)C  ACBC and ii) ACBC  (AB)C

i) To prove (AB)C  ACBC, it’s suffices to show that for any string w, w(AB)C  wAC BC Take any w (AB)C x, y, such that w = xy and x(AB) and yC (dfn of concat) … (xA or xB) and y C (dfn of  ) … (xA and yC)or (xB and yC) (distributive property)  w AC or wBC (dfn of concat)  w AC BC (dfn of )

ii) To prove that ACBC (AB)C, we need to show that for any string w, w ACBC  w (AB)C Take any w  AC BC  w AC or w BC (dfnof ) x, y, such that w = xy and (x A and y C) or (x B and y C) (dfnof concat) So we can have two cases. In the first case, (x A and y C) implies that (x AB and y C) because A (AB). In the second case, (x B and y C) implies that (x AB and y C) because B (AB). So, in either case we have  w (AB)C(dfn of concat) So, we proved ACBC (AB)C and (AB)CACBC, that means (AB)C = ACBC

Theorem 2. Let A, B and C be sets of strings. Then (AB)C  ACBC Proof. To prove subset relation we need to show that for any string w, w(AB)C  wACBC. Why not to prove ACBC  (AB)C as well? Let’s try. Take arbitrary wACBC  wAC and wBC . (x, y, w=xy, xA and yC)and (u,v, w=uv, uB and vC) Can we imply xy=uv  x = u ? No, because the same string abc may come from abc and abc Example.A ={a}, B ={ab}, C ={c, bc}. Then AB={}, (AB)C={}. AC={ac, abc} BC={abc, abbc} abc ACBC, but we can not imply that abc (AB)C={}

Strings

Strings

Presentation Transcript

Strings

Strings

Strings

Strings

Strings

Strings

Strings

Strings

Strings

Strings

Strings

Strings

Strings

Strings

Strings

STRINGS

Strings

Strings

Strings