1 / 10

12. Regular Expressions

12. Regular Expressions. Motto: I don't play accurately-any one can play accurately- but I play with wonderful expression. As far as the piano is concerned, sentiment is my forte. I keep science for life. - Oscar Wilde. Concepts. Regular Expressions

anka
Download Presentation

12. Regular Expressions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 12. Regular Expressions

  2. Motto: I don't play accurately-any one can play accurately- but I play with wonderful expression. As far as the piano is concerned, sentiment is my forte. I keep science for life. - Oscar Wilde

  3. Concepts • Regular Expressions • allows to search for a pattern within a text string • the patterns can be rather complex • same idea as "wildcard" characters – compare SQL – but much more expressive • often abbreviated, e.g. as RegExp • RegExps match as much as possible • they are greedy • Theoretical underpinnings • nondeterministic final automata (NFA) • regular grammars • but some constructs extend the functionality further • even beyond CFG (context-free grammars)

  4. Support • Popular, widely supported • Directly in scripting languages • JavaScript • special syntax • PHP • functions • Ruby • Perl • as libraries • Java's java.lang.regex package

  5. JavaScript RegExp • Directly as argument of methods of String object • string.match(regexp) • returns an array of substrings that matched regexp pattern • string.replace(regexp,by) • returns a new string where the first (or all) matched patterns were replaced with by string • string.search(regexp) • returns the index of first substring that matched regexp pattern, -1 if there is no match • string.split(regexp) • returns an array of the substrings of string separated by regexp • regexp argument • enclosed in / • e.g., /ex/ matches first occurrence of "ex" • optional modifiers placed as suffix • g (global); used in replace() • e.g., /ex/g matches all occurrences of "ex" • i (ignore case) • e.g., /ex/i matches all occurrences of "ex", "EX", "Ex" and "eX" • m (multiline)

  6. PHP RegExp • functions with $regexp and $string arguments • ereg($regexp,$string [,&$matches]) • returns length of matched string, false if there is no match • array reference &$matches if given, will be filled with the string in $matches[0]and the matched substrings in subsequent elements • ereg_replace($regexp,$by,$string) • returns a string where the first (or all) matched patterns were replaced with $by string • split($regexp,$string [,$limit]) • returns an array of substrings of $string that were separated by patterns matching $regexp • optional $limit determines how many substrings to return (the last one contains the remainder) • eregi(), eregi_replace(), spliti() • same as ereg() and ereg_replace(), but ignores case • preg_match($regexp,$string ) • similar to ereg(), see PHP documentation • if global search for all matches is to be performed, ereg() or ereg_replace() must be called in a loop

  7. Syntax in JavaScript • by "element" we mean a character or a group • . any character • ? one occurrences of preceding element or nothing • * any number of occurrences of preceding element, incl. none • e.g., a.*z matches the largest substring that starts with a and ends with z, incl. "az" • + any number of occurrences of preceding element, but at least one • e.g., a.+z matches the largest substring that starts with a and ends with z, not including "az" • note that "azz" and "aaz" are matched • {n} exactly n occurrences of preceding element • {m,n} between n and m occurrences of preceding element • ^ beginning of the string • $ end of the string • sequence of elements means that such sequence must be matched • e.g., a.z matches "axz", "a5z", "aQz", etc. • [] alternative elements • e.g., [ab] means a or b • [^ ] none of the alternative elements • e.g., [^ab] means not a and not b • - range • e.g., [a-zA-Z] means a through z or A through Z, i.e. all lower-case and upper-case letters • | or • e.g., ab|yz matches "ab" and "yz"

  8. Special Characters • Denoted by \ • \/: / • \b: space/blank • \t: tab character • \n: line feed • \r: carriage return • \f: form feed • \s: whitespace character, i.e.[ \t\r\n] • \d: digit, i.e.[0-9] • \w: word character, i.e.[a-zA-Z0-9_] • \S: not a whitespace character, i.e.[^\s] • \D: not a digit, i.e.[^\d] • \W: not a word character, i.e.[^\] • any other character preceded by \ means the character itself • the "meta-characters" need to be escaped: • \\, \/, \[, \], \., \?, \[, \], \|, \+, \*, \(, \), \^, \$, \-, \{, \}

  9. RegExp Capturing • If you enclose subpattern(s) ( and ) within a RegExp it the pattern(s) that will be captured, i.e. returned or used • e.g., \b(.*)@ will capture the first part of an email

  10. Sample RegExp • hex digit: • [0-9a-fA-F] • identifier: • [a-zA-Z_][a-zA-Z_0-9]* • email address: • \b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}\b

More Related