Regular expression
This presentation is the property of its rightful owner.
Sponsored Links
1 / 54

Regular Expression PowerPoint PPT Presentation


  • 95 Views
  • Uploaded on
  • Presentation posted in: General

Regular Expression. What are Regular Expressions. Regular expressions are a syntax to match text. They date back to mathematical notation made in the 1950s. Became embedded in unix systems through tools like ed and grep. What are RE.

Download Presentation

Regular Expression

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Regular expression

Regular Expression


What are regular expressions

What are Regular Expressions

  • Regular expressions are a syntax to match text.

  • They date back to mathematical notation made in the 1950s.

  • Became embedded in unix systems through tools like ed and grep.


What are re

What are RE

  • Perl in particular promoted the use of very complex regular expressions.

  • They are now available in all popular programming languages.

  • They allow much more complex matching than strpos()


Why use re

Why use RE

  • You can use RE to enforce rules on formats like phone numbers, email addresses or URLs.

  • You can use them to find key data within logs, configuration files or webpages.


Why use re1

Why use RE

  • They can quickly make replacements that may be complex like finding all email addresses in a page and making them address [AT] site [dot] com.

  • You can make your code really hard to understand


Data manipulation regex

Data Manipulation & Regex


Regular expression

What..?

  • Often in PHP we have to get data from files, or maybe through forms from a user.

  • Before acting on the data, we:

    • Need to put it in the format we require.

    • Check that the data is actually valid.


Regular expression

What..?

  • To achieve this, we need to learn about PHP functions that check values, and manipulate data.

    • Input PHP functions.

    • Regular Expressions (Regex).


Php functions

PHP Functions

  • There are a lot of useful PHP functions to manipulate data.

  • We’re not going to look at them all – we’re not even going to look at most of them…

    http://php.net/manual/en/ref.strings.php

    http://php.net/manual/en/ref.ctype.php

    http://php.net/manual/en/ref.datetime.php


Useful functions splitting

Useful Functions: splitting

  • Often we need to split data into multiple pieces based on a particular character.

  • Use explode().

    // expand user supplied date..

    $input = ‘1/12/2007’;

    $bits = explode(‘/’,$input);

    // array(0=>1,1=>12,2=>2007)


Useful functions trimming

Useful functions: trimming

  • Removing excess whitespace..

  • Use trim()

    // a user supplied name..

    $input = ‘ Rob ’;

    $name = trim($input);

    // ‘Rob’


Useful functions string replace

Useful functions: string replace

  • To replace all occurrences of a string in another string use str_replace()

    // allow user to user a number

    of date separators

    $input = ’01.12-2007’;

    $clean = str_replace(array(‘.’,’-’),

    ‘/’,$input);

    // 01/12/2007


Useful functions case

Useful functions: cAsE

  • To make a string all uppercase use strtoupper().

  • To make a string all uppercase use strtolower().

  • To make just the first letter upper case use ucfirst().

  • To make the first letter of each word in a string uppercase use ucwords().


Useful functions html sanitise

Useful functions: html sanitise

  • To make a string “safe” to output as html use htmlentities()

    // user entered comment

    $input = ’The <a> tag & ..’;

    $clean = htmlentities($input);

    // ‘The &lt;a&gt; tag &amp; ..’


More complicated checks

More complicated checks..

  • It is usually possible to use a combination of various built-in PHP functions to achieve what you want.

  • However, sometimes things get more complicated. When this happens, we turn to Regular Expressions.


Regular expressions

Regular Expressions

  • Regular expressions are a concise (but obtuse!) way of pattern matching within a string.

  • There are different flavours of regular expression (PERL & POSIX), but we will just look at the faster and more powerful version (PERL).


Some definitions

Some definitions

Actual data that we are going to work upon (e.g. an email address string)

[email protected]

'/^[a-z\d\._-]+@([a-z\d-]+\.)+[a-z]{2,6}$/i‘

preg_match(), preg_replace()

Definition of the string pattern (the ‘Regular Expression’).

PHP functions to do something with data and regular expression.


Regular expressions1

Regular Expressions

'/^[a-z\d\._-]+@([a-z\d-]+\.)+[a-z]{2,6}$/i‘

  • Are complicated!

  • They are a definition of a pattern. Usually used to validate or extract data from a string.


Regex delimiters

Regex: Delimiters

  • The regex definition is always bracketed by delimiters, usually a ‘/’:

    $regex = ’/php/’;

    Matches: ‘php’, ’I love php’

    Doesn’t match: ‘PHP’

    ‘I love ph’


Regex first impressions

Regex: First impressions

  • Note how the regular expression matches anywhere in the string: the whole regular expression has to be matched, but the whole data string doesn’t have to be used.

  • It is a case-sensitive comparison.


Regex case insensitive

Regex: Case insensitive

  • Extra switches can be added after the last delimiter. The only switch we will use is the ‘i’ switch to make comparison case insensitive:

    $regex = ’/php/i’;

    Matches: ‘php’, ’I love pHp’,

    ‘PHP’

    Doesn’t match: ‘I love ph’


Regex character groups

Regex: Character groups

  • A regex is matched character-by-character. You can specify multiple options for a character using square brackets:

    $regex = ’/p[hu]p/’;

    Matches: ‘php’, ’pup’

    Doesn’t match: ‘phup’, ‘pop’,

    ‘PHP’


Regex character groups1

Regex: Character groups

  • You can also specify a digit or alphabetical range in square brackets:

    $regex = ’/p[a-z1-3]p/’;

    Matches: ‘php’, ’pup’,

    ‘pap’, ‘pop’, ‘p3p’

    Doesn’t match: ‘PHP’, ‘p5p’


Regex predefined classes

Regex: Predefined Classes

  • There are a number of pre-defined classes available:


Regex predefined classes1

Regex: Predefined classes

$regex = ’/p\dp/’;

Matches: ‘p3p’, ’p7p’,

Doesn’t match: ‘p10p’, ‘P7p’

$regex = ’/p\wp/’;

Matches: ‘p3p’, ’pHp’, ’pop’

Doesn’t match: ‘phhp’


Regex the dot

Regex: the Dot

  • The special dot character matches anything apart from line breaks:

    $regex = ’/p.p/’;

    Matches: ‘php’, ’p&p’,

    ‘p(p’, ‘p3p’, ‘p$p’

    Doesn’t match: ‘PHP’, ‘phhp’


Regex repetition

Regex: Repetition

  • There are a number of special characters that indicate the character group may be repeated:


Regex repetition1

Regex: Repetition

$regex = ’/ph?p/’;

Matches: ‘pp’, ’php’,

Doesn’t match: ‘phhp’, ‘pap’

$regex = ’/ph*p/’;

Matches: ‘pp’, ’php’, ’phhhhp’

Doesn’t match: ‘pop’, ’phhohp’


Regex repetition2

Regex: Repetition

$regex = ’/ph+p/’;

Matches: ‘php’, ’phhhhp’,

Doesn’t match: ‘pp’, ‘phyhp’

$regex = ’/ph{1,3}p/’;

Matches: ‘php’, ’phhhp’

Doesn’t match: ‘pp’, ’phhhhp’


Regex bracketed repetition

Regex: Bracketed repetition

  • The repetition operators can be used on bracketed expressions to repeat multiple characters:

    $regex = ’/(php)+/’;

    Matches: ‘php’, ’phpphp’,

    ‘phpphpphp’

    Doesn’t match: ‘ph’, ‘popph’

    Will it match ‘phpph’?


Regex anchors

Regex: Anchors

  • So far, we have matched anywhere within a string (either the entire data string or part of it). We can change this behaviour by using anchors:


Regex anchors1

Regex: Anchors

  • With NO anchors:

    $regex = ’/php/’;

    Matches: ‘php’, ’php is great’,

    ‘in php we..’

    Doesn’t match: ‘pop’


Regex anchors2

Regex: Anchors

  • With start and end anchors:

    $regex = ’/^php$/’;

    Matches: ‘php’,

    Doesn’t match: ’php is great’,

    ‘in php we..’, ‘pop’


Regex escape special characters

Regex: Escape special characters

  • We have seen that characters such as ?,.,$,*,+ have a special meaning. If we want to actually use them as a literal, we need to escape them with a backslash.

    $regex = ’/p\.p/’;

    Matches: ‘p.p’

    Doesn’t match: ‘php’, ‘p1p’


Php regex functions

PHP regex functions

  • So we now know how to define regular expressions. Further explanation can be found at:

    http://www.regular-expressions.info/

  • We still need to know how to use them!


Boolean matching

Boolean Matching

  • We can use the function preg_match() to test whether a string matches or not.

    // match an email

    $input = [email protected];

    if (preg_match($emailRegex,$input) {

    echo‘Is a valid email’;

    } else {

    echo‘NOT a valid email’;

    }


Pattern replacement

Pattern replacement

  • We can use the function preg_replace() to replace any matching strings.

    // strip any multiple spaces

    $input = ‘Some comment string’;

    $regex = ‘/\s\s+/’;

    $clean = preg_replace($regex,’ ‘,$input);

    // ‘Some comment string’


Sub references

Sub-references

  • We’re not quite finished: we need to master the concept of sub-references.

  • Any bracketed expression in a regular expression is regarded as a sub-reference. You use it to extract the bits of data you want from a regular expression.

  • Easiest with an example..


Sub reference example

Sub-reference example:

  • I start with a date string in a particular format:

    $str = ’10, April 2007’;

  • The regex that matches this is:

    $regex = ‘/\d+,\s\w+\s\d+/’;

  • If I want to extract the bits of data I bracket the relevant bits:

    $regex = ‘/(\d+),\s(\w+)\s(\d+)/’;


Extracting data

Extracting data..

  • I then pass in an extra argument to the function preg_match():

    $str = ’The date is 10, April 2007’;

    $regex = ‘/(\d+),\s(\w+)\s(\d+)/’;

    preg_match($regex,$str,$matches);

    // $matches[0] = ‘10, April 2007’

    // $matches[1] = 10

    // $matches[2] = April

    // $matches[3] = 2007


Back references

Back-references

  • This technique can also be used to reference the original text during replacements with $1,$2,etc. in the replacement string:

    $str = ’The date is 10, April 2007’;

    $regex = ‘/(\d+),\s(\w+)\s(\d+)/’;

    $str = preg_replace($regex,

    ’$1-$2-$3’,

    $str);

    // $str = ’The date is 10-April-2007’


Regular expression

Phew!

  • We now know how to define regular expressions.

  • We now also know how to use them: matching, replacement, data extraction.


Syntax tricks

Syntax tricks

  • The entire regular expression is a sequence of characters between two forward slashes (/)

  • abc - most characters are normal character matches. This is looking for the exact character sequence a, b and then c

  • . - a period will match any character (except a newline but that can change)

  • [abc] - square brackets will match any of the characters inside. Here: a, b or c.


Syntax tricks 2

Syntax tricks 2

  • ? - marks the previous as optional. so a? means there might be an a

  • (abc)* - parenthesis group patterns and the asterix marks zero or more of the previous character. So this would match an empty string or abcabcabcabc

  • \.+ - the backslash is an all purpose escape character. the + marks one or more of the previous character. So this would match ......


More syntax tricks

More syntax tricks

  • [0-4] - match any number from 0 to 4

  • [^0-4] - match anything not the number 0-4

  • \sword\s - match word where there is white space before and after

  • \bword\b - \b marks a word boundary. This could be white space, new line or end of the string


More syntax tricks1

More syntax tricks

  • \d{3,12} - \d matches any digit ([0-9]) while the braces mark the min and max count of the previous character. In this case 3 to 12 digits

  • [a-z]{8,} - must be at least 8 letters


Matching text

Matching Text

  • Simple check: preg_match(“/^[a-z0-9]+@([a-z0-9]+\.)*[a-z0-9]+$/i”, $email_address) > 0

  • Finding: preg_match(“/\bcolou?r:\s+([a-zA-Z]+)\b/”, $text, $matches); echo $matches[1];

  • Find all: preg_match_all(“/<([^>]+)>/”, $html, $tags); echo $tags[2][1];


Matching lines

Matching Lines

  • This is more for looking through files but could be for any array of text.

  • $new_lines = preg_grep(“/Jan[a-z]*[\s\/\-](20)?07/”, $old_lines);

  • Or lines that do not match by adding a third parameter of PREG_GREP_INVERT rather than complicating your regular expression into something like /^[^\/]|(\/[^p])|(\/p[^r]) etc...


Replacing text

Replacing text

preg_replace(

“/\b[^@]+(@)[a-zA-Z-_\d]+(\.)[a-zA-Z-_\d\.]+\b/”,

array(“ [AT] “, “ [dot] “), $post);


Splitting text

Splitting text

  • $date_parts = preg_split(“/[-\.,\/\\\s]+/”, $date_string);


So an example

So.. An example

  • Lets define a regex that matches an email:

    $emailRegex ='/^[a-z\d\._-]+@([a-z\d-]+\.)+[a-z]{2,6}$/i‘;

    Matches: [email protected],

    [email protected]

    [email protected]

    Doesn’t match: [email protected]@ple.com’

    ‘not.an.email.com’


So an example1

So.. An example

Starting delimiter, and start-of-string anchor

/^

[a-z\d\._-]+

@

([a-z\d-]+\.)+

[a-z]{2,6}

$/i

User name – allow any length of letters, numbers, dots, underscore or dashes

The @ separator

Domain (letters, digits or dash only). Repetition to include subdomains.

com,uk,info,etc.

End anchor, end delimiter, case insensitive


Regular expression

Tips

  • Comment what your regular expression is doing.

  • Test your regular expression for speed. Some can cause a noticeable slowdown.

  • There are plenty of simple uses like /Width: (\d+)/

  • Watch out for greedy expressions. Eg /(<(.+)>)/ will not pull out “b” and “/b” from “<b>test</b>” but instead will pull “b>test</b”. A easy way to change this behaviour is like this: /(<(.+?)>)/


References

References

  • 15 php regex uses:

    • http://www.catswhocode.com/blog/15-php-regular-expressions-for-web-developers

  • Regex cheat sheet:

    • http://www.addedbytes.com/cheat-sheets/regular-expressions-cheat-sheet/

  • PHP regex email verification:

    • http://fightingforalostcause.net/misc/2006/compare-email-regex.php

  • http://en.wikipedia.org/wiki/Regular_expressions

  • http://php.net/manual/en/ref.pcre.php

  • Geoffrey Dunn ([email protected])

  • PHP.net – PHP workshop


  • Login