Perl P ractical E xtration and R eporting L anguage - PowerPoint PPT Presentation

perl p ractical e xtration and r eporting l anguage n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Perl P ractical E xtration and R eporting L anguage PowerPoint Presentation
Download Presentation
Perl P ractical E xtration and R eporting L anguage

play fullscreen
1 / 43
Perl P ractical E xtration and R eporting L anguage
74 Views
Download Presentation
vanya
Download Presentation

Perl P ractical E xtration and R eporting L anguage

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. PerlPractical Extration and Reporting Language An Introduction by Shwen Ho

  2. What is Perl good for? • Designed for text manipulation • Very fast to implement • Allows many different ways to solve the same problem • Runs on many different platform • Windows, Mac, Unix, Linux, Dos, etc

  3. Running Perl • Perl scripts do not need to be compiled • They are interpreted at the point of execution • They do not necessarily have a particular file extension although the .pl file extension is used commonly.

  4. Running Perl • Executing it via the command line command line> perl script.pl arg1 arg2 ... • Or add the line "#!/usr/bin/perl" to the start of the script if you are using unix/linux • Remember to set the correct file execution permissions before running it. chmod +x perlscript.pl ./perlscript.pl

  5. Beginning Perl • Every statement end with a semi colon ";". • Comments are prefixed at the start of the line with a hash "#". • Variable are assigned a value using the character "=". • Variables are not statically typed, i.e., you do not have to declare what kind of data you want to hold in them. • Variables are declared the first time you initialise them and they can be anywhere in the program.

  6. Scalar Variables • Contains single piece of data • '$' character shows that a variable is scalar. • Scalar variables can store either a number of a string. • A string is a chunk of text surrounded by quotes. $name = "paul"; $year = 1980; print "$name is born in $year"; output: paul is born in 1980

  7. Arrays Variables (List) • Ordered list of data, separated by commas. • '@' character shows that a variable is an array Array of numbers @year_of_birth = (1980, 1975, 1999); Array of string @name = ("Paul", "Jake", "Tom"); Array of both string and numbers @paul_address = (14,"Cleveland St","NSW",2030);

  8. Retrieving data from Arrays • Printing Arrays @name = ("Paul", "Jake", "Tom"); print "@name"; • Accessing individual elements in an array @name = ("Paul", "Jake", "Tom"); print "$name[1]"; • What has changed? @name to $name • To access individual elements use the syntax $array[index] • Why did $name[1] print the second element? • Perl, like Java and C, uses index 0 to represent the first element.

  9. Interesting things you can do with Array @name = ("Paul", "Jake", "Tom");

  10. Basic Arithmetic Operators + Addition - Subtraction * multiplication / division ++ adding one to the variable -- subtracting one from the variable $a += 2 incrementing variable by 2 $b *= 3 tripling the value of the variable

  11. Relational Operators

  12. Control Operators - If if ( expression 1) { ... } elsif (expression 2) { ... } else { ... }

  13. Iteration Structures • while (CONDITION) { BLOCK } • until (CONDITION) {BLOCK} • do {BLOCK} while (CONDITION) • for (INITIALIZATION ; CONDITION ; Re-INITIALIZATION) {BLOCK} • for VAR (LIST) {BLOCK} • foreach VAR (LIST) {BLOCK}

  14. Iteration Structures $i = 1; while($i <= 5){ print "$i\n"; $i++; } for($x=1; $x <=5; $x++) { print "$x\n"; } @array = [1,2,3,4,5]; foreach $number (@array){ print "$number\n"; }

  15. String Operations • Strings can be concatenated with the dot operator $lastname = "Harrison"; $firstname = "Paul"; $name = $firstname . $lastname; $name = "$firstname$lastname"; • String comparison can be done with the relational operator $string1 = "hello"; $string2 = "hello"; if ($string1 eq $string2) { print "they are equal"; } else { print "they are different"; }

  16. String comparison using patterns • The =~ operator return true if the pattern within the / quotes are found. $string1 = "HELLO"; $string2 = "Hi there"; # test if the string contains the pattern EL if ($string1 =~ /EL/) { print "This string contains the pattern"; } else { print "No pattern found"; }

  17. Functions in Perl • No strict variable type restriction during function call • java example variable_type function (variable_type variable_name) public int function1 (int var1, char var2) { … } • Perl has provided lots of useful functions within the language to get you started. • chop - remove the first character of a string • chomp - often used to remove the carriage return character from the end of a string • push - append one or more element into an array • pop - remove the last element of an array and return it • shift - remove the first element of an array and return it • s - replace a pattern with a string

  18. Functions in Perl • The "split" function breaks a given string into individual segments given a delimiter. • split( /pattern/, string) returns a list @output = split (/\s/, $string); # breaks the sentence into words @output = split (//, $string); # breaks the sentence into single characters @output = split (/,/, $string); # breaks the sentence into chunks separated by a comma. • join ( /delimiter/, array) returns a string

  19. Functions in Perl A simple perl function sub sayHello { print "Hello!!\n"; } sayHello();

  20. Executing functions in Perl • Function arguments are stored automatically in a temporary array called @_ . sub sayHelloto { @name = @_; $count = @_; foreach $person (@name){ print "Hello $person\n"; } return $count; } @array = ("Paul", "Jake", "Tom"); sayHelloto(@array); sayHelloto("Mary", "Jane", "Tylor", 1,2,3);

  21. Input / Output • Perl allows you to read in any input that is automatically sent to your program via standard input by using the handle <STDIN>. • One way of handling inputs via <STDIN> is to use a loop to process every line of input

  22. Input / Output • Count the number of lines from standard input and print the line number together with the 1st word of each line. $count = 1; foreach $line (<STDIN>){ @array = split(/\s/, $line); print "$count $array[0]\n"; $count++; } • Other I/O topics include reading and writing to files, Standard Error (STDERR) and Standard Output (STDOUT).

  23. Regular Expression • Regular expression is a set of characters that specify a pattern. • Used for locating piece of text in a file. • Regular expression syntax allows the user to do a "wildcard" type search without necessarily specifying the character literally. • Available across OS platform and programming language.

  24. Simple Regular Expression • A simple regular expression contains the exact string to match $string = "aaaabbbbccc"; if($string =~ /bc/){ print "found pattern\n"; } output: found pattern

  25. Simple Regular Expression • The variable $& is automatically set to the matched pattern $string = "aaaabbbbccc"; if($string =~ /bc/){ print "found pattern : $&\n"; } output: found pattern bc

  26. Simple Regular Expression • What happen when you want to match a generalised pattern like an "a" followed by some "b"s and a single "c" $string = "aaaabbbbccc"; if($string =~ /abbc/){ print "found pattern : $&\n"; } else {print "nothing found\n"; } output: nothing found

  27. Regular Expression - Quantifiers • We can specify the number of times we want to see a specific character in a regular expression by adding operators behind the character. • * (asterisk) matches zero or more copies of a specific character • + (plus) matches one or more copies of a specific character

  28. Regular Expression - Quantifiers @array = ["ac", "abc", "abbc", "abbbc", "abb", "bbc", "bcf", "abbb", "c"]; foreach $string (@array){ if($string =~ /ab*c/){ print "$string "; } } output: ac abc abbc abbbc

  29. Regular Expression - Quantifiers @array = ["ac", "abc", "abbc", "abbbc", "abb", "bbc", "bcf", "abbb", "c"];

  30. Regular Expression - Anchors • You can use Anchor restrictions preceding and behind the pattern to specify where along the string to match to. • ^ indicates a beginning of a line restriction • $ indicates an end of line restriction

  31. Regular Expression - Anchors @array = ["ac", "abc", "abbc", "abbbc", "abb", "bbc", "bcf", "abbb", "c"];

  32. Regular Expression - Range • […] is used to identify the exact characters you are searching for. • [0123456789] will match a single numeric character. • [0-9] will also match a single numeric character • [A-Za-z] will match a single alphabet of any case.

  33. Regular Expression - Range • Search for a word that • starts with the uppercase T • second letter is a lowercase alphabet • third letter is a lower case vowel • is 3 letters long followed by a space • Regular expression : "^T[a-z][aeiou]" • Note : [z-a] is backwards and does not work • Note : [A-z] does match upper and lowercase but also 6 additional characters between the upper and lower case letters in the ASCII chart: [ \ ] ^ _ `

  34. Regular Expression - Others • Match a single character (non specific) with "." (dot) a.c = matches any string with "a" follow by one character and followed by "c" • Specifying number of repetition sets with \{ and \} [a-z]\{4,6\} = match four, five or six lower case alphabet • Remembering Patterns with \(,\) and \1 Regular Exp allows you to remember and recall patterns

  35. RegExp problem and strategies • You tend to match more lines than desired. A.*B matches AAB as well as AAAAAAACCCAABBBBAABBB • Knowing what you want to match • Knowing what you don’t want to match • Writing a pattern out to describe that you want to match • Testing the pattern • More info : type "man re_syntax" in a unix shell

  36. Example problem - Background • Biologists are interested in analysing proteins that are from a particular biochemical enzyme class "CDK1, CDK2 or CDK3". In additional, biologists would like to extract those protein sequences that contain the amino acid pattern (motif) that represents a particular virus binding site. Serine , Glutamic Acid , (multiple occurrence of) Alanine , Glycine Serine = S, Glutamic Acid = E , Alanine = A, Glycine = G

  37. Example Problem - Dataset • Dataset was downloaded from an online phosphorylation protein database. • Contains 16472 protein entries in one file. • One entry per line and terminates with carriage return character. • Comma delimited entries • field1, field2, field3, field4, …..

  38. Example Problem - Dataset fields • acc - unique database ID • sequence - amino acid sequence for the protein • position - position along sequence that is phophorylated • code - amino acid that is phophorylated • pmid - unique protein ID linked to an international protein database • kinase - enzyme class of this protein • source - where this protein found • entry_date - date entered into the database

  39. Example Problem - Dataset fields • acc - unique database ID • sequence - amino acid sequence for the protein • position - position along sequence that is phophorylated • code - amino acid that is phophorylated • pmid - unique protein ID linked to an international protein database • kinase - enzyme class of this protein • source - where this protein found • entry_date - date entered into the database

  40. The task • Extract those entries that have the string CDK1, CDK2 or CDK3 in the enzyme column. • Within our extracted entries, search and match those sequences that contain the virus binding pattern. • Print out the database ID of the positively matched entries.

  41. Problem: Divide and conquer • enzyme class CDK1 , CDK2 or CDK3 • extract those protein with the pattern Serine , Glutamic Acid , (multiple occurrence of) Alanine , Glycine Serine = S, Glutamic Acid = E , Alanine = A, Glycine = G

  42. Interesting parts of Perl not covered in this lecture • Hashes • One unique variable that is linked to another variable • "Lecture 1002" ---> "Thur 3pm" • "Lecture 1002" ---> 25 • "Lecture 1002" ---> [name1, name2, … ] • "Lecture 1002" ---> [{name1},{name2}.. ] {name2} -> student ID {name1} --> student ID

  43. Interesting parts of Perl not covered in this lecture • CGI (Common Gateway Interface) • Creation of dynamic web pages using perl • CGI, PHP, JavaScript, Java Applet, etc. • Object Oriented Perl • Perl books & references to explore at your own curiosity • http://perldoc.perl.org/ • http://www.oreilly.com/pub/topic/perl • Book: O’Reilly - Perl Cookbook - This will save you someday • Book: O'Reilly - Mastering Regular Expressions