1 / 67

Perl (2) Hongkang Mei, Ph.D. March 10, 2002

Perl (2) Hongkang Mei, Ph.D. March 10, 2002. Review of Perl (1) More on I/O Regular Expression basics More on regular expression Using regular expressions File and directory handles. Scalar data something single or just one number or string, interchangeable acted upon with operators

amish
Download Presentation

Perl (2) Hongkang Mei, Ph.D. March 10, 2002

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Perl (2) Hongkang Mei, Ph.D. March 10, 2002

  2. Review of Perl (1) • More on I/O • Regular Expression basics • More on regular expression • Using regular expressions • File and directory handles

  3. Scalar data something single or just one number or string, interchangeable acted upon with operators (a scalar variables holds value of a scalar) • List data list of scalars (array is a variable contains list) (hash or associative array is a variable contains a list with pairs of scalars associated to each other)

  4. Scalar variables ‘$’ followed by Perl identifier • Should be descriptive • Perl built-in scalar variables • $ARGV, $_, ”…… *Perl identifier letters, ‘_’, digits, not begin with digit

  5. Numeric operators • ++ incrementing the value • $counter++; • $v = $counter++; • is different from • $v = ++$counter; • -- decrementing the value

  6. List data listliterals scalars separated by ‘,’ in () (1, 2, 3, 4, 5) (“dnaA”, “argC”, “rnpA”) qw/ dnaA argC rnpA/ range operator .. (1..5) # (1, 2, 3, 4, 5) (1.2..5.7) # same (5..1) # empty ($a..$b) # depend on current values * The qw shortcut treated like ‘’ string uses any punctuation pairs / /, “”, {}, [], (), <>, ##, !!

  7. Array variables • @ + identifier • no unnecessary limit • Array elements: scalar variables 0 1 2 3 4 Array{ C Scalar variable indices

  8. Accessing array elements • achieved by calling the scalar variables: • $seq[0] • print $seq[3]; • $seq[1] = ‘acg’; • $seq is a different thing! • @ and $ have different namespaces

  9. Hashes • A hash is a variable containing list with • paired scalar values associated each other • % + identifier • no unnecessary limit • keys values C Hash { Scalar variables

  10. Hash element access • $hash{$key} • $seq{“dnaA”} = “CAGACTCGAT”; • foreach $gene (qw/dnaA argC rnt/) { • print “The sequence for $gene is $seq{$gene}.\n”; • } • $key can be expr. • $seq{“unknown”} # undef

  11. Interpolation of variables into strings • Scalar: • print $aa_seq; • print “The sequence is $aa_seq.\n”; • print “The file contains $count ${type}s.\n”; • Array: • print “The list contains @array\n”; • print @array; • print 3 * @array; • Hash: • print “The AC# for ‘dnaA’ is $g_ac{‘dnaA’}; • NO interpolation for the whole hash!! • Printf “The %s has %d AAs.\n”, $prot, $len;

  12. SCALAR and LIST CONTEXT • Using the same variable in different context • means different things • depending on what Perl is expecting • 5 + @aa; # scalar • sort @aa; # list • @list = @aa; • @list = $aa; • $aa[0] = @list; • print “The full aa list is @aa.\n”; • print “The number of aa is “ . @aa . “.\n” • print @aa;

  13. Control structures • if (true) {...}elsif{…}else{…} • while (true) {...} • foreach $line (list){...} • for($i=1; $i<11; $i++) {…} • unless(true){…} #if(false){…} • until(true){…} #while(false){…}

  14. Control structures • autoincrement autodecrement • $n++; $n--; • ++$n; --$n; • $m = $n++; • $m = ++$n; • $m = $n; $n++; • logical operators • &&, ||, ! • and, not, or

  15. Control structures • expression modifier • print “Acidic\n” if $pH < 7; • print “ “, ($n += 2) while $n < 10; • print “$aa{$_[0]}\t” foreach (keys %codon); • short-circuit operator • my $n_aa = $aa{$codon} || “not in the list”; • the ternary operator ?: • $aa = ($pI{$aa} < 7) ? “acidic” : • ($pI{$aa} = 7) ? “neutral” : • ($pI{$aa} > 7) ? “basic”;

  16. Subroutines • functions or subroutines • define: • sub my_funct { • $dna_length = 3 * length($aa_seq); • print “DNA is $dna_length basepairs\n”; • } • Invoke: • &my_funct;

  17. Built-in functions • print • chomp • defined • chop • reverse, sort • pop, push, shift, unshift • return • length • scalar # a fake one • …… • perlfunc manpage

  18. Review of Perl (1) • More on I/O • Regular Expression basics • More on regular expression • Using regular expressions • File and directory handles

  19. <STDIN>: get user input • from commandline: • chomp ($a = <STDIN>); print $a; • # input ends up at newline • file redirection: • %>myprog.pl < my_input.txt • …… • $line_n = 1; • while (<STDIN>){ • print “$line_n\t$_; • $line_n++; • }

  20. <>: get user input from commandline • %>myprog.pl input1 - input2 • …… • $line_n = 1; • while (<>){ • print “$line_n\t$_; • $line_n++; • } • The difference between <> and <STDIN> • <> works from @ARGV

  21. more on print • buffer • print <>; #string operator • # work like cat in commandline • print () function • print (3+4)*5; • print “The result is: “, (3+4)*5;

  22. printf • printf “The mutation is at %s position.\n”, • $count_mut; • %s, %f, %d, %g…… • %2d • %-12s (left justified) • %12.3f (right justified) • %: does not interpolate whole hash • %% to print ‘%’

  23. Review of Perl (1) • More on I/O • Regular Expression basics • More on regular expression • Using regular expressions • File and directory handles

  24. regular expression or pattern • mini-program • match or doesn’t match a given string • match any number of strings • doesn’t matter how many times • it matches to a string • works like grep • $p_seq = “ADCSFTSCGNYEQ”; • if(/SFT/){ • print “It has the motif \”SFT\”.\n” • }

  25. metacharacters • . Matches anything but “\n” • \ escape (/3\.14) • () grouping

  26. simple qualifiers • the following qualifiers repeat the previous pattern • * 0 or more times • + 1 or more times • ? 0 or 1 times

  27. the ‘|’alternative pattern • /T|S/ • /protein(and|or)DNA/ • /arg(ser|cys)lys/

  28. Review of Perl (1) • More on I/O • Regular Expression basics • More on regular expression • Using regular expressions • File and directory handles

  29. character classes • [] matches any single character inside • [AGCT] # any deoxynucleotides • [a-zA-Z0-9]+ # 1 or more of letters or digits • [;\-,] # ‘-’ needs to be escaped

  30. character classes shortcuts • \d [0-9] • \w [A-Za-z0-9_] # only a char, \w+ a word • \s [\f\t\n\r ] # whitespace • negating the shortcuts • \D [^\d] • \W [^\w] • \S [^\s] • can be part of a larger class • [\dA-F] • [\d\D] (any char) • [^\d\D] (nothing)

  31. general qualifiers • * 0 or more repetitions • + 1 or more • ? 0 or 1 • {3, 5} 3 to 5 • {3,} 3 or more • {3} exactly 3 repetitions • /U{5,8}/ • /\w{8}/ • /A{15,100}/ • /(arg){2,}/ • * {0,} • how about + and ?

  32. anchors • ^ marks beginning of the string • /^ATG/ # initiation codon • [^AGCT] # ? • $ marks the end • /(UA[AG]|UGA)$/ # stop codons • /^\s*$/ # a blank line

  33. word anchors • \b word boundary anchor • matches either end of a word • /\barg/ # arg, arginine, arginyl, argue…… • /\barg\b/ # arg • \B nonword boundary anchor • matches any point that \b would not • \barg\B/ # arginine, arginyl, argue……

  34. memory () • () grouping • matched part kept in memory • /A(ACGT)T/ # ACGT in memory • backreferences • \1 \2 • /(AACGTT).*\1/ # can EcoRI cut the insert out? • /(.)\1/ NOT /../ #two same char; two char • memory variables • $1

  35. precedence • which parts of the pattern stick together • more tightly • () • *+?{} • ^$\b\B sequence • | • atoms chars, classes, backreferences • examples • /^fred|barnay$/ • /^(\w+)\s+(\w+)$/

  36. Review of Perl (1) • More on I/O • Regular Expression basics • More on regular expression • Using regular expressions • File and directory handles

  37. m// • a more general pattern match operator • can use any pairs of delimiters • // • m,, m!! m^^ m## • m<> m{} m[] m() • example: • m%^http://% is better than /^http:\/\//

  38. option modifiers • /i case insensitive • matches both cases for all letters • /\byes\b/ # Yes yes YES • /s matches any character • more than . • /\d\D/ • $_ = “ACGTTTGCG\nAACACGT”; • /^(ACG).*(CGT)$/s • do not confuse with the \s shortcut

  39. combiningoption modifiers • /si # both /s and /i • $_ = “aCGTTTGCG\nAACAcGT”; • if(/^(ACG).*(CGT)$/si){ • print “That sequence begins with ACG”, • “and ends with CGT.\n” • } • other options

  40. the binding operator =~ • if (/\w+/i){……} # only works on $_ • if ($seq =~ /^(ACG).*(CGT)$/si){ • print “That sequence begins with ACG”, • “and ends with CGT.\n” • } • $prot_seq = <STDIN> =~ /[^ACGT]/i; • if ($prot_seq) {blastp;}

  41. interpolating into patterns • my $p = “arg”; • if ($seq =~ /($p)$/si){ • print “That sequence ends with $p.\n” • } • $profile = shift @ARGV; # get commandline args • if ($prot_seq =~ /$profile/si) { • print “$prot_seq has motif $profile; • }

  42. the match variables • /(A)\1/ # use \1 inside pattern • $1 # hold memory value in Perl code • if ($seq =~ /(g.)\1/si){ • print “That sequence has a $1 repeat.\n” • } • if ($prot_seq =~ /([stavli]{3,}).*([Deq]{3,})/si) { • print “$prot_seq has hydrophobic region $1 ”, • “followed by hydrophilic region $2.\n”; • }

  43. the persistence of match • next successful match will overwrite the earlier one • store your $1 away! • if ($prot_seq =~ /([cstv]+)/si) { • my $motif = $1; • } • test your match before using $1 • it could be a leftover • $prot_seq =~ /([cstv]+)/si; • print “I found the motif $1, correct?\n”

  44. automatic matched variables (PP121) • $& matched part of string • $` part before the match • $’ part after the match • $`$&$’ the whole string • …… • print “The matched string is the following,”, • “the part matched is in <>:\n”, • “$`<$&>$’\n”;

  45. substitutions with s/// • m// search • s/// search and replace • s/match_pattern/replacement_string/ • returns true if successful, false if not • replacement string: • $1 • empty • $& • words • whitespaces • ……

  46. examples of s/// • if (s/([a-z]{3})([cstyleu]{3})/$2/){ • print “The mutant protein has a $1 deletion ”, • “before $2.\n”; • s/(arg)/$1$1/; # arg insertion • s/arg/cys/; # cys substitution • s/\s+//g; # get rid of all whitespaces • s/\s+/ /g; # single space delimiters • s/[^acgt]//gi; # clean up DNA sequence • s/[tT]/U/g; # translate to RNA sequence • s/_END_.*//s; # chop off after END mark

  47. s/// different delimiters • just like: • m// • qw// • can use unpaired or paired delimiters • ,, “” {} [] %% ## • s#^https://#http://# • s{T}{U} • s<T>#U#

  48. binding operator for s/// • works for non-default variables • $dna_seq =~ s/[^acgt]/n/gis;

  49. case shifting • $dna =~ s/(.+)/\U$1/; • $prot =~ s/(.+)/\L$1/; • $prot =~ s/(\w+)/\u\L$1/gi;

  50. split • split /seperator/, $string; • @aa = split / /, $aa; • split /seperator/; • @aa = split /:/; # used $_ eg. “a:b:c:d” • split //; • @data = split //; # still $_, split each char • split; split /\s+/, $_; • @data = split; # split $_ at whitespaces

More Related