slide1
Download
Skip this Video
Download Presentation
Perl (2) Hongkang Mei, Ph.D. March 10, 2002

Loading in 2 Seconds...

play fullscreen
1 / 67

Perl (2) Hongkang Mei, Ph.D. March 10, 2002 - PowerPoint PPT Presentation


  • 70 Views
  • Uploaded on

Perl (2) Hongkang Mei, Ph.D. March 10, 2002. Review of Perl (1) More on I/O Regular Expression basics More on regular expression Using regular expressions File and directory handles. Scalar data something single or just one number or string, interchangeable acted upon with operators

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Perl (2) Hongkang Mei, Ph.D. March 10, 2002' - amish


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Perl (2)

Hongkang Mei, Ph.D.

March 10, 2002

slide2

Review of Perl (1)

  • More on I/O
  • Regular Expression basics
  • More on regular expression
  • Using regular expressions
  • File and directory handles
slide3

Scalar data

something single or just one

number or string, interchangeable

acted upon with operators

(a scalar variables holds value of a scalar)

  • List data

list of scalars

(array is a variable contains list)

(hash or associative array is a variable contains a list with pairs of scalars associated to each other)

slide4

Scalar variables

‘$’ followed by Perl identifier

  • Should be descriptive
  • Perl built-in scalar variables
  • $ARGV, $_, ”……

*Perl identifier

letters, ‘_’, digits, not begin with digit

slide5

Numeric operators

  • ++ incrementing the value
  • $counter++;
  • $v = $counter++;
  • is different from
  • $v = ++$counter;
  • -- decrementing the value
slide6

List data

listliterals

scalars separated by ‘,’ in ()

(1, 2, 3, 4, 5)

(“dnaA”, “argC”, “rnpA”)

qw/ dnaA argC rnpA/

range operator ..

(1..5) # (1, 2, 3, 4, 5)

(1.2..5.7) # same

(5..1) # empty

($a..$b) # depend on current values

* The qw shortcut

treated like ‘’ string

uses any punctuation pairs

/ /, “”, {}, [], (), <>, ##, !!

slide7

Array variables

  • @ + identifier
  • no unnecessary limit
  • Array elements: scalar variables

0

1

2

3

4

Array{

C

Scalar variable

indices

slide8

Accessing array elements

  • achieved by calling the scalar variables:
  • $seq[0]
  • print $seq[3];
  • $seq[1] = ‘acg’;
  • $seq is a different thing!
  • @ and $ have different namespaces
slide9

Hashes

  • A hash is a variable containing list with
  • paired scalar values associated each other
  • % + identifier
  • no unnecessary limit
  • keys values

C

Hash {

Scalar variables

slide10

Hash element access

  • $hash{$key}
  • $seq{“dnaA”} = “CAGACTCGAT”;
  • foreach $gene (qw/dnaA argC rnt/) {
  • print “The sequence for $gene is $seq{$gene}.\n”;
  • }
  • $key can be expr.
  • $seq{“unknown”} # undef
slide11

Interpolation of variables into strings

  • Scalar:
  • print $aa_seq;
  • print “The sequence is $aa_seq.\n”;
  • print “The file contains $count ${type}s.\n”;
  • Array:
  • print “The list contains @array\n”;
  • print @array;
  • print 3 * @array;
  • Hash:
  • print “The AC# for ‘dnaA’ is $g_ac{‘dnaA’};
  • NO interpolation for the whole hash!!
  • Printf “The %s has %d AAs.\n”, $prot, $len;
slide12

SCALAR and LIST CONTEXT

  • Using the same variable in different context
  • means different things
  • depending on what Perl is expecting
  • 5 + @aa; # scalar
  • sort @aa; # list
  • @list = @aa;
  • @list = $aa;
  • $aa[0] = @list;
  • print “The full aa list is @aa.\n”;
  • print “The number of aa is “ . @aa . “.\n”
  • print @aa;
slide13

Control structures

  • if (true) {...}elsif{…}else{…}
  • while (true) {...}
  • foreach $line (list){...}
  • for($i=1; $i<11; $i++) {…}
  • unless(true){…} #if(false){…}
  • until(true){…} #while(false){…}
slide14

Control structures

  • autoincrement autodecrement
  • $n++; $n--;
  • ++$n; --$n;
  • $m = $n++;
  • $m = ++$n;
  • $m = $n; $n++;
  • logical operators
  • &&, ||, !
  • and, not, or
slide15

Control structures

  • expression modifier
  • print “Acidic\n” if $pH < 7;
  • print “ “, ($n += 2) while $n < 10;
  • print “$aa{$_[0]}\t” foreach (keys %codon);
  • short-circuit operator
  • my $n_aa = $aa{$codon} || “not in the list”;
  • the ternary operator ?:
  • $aa = ($pI{$aa} < 7) ? “acidic” :
  • ($pI{$aa} = 7) ? “neutral” :
  • ($pI{$aa} > 7) ? “basic”;
slide16

Subroutines

  • functions or subroutines
  • define:
  • sub my_funct {
  • $dna_length = 3 * length($aa_seq);
  • print “DNA is $dna_length basepairs\n”;
  • }
  • Invoke:
  • &my_funct;
slide17

Built-in functions

  • print
  • chomp
  • defined
  • chop
  • reverse, sort
  • pop, push, shift, unshift
  • return
  • length
  • scalar # a fake one
  • ……
  • perlfunc manpage
slide18

Review of Perl (1)

  • More on I/O
  • Regular Expression basics
  • More on regular expression
  • Using regular expressions
  • File and directory handles
slide19

<STDIN>: get user input

  • from commandline:
  • chomp ($a = <STDIN>); print $a;
  • # input ends up at newline
  • file redirection:
  • %>myprog.pl < my_input.txt
  • ……
  • $line_n = 1;
  • while (<STDIN>){
  • print “$line_n\t$_;
  • $line_n++;
  • }
slide20

<>: get user input from commandline

  • %>myprog.pl input1 - input2
  • ……
  • $line_n = 1;
  • while (<>){
  • print “$line_n\t$_;
  • $line_n++;
  • }
  • The difference between <> and <STDIN>
  • <> works from @ARGV
slide21

more on print

  • buffer
  • print <>; #string operator
  • # work like cat in commandline
  • print () function
  • print (3+4)*5;
  • print “The result is: “, (3+4)*5;
slide22

printf

  • printf “The mutation is at %s position.\n”,
  • $count_mut;
  • %s, %f, %d, %g……
  • %2d
  • %-12s (left justified)
  • %12.3f (right justified)
  • %: does not interpolate whole hash
  • %% to print ‘%’
slide23

Review of Perl (1)

  • More on I/O
  • Regular Expression basics
  • More on regular expression
  • Using regular expressions
  • File and directory handles
slide24

regular expression or pattern

  • mini-program
  • match or doesn’t match a given string
  • match any number of strings
  • doesn’t matter how many times
  • it matches to a string
  • works like grep
  • $p_seq = “ADCSFTSCGNYEQ”;
  • if(/SFT/){
  • print “It has the motif \”SFT\”.\n”
  • }
slide25

metacharacters

  • . Matches anything but “\n”
  • \ escape (/3\.14)
  • () grouping
slide26

simple qualifiers

  • the following qualifiers repeat the previous pattern
  • * 0 or more times
  • + 1 or more times
  • ? 0 or 1 times
slide27

the ‘|’alternative pattern

  • /T|S/
  • /protein(and|or)DNA/
  • /arg(ser|cys)lys/
slide28

Review of Perl (1)

  • More on I/O
  • Regular Expression basics
  • More on regular expression
  • Using regular expressions
  • File and directory handles
slide29

character classes

  • [] matches any single character inside
  • [AGCT] # any deoxynucleotides
  • [a-zA-Z0-9]+ # 1 or more of letters or digits
  • [;\-,] # ‘-’ needs to be escaped
slide30

character classes shortcuts

  • \d [0-9]
  • \w [A-Za-z0-9_] # only a char, \w+ a word
  • \s [\f\t\n\r ] # whitespace
  • negating the shortcuts
  • \D [^\d]
  • \W [^\w]
  • \S [^\s]
  • can be part of a larger class
  • [\dA-F]
  • [\d\D] (any char)
  • [^\d\D] (nothing)
slide31

general qualifiers

    • * 0 or more repetitions
    • + 1 or more
    • ? 0 or 1
  • {3, 5} 3 to 5
  • {3,} 3 or more
  • {3} exactly 3 repetitions
  • /U{5,8}/
  • /\w{8}/
  • /A{15,100}/
  • /(arg){2,}/
  • * {0,}
  • how about + and ?
slide32

anchors

  • ^ marks beginning of the string
  • /^ATG/ # initiation codon
  • [^AGCT] # ?
  • $ marks the end
  • /(UA[AG]|UGA)$/ # stop codons
  • /^\s*$/ # a blank line
slide33

word anchors

  • \b word boundary anchor
  • matches either end of a word
  • /\barg/ # arg, arginine, arginyl, argue……
  • /\barg\b/ # arg
  • \B nonword boundary anchor
  • matches any point that \b would not
  • \barg\B/ # arginine, arginyl, argue……
slide34

memory ()

  • () grouping
  • matched part kept in memory
  • /A(ACGT)T/ # ACGT in memory
  • backreferences
  • \1 \2
  • /(AACGTT).*\1/ # can EcoRI cut the insert out?
  • /(.)\1/ NOT /../ #two same char; two char
  • memory variables
  • $1
slide35

precedence

  • which parts of the pattern stick together
  • more tightly
  • ()
  • *+?{}
  • ^$\b\B sequence
  • |
  • atoms chars, classes, backreferences
  • examples
  • /^fred|barnay$/
  • /^(\w+)\s+(\w+)$/
slide36

Review of Perl (1)

  • More on I/O
  • Regular Expression basics
  • More on regular expression
  • Using regular expressions
  • File and directory handles
slide37

m//

  • a more general pattern match operator
  • can use any pairs of delimiters
  • //
  • m,, m!! m^^ m##
  • m<> m{} m[] m()
  • example:
  • m%^http://% is better than /^http:\/\//
slide38

option modifiers

  • /i case insensitive
  • matches both cases for all letters
  • /\byes\b/ # Yes yes YES
  • /s matches any character
  • more than .
  • /\d\D/
  • $_ = “ACGTTTGCG\nAACACGT”;
  • /^(ACG).*(CGT)$/s
  • do not confuse with the \s shortcut
slide39

combiningoption modifiers

  • /si # both /s and /i
  • $_ = “aCGTTTGCG\nAACAcGT”;
  • if(/^(ACG).*(CGT)$/si){
  • print “That sequence begins with ACG”,
  • “and ends with CGT.\n”
  • }
  • other options
slide40

the binding operator =~

  • if (/\w+/i){……} # only works on $_
  • if ($seq =~ /^(ACG).*(CGT)$/si){
  • print “That sequence begins with ACG”,
  • “and ends with CGT.\n”
  • }
  • $prot_seq = <STDIN> =~ /[^ACGT]/i;
  • if ($prot_seq) {blastp;}
slide41

interpolating into patterns

  • my $p = “arg”;
  • if ($seq =~ /($p)$/si){
  • print “That sequence ends with $p.\n”
  • }
  • $profile = shift @ARGV; # get commandline args
  • if ($prot_seq =~ /$profile/si) {
  • print “$prot_seq has motif $profile;
  • }
slide42

the match variables

  • /(A)\1/ # use \1 inside pattern
  • $1 # hold memory value in Perl code
  • if ($seq =~ /(g.)\1/si){
  • print “That sequence has a $1 repeat.\n”
  • }
  • if ($prot_seq =~ /([stavli]{3,}).*([Deq]{3,})/si) {
  • print “$prot_seq has hydrophobic region $1 ”,
  • “followed by hydrophilic region $2.\n”;
  • }
slide43

the persistence of match

  • next successful match will overwrite the earlier one
  • store your $1 away!
  • if ($prot_seq =~ /([cstv]+)/si) {
  • my $motif = $1;
  • }
  • test your match before using $1
  • it could be a leftover
  • $prot_seq =~ /([cstv]+)/si;
  • print “I found the motif $1, correct?\n”
slide44

automatic matched variables (PP121)

  • $& matched part of string
  • $` part before the match
  • $’ part after the match
  • $`$&$’ the whole string
  • ……
  • print “The matched string is the following,”,
  • “the part matched is in <>:\n”,
  • “$`<$&>$’\n”;
slide45

substitutions with s///

  • m// search
  • s/// search and replace
  • s/match_pattern/replacement_string/
  • returns true if successful, false if not
  • replacement string:
  • $1
  • empty
  • $&
  • words
  • whitespaces
  • ……
slide46

examples of s///

  • if (s/([a-z]{3})([cstyleu]{3})/$2/){
  • print “The mutant protein has a $1 deletion ”,
  • “before $2.\n”;
  • s/(arg)/$1$1/; # arg insertion
  • s/arg/cys/; # cys substitution
  • s/\s+//g; # get rid of all whitespaces
  • s/\s+/ /g; # single space delimiters
  • s/[^acgt]//gi; # clean up DNA sequence
  • s/[tT]/U/g; # translate to RNA sequence
  • s/_END_.*//s; # chop off after END mark
slide47

s/// different delimiters

  • just like:
  • m//
  • qw//
  • can use unpaired or paired delimiters
  • ,, “” {} [] %% ##
  • s#^https://#http://#
  • s{T}{U}
  • s<T>#U#
slide48

binding operator for s///

  • works for non-default variables
  • $dna_seq =~ s/[^acgt]/n/gis;
slide49

case shifting

  • $dna =~ s/(.+)/\U$1/;
  • $prot =~ s/(.+)/\L$1/;
  • $prot =~ s/(\w+)/\u\L$1/gi;
slide50

split

  • split /seperator/, $string;
  • @aa = split / /, $aa;
  • split /seperator/;
  • @aa = split /:/; # used $_ eg. “a:b:c:d”
  • split //;
  • @data = split //; # still $_, split each char
  • split; split /\s+/, $_;
  • @data = split; # split $_ at whitespaces
slide51

join

  • join glue, list of pieces;
  • $full_name = join ‘ ‘, $first, $middle, $last;
  • $x = join ‘y’, @empty; # empty string
slide52

Review of Perl (1)

  • More on I/O
  • Regular Expression basics
  • More on regular expression
  • Using regular expressions
  • File and directory handles
slide53

File handle

  • a name for I/O connection, not file name
  • usually named uppercases, _ and digits
  • Perl’s special file handles
  • do not name your’s with the 6 handles
  • STDIN
  • STDOUT
  • %>myprog.pl <input >output
  • STDERR
  • another stream
  • DATA ARGV ARGVOUT
slide54

opening filehandles

  • STDIN, STDOUT, STDERR automatically opened
  • open SEQ “e_coli_dna”;
  • open INPUT “<e_coli_dna”;
  • open OUT1 “>intergene_seq”;
  • open LOG “>>genome__Update_log”;
  • my $out_file = <STDIN>;
  • open OUT2 “>$out_file”;
slide55

closing a filehandle

  • release memory
  • automatic closing on reopen or exit
  • close LOG;
slide56

return value of opening filehandle

  • open returns true or false
  • reasons fail to open:
  • permission, spelling, not created (for input)
  • consequence of fail:
  • EOF (undef), no data input
  • output discarded
  • turn on -w
  • die, warn
slide57

die when having fatal error

  • $length = $a/$b or die “Can’t calculate: $!”;
  • # if $b is 0
  • open LOG, “>>log_file”
  • or die “Cannot create logfile: $!”;
  • die “Not enough arguments\n” @ARGV < 2;
  • * $! Is the system error message
slide58

warn when not fetal

  • just like die except not quitting the program
  • warn “Input sequence is too short.\n”
  • if $seq_len < 30;
slide59

using filehandles

  • while (<SEQ>) {
  • if /^AUG/
  • {
  • print INITIAL_SEQ $_;
  • print OUT1 (“>$accession\n$_\n”);
  • print LOG “The sequence is updated\n”;
  • }
  • }
slide60

file tests

  • warn “$filename is not updated.\n”
  • if -M INPUT > 14;
  • die “File named $filename already exists.\n”
  • if -e $filename;
  • if (-s $filename) {
  • print “File made successfully.\n”;
  • }
slide61

chdir

  • similar to UNIX cd
  • chdir “/fasta” or die “Can’t chdir to fasta: $!”;
  • chdir; # Perl finds your home, not using $_
slide62

glob and <something>

  • similar to UNIX ls, returns a list
  • my @all_files = glob “.* *”;
  • my @seq_files = glob “*.seq”;
  • my @dir_files = <$dir/.* $dir/*>;
  • my @files = <FASTA/*>;
  • my @lines = <FASTA>;
  • my @files = <$name/*>;
  • my @lines = readline FASTA;
slide63

directory handles

  • opendir FASTA, $dir or die “Can’t open: $!”;
  • @files = readdir FASTA;
  • closedir FASTA;
  • while ($name = readdir FASTA){
  • if ($name =~ /\.seq$/){
  • do something here……
  • }
slide64

unlink files

  • similar to UNIX rm
  • unlink “seq1”, “seq2”, “seq3”;
  • unlink glob “*.seq”;
  • rename files
  • similar to UNIX mv
  • rename “old”, “new”;
  • rename “/bin/somewhere/e_coli”, “e_coli”;
slide65

mkdir

  • similar to UNIX mkdir
  • mkdir “fasta”, 0755;
  • mkdir $name, oct($permission);
  • rmdir
  • similar to UNIX rmdir
  • rmdir $dir or warn “Can’t remove $dir: $!”;
  • rmdir glob “$dir/*”;
slide66

change permissions with chmod

  • similar to UNIX chmod
  • chmod 0640, “seq1”, “seq2”, “seq3”;
  • change ownership with chown
  • similar to UNIX chown
  • chown $user, $group, glob “*.seq”;
slide67

Example 1: Expression

# Take in what a user types, and turn .com web sites into .orgs, and change

# the "@" in their email address to something else

while (<STDIN>) {

if (/^quit$/i) { # Leave the program if the use types "quit"

last;

}

else {

# replace .coms in URLs and with .orgs. Only do it

# for the "first match" in the string

s/(http:\/\/[\w\d\.]+)\.com/$1\.org/i;

# replace the @ in email addresses with the ^ symbol. Do it for

# ALL occurrences in the string

s/([\w\d]+)\@([\w\d\.]+)/$1\^$2/ig;

# Print out the modified string

print;

}

}

ad