Perl (2)
This presentation is the property of its rightful owner.
Sponsored Links
1 / 67

Perl (2) Hongkang Mei, Ph.D. March 10, 2002 PowerPoint PPT Presentation


  • 45 Views
  • Uploaded on
  • Presentation posted in: General

Perl (2) Hongkang Mei, Ph.D. March 10, 2002. Review of Perl (1) More on I/O Regular Expression basics More on regular expression Using regular expressions File and directory handles. Scalar data something single or just one number or string, interchangeable acted upon with operators

Download Presentation

Perl (2) Hongkang Mei, Ph.D. March 10, 2002

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Perl 2 hongkang mei ph d march 10 2002

Perl (2)

Hongkang Mei, Ph.D.

March 10, 2002


Perl 2 hongkang mei ph d march 10 2002

  • Review of Perl (1)

  • More on I/O

  • Regular Expression basics

  • More on regular expression

  • Using regular expressions

  • File and directory handles


Perl 2 hongkang mei ph d march 10 2002

  • Scalar data

    something single or just one

    number or string, interchangeable

    acted upon with operators

    (a scalar variables holds value of a scalar)

  • List data

    list of scalars

    (array is a variable contains list)

    (hash or associative array is a variable contains a list with pairs of scalars associated to each other)


Perl 2 hongkang mei ph d march 10 2002

  • Scalar variables

    ‘$’ followed by Perl identifier

  • Should be descriptive

  • Perl built-in scalar variables

  • $ARGV, $_, ”……

    *Perl identifier

    letters, ‘_’, digits, not begin with digit


Perl 2 hongkang mei ph d march 10 2002

  • Numeric operators

  • ++ incrementing the value

  • $counter++;

  • $v = $counter++;

  • is different from

  • $v = ++$counter;

  • --decrementing the value


Perl 2 hongkang mei ph d march 10 2002

  • List data

    listliterals

    scalars separated by ‘,’ in ()

    (1, 2, 3, 4, 5)

    (“dnaA”, “argC”, “rnpA”)

    qw/ dnaA argC rnpA/

    range operator ..

    (1..5)# (1, 2, 3, 4, 5)

    (1.2..5.7)# same

    (5..1)# empty

    ($a..$b)# depend on current values

    * The qw shortcut

    treated like ‘’ string

    uses any punctuation pairs

    / /, “”, {}, [], (), <>, ##, !!


Perl 2 hongkang mei ph d march 10 2002

  • Array variables

  • @ + identifier

  • no unnecessary limit

  • Array elements: scalar variables

0

1

2

3

4

Array{

C

Scalar variable

indices


Perl 2 hongkang mei ph d march 10 2002

  • Accessing array elements

  • achieved by calling the scalar variables:

  • $seq[0]

  • print $seq[3];

  • $seq[1] = ‘acg’;

  • $seq is a different thing!

  • @ and $ have different namespaces


Perl 2 hongkang mei ph d march 10 2002

  • Hashes

  • A hash is a variable containing list with

  • paired scalar values associated each other

  • % + identifier

  • no unnecessary limit

  • keys values

C

Hash {

Scalar variables


Perl 2 hongkang mei ph d march 10 2002

  • Hash element access

  • $hash{$key}

  • $seq{“dnaA”} = “CAGACTCGAT”;

  • foreach $gene (qw/dnaA argC rnt/) {

  • print “The sequence for $gene is $seq{$gene}.\n”;

  • }

  • $key can be expr.

  • $seq{“unknown”} # undef


Perl 2 hongkang mei ph d march 10 2002

  • Interpolation of variables into strings

  • Scalar:

  • print $aa_seq;

  • print “The sequence is $aa_seq.\n”;

  • print “The file contains $count ${type}s.\n”;

  • Array:

  • print “The list contains @array\n”;

  • print @array;

  • print 3 * @array;

  • Hash:

  • print “The AC# for ‘dnaA’ is $g_ac{‘dnaA’};

  • NO interpolation for the whole hash!!

  • Printf “The %s has %d AAs.\n”, $prot, $len;


Perl 2 hongkang mei ph d march 10 2002

  • SCALAR and LIST CONTEXT

  • Using the same variable in different context

  • means different things

  • depending on what Perl is expecting

  • 5 + @aa;# scalar

  • sort @aa;# list

  • @list = @aa;

  • @list = $aa;

  • $aa[0] = @list;

  • print “The full aa list is @aa.\n”;

  • print “The number of aa is “ . @aa . “.\n”

  • print @aa;


Perl 2 hongkang mei ph d march 10 2002

  • Control structures

  • if (true) {...}elsif{…}else{…}

  • while (true) {...}

  • foreach $line (list){...}

  • for($i=1; $i<11; $i++) {…}

  • unless(true){…}#if(false){…}

  • until(true){…}#while(false){…}


Perl 2 hongkang mei ph d march 10 2002

  • Control structures

  • autoincrementautodecrement

  • $n++; $n--;

  • ++$n; --$n;

  • $m = $n++;

  • $m = ++$n;

  • $m = $n; $n++;

  • logical operators

  • &&, ||, !

  • and, not, or


Perl 2 hongkang mei ph d march 10 2002

  • Control structures

  • expression modifier

  • print “Acidic\n” if $pH < 7;

  • print “ “, ($n += 2) while $n < 10;

  • print “$aa{$_[0]}\t” foreach (keys %codon);

  • short-circuit operator

  • my $n_aa = $aa{$codon} || “not in the list”;

  • the ternary operator ?:

  • $aa = ($pI{$aa} < 7) ? “acidic” :

  • ($pI{$aa} = 7) ? “neutral” :

  • ($pI{$aa} > 7) ? “basic”;


Perl 2 hongkang mei ph d march 10 2002

  • Subroutines

  • functions or subroutines

  • define:

  • sub my_funct {

  • $dna_length = 3 * length($aa_seq);

  • print “DNA is $dna_length basepairs\n”;

  • }

  • Invoke:

  • &my_funct;


Perl 2 hongkang mei ph d march 10 2002

  • Built-in functions

  • print

  • chomp

  • defined

  • chop

  • reverse, sort

  • pop, push, shift, unshift

  • return

  • length

  • scalar # a fake one

  • ……

  • perlfunc manpage


Perl 2 hongkang mei ph d march 10 2002

  • Review of Perl (1)

  • More on I/O

  • Regular Expression basics

  • More on regular expression

  • Using regular expressions

  • File and directory handles


Perl 2 hongkang mei ph d march 10 2002

  • <STDIN>: get user input

  • from commandline:

  • chomp ($a = <STDIN>); print $a;

  • # input ends up at newline

  • file redirection:

  • %>myprog.pl < my_input.txt

  • ……

  • $line_n = 1;

  • while (<STDIN>){

  • print “$line_n\t$_;

  • $line_n++;

  • }


Perl 2 hongkang mei ph d march 10 2002

  • <>: get user input from commandline

  • %>myprog.pl input1 - input2

  • ……

  • $line_n = 1;

  • while (<>){

  • print “$line_n\t$_;

  • $line_n++;

  • }

  • The difference between <> and <STDIN>

  • <> works from @ARGV


Perl 2 hongkang mei ph d march 10 2002

  • more on print

  • buffer

  • print <>; #string operator

  • # work like cat in commandline

  • print () function

  • print (3+4)*5;

  • print “The result is: “, (3+4)*5;


Perl 2 hongkang mei ph d march 10 2002

  • printf

  • printf “The mutation is at %s position.\n”,

  • $count_mut;

  • %s, %f, %d, %g……

  • %2d

  • %-12s (left justified)

  • %12.3f (right justified)

  • %: does not interpolate whole hash

  • %% to print ‘%’


Perl 2 hongkang mei ph d march 10 2002

  • Review of Perl (1)

  • More on I/O

  • Regular Expression basics

  • More on regular expression

  • Using regular expressions

  • File and directory handles


Perl 2 hongkang mei ph d march 10 2002

  • regular expression or pattern

  • mini-program

  • match or doesn’t match a given string

  • match any number of strings

  • doesn’t matter how many times

  • it matches to a string

  • works like grep

  • $p_seq = “ADCSFTSCGNYEQ”;

  • if(/SFT/){

  • print “It has the motif \”SFT\”.\n”

  • }


Perl 2 hongkang mei ph d march 10 2002

  • metacharacters

  • .Matches anything but “\n”

  • \escape (/3\.14)

  • ()grouping


Perl 2 hongkang mei ph d march 10 2002

  • simple qualifiers

  • the following qualifiers repeat the previous pattern

  • *0 or more times

  • +1 or more times

  • ?0 or 1 times


Perl 2 hongkang mei ph d march 10 2002

  • the ‘|’alternative pattern

  • /T|S/

  • /protein(and|or)DNA/

  • /arg(ser|cys)lys/


Perl 2 hongkang mei ph d march 10 2002

  • Review of Perl (1)

  • More on I/O

  • Regular Expression basics

  • More on regular expression

  • Using regular expressions

  • File and directory handles


Perl 2 hongkang mei ph d march 10 2002

  • character classes

  • []matches any single character inside

  • [AGCT] # any deoxynucleotides

  • [a-zA-Z0-9]+# 1 or more of letters or digits

  • [;\-,]# ‘-’ needs to be escaped


Perl 2 hongkang mei ph d march 10 2002

  • character classes shortcuts

  • \d[0-9]

  • \w[A-Za-z0-9_] # only a char, \w+ a word

  • \s[\f\t\n\r ]# whitespace

  • negating the shortcuts

  • \D[^\d]

  • \W[^\w]

  • \S[^\s]

  • can be part of a larger class

  • [\dA-F]

  • [\d\D] (any char)

  • [^\d\D] (nothing)


Perl 2 hongkang mei ph d march 10 2002

  • general qualifiers

    • *0 or more repetitions

    • +1 or more

    • ?0 or 1

  • {3, 5}3 to 5

  • {3,}3 or more

  • {3}exactly 3 repetitions

  • /U{5,8}/

  • /\w{8}/

  • /A{15,100}/

  • /(arg){2,}/

  • *{0,}

  • how about + and ?


Perl 2 hongkang mei ph d march 10 2002

  • anchors

  • ^marks beginning of the string

  • /^ATG/# initiation codon

  • [^AGCT]# ?

  • $marks the end

  • /(UA[AG]|UGA)$/# stop codons

  • /^\s*$/# a blank line


Perl 2 hongkang mei ph d march 10 2002

  • word anchors

  • \bword boundary anchor

  • matches either end of a word

  • /\barg/# arg, arginine, arginyl, argue……

  • /\barg\b/# arg

  • \B nonword boundary anchor

  • matches any point that \b would not

  • \barg\B/# arginine, arginyl, argue……


Perl 2 hongkang mei ph d march 10 2002

  • memory ()

  • ()grouping

  • matched part kept in memory

  • /A(ACGT)T/# ACGT in memory

  • backreferences

  • \1\2

  • /(AACGTT).*\1/# can EcoRI cut the insert out?

  • /(.)\1/NOT/../ #two same char; two char

  • memory variables

  • $1


Perl 2 hongkang mei ph d march 10 2002

  • precedence

  • which parts of the pattern stick together

  • more tightly

  • ()

  • *+?{}

  • ^$\b\B sequence

  • |

  • atoms chars, classes, backreferences

  • examples

  • /^fred|barnay$/

  • /^(\w+)\s+(\w+)$/


Perl 2 hongkang mei ph d march 10 2002

  • Review of Perl (1)

  • More on I/O

  • Regular Expression basics

  • More on regular expression

  • Using regular expressions

  • File and directory handles


Perl 2 hongkang mei ph d march 10 2002

  • m//

  • a more general pattern match operator

  • can use any pairs of delimiters

  • //

  • m,, m!! m^^ m##

  • m<> m{} m[] m()

  • example:

  • m%^http://% is better than /^http:\/\//


Perl 2 hongkang mei ph d march 10 2002

  • option modifiers

  • /icase insensitive

  • matches both cases for all letters

  • /\byes\b/# Yes yes YES

  • /smatches any character

  • more than .

  • /\d\D/

  • $_ = “ACGTTTGCG\nAACACGT”;

  • /^(ACG).*(CGT)$/s

  • do not confuse with the \s shortcut


Perl 2 hongkang mei ph d march 10 2002

  • combiningoption modifiers

  • /si# both /s and /i

  • $_ = “aCGTTTGCG\nAACAcGT”;

  • if(/^(ACG).*(CGT)$/si){

  • print “That sequence begins with ACG”,

  • “and ends with CGT.\n”

  • }

  • other options


Perl 2 hongkang mei ph d march 10 2002

  • the binding operator =~

  • if (/\w+/i){……} # only works on $_

  • if ($seq =~ /^(ACG).*(CGT)$/si){

  • print “That sequence begins with ACG”,

  • “and ends with CGT.\n”

  • }

  • $prot_seq = <STDIN> =~ /[^ACGT]/i;

  • if ($prot_seq) {blastp;}


Perl 2 hongkang mei ph d march 10 2002

  • interpolating into patterns

  • my $p = “arg”;

  • if ($seq =~ /($p)$/si){

  • print “That sequence ends with $p.\n”

  • }

  • $profile = shift @ARGV; # get commandline args

  • if ($prot_seq =~ /$profile/si) {

  • print “$prot_seq has motif $profile;

  • }


Perl 2 hongkang mei ph d march 10 2002

  • the match variables

  • /(A)\1/# use \1 inside pattern

  • $1# hold memory value in Perl code

  • if ($seq =~ /(g.)\1/si){

  • print “That sequence has a $1 repeat.\n”

  • }

  • if ($prot_seq =~ /([stavli]{3,}).*([Deq]{3,})/si) {

  • print “$prot_seq has hydrophobic region $1 ”,

  • “followed by hydrophilic region $2.\n”;

  • }


Perl 2 hongkang mei ph d march 10 2002

  • the persistence of match

  • next successful match will overwrite the earlier one

  • store your $1 away!

  • if ($prot_seq =~ /([cstv]+)/si) {

  • my $motif = $1;

  • }

  • test your match before using $1

  • it could be a leftover

  • $prot_seq =~ /([cstv]+)/si;

  • print “I found the motif $1, correct?\n”


Perl 2 hongkang mei ph d march 10 2002

  • automatic matched variables (PP121)

  • $& matched part of string

  • $`part before the match

  • $’part after the match

  • $`$&$’the whole string

  • ……

  • print “The matched string is the following,”,

  • “the part matched is in <>:\n”,

  • “$`<$&>$’\n”;


Perl 2 hongkang mei ph d march 10 2002

  • substitutions with s///

  • m//search

  • s///search and replace

  • s/match_pattern/replacement_string/

  • returns true if successful, false if not

  • replacement string:

  • $1

  • empty

  • $&

  • words

  • whitespaces

  • ……


Perl 2 hongkang mei ph d march 10 2002

  • examples of s///

  • if (s/([a-z]{3})([cstyleu]{3})/$2/){

  • print “The mutant protein has a $1 deletion ”,

  • “before $2.\n”;

  • s/(arg)/$1$1/;# arg insertion

  • s/arg/cys/;# cys substitution

  • s/\s+//g;# get rid of all whitespaces

  • s/\s+/ /g;# single space delimiters

  • s/[^acgt]//gi;# clean up DNA sequence

  • s/[tT]/U/g;# translate to RNA sequence

  • s/_END_.*//s;# chop off after END mark


Perl 2 hongkang mei ph d march 10 2002

  • s/// different delimiters

  • just like:

  • m//

  • qw//

  • can use unpaired or paired delimiters

  • ,, “” {} [] %% ##

  • s#^https://#http://#

  • s{T}{U}

  • s<T>#U#


Perl 2 hongkang mei ph d march 10 2002

  • binding operator for s///

  • works for non-default variables

  • $dna_seq =~ s/[^acgt]/n/gis;


Perl 2 hongkang mei ph d march 10 2002

  • case shifting

  • $dna =~ s/(.+)/\U$1/;

  • $prot =~ s/(.+)/\L$1/;

  • $prot =~ s/(\w+)/\u\L$1/gi;


Perl 2 hongkang mei ph d march 10 2002

  • split

  • split /seperator/, $string;

  • @aa = split / /, $aa;

  • split /seperator/;

  • @aa = split /:/; # used $_ eg. “a:b:c:d”

  • split //;

  • @data = split //; # still $_, split each char

  • split; split /\s+/, $_;

  • @data = split;# split $_ at whitespaces


Perl 2 hongkang mei ph d march 10 2002

  • join

  • join glue, list of pieces;

  • $full_name = join ‘ ‘, $first, $middle, $last;

  • $x = join ‘y’, @empty;# empty string


Perl 2 hongkang mei ph d march 10 2002

  • Review of Perl (1)

  • More on I/O

  • Regular Expression basics

  • More on regular expression

  • Using regular expressions

  • File and directory handles


Perl 2 hongkang mei ph d march 10 2002

  • File handle

  • a name for I/O connection, not file name

  • usually named uppercases, _ and digits

  • Perl’s special file handles

  • do not name your’s with the 6 handles

  • STDIN

  • STDOUT

  • %>myprog.pl <input >output

  • STDERR

  • another stream

  • DATA ARGV ARGVOUT


Perl 2 hongkang mei ph d march 10 2002

  • opening filehandles

  • STDIN, STDOUT, STDERR automatically opened

  • open SEQ “e_coli_dna”;

  • open INPUT “<e_coli_dna”;

  • open OUT1 “>intergene_seq”;

  • open LOG “>>genome__Update_log”;

  • my $out_file = <STDIN>;

  • open OUT2 “>$out_file”;


Perl 2 hongkang mei ph d march 10 2002

  • closing a filehandle

  • release memory

  • automatic closing on reopen or exit

  • close LOG;


Perl 2 hongkang mei ph d march 10 2002

  • return value of opening filehandle

  • open returns true or false

  • reasons fail to open:

  • permission, spelling, not created (for input)

  • consequence of fail:

  • EOF (undef), no data input

  • output discarded

  • turn on -w

  • die, warn


Perl 2 hongkang mei ph d march 10 2002

  • die when having fatal error

  • $length = $a/$b or die “Can’t calculate: $!”;

  • # if $b is 0

  • open LOG, “>>log_file”

  • or die “Cannot create logfile: $!”;

  • die “Not enough arguments\n” @ARGV < 2;

  • * $! Is the system error message


Perl 2 hongkang mei ph d march 10 2002

  • warn when not fetal

  • just like die except not quitting the program

  • warn “Input sequence is too short.\n”

  • if $seq_len < 30;


Perl 2 hongkang mei ph d march 10 2002

  • using filehandles

  • while (<SEQ>) {

  • if /^AUG/

  • {

  • print INITIAL_SEQ $_;

  • print OUT1 (“>$accession\n$_\n”);

  • print LOG “The sequence is updated\n”;

  • }

  • }


Perl 2 hongkang mei ph d march 10 2002

  • file tests

  • warn “$filename is not updated.\n”

  • if -M INPUT > 14;

  • die “File named $filename already exists.\n”

  • if -e $filename;

  • if (-s $filename) {

  • print “File made successfully.\n”;

  • }


Perl 2 hongkang mei ph d march 10 2002

  • chdir

  • similar to UNIX cd

  • chdir “/fasta” or die “Can’t chdir to fasta: $!”;

  • chdir;# Perl finds your home, not using $_


Perl 2 hongkang mei ph d march 10 2002

  • glob and <something>

  • similar to UNIX ls, returns a list

  • my @all_files = glob “.* *”;

  • my @seq_files = glob “*.seq”;

  • my @dir_files = <$dir/.* $dir/*>;

  • my @files = <FASTA/*>;

  • my @lines = <FASTA>;

  • my @files = <$name/*>;

  • my @lines = readline FASTA;


Perl 2 hongkang mei ph d march 10 2002

  • directory handles

  • opendir FASTA, $dir or die “Can’t open: $!”;

  • @files = readdir FASTA;

  • closedir FASTA;

  • while ($name = readdir FASTA){

  • if ($name =~ /\.seq$/){

  • do something here……

  • }


Perl 2 hongkang mei ph d march 10 2002

  • unlink files

  • similar to UNIX rm

  • unlink “seq1”, “seq2”, “seq3”;

  • unlink glob “*.seq”;

  • rename files

  • similar to UNIX mv

  • rename “old”, “new”;

  • rename “/bin/somewhere/e_coli”, “e_coli”;


Perl 2 hongkang mei ph d march 10 2002

  • mkdir

  • similar to UNIX mkdir

  • mkdir “fasta”, 0755;

  • mkdir $name, oct($permission);

  • rmdir

  • similar to UNIX rmdir

  • rmdir $dir or warn “Can’t remove $dir: $!”;

  • rmdir glob “$dir/*”;


Perl 2 hongkang mei ph d march 10 2002

  • change permissions with chmod

  • similar to UNIX chmod

  • chmod 0640, “seq1”, “seq2”, “seq3”;

  • change ownership with chown

  • similar to UNIX chown

  • chown $user, $group, glob “*.seq”;


Perl 2 hongkang mei ph d march 10 2002

Example 1: Expression

# Take in what a user types, and turn .com web sites into .orgs, and change

# the "@" in their email address to something else

while (<STDIN>) {

if (/^quit$/i) { # Leave the program if the use types "quit"

last;

}

else {

# replace .coms in URLs and with .orgs. Only do it

# for the "first match" in the string

s/(http:\/\/[\w\d\.]+)\.com/$1\.org/i;

# replace the @ in email addresses with the ^ symbol. Do it for

# ALL occurrences in the string

s/([\w\d]+)\@([\w\d\.]+)/$1\^$2/ig;

# Print out the modified string

print;

}

}


  • Login