Perl (2) Hongkang Mei, Ph.D. March 10, 2002 - PowerPoint PPT Presentation

Perl (2)
Download
1 / 67

  • 56 Views
  • Uploaded on
  • Presentation posted in: General

Perl (2) Hongkang Mei, Ph.D. March 10, 2002. Review of Perl (1) More on I/O Regular Expression basics More on regular expression Using regular expressions File and directory handles. Scalar data something single or just one number or string, interchangeable acted upon with operators

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Perl (2) Hongkang Mei, Ph.D. March 10, 2002

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Perl 2 hongkang mei ph d march 10 2002

Perl (2)

Hongkang Mei, Ph.D.

March 10, 2002


Perl 2 hongkang mei ph d march 10 2002

  • Review of Perl (1)

  • More on I/O

  • Regular Expression basics

  • More on regular expression

  • Using regular expressions

  • File and directory handles


Perl 2 hongkang mei ph d march 10 2002

  • Scalar data

    something single or just one

    number or string, interchangeable

    acted upon with operators

    (a scalar variables holds value of a scalar)

  • List data

    list of scalars

    (array is a variable contains list)

    (hash or associative array is a variable contains a list with pairs of scalars associated to each other)


Perl 2 hongkang mei ph d march 10 2002

  • Scalar variables

    ‘$’ followed by Perl identifier

  • Should be descriptive

  • Perl built-in scalar variables

  • $ARGV, $_, ”……

    *Perl identifier

    letters, ‘_’, digits, not begin with digit


Perl 2 hongkang mei ph d march 10 2002

  • Numeric operators

  • ++ incrementing the value

  • $counter++;

  • $v = $counter++;

  • is different from

  • $v = ++$counter;

  • --decrementing the value


Perl 2 hongkang mei ph d march 10 2002

  • List data

    listliterals

    scalars separated by ‘,’ in ()

    (1, 2, 3, 4, 5)

    (“dnaA”, “argC”, “rnpA”)

    qw/ dnaA argC rnpA/

    range operator ..

    (1..5)# (1, 2, 3, 4, 5)

    (1.2..5.7)# same

    (5..1)# empty

    ($a..$b)# depend on current values

    * The qw shortcut

    treated like ‘’ string

    uses any punctuation pairs

    / /, “”, {}, [], (), <>, ##, !!


Perl 2 hongkang mei ph d march 10 2002

  • Array variables

  • @ + identifier

  • no unnecessary limit

  • Array elements: scalar variables

0

1

2

3

4

Array{

C

Scalar variable

indices


Perl 2 hongkang mei ph d march 10 2002

  • Accessing array elements

  • achieved by calling the scalar variables:

  • $seq[0]

  • print $seq[3];

  • $seq[1] = ‘acg’;

  • $seq is a different thing!

  • @ and $ have different namespaces


Perl 2 hongkang mei ph d march 10 2002

  • Hashes

  • A hash is a variable containing list with

  • paired scalar values associated each other

  • % + identifier

  • no unnecessary limit

  • keys values

C

Hash {

Scalar variables


Perl 2 hongkang mei ph d march 10 2002

  • Hash element access

  • $hash{$key}

  • $seq{“dnaA”} = “CAGACTCGAT”;

  • foreach $gene (qw/dnaA argC rnt/) {

  • print “The sequence for $gene is $seq{$gene}.\n”;

  • }

  • $key can be expr.

  • $seq{“unknown”} # undef


Perl 2 hongkang mei ph d march 10 2002

  • Interpolation of variables into strings

  • Scalar:

  • print $aa_seq;

  • print “The sequence is $aa_seq.\n”;

  • print “The file contains $count ${type}s.\n”;

  • Array:

  • print “The list contains @array\n”;

  • print @array;

  • print 3 * @array;

  • Hash:

  • print “The AC# for ‘dnaA’ is $g_ac{‘dnaA’};

  • NO interpolation for the whole hash!!

  • Printf “The %s has %d AAs.\n”, $prot, $len;


Perl 2 hongkang mei ph d march 10 2002

  • SCALAR and LIST CONTEXT

  • Using the same variable in different context

  • means different things

  • depending on what Perl is expecting

  • 5 + @aa;# scalar

  • sort @aa;# list

  • @list = @aa;

  • @list = $aa;

  • $aa[0] = @list;

  • print “The full aa list is @aa.\n”;

  • print “The number of aa is “ . @aa . “.\n”

  • print @aa;


Perl 2 hongkang mei ph d march 10 2002

  • Control structures

  • if (true) {...}elsif{…}else{…}

  • while (true) {...}

  • foreach $line (list){...}

  • for($i=1; $i<11; $i++) {…}

  • unless(true){…}#if(false){…}

  • until(true){…}#while(false){…}


Perl 2 hongkang mei ph d march 10 2002

  • Control structures

  • autoincrementautodecrement

  • $n++; $n--;

  • ++$n; --$n;

  • $m = $n++;

  • $m = ++$n;

  • $m = $n; $n++;

  • logical operators

  • &&, ||, !

  • and, not, or


Perl 2 hongkang mei ph d march 10 2002

  • Control structures

  • expression modifier

  • print “Acidic\n” if $pH < 7;

  • print “ “, ($n += 2) while $n < 10;

  • print “$aa{$_[0]}\t” foreach (keys %codon);

  • short-circuit operator

  • my $n_aa = $aa{$codon} || “not in the list”;

  • the ternary operator ?:

  • $aa = ($pI{$aa} < 7) ? “acidic” :

  • ($pI{$aa} = 7) ? “neutral” :

  • ($pI{$aa} > 7) ? “basic”;


Perl 2 hongkang mei ph d march 10 2002

  • Subroutines

  • functions or subroutines

  • define:

  • sub my_funct {

  • $dna_length = 3 * length($aa_seq);

  • print “DNA is $dna_length basepairs\n”;

  • }

  • Invoke:

  • &my_funct;


Perl 2 hongkang mei ph d march 10 2002

  • Built-in functions

  • print

  • chomp

  • defined

  • chop

  • reverse, sort

  • pop, push, shift, unshift

  • return

  • length

  • scalar # a fake one

  • ……

  • perlfunc manpage


Perl 2 hongkang mei ph d march 10 2002

  • Review of Perl (1)

  • More on I/O

  • Regular Expression basics

  • More on regular expression

  • Using regular expressions

  • File and directory handles


Perl 2 hongkang mei ph d march 10 2002

  • <STDIN>: get user input

  • from commandline:

  • chomp ($a = <STDIN>); print $a;

  • # input ends up at newline

  • file redirection:

  • %>myprog.pl < my_input.txt

  • ……

  • $line_n = 1;

  • while (<STDIN>){

  • print “$line_n\t$_;

  • $line_n++;

  • }


Perl 2 hongkang mei ph d march 10 2002

  • <>: get user input from commandline

  • %>myprog.pl input1 - input2

  • ……

  • $line_n = 1;

  • while (<>){

  • print “$line_n\t$_;

  • $line_n++;

  • }

  • The difference between <> and <STDIN>

  • <> works from @ARGV


Perl 2 hongkang mei ph d march 10 2002

  • more on print

  • buffer

  • print <>; #string operator

  • # work like cat in commandline

  • print () function

  • print (3+4)*5;

  • print “The result is: “, (3+4)*5;


Perl 2 hongkang mei ph d march 10 2002

  • printf

  • printf “The mutation is at %s position.\n”,

  • $count_mut;

  • %s, %f, %d, %g……

  • %2d

  • %-12s (left justified)

  • %12.3f (right justified)

  • %: does not interpolate whole hash

  • %% to print ‘%’


Perl 2 hongkang mei ph d march 10 2002

  • Review of Perl (1)

  • More on I/O

  • Regular Expression basics

  • More on regular expression

  • Using regular expressions

  • File and directory handles


Perl 2 hongkang mei ph d march 10 2002

  • regular expression or pattern

  • mini-program

  • match or doesn’t match a given string

  • match any number of strings

  • doesn’t matter how many times

  • it matches to a string

  • works like grep

  • $p_seq = “ADCSFTSCGNYEQ”;

  • if(/SFT/){

  • print “It has the motif \”SFT\”.\n”

  • }


Perl 2 hongkang mei ph d march 10 2002

  • metacharacters

  • .Matches anything but “\n”

  • \escape (/3\.14)

  • ()grouping


Perl 2 hongkang mei ph d march 10 2002

  • simple qualifiers

  • the following qualifiers repeat the previous pattern

  • *0 or more times

  • +1 or more times

  • ?0 or 1 times


Perl 2 hongkang mei ph d march 10 2002

  • the ‘|’alternative pattern

  • /T|S/

  • /protein(and|or)DNA/

  • /arg(ser|cys)lys/


Perl 2 hongkang mei ph d march 10 2002

  • Review of Perl (1)

  • More on I/O

  • Regular Expression basics

  • More on regular expression

  • Using regular expressions

  • File and directory handles


Perl 2 hongkang mei ph d march 10 2002

  • character classes

  • []matches any single character inside

  • [AGCT] # any deoxynucleotides

  • [a-zA-Z0-9]+# 1 or more of letters or digits

  • [;\-,]# ‘-’ needs to be escaped


Perl 2 hongkang mei ph d march 10 2002

  • character classes shortcuts

  • \d[0-9]

  • \w[A-Za-z0-9_] # only a char, \w+ a word

  • \s[\f\t\n\r ]# whitespace

  • negating the shortcuts

  • \D[^\d]

  • \W[^\w]

  • \S[^\s]

  • can be part of a larger class

  • [\dA-F]

  • [\d\D] (any char)

  • [^\d\D] (nothing)


Perl 2 hongkang mei ph d march 10 2002

  • general qualifiers

    • *0 or more repetitions

    • +1 or more

    • ?0 or 1

  • {3, 5}3 to 5

  • {3,}3 or more

  • {3}exactly 3 repetitions

  • /U{5,8}/

  • /\w{8}/

  • /A{15,100}/

  • /(arg){2,}/

  • *{0,}

  • how about + and ?


Perl 2 hongkang mei ph d march 10 2002

  • anchors

  • ^marks beginning of the string

  • /^ATG/# initiation codon

  • [^AGCT]# ?

  • $marks the end

  • /(UA[AG]|UGA)$/# stop codons

  • /^\s*$/# a blank line


Perl 2 hongkang mei ph d march 10 2002

  • word anchors

  • \bword boundary anchor

  • matches either end of a word

  • /\barg/# arg, arginine, arginyl, argue……

  • /\barg\b/# arg

  • \B nonword boundary anchor

  • matches any point that \b would not

  • \barg\B/# arginine, arginyl, argue……


Perl 2 hongkang mei ph d march 10 2002

  • memory ()

  • ()grouping

  • matched part kept in memory

  • /A(ACGT)T/# ACGT in memory

  • backreferences

  • \1\2

  • /(AACGTT).*\1/# can EcoRI cut the insert out?

  • /(.)\1/NOT/../ #two same char; two char

  • memory variables

  • $1


Perl 2 hongkang mei ph d march 10 2002

  • precedence

  • which parts of the pattern stick together

  • more tightly

  • ()

  • *+?{}

  • ^$\b\B sequence

  • |

  • atoms chars, classes, backreferences

  • examples

  • /^fred|barnay$/

  • /^(\w+)\s+(\w+)$/


Perl 2 hongkang mei ph d march 10 2002

  • Review of Perl (1)

  • More on I/O

  • Regular Expression basics

  • More on regular expression

  • Using regular expressions

  • File and directory handles


Perl 2 hongkang mei ph d march 10 2002

  • m//

  • a more general pattern match operator

  • can use any pairs of delimiters

  • //

  • m,, m!! m^^ m##

  • m<> m{} m[] m()

  • example:

  • m%^http://% is better than /^http:\/\//


Perl 2 hongkang mei ph d march 10 2002

  • option modifiers

  • /icase insensitive

  • matches both cases for all letters

  • /\byes\b/# Yes yes YES

  • /smatches any character

  • more than .

  • /\d\D/

  • $_ = “ACGTTTGCG\nAACACGT”;

  • /^(ACG).*(CGT)$/s

  • do not confuse with the \s shortcut


Perl 2 hongkang mei ph d march 10 2002

  • combiningoption modifiers

  • /si# both /s and /i

  • $_ = “aCGTTTGCG\nAACAcGT”;

  • if(/^(ACG).*(CGT)$/si){

  • print “That sequence begins with ACG”,

  • “and ends with CGT.\n”

  • }

  • other options


Perl 2 hongkang mei ph d march 10 2002

  • the binding operator =~

  • if (/\w+/i){……} # only works on $_

  • if ($seq =~ /^(ACG).*(CGT)$/si){

  • print “That sequence begins with ACG”,

  • “and ends with CGT.\n”

  • }

  • $prot_seq = <STDIN> =~ /[^ACGT]/i;

  • if ($prot_seq) {blastp;}


Perl 2 hongkang mei ph d march 10 2002

  • interpolating into patterns

  • my $p = “arg”;

  • if ($seq =~ /($p)$/si){

  • print “That sequence ends with $p.\n”

  • }

  • $profile = shift @ARGV; # get commandline args

  • if ($prot_seq =~ /$profile/si) {

  • print “$prot_seq has motif $profile;

  • }


Perl 2 hongkang mei ph d march 10 2002

  • the match variables

  • /(A)\1/# use \1 inside pattern

  • $1# hold memory value in Perl code

  • if ($seq =~ /(g.)\1/si){

  • print “That sequence has a $1 repeat.\n”

  • }

  • if ($prot_seq =~ /([stavli]{3,}).*([Deq]{3,})/si) {

  • print “$prot_seq has hydrophobic region $1 ”,

  • “followed by hydrophilic region $2.\n”;

  • }


Perl 2 hongkang mei ph d march 10 2002

  • the persistence of match

  • next successful match will overwrite the earlier one

  • store your $1 away!

  • if ($prot_seq =~ /([cstv]+)/si) {

  • my $motif = $1;

  • }

  • test your match before using $1

  • it could be a leftover

  • $prot_seq =~ /([cstv]+)/si;

  • print “I found the motif $1, correct?\n”


Perl 2 hongkang mei ph d march 10 2002

  • automatic matched variables (PP121)

  • $& matched part of string

  • $`part before the match

  • $’part after the match

  • $`$&$’the whole string

  • ……

  • print “The matched string is the following,”,

  • “the part matched is in <>:\n”,

  • “$`<$&>$’\n”;


Perl 2 hongkang mei ph d march 10 2002

  • substitutions with s///

  • m//search

  • s///search and replace

  • s/match_pattern/replacement_string/

  • returns true if successful, false if not

  • replacement string:

  • $1

  • empty

  • $&

  • words

  • whitespaces

  • ……


Perl 2 hongkang mei ph d march 10 2002

  • examples of s///

  • if (s/([a-z]{3})([cstyleu]{3})/$2/){

  • print “The mutant protein has a $1 deletion ”,

  • “before $2.\n”;

  • s/(arg)/$1$1/;# arg insertion

  • s/arg/cys/;# cys substitution

  • s/\s+//g;# get rid of all whitespaces

  • s/\s+/ /g;# single space delimiters

  • s/[^acgt]//gi;# clean up DNA sequence

  • s/[tT]/U/g;# translate to RNA sequence

  • s/_END_.*//s;# chop off after END mark


Perl 2 hongkang mei ph d march 10 2002

  • s/// different delimiters

  • just like:

  • m//

  • qw//

  • can use unpaired or paired delimiters

  • ,, “” {} [] %% ##

  • s#^https://#http://#

  • s{T}{U}

  • s<T>#U#


Perl 2 hongkang mei ph d march 10 2002

  • binding operator for s///

  • works for non-default variables

  • $dna_seq =~ s/[^acgt]/n/gis;


Perl 2 hongkang mei ph d march 10 2002

  • case shifting

  • $dna =~ s/(.+)/\U$1/;

  • $prot =~ s/(.+)/\L$1/;

  • $prot =~ s/(\w+)/\u\L$1/gi;


Perl 2 hongkang mei ph d march 10 2002

  • split

  • split /seperator/, $string;

  • @aa = split / /, $aa;

  • split /seperator/;

  • @aa = split /:/; # used $_ eg. “a:b:c:d”

  • split //;

  • @data = split //; # still $_, split each char

  • split; split /\s+/, $_;

  • @data = split;# split $_ at whitespaces


Perl 2 hongkang mei ph d march 10 2002

  • join

  • join glue, list of pieces;

  • $full_name = join ‘ ‘, $first, $middle, $last;

  • $x = join ‘y’, @empty;# empty string


Perl 2 hongkang mei ph d march 10 2002

  • Review of Perl (1)

  • More on I/O

  • Regular Expression basics

  • More on regular expression

  • Using regular expressions

  • File and directory handles


Perl 2 hongkang mei ph d march 10 2002

  • File handle

  • a name for I/O connection, not file name

  • usually named uppercases, _ and digits

  • Perl’s special file handles

  • do not name your’s with the 6 handles

  • STDIN

  • STDOUT

  • %>myprog.pl <input >output

  • STDERR

  • another stream

  • DATA ARGV ARGVOUT


Perl 2 hongkang mei ph d march 10 2002

  • opening filehandles

  • STDIN, STDOUT, STDERR automatically opened

  • open SEQ “e_coli_dna”;

  • open INPUT “<e_coli_dna”;

  • open OUT1 “>intergene_seq”;

  • open LOG “>>genome__Update_log”;

  • my $out_file = <STDIN>;

  • open OUT2 “>$out_file”;


Perl 2 hongkang mei ph d march 10 2002

  • closing a filehandle

  • release memory

  • automatic closing on reopen or exit

  • close LOG;


Perl 2 hongkang mei ph d march 10 2002

  • return value of opening filehandle

  • open returns true or false

  • reasons fail to open:

  • permission, spelling, not created (for input)

  • consequence of fail:

  • EOF (undef), no data input

  • output discarded

  • turn on -w

  • die, warn


Perl 2 hongkang mei ph d march 10 2002

  • die when having fatal error

  • $length = $a/$b or die “Can’t calculate: $!”;

  • # if $b is 0

  • open LOG, “>>log_file”

  • or die “Cannot create logfile: $!”;

  • die “Not enough arguments\n” @ARGV < 2;

  • * $! Is the system error message


Perl 2 hongkang mei ph d march 10 2002

  • warn when not fetal

  • just like die except not quitting the program

  • warn “Input sequence is too short.\n”

  • if $seq_len < 30;


Perl 2 hongkang mei ph d march 10 2002

  • using filehandles

  • while (<SEQ>) {

  • if /^AUG/

  • {

  • print INITIAL_SEQ $_;

  • print OUT1 (“>$accession\n$_\n”);

  • print LOG “The sequence is updated\n”;

  • }

  • }


Perl 2 hongkang mei ph d march 10 2002

  • file tests

  • warn “$filename is not updated.\n”

  • if -M INPUT > 14;

  • die “File named $filename already exists.\n”

  • if -e $filename;

  • if (-s $filename) {

  • print “File made successfully.\n”;

  • }


Perl 2 hongkang mei ph d march 10 2002

  • chdir

  • similar to UNIX cd

  • chdir “/fasta” or die “Can’t chdir to fasta: $!”;

  • chdir;# Perl finds your home, not using $_


Perl 2 hongkang mei ph d march 10 2002

  • glob and <something>

  • similar to UNIX ls, returns a list

  • my @all_files = glob “.* *”;

  • my @seq_files = glob “*.seq”;

  • my @dir_files = <$dir/.* $dir/*>;

  • my @files = <FASTA/*>;

  • my @lines = <FASTA>;

  • my @files = <$name/*>;

  • my @lines = readline FASTA;


Perl 2 hongkang mei ph d march 10 2002

  • directory handles

  • opendir FASTA, $dir or die “Can’t open: $!”;

  • @files = readdir FASTA;

  • closedir FASTA;

  • while ($name = readdir FASTA){

  • if ($name =~ /\.seq$/){

  • do something here……

  • }


Perl 2 hongkang mei ph d march 10 2002

  • unlink files

  • similar to UNIX rm

  • unlink “seq1”, “seq2”, “seq3”;

  • unlink glob “*.seq”;

  • rename files

  • similar to UNIX mv

  • rename “old”, “new”;

  • rename “/bin/somewhere/e_coli”, “e_coli”;


Perl 2 hongkang mei ph d march 10 2002

  • mkdir

  • similar to UNIX mkdir

  • mkdir “fasta”, 0755;

  • mkdir $name, oct($permission);

  • rmdir

  • similar to UNIX rmdir

  • rmdir $dir or warn “Can’t remove $dir: $!”;

  • rmdir glob “$dir/*”;


Perl 2 hongkang mei ph d march 10 2002

  • change permissions with chmod

  • similar to UNIX chmod

  • chmod 0640, “seq1”, “seq2”, “seq3”;

  • change ownership with chown

  • similar to UNIX chown

  • chown $user, $group, glob “*.seq”;


Perl 2 hongkang mei ph d march 10 2002

Example 1: Expression

# Take in what a user types, and turn .com web sites into .orgs, and change

# the "@" in their email address to something else

while (<STDIN>) {

if (/^quit$/i) { # Leave the program if the use types "quit"

last;

}

else {

# replace .coms in URLs and with .orgs. Only do it

# for the "first match" in the string

s/(http:\/\/[\w\d\.]+)\.com/$1\.org/i;

# replace the @ in email addresses with the ^ symbol. Do it for

# ALL occurrences in the string

s/([\w\d]+)\@([\w\d\.]+)/$1\^$2/ig;

# Print out the modified string

print;

}

}


  • Login