1 / 26

Perl II

Perl II. Part III: Motifs and Loops. Objectives. Search for motifs in DNA or Proteins Interact with users at the keyboard Write data to files Use loops Use basic regular expressions Responding to conditional tests Examining sequence data in detail. Conditional Tests.

arin
Download Presentation

Perl II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Perl II Part III: Motifs and Loops

  2. Objectives • Search for motifs in DNA or Proteins • Interact with users at the keyboard • Write data to files • Use loops • Use basic regular expressions • Responding to conditional tests • Examining sequence data in detail

  3. Conditional Tests • if (1 == 1) { print “1 equals 1\n”; } • if (1) { print “What does this evaluate to?\n”; } • if (1 == 0) { print “1 equals 0\n”; } • if (0) { print “1 evaluates to true\n”; }

  4. Conditional if/else if (1 == 1) { print “1 equals 1\n\n”; } else { print “1 does not equal 1\n\n”; } Conditionals also use: ==, !=, >=, <=, >, <, <> For text: “” and ‘’ evaluate to true unless (1 == 0) { print “1 does not equal 0\n\n”; } else { print “1 does 0?\n\n”; }

  5. More conditionals … #!usr/bin/perl –w #if-elseif-else $word = “MNIDDKL”; if ($word eq ‘QSTLV’) { print “QSTLV\n”; } elseif ($word eq ‘MSRQQNKISDH’) { print “MSRQQNKISDH\n”; } else { print “What is \”$word\”?\n”; exit;

  6. Using Loops to Open and Read Files #!/usr/bin/perl –w $proteinFilename = “NM_012345.pep”; #open the file and catch the error unless (open(MUPPETFILE, $proteinFilename) ) { print “Could not open file $proteinFilename!\n”; exit; } #read data using a while loop, and print while ($protein = <MUPPETFILE>) { print “##### Here is the next line of the file:\t”; print $protein,”\n”; } close MUPPETFILE; exit;

  7. Motif finding –www.expasy.ch/prosite/ • Something genuinely useful • Program Flow – • Reads in protein sequence from file • Puts all sequence data into one string for easy searching • Looks for motifs that the user types into the keyboard

  8. #!/usr/bin/perl –w #searching for motifs #Ask the user for the filename of the data file print “Please type the filename of the data file: “; $proteinFilename = <STDIN>; chomp $proteinFilename; This operator will read data in until it reached the special $/ character, which is set to default as \n #Open the file or exit open (PROTEINFILE, $proteinFilename) or die (“Error: $!”); #Read file into an array and close @protein = <PROTEINFILE>; close PROTEINFILE; • Reading: ”<filename” • Writing: “>filename”, discard current contents if it already exists • Append: “>>filename”, open or create file for writing at end of file • Update: “+<filename”, open a file for update (reading and writing) • New Update: “+>filename”, create file for update is non-existent #Put data into a single string to make it easier to search $protein = join(‘’, @protein); $protein =~ s/[\s\t\f\r\n ]//g;

  9. #Ask the user for a motif, search for it, and report #if it was found. Exit if no motif was entered. do { print “Enter a motif to search for: “; $motif = <STDIN>; chomp $motif; if ($protein =~ m/$motif/) { print “I found it!\n\n”; } else { print “I couldn’t find it!\n\n”; } #exit on user prompt } until ($motif =~ /^\s*$/); exit;

  10. Regular Expressions • Very powerful methods for matching wildcards to strings • Very cryptic • Perl reads =~ /n/ as =~ m/n/ • The delimiter is flexible, it acccepts any nonalphanumeric nonwhitespace character (eg. #({[,.’)

  11. Metasymbols

  12. Look-behind assertion • (?<=value1)value2 • $string = “English goodly spoken here”; • $string =~ s/(?<=English )goodly/well/; • (?=value1)value2 : look ahead • (!=value1)value2 : not look ahead • (!<=value1)value2 : not look behind

  13. Backreferences • Pattern == “2y 4 2 22y2y” • $string =~ /(\d\w)\s+(\d)\s+(\d)\s\3\1\1/; • backreferencing works within brackets from left to right

  14. #!/usr/bin/perl –w #determining the frequency of nucleotides #Ask the user for the filename of the data file print “Please type the filename of the data file: “; $dnaFilename = <STDIN>; ? $dnaFilename; #Open the file or exit open (DNA, $dnaFilename) or die (“Error: ?”); #Read file into an array and close @dna = <DNA>; close DNA; #Put data into a single string to make it easier to search $dna = join(‘’, @dna); $dna =~ s/[\s\t\f\r\n ]//g;

  15. #Explode the $dna string into an array where it will be #easier to iterate through them and count their numbers @dna = split(‘’,$dna); #Initialize the counts $A_Number = 0; $C_Number = 0; $G_Number = 0; $T_Number = 0; $Errors = 0;

  16. #Loop through the bases, examine each to determine what #each nucleotide is and increment the appropriate number foreach $base (@dna) { if ($base eq ‘A’) ++$A_Number; elseif ($base eq ‘C’) ++$C_Number; elseif ($base eq ‘G’) ++$G_Number; elseif ($base eq ‘T’) ++$T_Number; else { print “Error: I don’t recognize the base\n”; ++$Errors; } } print “Base\tNumber\nA=\t$A_Number\nB=\t$B_Number\n”; print “C=\t$C_Number\nG=\t$G_Number\n\n”;

  17. foreach $base (@dna) { if ($base eq ‘A’) ++$A_Number; elseif ($base eq ‘C’) ++$C_Number; elseif ($base eq ‘G’) ++$G_Number; elseif ($base eq ‘T’) ++$T_Number; else { print “Error: I don’t recognize the base\n”; ++$Errors; } } foreach (@dna) { if (/A/) ++$A_Number; elseif (/C/) ++$C_Number; elseif (/G/) ++$G_Number; elseif (/T/) ++$T_Number; else { Print “Error when reading base\n”; ++$Errors; } }

  18. Tricky little ifs if ($string =~ /\d{3,4}/) print “the string is 3 to four characters long\n”; = print “the string is 3 to four characters long\n” if ($string =~ /\d{3,4}/);

  19. Let’s do the same thing but save on some memory by not creating an array #!/usr/bin/perl –w #determining the frequency of nucleotides #Ask the user for the filename of the data file print “Please type the filename of the data file: “; $dnaFilename = <STDIN>; chomp $dnaFilename; #See if the file exists then open it unless( -e $dnaFilename) { print “\”$dnaFilename\” does not exist”; exit; } open (DNA, $dnaFilename) or die (“File Error”); @dna = <DNA>; close DNA; #Put data into a single string to make it easier to search $dna = join(‘’, @dna); $dna =~ s/[\s\t\f\r\n ]//g;

  20. #Initialize the counts $A_Number = 0; $C_Number = 0; $G_Number = 0; $T_Number = 0; $Errors = 0;

  21. #Loop through the bases, examine each to determine what #each nucleotide is and increment the appropriate number for ($position=0; $position<length $dna; ++$position) { $base = substr($dna, $position, 1); $_ if ($base eq ‘A’) ++$A_Number; elseif ($base eq ‘C’) ++$C_Number; elseif ($base eq ‘G’) ++$G_Number; elseif ($base eq ‘T’) ++$T_Number; else { print “Error: I don’t recognize the base\n”; ++$Errors; } } while($base =~ /a/ig){$a++} while($base =~ /c/ig){$c++} while($base =~ /g/ig){$g++} while($base =~ /t/ig){$t++} while($base !~ /[acgt]/ig){$e++} print “Base\tNumber\nA=\t$A_Number\nB=\t$B_Number\n”; print “C=\t$C_Number\nG=\t$G_Number\n\n”;

  22. Writing to files #All text data can be written to files $outputfile = “results.txt”; open(RESULTS, “>$ouputfile”) or die (“Error: $!”); print RESULTS “These results are overwriting everything that existed in the file results.txt\n”; Close RESULTS;

  23. Command line arguments and subroutines #!/usr/bin/perl –w use strict; #Arguments collected on the command line go into a special var # called @ARGV and the program name resides in the var $0 my($title) = “$0 DNA\n\n”; unless(@ARGV) { print $title; exit; } my($input) = @ARGV[0]; print $input,”\n\n”; exit;

  24. Command line arguments and subroutines #!/usr/bin/perl –w use strict; #Arguments collected on the command line go into a special var # called @ARGV and the program name resides in the var $0 my($title) = “$0DNA\n\n”; unless(@ARGV) { print $title; exit; } my($input) = @ARGV[0]; my($subRoutineResults) = Find_Length($input); print “the length of your input is $subRoutineResults\n”; exit; sub Find_Length { my($tmp) = @_; $results = length($tmp); $return $results; }

  25. Passing by value vs reference • Simple routines pass everything by value • However, because of the subroutine array, @_, values of arrays, hashes and scalers get flattened. • Ex. • my @i = (1..10); • my @j = (1..23); • reference_sub(@i, @j); • sub { • my (@i, @j) = @_; • print @i.”\n@j”; • }

  26. my @i = (1..10); • my @j = (1..23); • reference_sub(\@i, \@j); • #returned arrays can be referenced by @ but are global • print “@i\n”; • sub { • my ($i, $j) = @_; • print $$j[2]; • push(@$i, ‘4’); • }

More Related