1 / 36

Topics

Topics. Logical expression string functions: substr and index random numbers and mutation hashes Transcription, translation, genetic code Quiz 2. Any expression in Perl can be interpreted as a logical value true or false Scalar context: false if 0, "", or undefined otherwise true

favian
Download Presentation

Topics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Topics • Logical expression • string functions: substr and index • random numbers and mutation • hashes • Transcription, translation, genetic code • Quiz 2 BINF634 FALL13 - LECTURE 4

  2. Any expression in Perl can be interpreted as a logical value true or false Scalar context: false if 0, "", or undefined otherwise true List context: false if () or undefined true otherwise my $x; # or: my $x = 0; if ($x) { $x++ } else { $x = 17 } print "$x\n"; 17 my $x = 2; while ($x) { print $x--, "\n"; } 2 1 my @a = ("A", "T"); while (@a) { print shift @a, "\n"; } A T Logical Value of Expressions BINF634 FALL13 - LECTURE 4

  3. Logical Operators Logical operator "short-circuit": only evaluate second argument if necessary Example: open(GRADES, "grades") or die "Can't open file grades\n"; See Wall (p 109-110) for discussion of C-style &&, || and ! operators BINF634 FALL13 - LECTURE 4

  4. String Functions: substr substr EXPR, OFFSET, LENGTH, REPLACEMENT Return substring of string EXPR at position OFFSET and length LENGTH $s = "hello, world!"; $x = substr $s, 1, 1; # $x = "e" $x = substr $s, 0, 3; # $x = "hel" $x = substr $s, 7; # $x = "world!" # The substring is replaced by REPLACEMENT if used: substr($s,0,5,"goodbye"); # $s = "goodbye, world!"; # This does the same thing: $s = "hello, world!"; substr($s,0,5) = "goodbye"; Note: Predefined Perl functions may be used with or without parentheses around their arguments BINF634 FALL13 - LECTURE 4

  5. # From example7-4.pl # matching_percentage # # Subroutine to calculate the percentage of identical bases in two # equal length DNA sequences sub matching_percentage { my($string1, $string2) = @_; # we assume that the strings have the same length my $length = length($string1); my $count = 0; for (my $position=0; $position < $length ; ++$position) { if(substr($string1,$position,1) eq substr($string2,$position,1)) { ++$count; } } return $count / $length; } BINF634 FALL13 - LECTURE 4

  6. #!/usr/bin/perl use strict; use warnings; my $dna = "AAAAATTTTTGGGGGTTTTT"; print_sequence(dna,8); exit; # A subroutine to format and print sequence data sub print_sequence { my($sequence, $length) = @_; # Print sequence in lines of $length for (my $pos = 0 ; $pos < length($sequence); $pos += $length ) { print substr($sequence, $pos, $length), "\n"; } } AAAAATTT TTGGGGGC CCCC BINF634 FALL13 - LECTURE 4

  7. String Functions: index index STR, SUBSTR, OFFSET Return the position of the first occurrence of SUBSTR in STR; If OFFSET is given, skip this many letters before looking Returns -1 if SUBSTR not found $dna = "GATGCCATGAAATGC"; $pos = index $dna, "ATG"; print "ATG found at position $pos\n"; # answer: 1 pos = -1; while (($pos = index($dna, "ATG", $pos)) > -1) { print "ATG found at position $pos\n"; $pos++; } OUTPUT: ATG found at position 1 ATG found at position 6 ATG found at position 11 BINF634 FALL13 - LECTURE 4

  8. Random Number Generation $r = rand; # $r is random between 0 and 1 (0<=$r< 1.0) $r = rand(100); # random between 0 and 100 $d = int rand(101); # random integer from 0 to 100 srand($seed); # seeds the random number generator If a program doesn't call srand(), then it generates different random numbers each time it is run. If program does call srand($seed), we get the same sequence of random number each time the program is run with the same value for $seed. BINF634 FALL13 - LECTURE 4

  9. An Example to Generate Random Stories - I #!/usr/bin/perl -w # Example 7-1 Children's game, demonstrating primitive artificial intelligence, # using a random number generator to randomly select parts of sentences. use strict; use warnings; # Declare the variables my $count; my $input; my $number; my $sentence; my $story; # Here are the arrays of parts of sentences: my @nouns = ( 'Dad', 'TV', 'Mom', 'Groucho', 'Rebecca', 'Harpo', 'Robin Hood', 'Joe and Moe', ); BINF634 FALL13 - LECTURE 4

  10. my @verbs = ( 'ran to', 'giggled with', 'put hot sauce into the orange juice of', 'exploded', 'dissolved', 'sang stupid songs with', 'jumped with', ); my @prepositions = ( 'at the store', 'over the rainbow', 'just for the fun of it', 'at the beach', 'before dinner', 'in New York City', 'in a dream', 'around the world', ); # Seed the random number generator. # time|$$ combines the current time with the current process id # in a somewhat weak attempt to come up with a random seed. srand(time|$$); # This do-until loop composes six-sentence "stories". # until the user types "quit". do { # (Re)set $story to the empty string each time through the loop $story = ''; # Make 6 sentences per story. for ($count = 0; $count < 6; $count++) { An Example to Generate Random Stories - II BINF634 FALL13 - LECTURE 4

  11. # Notes on the following statements: # 1) scalar @array gives the number of elements in the array. # 2) rand returns a random number greater than 0 and # less than scalar(@array). # 3) int removes the fractional part of a number. # 4) . joins two strings together. $sentence = $nouns[int(rand(scalar @nouns))] . " " . $verbs[int(rand(scalar @verbs))] . " " . $nouns[int(rand(scalar @nouns))] . " " . $prepositions[int(rand(scalar @prepositions))] . '. '; $story .= $sentence; } # Print the story. print "\n",$story,"\n"; # Get user input. print "\nType \"quit\" to quit, or press Enter to continue: "; $input = <STDIN>; # Exit loop at user's request } until($input =~ /^\s*q/i); exit; An Example to Generate Random Stories - III BINF634 FALL13 - LECTURE 4

  12. Randomly Selecting and Element of an Array • $verbs[int(rand(scalar @verbs)))] • verbs[int(rand(7))] #Why?? • What does rand(7) do? • How about int(rand(7))? BINF634 FALL13 - LECTURE 4

  13. An Alternative Way to Randomly Select the Elements in an Array • $verbs[int rand scalar @verbs] • This will actually work • $verbs[rand @verbs] #How does this work? BINF634 FALL13 - LECTURE 4

  14. # From example7-2.pl # A subroutine to perform a mutation in a string of DNA # sub mutate { my($dna) = @_; my(@nucleotides) = ('A', 'C', 'G', 'T'); # Pick a random position in the DNA my($position) = randomposition($dna); # Pick a random nucleotide my($newbase) = randomnucleotide(@nucleotides); # Insert the random nucleotide into the random position in the DNA substr($dna,$position,1,$newbase); return $dna; } BINF634 FALL13 - LECTURE 4

  15. # A subroutine to randomly select a position in a string. sub randomposition { my($string) = @_; # rand returns a decimal number between 0 and its argument. # int returns the integer portion of a decimal number. # # The whole expression returns a random number between 0 and # length-1 return int rand length $string; } # Select at random one of the four nucleotides sub randomnucleotide { my(@nucleotides) = ('A', 'C', 'G', 'T'); return randomelement(@nucleotides); } # randomly select an element from an array sub randomelement { my(@array) = @_; # return $array[int(rand(scalar @array)]; return $array[rand @array]; } BINF634 FALL13 - LECTURE 4

  16. # Subroutine to perform a mutation in a string of DNA-version 2, in # which it is guaranteed that one base will change on each call. sub mutate_better { my($dna) = @_; my(@nucleotides) = ('A', 'C', 'G', 'T'); # Pick a random position in the DNA my($position) = randomposition($dna); # Pick a random nucleotide my($newbase); do { $newbase = randomnucleotide(@nucleotides); # Make sure it's different than the nucleotide we're mutating }until ( $newbase ne substr($dna, $position,1) ); # Insert the random nucleotide into the random position in the DNA substr($dna,$position,1,$newbase); return $dna; } BINF634 FALL13 - LECTURE 4

  17. Hashes(Associative Arrays) • A Hash is a collection of zero or more pairs of scalar values, called keys and values • Hash variable names begin with a percent sign (%) %genes = ( "gene1", "ATTCGT", "gene2", "CTGCCATGA"); • The values are indexed by the keys • Given a key, the hash returns the corresponding value $seq = $genes{"gene2"}; # $seq = "CTGCCATGA" • Note that $genes{"gene2"} is a scalar, so it starts with $ BINF634 FALL13 - LECTURE 4

  18. Hashes • Hashes can be assigned values use key=>value notation: %genes = ( "gene1", "ATTCGT", "gene2", "CTGCCATGA"); %genes = ( gene1=>"ATTCGT", gene2=>"CTGCCATGA"); • Hash elements can be created/altered by assignment statements: $genes{"gene1"} = "ATTCGT"; $genes{gene2} = "CTGCCATGA"; # note: no quotes in key BINF634 FALL13 - LECTURE 4

  19. Hashes (Associative Arrays) %genomes = ( ); # creates an empty hash # two ways to do the same thing: %genomes = ( "virus", 31, "bacteria", 89, "plants", 5 ); %genomes = ( virus => 31, bacteria => 89, plants => 5 ); $genomes{mammals} = 2; # adds a new pair to the hash @genome_list = keys %genomes; # @genome list is now ("plants" , "mammals", "bacteria", "virus") @genome_counts = values %genomes; # @genome_counts is now (5, 2, 89, 31) # keys and values are not guaranteed to return the data is same order # as it was entered, but they are guaranteed to return the data in the # same order as each other. BINF634 FALL13 - LECTURE 4

  20. Hashes The keys function returns a list of all keys in a hash (in some random order) %genes = (gene2=>"CTGCCATGA", gene1=>"ATTCGT"); @key_list = keys(%genes); print "@key_list\n"; # prints: gene1 gene2 # often used to loop through a hash: foreach $key (@key_list) { print "The value of $key is $genes{$key}\n"; } Output: The value of gene1 is ATTCGT The value of gene2 is CTGCCATGA BINF634 FALL13 - LECTURE 4

  21. Hashes # exists($H{$key}) returns TRUE if $key occurs in hash %H # print all the words in a file with their counts my ($file) = @ARGV; open FH, $file; my @lines = <FH>; close FH; my %count = (); for my $line (@lines) { for my $word (split " ", $line) { if (not exists $count{$word}) { # initialize the count for a new word $count{$word} = 1; } else { # update the count for an existing word $count{$word}++; } } } for my $key (sort keys %count) { print "$key $count{$key}\n"; } What does each line in this program do? BINF634 FALL13 - LECTURE 4

  22. Hashes % cat testfile2 a text file with lots of words some words occur once and some words occur more than once % wordcount.pl testfile2 a 1 and 1 file 1 lots 1 more 1 occur 2 of 1 once 2 some 2 text 1 than 1 with 1 words 3 BINF634 FALL13 - LECTURE 4

  23. Hashes (Associative Arrays) • The each function returns a two-element list containing one key from the hash and its associated value • Subsequent calls to each will return another pair, until all pairs have been returned (at which point an empty array is returned) while ( ($genome, $count) = each %genomes ) ) { print “$genome $count\n”; } OUTPUT: (possibly not in this order) plants 5 virus 31 bacteria 89 mammals 2 BINF634 FALL13 - LECTURE 4

  24. More on Hashes (Associative Arrays) • Assigning the return value from values or keys to a scalar gives the number of pairs in a hash: $genome_count = keys %genomes; # $genome_count is now 4 $genome_count = values %genomes; # $genome_count is now 4 • The delete function removes a pair from a hash delete $genomes{bacteria}; $genome_count = keys %genomes; # $genome_count is now 3 BINF634 FALL13 - LECTURE 4

  25. Transcription and Translation • DNA is transcribed to mRNA, mRNA is translated to protein • Start with double stranded DNA ATTCGAGCATGACATCATCGGTA (sense strand) TAAGCTCGTACTGTAGTAGCCAT (complement strand) • DNA double helix separates, allowing polymerase to transcribe one strand: ATTCGAGCATGACATCATCGGTA (sense strand) AUUCGAGCAUGACAUCAUCGGUA (mRNA) | | | | | | | TAAGCTCGTACTGTAGTAGCCAT (complement strand) • mRNA codons translated into protein: AUU CGA GCA UGA CAU CAU CGG … (mRNA) BINF634 FALL13 - LECTURE 4

  26. Codons BINF634 FALL13 - LECTURE 4

  27. Reading Frames • A genomic sequence has 6 reading frames, corresponding to the six possible ways of translating the sequence into three-letter codons. • Frame 1 treats each group of three bases as a codon, starting from the first base • Frame 2 starts at the second base • Frame 3 starts at the third base • Start with the sense strand of DNA: ATTCGAGCATGACATCATCGGTA • Reading frame 1: ATT CGA GCA TGA CAT CAT CGG TA • Reading Frame 2: TTC GAG CAT GAC ATC ATC GGT A • Reading Frame 3: TCG AGC ATG ACA TCA TCG GTA • Each reading frame can be translated into a different protein sequence • Frames 4, 5 and 6 are defined in a similar way, but refer to the opposite strand, which is the reverse complement of the first strand. BINF634 FALL13 - LECTURE 4

  28. Reading Frames • Perl code to process first three reading frames: $dna = ...; for (my $r = 1; $r <= 3; $r++) { my $frame = substr($dna, ?, ?); # fill in the blanks! # process reading frame $r here } # exercise: write a subroutine to return the nth reading frame for a DNA string BINF634 FALL13 - LECTURE 4

  29. BINF634 FALL13 - LECTURE 4

  30. Using the Genetic Code We’ll look at four versions of translating DNA using the genetic code: • Look up the codon using if-then-else • Same as above, but use patterns to reflect redundancy of genetic code • Use a hash to look up each codon • Same as above, but more efficiently • See BeginPerlBioinfo.pm BINF634 FALL13 - LECTURE 4

  31. Version 1 # codon2aa # A subroutine to translate a DNA 3-character codon to an amino acid sub codon2aa_v1 { my($codon) = @_; if ( $codon =~ /TCA/i ) { return "S" } # Serine elsif ( $codon =~ /TCC/i ) { return "S" } # Serine elsif ( $codon =~ /TCG/i ) { return "S" } # Serine elsif ( $codon =~ /TCT/i ) { return "S" } # Serine elsif ( $codon =~ /TTC/i ) { return "F" } # Phenylalanine elsif ( $codon =~ /TTT/i ) { return "F" } # Phenylalanine elsif ( $codon =~ /TTA/i ) { return "L" } # Leucine elsif ( $codon =~ /TTG/i ) { return "L" } # Leucine elsif ( $codon =~ /TAC/i ) { return "Y" } # Tyrosine elsif ( $codon =~ /TAT/i ) { return "Y" } # Tyrosine elsif ( $codon =~ /TAA/i ) { return "_" } # Stop elsif ( $codon =~ /TAG/i ) { return "_" } # Stop elsif ( $codon =~ /TGC/i ) { return "C" } # Cysteine elsif ( $codon =~ /TGT/i ) { return "C" } # Cysteine elsif ( $codon =~ /TGA/i ) { return "_" } # Stop elsif ( $codon =~ /TGG/i ) { return "W" } # Tryptophan . . elsif ( $codon =~ /GGT/i ) { return "G" } # Glycine else { print STDERR "Bad codon \"$codon\"!!\n"; exit; } } Remember in relating to the translation table on slide 29 that the T’s become U’s BINF634 FALL13 - LECTURE 4 Problem: takes longer to look up codons near end of list.

  32. Version 2 # codon2aa # A subroutine to translate a DNA 3-character codon to an amino acid sub codon2aa_v2 { my($codon) = @_; if ( $codon =~ /GC./i) { return "A" } # Alanine elsif ( $codon =~ /TG[TC]/i) { return "C" } # Cysteine elsif ( $codon =~ /GA[TC]/i) { return "D" } # Aspartic Acid elsif ( $codon =~ /GA[AG]/i) { return "E" } # Glutamic Acid elsif ( $codon =~ /TT[TC]/i) { return "F" } # Phenylalanine elsif ( $codon =~ /GG./i) { return "G" } # Glycine elsif ( $codon =~ /CA[TC]/i) { return "H" } # Histidine elsif ( $codon =~ /AT[TCA]/i) { return "I" } # Isoleucine elsif ( $codon =~ /AA[AG]/i) { return "K" } # Lysine elsif ( $codon =~ /TT[AG]|CT./i) { return "L" } # Leucine elsif ( $codon =~ /ATG/i) { return "M" } # Methionine elsif ( $codon =~ /AA[TC]/i) { return "N" } # Asparagine elsif ( $codon =~ /CC./i) { return "P" } # Proline elsif ( $codon =~ /CA[AG]/i) { return "Q" } # Glutamine elsif ( $codon =~ /CG.|AG[AG]/i) { return "R" } # Arginine elsif ( $codon =~ /TC.|AG[TC]/i) { return "S" } # Serine elsif ( $codon =~ /AC./i) { return "T" } # Threonine elsif ( $codon =~ /GT./i) { return "V" } # Valine elsif ( $codon =~ /TGG/i) { return "W" } # Tryptophan elsif ( $codon =~ /TA[TC]/i) { return "Y" } # Tyrosine elsif ( $codon =~ /TA[AG]|TGA/i) { return "_" } # Stop else { print STDERR "Bad codon \"$codon\"!!\n"; exit; } } BINF634 FALL13 - LECTURE 4 Still takes longer to look up codons near end of list.

  33. # codon2aa # Translate codon using hash lookup (pp 160-162 and BeginPerlBioinfo.pm) sub codon2aa_v3 { my($codon) = @_; $codon = uc $codon; my(%genetic_code) = ( "TCA" => "S", # Serine "TCC" => "S", # Serine "TCG" => "S", # Serine "TCT" => "S", # Serine "TTC" => "F", # Phenylalanine "TTT" => "F", # Phenylalanine "TTA" => "L", # Leucine "TTG" => "L", # Leucine "TAC" => "Y", # Tyrosine "TAT" => "Y", # Tyrosine "TAA" => "_", # Stop "TAG" => "_", # Stop . . . "GGT" => "G", # Glycine ); if(exists $genetic_code{$codon}) { return $genetic_code{$codon}; } else{ die "Bad codon: $codon";} } Version 3 Efficiency Problem: the hash is redefined for every codon lookup. Robustness Problem: should the program die if the codon is unknown? BINF634 FALL13 - LECTURE 4

  34. More Efficient Solution • Define the genetic code hash just once (outside subroutine) • Use subroutine to perform lookup my(%genetic_code) = ( "TCA" => "S", # Serine "TCC" => "S", # Serine "TCG" => "S", # Serine "TCT" => "S", # Serine "TTC" => "F", # Phenylalanine "TTT" => "F", # Phenylalanine "TTA" => "L", # Leucine "TTG" => "L", # Leucine "TAC" => "Y", # Tyrosine "TAT" => "Y", # Tyrosine "TAA" => "_", # Stop "TAG" => "_", # Stop . . . "GGA" => "G", # Glycine "GGC" => "G", # Glycine "GGG" => "G", # Glycine "GGT" => "G", # Glycine ); BINF634 FALL13 - LECTURE 4

  35. # codon2aa # Translate a DNA 3-character codon to an amino acid # using hash lookup passed as argument sub codon2aa { my($codon) = @_; $codon = uc $codon; if (exists $genetic_code{$codon}) { return $genetic_code{$codon}; } else { return "X"; # alternative error response: # die "Bad codon: $codon"; } } Version 4 The exists function return true if an element with the given key occurs in the hash. BINF634 FALL13 - LECTURE 4

  36. Wrap up • Program #1 was due tonight. Please talk to me privately either via email or in person if you did not turn it in. • Program #2 is posted on course website • hint: Read about BeginPerlBioinfo.pm in chapter 8 • Read the code in BeginPerlBioinfo.pm • Start NOW! • Read: • Tisdall Ch. 9 • Wall Ch. 5 BINF634 FALL13 - LECTURE 4

More Related