Programming and Perl for Bioinformatics Part I
210 likes | 334 Views
This guide serves as an introduction to Perl programming for bioinformatics, focusing on basic syntax and essential data types. Learn how to print messages, handle scalar values, and manipulate strings, including DNA sequences. Understand variable assignments, operators, comments, and data types like scalars, arrays, and hashes. Explore mathematical functions, string manipulation techniques, and conditional statements to control the flow of your Perl programs. Ideal for beginners interested in applying Perl to bioinformatics tasks.
Programming and Perl for Bioinformatics Part I
E N D
Presentation Transcript
A Taste of Perl: print a message • perltaste.pl: Greet the entire world. #!/usr/bin/perl #greet the entire world $x = 6e9; print “Hello world!\n”; print “All $x of you!\n”; - command interpretation header - a comment - variable assignment statement } - function calls (output statements)
Basic Syntax and Data Types • whitespace doesn’t matter to Perl. One can write all statements on one line • All Perl statements end in a semicolon ; just like C • Comments begin with ‘#’ and Perl ignores everything after the # until end of line. • Example: #this is a comment • Perl has three basic data types: • scalar • array (list) • associative array (hash)
Scalars • Scalar variables begin with ‘$’ followed by an identifier • Example: $this_is_a_scalar; • An identifier is composed of upper or lower case letters, numbers, and underscore '_'. Identifiers are case sensitive (like all of Perl) • $progname = “first_perl”; • $numOfStudents = 4; • = sets the content of $progname to be the string “first_perl” & $numOfStudents to be the integer 4
Scalar Values • Numerical Values • integer: 5, “3”, 0, -307 • floating point: 6.2e9, -4022.33 • hexadecimal/octal: 0xd4f, 0477 • Binary: 0b011011 NOTE: all numerical values stored as floating-point numbers (“double” precision)
Do the Math • Mathematical functions work pretty much as you would expect: 4+7 6*4 43-27 256/12 2/(3-5) • Example #!/usr/bin/perl print "4+5\n"; print 4+5 , "\n"; print "4+5=" , 4+5 , "\n"; $myNumber = 88; • Note: use commas to separate multiple items in a print statement 4+5 9 4+5=9 What will be the output?
Scalar Values • String values • Example: $day = "Monday "; print "Happy Monday!\n"; print "Happy $day!\n"; print 'Happy Monday!\n'; print 'Happy $day!\n'; • Double-quoted: interpolates (replaces variable name/control character with it’s value) • Single-quoted: no interpolation done (as-is) Happy Monday!<newline> Happy Monday!<newline> Happy Monday!\n Happy $day!\n What will be the output?
2 Length of the substring 0 String Manipulation Concatenation $dna1 = “ACTGCGTAGC”; $dna2 = “CTTGCTAT”; • juxtapose in a string assignment or print statement $new_dna = “$dna1$dna2”; • Use the concatenation operator ‘.’ $new_dna = $dna1 . $dna2; Substring $dna = “ACTGCGTAGC”; $exon1 = substr($dna,2,5); # TGCGT
Substitution DNA transcription: T U Substitution operator s/// : $dna = “GATTACATACACTGTTCA”; $rna = $dna; $rna =~s/T/U/g; #“GAUUACAUACACUGUUCA” =~ is a binding operator indicating to exam the contents of $rna for a match pattern Ex: Start with $dna =“gaTtACataCACTgttca”; and do the same as above. What will be the output?
Example • transcribe.pl: $dna ="gaTtACataCACTgttca"; $rna = $dna; $rna =~ s/T/U/g; print "DNA: $dna\n"; print "RNA: $rna\n"; • Does it do what you expect? If not, why not? • Patterns in substitution are case-sensitive! What can we do? • Convert all letters to upper/lower case (preferred when possible) • If we want to retain mixed case, use transliteration/translation operatortr/// $rna =~ tr/tT/uU/; #replace all t by u, all T by U
Case conversion $string = “acCGtGcaTGc”; Upper case: $dna = uc($string); # “ACCGTGCATGC” or$dna = uc $string; or$dna = “\U$string”; Lower case: $dna = lc($string); # “accgtgcatgc” or$dna = “\L$string”; Sentence case: $dna = ucfirst($string) # “Accgtgcatgc” or$dna = “\u\L$string”;
Reverse Complement 5’-A C G T C T A G C . . . . G C A T-3’ 3’-T G C A G A T C G . . . . C G T A-5’ • Reverse: reverses a string $string = "ACGTCTAGC"; $string = reverse($string);"CGATCTGCA“ • Complementation: use transliteration operator $string =~ tr/ACGT/TGCA/;
optional More on String Manipulation String length: length($dna) Index: #index STR,SUBSTR,POSITION index($strand, $primer, 2)
Flow Control Conditional Statements • parts of code executed depending on truth value of a logical statement “truth” (logical) values in Perl: false = {0, 0.0, 0e0, “”, undef}, default “” true = anything else, default 1 ($a, $b) = (75, 83); if ( $a < $b ) { $a = $b; print “Now a = b!\n”; } if ( $a > $b ) { print “Yes, a > b!\n” }# Compact
if/else/elsif • allows for multiple branching/outcomes $a = rand(); if ( $a <0.25 ) { print “A”; } elsif ($a <0.50 ) { print “C”; } elsif ( $a < 0.75 ) { print “G”; } else { print “T”; }
Conditional Loops while ( statement ) { commands … } • repeats commands until statement is no longer true do { commands } while ( statement ); • same as while, except commands executed as least once • NOTE the ‘;’ after the while statement!! Short-circuiting commands: next and last • next; #jumps to end, do next iteration • last; #jumps out of the loop completely
while Example: while ($alive) { if ($needs_nutrients) { print “Cell needs nutrients\n”; } } Any problem?
for and foreach loops • Execute a code loop a specified number of times, or for a specified list of values • for and foreach are identical: use whichever you want Incremental loop (“C style”): for ( $i=0 ; $i < 50 ; $i++ ) { $x = $i*$i; print "$i squared is $x.\n"; } Loop over list (“foreach” loop): foreach $name ( "Billy", "Bob", "Edwina" ) { print "$name is my friend.\n"; }
Basic Data Types • Perl has three basic data types: • scalar • array (list) • associative array (hash)