220 likes | 330 Views
This comprehensive guide covers essential Perl programming concepts for bioinformatics. Learn effective debugging strategies, utilize variable interpolation, and manage input/output operations with text files. Explore decision-making with conditional statements such as if, else, and more. The course also delves into loops to automate processes. Designed for biologists and researchers, this resource enhances programming skills crucial for biological data analysis.
E N D
Perl for BioinformaticsPart 2 Stuart Brown NYU School of Medicine
Sources • Beginning Perl for Bioinformatics • James Tisdall, O’Reilly Press, 2000 • Using Perl to Facilitate Biological Analysisin Bioinformatics: A Practical Guide (2nd Ed.) • Lincoln Stein, Wiley-Interscience, 2001 • Introduction to Programming and Perl • Alan M. Durham, Computer Science Dept., Univ. of São Paulo, Brazil
Debugging • Hopefully you were lucky enough to have some bugs in your programs from the first Perl exercise. • Test each line as you write • insert extra print statements to check on variables
Perl Debugging Help • Add -w on the first line of your programs: #!usr/local/perl -w • provides ‘warnings’ • Add use strict as the 2nd line of your programs • enforces proper variable names • must initialize variables before using (set to some initialvalue such as 0 or empty)
Variable “Interpolation” • A variable holds a value $value = 6; • When you print the variable, Perl gives the value rather than the name of the variable. print $value; 6 • If you put a variable inside double quotes, Perl substitutes the value (this is called variable interpolation) print “The result is $value\n” The result is 6 • If you use single quotes, the variable name is used (interpolation is not used) print ‘The result is $value\n’ The result is $value\n
Input • A Perl program can take input from the keyboard • The angle bracket operator (<>)takes input • Usually this is assigned to a variable print“Please type a number: ”; $num =<>; print“Your number is $num\n”;
chomp • When data is entered from the keyboard, Perl waits for the Enter key to be typed • But the string which is captured includes a newline (carriage return) at its end • Perl uses the function chomp to remove the newline character: print “Enter your name: ”; $name = <>; print “Hello $name, happy to meet you!\n”; chomp $name; print “Hello $name, happy to meet you!\n”;
Working with Text Files • To do real work, Perl has to read data out of text files and write results into output files • This is done in two steps • First, you must give the file a name within the script - this is known as a filehandle • Use the open command: open FILE1, ‘/u/schmoj01/Seqs/protein1.seq’;
Read From the File • Once the file is open, you can read from it using the <> operator • (put the filehandle between the angle brackets) • Perl reads files one line at a time, each time you input data from the file, the next line is read: open FILE1, ‘/u/prot1.seq’; $line1 = <FILE1>; chomp $line1; $line2 = <FILE1>; …etc
Write to a File • Writing to a file is similar to reading from it • Use the > operator to open a file for writing: open FILE1,‘>/u/prot1.seq’; • This creates a new file with that name, or overwrites an existing file • Use >> to append text to an existing file • print to the file using the filehandle: print FILE1 $data1;
Making Decisons • Useful programs must be able to make some decisions on their own • The if operator is very powerful • It is generally used together with numerical or string comparison operators numerical: ==, !=, >, <, ≥, ≤ strings: eq, ne, gt, lt, ge, le
True/False • Perl relies on the concept of True/False decisions. • Things are true if the math works. • The not operator ! reverses it print “positive number” if! ($a < 0);
Conditional Blocks • An if test can be used to control multiple lines of commands: print “Enter your age: ”; $age = <>; chomp $age; if ($age < 21) { print “You are too young for this kind of work!\n”; die “too young”; } print “You are old enough to know better!\n”; • If the test is true, execute all the command lines inside the {} brackets. If not, then go on past the closing } to the statements below.
If evaluates some statement in parentheses (must be true or false) • Note: conditional block is indented • Perl doesn’t care about indents, but it makes your code more human readable • dieis a special function - stops your script and prints its message • Often used to test if keyboard input data is valid or if an input file exists.
Else & Elseif • Instead of just letting the script go on if it fails the if test, you can designate a second block of code for the “or else” condition • You can also perform multiple tests using elseif if $A = 10 { print “yadda yadda”; # do some stuff } elseif $A > 10 { print “yowsa yowsa”; # do different stuff } elseif $A < 10 { print “do this other stuff”; } else $A { print “if it ain\’t =, >, or <, then I’m stumped” die “not a number”; }
Loops • OK, we’ve got variables, input & output and decisions. Now we need Loops. • Loops test a condition and repeat a block of code based on the result • while loops repeat while the condition is true $count = 1; while ($count <= 10) { print “$count bottles of pop\n”; $count = $count +1; }; print “POP!\n”; [Try this program yourself]
Read a File: line by line open FILE1, ‘/u/doej01/prot1.seq’; while ($line = <FILE1>){ chomp($line); $my_sequence=$my_sequence.$line; }; close FILE1 • Dumps the whole file into the variable $my_sequence
Arrays • It is awkward to store a large DNA sequence in one variable, or to create many variables for a list of numbers • Perl has a type of variable called an “array” that can store a list of data • multiple lines of a text file • a list of numbers • a list of words • Array variables are referred to with an “@” symbol @numbers = (1,2,45,234,11);
Bioinformatics Uses Arrays • bioinformatics data often comes in the form of arrays • tab delimited lists • multi-line text files • Arrays are handy because the entries are indexed • You can grab the third number directly @numbers = (1, 2, 45, 234, 11); print “$numbers[3]\n”; 234 #Note - the index starts with zero!
Read a File into an Array • Rather than read a file one line at time into a scalar variable, it is often helpful to read the entire file into an array open FILE1, ‘/u/doej01/prot1.seq’; @DNA = <FILE1>;
join & substr • join combines the elements of an array into a single scalar variable (a string) $DNA = join('', @DNA); • substrtakes characters out of a string $letter = substr($DNA, $position, 1) spacer (empty here) which array where in the string how many letters to take which string
Exercise • Read a DNA sequence from a text file • Calculate the %GC content • What about non-DNA characters in the file? • carriage returns and blank spaces • N’s or X’s or unexpected letters • Write the output to the screen and to a file • use append so that the file will grow as you run this program on additional sequences