1 / 21

MBG 8680 Perl Workshop

MBG 8680 Perl Workshop. http://www.bioinformatics.wayne.edu Presented By: Daniel Liu ( danliu@genetics.wayne.edu ). What is Perl?. Perl is a versatile programmling language Perl has no cost… Runs on Unix, Linux, Mac, Windows and more

sakura
Download Presentation

MBG 8680 Perl Workshop

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MBG 8680 Perl Workshop http://www.bioinformatics.wayne.edu Presented By: Daniel Liu ( danliu@genetics.wayne.edu )

  2. What is Perl? • Perl is a versatile programmling language • Perl has no cost… • Runs on Unix, Linux, Mac, Windows and more • Perl binaries and source are downloadable from http://www.perl.com • “An installation demo”

  3. A very simple example #!/usr/local/bin/perl # So a unix system knows where to find perl # Example of a comment line # Scaler variables begin with $ $string = "This is a string"; $number = 42; print "What is your name?"; $name = <STDIN>; chop($name); if ($name eq "Dan") { print "Hello Dan, your secret number is $number"; } else { print "Hello $name, you don't have a secret number"; }

  4. Strings • Strings can use single or double quotes • Double quotes allow for variables to be interpolated • We can easily represent DNA as strings • $DNA = ‘ACGGGCATGCAAGCTAT’; • print $DNA; • It is easy to concatenate, translate, search, replace, and generate random strings. • “A transcription example”

  5. Numbers • Numbers can be: • Integers like 0, 4, -5 • Floating point (decimal) like 3.14 • Scientific (exponential) like 6.02E23 • Different bases like hex, octal and binary

  6. Arrays • Arrays are ordered collections of values indexed by position starting with 0 • Arrays are indicated by the ‘at sign’ @ • @myarray = (‘ACGT’, $fragment2, ‘GGCAA’); • In this case, myarray[1] = $fragment2 • Arrays can be copied • @newarray = @myarray; • Arrays elements can be counted • $elements = @myarray; • Arrays can be sorted • @sorted = sort(@myarray);

  7. Regular Expressions • Regular expressions are used for pattern matching • $DNA = ‘ACGTGTCGATCGT’; • if ($DNA =~ /GATC/) { • print $&; • } • We call “=~” the binding operator: apply right on left • Will print ‘GATC’ if the substring is found • Can also be used for filtering • # Remove white space in $string • $string =~ s/\s//g;

  8. Programming Loops • if (condition) { } else { } • if (condition) { } elsif { } else { } • while (condition) { } • for ( init; test; incr ) { } • example: for ($i=0; $i < 10; $i++) { } • foreach $i (@myarray) { }

  9. Programming Functions (Subroutines) • Best to think of as a program within a program • Sub addACGT { • $dna = @_; • $dna .= ‘ACGT’; • return $dna; • } • Variables declared with ‘my’ are localized and can not be accessed outside that function • Pass by value vs. Pass by reference

  10. Debugging • To start with debug mode, use perl –d • To list the help menu, type ‘h’ or ‘man perldebug’ • Useful commands: • Printing a variable: p $dna • Next line: n • Single Step (enters subroutines): s • Display a window of code: w • To set a breakpoint: b • To continue: c • To watch a variable: W

  11. Modules and BioPerl • Why reinvent the wheel when someone has written a module that does what you want? • http://www.cpan.org • We’ll see examples of some later… • Bioperl is a toolkit of perl modules useful in building bioinformatics solutions in perl • http://www.bioperl.org • Useful for parsing sequences, report parsing, annotations, running external programs and more!

  12. Beginning Perl for Bioinformatics Functions • On the genetics server, the BeginPerlBioinfo.pm module has been installed and made available • use BeginPerlBioinfo; • This modules includes all the functions defined in the book “Beginning Perl for Bioinformatics” by James Tisdall • You can view this file on genetics.wayne.edu • /usr/local/lib/perl5/5.6.1/BeginPerlBioinfo.pm

  13. Design Philosophy • See if someone else already solved the problem • Start with ‘pseudocode’ • Identify inputs, outputs, and design • Edit – Run – Revise (and Save) • Comment your code • When in doubt, use a debugger • Use test cases for correctness, logic and boundaries

  14. Automating Tasks (automate) • Problem • Want to automate a repetitive task • Input • Sequences in FASTA format • Output • Protein Sequences • Design • Identify input files • Open and evaluate input • If input criteria not met, skip to the next input • Otherwise translate into protein

  15. Mouse %GC Example (mouse) • Problem • Want to summarize data from a specific text format • Input • Map density table from Ensembl • Output • Summations of selected criteria (%gc) • Design • Open file and split on a delimiter • Keep a running total for %gc • When there is no more input, print the results in a human readable format

  16. Automating NCBI Example (ncbi) • Problem • Want to download a large amount of NCBI records • Input • Search term (just as you would use at entrez) • Output • Genbank, fasta etc… • Design • eutils - http://www.ncbi.nlm.nih.gov/entrez/eutils • Customize example script to fit your query • Use the ‘at’ command to schedule the job to run on off-peak hours, otherwise you may be blocked!

  17. Parsing Genbank Annotations (genbank) • Problem • Want to find which annotations contain a keyword • Input • Genbank records delimited by ‘//’ in one file • Output • List of byte offsets for the records with a hit • Design • Load search terms and annotations (per record) • For each record, see if any keywords are present • If found, print the offset of the record • If not found, process the next entry until finished

  18. A Quick BioPerl Example • Problem • Want to parse Unigene clusters into a database • Input • Unigene records delimited by ‘//’ in one file • Output • Insert statements to be used with the database • Design • Can use Bio::Cluster::UniGene and Bio::ClusterIO • A while loop can parse each record • Use BioPerl functions to parse out the data • Write out each insert statement before processing the next record

  19. More Uses • Web Automation (LWP) • This is a module that contains numerous subroutines for automating web tasks • Example include automatic processing of web forms, web spidering, and web mirroring • See ‘Perl & LWP’ by Sean Burke • Database Interaction (DBI/DBD) • These are modules used for interaction with a database such as Oracle or MySQL • See ‘Programming the Perl DBI’ by Tim Brunce • If you can imagine it, it is probably out there!

  20. More Resources • ‘Learning Perl’ by Randal L. Schwartz • ‘Perl By Example’ by Elie Quigley • ‘Perl Cookbook’ by Christiansen & Torkington • ‘Beginning Perl for Bioinformatics’ by James Tisdall • http://www.perl.com for source and binaries • http://www.perldoc.com for Perl Documentation • http://www.bioperl.org for BioPerl • http://www.cpan.org for modules • http://cs.camden.rutgers.edu/perl/ for a tutorial • http://paul.rutgers.edu/~mcgrew/perltutor/ for a tutorial

  21. Questions? • Dan Liu danliu@wayne.edu • Dan’s Top 5 Tips! • http://bioinformatics.org • http://www.bio-itworld.com/ • http://www.bio-mirror.net/ • http://workbench.sdsc.edu/ • http://gchelpdesk.ualberta.ca/news/news.php • Have a great summer!!!

More Related