1 / 26

Perl Programming for Biologists PART 2: Tue Aug 28 th 2007

Perl Programming for Biologists PART 2: Tue Aug 28 th 2007. Yannick Pouliot, PhD Bioresearch Informationist Lane Medical Library & Knowledge Management Center. To Dos. Close all programs other than IE on your laptop Log into virtual room YP: log into Safari. To Do - 2.

Download Presentation

Perl Programming for Biologists PART 2: Tue Aug 28 th 2007

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Perl Programming for BiologistsPART 2: Tue Aug 28th 2007 Yannick Pouliot, PhD Bioresearch Informationist Lane Medical Library & Knowledge Management Center

  2. To Dos • Close all programs other than IE on your laptop • Log into virtual room • YP: log into Safari 2

  3. To Do - 2 • Please download all class materials from http://lane.stanford.edu/howto/index.html?id=_2593 3

  4. Class Focus for Session #2 • Converting file contents • Introducing BioPerl • Perl and relational databases And remember: Ask LOTS OF QUESTIONS 4

  5. Cautions - Reminder • All examples pertain to MS Office 2003 • Unclear what is to be expected for MS Office 2007 • All contents pertain to Perl 5.x, not 6.x • V.5 and 6 are NOT compatible • V.5 is far far more common, so not much of an issue 5

  6. Questions from last session? 6

  7. Part 1: Converting file contents 7

  8. Converting Data Stored in Flatfiles • Input: ExampleOutputExcel3.csv • File generated last week by Excel3.pl • Let’s look and run Convert1.pl →Convert5.pl 8

  9. Part 2: BioPerl 9

  10. BioPerl: Overview • BioPerl = >1,000 modules divided into 7 packages • Not all in 1.4 • 1.4 = stable release 10

  11. Other, Non-BioPerl Modules 11

  12. BioPerl: You Have A Friend In High Places The big deal: • BioPerl provides “objects” for various types of sequence data and their associated features and annotations. • These objects provide interfaces for analysis of these sequences with a wide variety of • external programs (BLAST, FASTA, clustalw and EMBOSS to name just a few). • various types of databases for storage and retrieval of sequences • remote (GenBank, EMBL etc) • local (MySQL, Flat_databases flat files, GFF etc.). 12

  13. So What Is This Object Business? 13

  14. What A Biology-Related Program Looks Like When Coded According To The Object Paradigm t: Gene t: Protein t: DNA t: Organism t: Species t: RNA t: LivingObject t: Sequence 14

  15. Derive an object from an existing object Create an object (“new”) Objects Inherit From A Class Or Prior Object Sequence RNA Protein Class = prototype for all objects of this type Object 1 (ancestor) Object2 DNA 15

  16. An example: Class inheritance for shape concepts 16

  17. Key BioPerl Links • BioPerl 1.4 installed as part of Perl 5.8.8.822 (what you downloaded) • BioPerl home: http://www.bioperl.org/wiki/Main_Page • http://www.bioperl.org/wiki/Getting_Started • Lots of examples 17

  18. BioPerl Example: Querying GenBank To Retrieve Sequence Properties • Seq7.pl • Seq8.pl • Seq9.pl → after exercise (next slide) • Seq11.pl → after exercise (next slide) • Related docs: • GenBank search: http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/DB/GenBank.html • SeqIO: http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/genbank.htmlSeqIO • See also http://www.bioperl.org/wiki/HOWTO:SeqIO • And most importantly: http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Seq.html 18

  19. Exercise: Print An Additional Sequence Feature • Add an additional sequence feature to Seq8.pl • What to print: see Methods for Seq object at http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Seq.html 19

  20. Quiz Questions based on Seq11.pl use warnings; use strict; use Bio::DB::GenBank; # --------------------------------------------------------------------------- # main $| = 1; # Force unbuffered STDOUT and STDIN. my $gb = Bio::DB::GenBank->new( -format => 'GenBank', -seq_start => 0, -seq_end => 1000, -strand => 1, -complexity => 0); # put in some restrictions as to what is retrieved and stored into GenBank object ... # get a stream via a query string my $query = Bio::DB::Query::GenBank->new (-query =>'Homo sapiens[Organism] AND M-cadherin', -db => 'nucleotide'); my $seqio = $gb->get_Stream_by_query($query); my $i=0; # count total number of sequences while (my $seq = $seqio->next_seq) { print "seq id =", $seq->id, "\t version = ", $seq->version, "\t seq acc number = ", $seq->accession_number, "\t seq length = ", $seq->length,"\n"; $i++; } print "retrieved $i sequences from GenBank \n"; # -------------------------------------------------------------------------- 20

  21. More Quizzing: Seq10.pl • Run Seq10.pl • Why the warning messages? • Specifying strands • 1 for plus • 2 for minus • Complexity: A GenBank nucleotide entry is often a part of a larger biological blob that contains other GI numbers (e.g., translated protein) • Complexity regulates the display:0 - get the whole blob1 - get the bioseq for gi of interest (default in Entrez)2 - get the minimal bioseq-set containing the gi of interest3 - get the minimal nuc-prot containing the gi of interest4 - get the minimal pub-set containing the gi of interest 21

  22. Some Cautions • Be careful when querying databases • → have an idea of how many sequences you may be downloading/processing • Know that Perl might eat-up all of your CPU cycles 22

  23. Part 3: Interacting With A Database 23

  24. Preliminaries: Updating ODBC Manager • First we need to add directions to “GenesToEvaluate” DB to ODBC Manager • More at http://lane.stanford.edu/howto/index.html?id=_1751 24

  25. Example Perl Programs That Interact With A Database Ancillary files: • ExampleOutputExcel3.csv needed as input to Access1.pl • Access2.pl and Access3.pl don’t need this file • All programs rely on GenesToEvaluate.mdb (Access DB) 25

  26. In Closing: Suggestions • Modify the programs provided here • Baby steps… • Save often • Keep lots of prior versions so you can recover from your mistakes • SU provides lots of documentation → use it! • Google is invaluable 26

More Related