1 / 19

Ni mble Perl Programming Using Scriptome

Ni mble Perl Programming Using Scriptome. Yannick Pouliot, PhD Bioresearch Informationist Lane Medical Library & Knowledge Management Center 1/22/2009. Objectives. Determining whether Scriptome can … Enable you to perform operations otherwise difficult/time-consuming/error-prone?

lily
Download Presentation

Ni mble Perl Programming Using Scriptome

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Nimble Perl Programming Using Scriptome Yannick Pouliot, PhD Bioresearch Informationist Lane Medical Library & Knowledge Management Center 1/22/2009

  2. Objectives Determining whether Scriptome can … • Enable you to perform operations otherwise difficult/time-consuming/error-prone? • Help you learn Perl? Also, we’ll be using anonymouspolling to determine whether you’re happy with the material and speed of delivery … And don’t worry: This experiment won’t hurt a bit! 2

  3. So What Is Scriptome? Scriptome is a resident Perl program that performs various data manipulation tasks useful to biologists • Originally developed by Harvard’s FAS Center for Systems Biology • Maintained and extended by lots more volunteers not associated with Harvard 3

  4. Why Bother With Scriptome? • Code is visible, enabling learning on how to do things in Perl … or not • Can handle arbitrarily large files • No size limitations, e.g., Excel • Free; runs on everything: PC, Mac, Linux • It’s programmatic! • Much faster than manual operations • You can string operations together and save these in e.g. a .bat file 4

  5. How Do You Use Scriptome? • You tell Scriptome which function you want it to perform (more later) • You can also string Scriptome functions into a protocol • Input: Scriptome operates on text files • No binary files, but you could add that capability yourself • E.g., process Excel files in native form using Perl modules, e.g., ParseExcel • Output: command line or write into another file 5

  6. Scriptome: Pick Your Flavor http://lane.stanford.edu/howto/index.html?id=_1257 http://sysbio.harvard.edu/csb/resources/computational/scriptome/ 6

  7. Installing Scriptome - Windows • Download Scriptome_exe.tar.gz using this link: http://sysbio.harvard.edu/csb/resources/computational/scriptome/bin/Scriptome_exe.tar.gz. → Final location: I suggest C:/Program Files/Scriptome • Create a directory named “Scriptome” • Decompress Scriptome_exe.tar.gz by double-clicking → Notice the four files inside • Update the PATH variable add this string at the END of the contents of the PATH variable: ;C:\Program Files\Scriptome\Scriptome;C:\Program Files\Scriptome\ScriptPack;C:\Program Files\Scriptome\Scriptome.bat;C:\Program Files\Scriptome\ScriptPack.bat 7

  8. Scriptome Usage 1. Using a specific tool: Scriptome flags toolname [input_filenames] [> output_filename] Example • Scriptome -t change_fasta_to_tab LONGhmcad.fst 2. Finding a tool by type: Scriptome -t tooltype where tooltype = • Calc • Choose • Sort • Fetch • Merge • Change Example • Scriptome -t Calc Let’s examine each area briefly before going over specifics… 8

  9. Polling Time: How’s the speed? 1: Too fast 2. Too slow 3. More or less OK 4. I feel nauseous 9

  10. Examples and noteworthy tools 10

  11. Calc Tool Examples - 1 Compute column sums: • Scriptome -t calc_col_sum SubjectData1.tab → select columns to add IMPORTANT: column numbers start at 0, not 1 • Note visible Perl code → easy to modify, expand perl -e " $col=1; while(<>) { s/\r?\n//; @F=split /\t/, $_; $sum += $F[$col]; } warn qq~\nSum of column $col for $. lines\n\n~; print qq~$sum\n~ " file.tab 11

  12. Calc Tool Examples - 2 Compute row sums: • Scriptome -t calc_row_sum SubjectData1.tab → enter 1 for column 1, 2 for column 2, etc perl -e " @cols=(1, 2, 3); while(<>) { s/\r?\n//; @F=split /\t/, $_; $sum = 0; foreach $col (@cols) { $sum += $F[$col] }; print qq~$_\t$sum\n~; } warn qq~\nSum of columns @cols for each line ($. lines)\n\n~ " in.tab 12

  13. Change Tool Examples - 1 perl -e " $count=0; $len=0; while(<>) { s/\r?\n//; s/\t/ /g; if (s/^>//) { if ($. != 1) { print qq~\n~ } s/ |$/\t/; $count++; $_ .= qq~\t~; } else { s/ //g; $len += length($_) } print $_; } print qq~\n~; warn qq~\nConverted $count FASTA records in $. lines to tabular format\nTotal sequence length: $len\n\n~; " seqs.fna Create tab-delimited file from FASTA file: • Scriptome -t change_fasta_to_tab LONGhmcad.fst > LONGhmcad.fst.tab → change_fasta_to_tab is an important tool because many Scriptome tools use tab-delimited files 13

  14. Change Tool Examples - 2 Change rows to columns or vice versa: • Scriptome -t change_transpose_table SubjectData1.tab • Note: change_transpose_table operates on tab-delimited files 14

  15. Change Tool Examples - 3 • Create tab-delimited file from FASTA file: Scriptome -t change_bio_format_to_bio_format LONGhmcad.fst enter ‘fasta’ as input format (no quotes) enter ‘genbank’ as output format (no quotes) change_bio_format_to_bio_format addresses the common problem of converting formats Important: requires Bioperl to be installed * Notice anything interesting? * perl -MBio::SeqIO -e " $informat= qq~genbank~; $outformat= qq~fasta~; $count = 0; for $infile (@ARGV) { $in = Bio::SeqIO->newFh(-file => $infile , -format => $informat); $out = Bio::SeqIO->newFh(-format => $outformat); while (<$in>) { print $out $_; $count++; } } warn qq~Translated $count sequences from $informat to $outformat format\n~ " myseqs.genbank > myseqs.fasta 15

  16. Conclusions Scriptome is … • A good solution for manipulating medium to large data files quickly and reliably • A way to learn Perl in a “real” context (no toy problems) • Able to perform a wide range of tasks, from simple, generic file manipulations to bio-specific complex tasks 16

  17. Resources • For Perl help, see resources in workshop description in Lane’s Perl Programming for Biologists • Some recommended titles: 17

  18. Polling Time: Do you think Scriptome will be useful to your research? 1. Definitely 2. Likely 3. Not likely 4. No way 5. What’s the question again? 18

More Related