slide1 n.
Skip this Video
Download Presentation
אנא מלאו את סקר ההוראה

Loading in 2 Seconds...

play fullscreen
1 / 24

אנא מלאו את סקר ההוראה - PowerPoint PPT Presentation

  • Uploaded on

אנא מלאו את סקר ההוראה. הסקר ייפתח ב-06.01.13 ויהיה ניתן למלאו במשך שלושה שבועות הסקר יהיה זמין ב מערכת המידע האישי. print “It is your chance to give us feedback\n”;. use strict;. Bio Perl. Bio Perl. Bio Perl is a collection of Perl modules for bioinformatics applications.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'אנא מלאו את סקר ההוראה' - zwi

Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
אנא מלאו את סקר ההוראה

הסקר ייפתח ב-06.01.13 ויהיה ניתן למלאו במשך שלושה שבועות

הסקר יהיה זמין במערכת המידע האישי

print “It is your chance to give us feedback\n”;

use strict;

bio perl
  • BioPerl is a collection of Perl modules for bioinformatics applications.
  • It is an active open source software project founded in 1996.
  • BioPerl provides software modules for many of the typical bioinformatic tasks.
  • Among these are:
  • Format conversions
  • Manipulating biological sequences
  • Searching for similar sequences
  • Creating sequence alignments
  • Searching for ORFs on genomic DNA
  • And more…
bio perl1

BioPerl modules are called Bio::XXX

You can use the BioPerl wiki:

with documentation and examples for how to use them – which is the best way to learn this. We recommend beginning with the "How-tos": To a more hard-core inspection of BioPerl modules:

BioPerl 1.6.1 Module Documentation

object oriented use of packages
Object-oriented use of packages



Many packages are meant to be used as objects.

In Perl, an object is a data structure that can use subroutines that are associated with it.

We will not learn object oriented programming,but we will learn how to create and use objects defined by BioPerl packages.

bio perl the seqio module
BioPerl: the SeqIO module



BioPerl modules are named Bio::xxxx

The Bio::SeqIO module deals with Sequences Input and Output:

We will pass arguments to the new argument of the file name and format

use Bio::SeqIO;

my $in = Bio::SeqIO->new("-file"=> "<","-format"=> "GenBank");

File argument(filename as would be in open)

Format argument

A list of all the sequence formats BioPerl can read is in:

bio perl the seqio module1
BioPerl: the SeqIO module


next_seq() write_seq()

use Bio::SeqIO;

my $in = Bio::SeqIO->new("-file" => "<", "-format" => "GenBank");

my $seqObj = $in->next_seq();

Perform next_seq()subroutine on $in You could think of it as:SeqIO::next_seq($in)

next_seq() returns the next sequence in the file as a Bio::Seq object (we will talk about them soon)

bio perl the seqio module2
BioPerl: the SeqIO module

use Bio::SeqIO;

my $in = Bio::SeqIO->new("-file" => "<", "-format" => "GenBank");

my $out = Bio::SeqIO->new("-file" => ">adeno12.out.fas", "-format" => "Fasta");

my $seqObj = $in->next_seq();

while ( defined($seqObj) ){


$seqObj = $in->next_seq();


write_seq()write a Bio::Seq object to $out according to its format

bio perl the seq module
BioPerl: the Seq module

use Bio::SeqIO;

my $in = Bio::SeqIO->new( "-file" => "<Ecoli.prot.fasta",

"-format" => "Fasta");

my $seqObj = $in->next_seq();

while (defined($seqObj)) {

print "ID:".$seqObj->id()."\n"; #1st word in header

print "Desc:".$seqObj->desc()."\n"; #rest of header

print "Sequence:".$seqObj->seq()."\n"; #seq string

print "Length:".$seqObj->length()."\n"; #seq length

$seqObj = $in->next_seq()


You can read more about the Bio::Seq subroutines in:

print last 30aa of each sequence no bioperl
Print last 30aa of each sequence (no BioPerl)

open (my $in, "<","seq.fasta") or die "Cannot open seq.fasta...";

my $fastaLine = <$in>;

while (defined $fastaLine) {

chomp $fastaLine;

my $header="";

# Read first word of header

if (fastaLine =~ m/^>(\S*)/) {

$header = substr($fastaLine,1);

$fastaLine = <$in>;}

# Read seq until next header

my $seq = "";

while ((defined $fastaLine) and(substr($fastaLine,0,1) ne ">" )) {

chomp $fastaLine;

$seq = $seq.$fastaLine;

$fastaLine = <$in>;


# print last 30aa

my $subseq = substr($seq,-30);

print "$header\n“."$subseq\n";


now using bio perl
Now using BioPerl

use Bio::SeqIO;

my $in = Bio::SeqIO->new("-file"=>"<seq.fasta","-format"=>"Fasta");

my $seqObj = $in->next_seq();

while (defined($seqObj)) {

# Read first word of header

my $header = $seqObj->id();

# print last 30aa

my $seq = $seqObj->seq();

my $subseq = substr($seq,-30);

print "$header\n";

print "$subseq\n";

$seqObj = $in->next_seq();


Note: BioPerl warnings about:Subroutine ... redefined at ... Should not trouble you, it is a known issue – it is not your fault and won't effect your script's performances.

class exercise 12a
Class exercise 12a
  • Use Bio::SeqIO to read a FASTA file and print to an output FASTA file only sequences shorter than 3,000 bases.(use the EHD nucleotide FASTA from the webpage)
  • Use Bio::SeqIO to read a FASTA file, and print (to the screen) header lines that contain the words "Mus musculus".
  • 3. Write a script that uses Bio::SeqIO to read a GenPept file and convert it to FASTA.(use from the course’s webpage)
  • 4*. Same as Q1, but print to the FASTA the reverse complement of each sequence. (Do not use the reverse or tr// functions! BioPerl can do it for you - read the BioPerl documentation).
bioperl downloading files from the web
BioPerl: downloading files from the web

The Bio::DB::Genbank module allows us to downloada specific record from the NCBI website:

use Bio::DB::GenBank;my $gb = Bio::DB::GenBank->new;my $seqObj = $gb->get_Seq_by_acc("J00522");

print $seqObj->seq();

see more options in:


Congrats, you just sequenced yourself some DNA.


And you want to see if it exists in any other organism


BLAST - Basic Local Alignment and Search Tool

BLAST helps you find similarity between your sequence and other sequences


BLAST - Basic Local Alignment and Search Tool

BLAST helps you find similarity between your sequence and other sequences


BLAST helps you find similarity between your sequence and other sequences

bioperl reading blast output
BioPerl: reading BLAST output

First we need to have the BLAST results in a text file BioPerl can read.Here is one way to achieve this (using NCBI BLAST):



An alternative is to use BLASTALL on your computer

bioperl reading blast output1
BioPerl: reading BLAST output


Query= gi|52840257|ref|YP_094056.1| chromosomal replication initiator

protein DnaA [Legionella pneumophila subsp. pneumophila str.

Philadelphia 1]

(452 letters)

Database: Coxiella.faa

1818 sequences; 516,956 total letters


Score E

Sequences producing significant alignments: (bits) Value

gi|29653365|ref|NP_819057.1| chromosomal replication initiator p... 633 0.0

gi|29655022|ref|NP_820714.1| DnaA-related protein [Coxiella burn... 72 4e-14

gi|29654861|ref|NP_820553.1| Holliday junction DNA helicase B [C... 32 0.033

gi|29654871|ref|NP_820563.1| ATPase, AFG1 family [Coxiella burne... 27 1.4

gi|29654481|ref|NP_820173.1| hypothetical protein CBU_1178 [Coxi... 25 3.1

gi|29654004|ref|NP_819696.1| succinyl-diaminopimelate desuccinyl... 25 3.1

Results info

bioperl reading blast output2
BioPerl: reading BLAST output

gi|215919162|ref|NP_820316.2| threonyl-tRNA synthetase [Coxiella... 25 5.3

gi|29655364|ref|NP_821056.1| transcription termination factor rh... 24 9.0

gi|215919324|ref|NP_821004.2| adenosylhomocysteinase [Coxiella b... 24 9.0

gi|29653813|ref|NP_819505.1| putative phosphoribosyl transferase... 24 9.0

>gi|29653365|ref|NP_819057.1| chromosomal replication initiator

protein [Coxiella burnetii RSA 493]

Length = 451

Score = 633 bits (1632), Expect = 0.0

Identities = 316/452 (69%), Positives = 371/452 (82%), Gaps = 5/452 (1%)





EL+ + S D I +++GS+ E + + AP + + +++N





Result header

 high scoring pair (HSP) data

HSP Alignment

Note: There could be more than one HSP for each result, in case of homology in different parts of the protein

bio searchio reading blast output
Bio::SearchIO : reading BLAST output

The Bio::SearchIO module can read and parse BLAST output:

use Bio::SearchIO;

my $blast_report =

Bio::SearchIO->new("-file" => "<LegCox.blastp",

"-format" => "blast" );

my ($resultObj, $hitObj, $hspObj);

while( defined($resultObj = $blast_report->next_result()) ){

print "Checking query ".$resultObj->query_name()."\n";

while( defined($hitObj = $resultObj->next_hit()) ) {

print "Checking hit ". $hitObj->name()."\n";

$hspObj = $hitObj->next_hsp();

print "Best score: ".$hspObj->score()."\n";



(See the BLAST output example in the course’s website)

bioperl reading blast output3
BioPerl: reading BLAST output

You can send parameters to the subroutines of the objects:

# Get length of HSP (including gaps)


# Get length of hit part of alignment (without gaps)


# Get length of query part of alignment (without gaps)


More about what you can do with query, hit and hsp see in:

class exercise 12b
Class exercise 12b
  • Uses Bio::SearchIO to parse the BLAST results:(LegCox.blastp provided in the course’s website)
    • For each query print out its name and the name of its first hit.
    • b*) Print the percent identity of each HSP of the first hit of each query.
    • c*) Print the e-value of each HSP of the first hit of each query.