Bioinformatics jtang@cse sc edu 13928761660
This presentation is the property of its rightful owner.
Sponsored Links
1 / 32

Bioinformatics 生物信息学理论和实践 唐继军 [email protected] 13928761660 PowerPoint PPT Presentation


  • 75 Views
  • Uploaded on
  • Presentation posted in: General

Bioinformatics 生物信息学理论和实践 唐继军 [email protected] 13928761660. #!/usr/bin/perl $DNA = 'ACGT'; # Next, we print the DNA onto the screen print $DNA, "\n"; print '$DNA\n'; print "$DNA\n"; exit;. Do the Math (your 2nd Perl program). #!/usr/bin/perl print " 4+5\n " ; print 4+5 , " \n " ;

Download Presentation

Bioinformatics 生物信息学理论和实践 唐继军 [email protected] 13928761660

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Bioinformatics jtang@cse sc edu 13928761660

[email protected].edu13928761660


Bioinformatics jtang cse sc 13928761660

#!/usr/bin/perl

$DNA = 'ACGT';

# Next, we print the DNA onto the screen

print $DNA, "\n";

print '$DNA\n';

print "$DNA\n";

exit;


Do the math your 2nd perl program

Do the Math (your 2nd Perl program)

#!/usr/bin/perl

print "4+5\n";

print 4+5 , "\n";

print "4+5=" , 4+5 , "\n";

[Note: use commas to separate multiple items in a print statement, whitespace is ignored]


String operations

String Operations

  • Strings (text) in variables can be used for some math-like operations

  • Concatenate (join) use the dot . operator

    $seq1= "ACTG";

    $seq2= "GGCTA";

    $seq3= $seq1 . $seq2;

    print $seq3;

    ACTGGGCTA


Bioinformatics jtang cse sc 13928761660

#!/usr/bin/perl –w

$DNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC';

print "Here is the starting DNA:\n\n";

print "$DNA\n\n";

# Transcribe the DNA to RNA by substituting all T's with U's.

$RNA = $DNA;

$RNA =~ s/T/U/g;

# Print the RNA onto the screen

print "Here is the result of transcribing the DNA to RNA:\n\n";

print "$RNA\n";

# Exit the program.

exit;


Bioinformatics jtang cse sc 13928761660

#!/usr/bin/perl -w

$DNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC';

print "$DNA\n\n";

$revcom = reverse $DNA;

# See the text for a discussion of tr///

$revcom =~ tr/ACGTacgt/TGCAtgca/;

# Print the reverse complement DNA onto the screen

print "Here is the reverse complement DNA:\n\n";

print "$revcom\n";

exit;


Bioinformatics jtang cse sc 13928761660

#!/usr/bin/perl -w

$proteinfilename = 'NM_021964fragment.pep';

open(PROTEINFILE, $proteinfilename);

# First line

$protein = <PROTEINFILE>;

print “\nHere is the first line of the protein file:\n\n”;

print $protein;

# Second line

$protein = <PROTEINFILE>;

print “\nHere is the second line of the protein file:\n\n”;

print $protein;

# Third line

$protein = <PROTEINFILE>;

print “\nHere is the third line of the protein file:\n\n”;

print $protein;

close PROTEINFILE;

exit;


Bioinformatics jtang cse sc 13928761660

#!/usr/bin/perl -w

# The filename of the file containing the protein sequence data

$proteinfilename = 'NM_021964fragment.pep';

# First we have to "open" the file

open(PROTEINFILE, $proteinfilename);

# Read the protein sequence data from the file, and store it

# into the array variable @protein

@protein = <PROTEINFILE>;

# Print the protein onto the screen

print @protein;

# Close the file.

close PROTEINFILE;

exit;


Bioinformatics jtang cse sc 13928761660

#!/usr/bin/perl -w

# "scalar context" and "list context"

@bases = ('A', 'C', 'G', 'T');

print "@bases\n";

$a = @bases;

print $a, "\n";

($a) = @bases;

print $a, "\n";

exit;


Bioinformatics jtang cse sc 13928761660

#!/usr/bin/perl -w

# array indexing

@bases = ('A', 'C', 'G', 'T');

print "@bases\n";

print $bases[0], "\n";

print $bases[1], "\n";

print $bases[2], "\n";

print $bases[3], "\n";

exit;


String functions

String functions

  • Chomp

  • Length of a string

  • Substring


Bioinformatics jtang cse sc 13928761660

#!/usr/bin/perl -w

$proteinfilename = 'NM_021964fragment.pep';

unless ( open(PROTEINFILE, $proteinfilename) ) {

print "Could not open file $proteinfilename!\n";

exit;

}

while( $protein = <PROTEINFILE> ) {

print " #####Here is the next line of the file:\n";

print $protein;

}

# Close the file.

close PROTEINFILE;

exit;


Comparison

Comparison

  • String comparison (are they the same, > or <)

    • eq (equal )

    • ne(not equal )

    • ge(greater or equal )

    • gt (greater than )

    • lt(less than )

    • le(less or equal )


Conditions

Conditions

  • if () {}

  • elsif() {}

  • else {}


Bioinformatics jtang cse sc 13928761660

#!/usr/bin/perl –w

$word = 'MNIDDKL';

if($word eq 'QSTVSGE') {

print "QSTVSGE\n";

}

elsif($word eq 'MRQQDMISHDEL') {

print "MRQQDMISHDEL\n";

}

elsif ( $word eq 'MNIDDKL' ) {

print "MNIDDKL-the magic word!\n";

}

else {

print "Is \”$word\“ a peptide?\n";

}

exit;


More conditions

More Conditions


Bioinformatics jtang cse sc 13928761660

$x = 10;

$y = -20;

if ($x <= 10) { print "1st true\n";}

if ($x > 10) {print "2nd true\n";}

if ($x <= 10 || $y > -21) {print "3rd true\n";}

if ($x > 5 && $y < 0) {print "4th true\n";}

if (($x > 5 && $y < 0) || $y > 5) {print "5th true\n";}


Bioinformatics jtang cse sc 13928761660

But

  • Use ==, <, <=, >, >=, !=, ||, && for numeric numbers

  • Use eq, lt, le, gt, ge, ne, or, and for string comparisons


Bioinformatics jtang cse sc 13928761660

$x = 10;

$y = -20;

if ($x le 10) { print "1st true\n";}

if ($x gt 5) {print "2nd true\n";}

if ($x le 10 || $y gt -21) {print "3rd true\n";}

if ($x gt 5 && $y lt 0) {print "4th true\n";}

if (($x gt 5 && $y lt 0) || $y gt 5) {print "5th true\n";}


Bioinformatics jtang cse sc 13928761660

#!/usr/bin/perl -w

$num = 1234;

$str = '1234';

print $num, " ", $str, "\n";

$num_or_str = $num + $str;

print $num_or_str, "\n";

$num_or_str = $num . $str;

print $num_or_str, "\n";

exit;


More arithmatics

More Arithmatics

  • +, -, *, **, /, %

  • +=, -=, *=, **=, /=, %=

  • ++, --


Bioinformatics jtang cse sc 13928761660

$x = 10;

$x = $x*1.5;

print $x*=3, "\n";

print $x++, "\n";

print $x, "\n";

print ++$x, "\n";

print $x, "\n";

print $x % 3, "\n";

print $x**2, "\n";


Motif finding

Motif finding

  • Read a sequence from a fasta file

  • Ask the user to input a motif

  • Check if the sequence has the motif


Bioinformatics jtang cse sc 13928761660

#!/usr/bin/perl –w

print "Please type the filename: ";

$fname = <STDIN>;

chomp $fname;

open(PROTEINFILE, $fname);

$name = <PROTEINFILE>;

@protein = <PROTEINFILE>;

close PROTEINFILE;

$protein = join( '', @protein);

$protein =~ s/\s//g;

print "Enter a motif to search for: ";

$motif = <STDIN>;

chomp $motif;

if ( $protein =~ /$motif/ ) {

print "I found it!\n\n";

}

else {

print "I couldn\‘t find it.\n\n";

}


Bioinformatics jtang cse sc 13928761660

#!/usr/bin/perl -w

print "Please type the filename of the DNA sequence data: ";

$dna_filename = <STDIN>;

chomp $dna_filename;

open(DNAFILE, $dna_filename);

$name = <DNAFILE>;

@DNA = <DNAFILE>;

close DNAFILE;

$DNA = join('', @DNA);

$DNA =~ s/\s//g;

$count_of_CG = 0;

$position = 0;

while ( $position < length $DNA) {

$base = substr($DNA, $position, 1);

if ( $base eq 'C' or $base eq 'G') {

++$count_of_CG;

}

$position++;

}

print "CG content is ", $count_of_CG/(length $DNA)*100, "%\n";


Bioinformatics jtang cse sc 13928761660

#!/usr/bin/perl –w

print "Please type the filename of the DNA sequence data: ";

$dna_filename = <STDIN>;

chomp $dna_filename;

open(DNAFILE, $dna_filename);

$name = <DNAFILE>;

@DNA = <DNAFILE>;

close DNAFILE;

$DNA = join('', @DNA);

$DNA =~ s/\s//g;

$count_of_CG = 0;

for ( $position = 0 ; $position < length $DNA ; ++$position ) {

$base = substr($DNA, $position, 1);

if ( $base eq 'C' or $base eq 'G') {

++$count_of_CG;

}

}

print "CG content is ", $count_of_CG/(length $DNA)*100, "%\n";


Bioinformatics jtang cse sc 13928761660

#!/usr/bin/perl –w

print "Please type the filename of the DNA sequence data: ";

$dna_filename = <STDIN>;

chomp $dna_filename;

open(DNAFILE, $dna_filename);

$name = <DNAFILE>;

@DNA = <DNAFILE>;

close DNAFILE;

$DNA = join('', @DNA);

$DNA =~ s/\s//g;

$count_of_CG = 0;

while($DNA =~ /c/ig) {$count_of_CG++;}

while($DNA =~ /g/ig) {$count_of_CG++;}

print "CG content is ", $count_of_CG/(length $DNA)*100, "%\n";


Bioinformatics jtang cse sc 13928761660

#!/usr/bin/perl –w

print "Please type the filename of the DNA sequence data: ";

$dna_filename = <STDIN>;

chomp $dna_filename;

open(DNAFILE, $dna_filename);

$name = <DNAFILE>;

@DNA = <DNAFILE>;

close DNAFILE;

$DNA = join('', @DNA);

$DNA =~ s/\s//g;

$count_of_CG = 0;

while($DNA =~ /c/i) {$count_of_CG++;}

while($DNA =~ /g/ig) {$count_of_CG++;}

print "CG content is ", $count_of_CG/(length $DNA)*100, "%\n";


Exercise 1

Exercise 1

  • Ask for a protein file in fasta format

  • Ask for an amino acid

  • Count the frequency of that amino acid

  • TKFHSNAHFYDCWRMLQYQLDMRCMRAISTFSPHCGMEHMPDQTHNQGEMCKPRMWQVSMNQSCNHTPPFRKTYVEWDYMAKALIAPYTLGWLASTCFIW


Exercise 2

Exercise 2

  • Ask for a DNA file in fasta format

  • Ask for a codon

  • Count the frequency of that codon

  • TCGTACTTAGAAATGAGGGTCCGCTTTTGCCCACGCACCTGATCGCTCCTCGTTTGCTTTTAAGAACCGGACGAACCACAGAGCATAAGGAGAACCTCTAGCTGCTTTACAAAGTACTGGTTCCCTTTCCAGCGGGATGCTTTATCTAAACGCAATGAGAGAGGTATTCCTCAGGCCACATCGCTTCCTAGTTCCGCTGGGATCCATCGTTGGCGGCCGAAGCCGCCATTCCATAGTGAGTTCTTCGTCTGTGTCATTCTGTGCCAGATCGTCTGGCAAATAGCCGATCCAGTTTATCTCTCGAAACTATAGTCGTACAGATCGAAATCTTAAGTCAAATCACGCGACTAGACTCAGCTCTATTTTAGTGGTCATGGGTTTTGGTCCCCCCGAGCGGTGCAACCGATTAGGACCATGTAGAACATTAGTTATAAGTCTTCTTTTAAACACAATCTTCCTGCTCAGTGGTACATGGTTATCGTTATTGCTAGCCAGCCTGATAAGTAACACCACCACTGCGACCCTAATGCGCCCTTTCCACGAACACAGGGCTGTCCGATCCTATATTACGACTCCGGGAAGGGGTTCGCAAGTCGCACCCTAAACGATGTTGAAGGCTCAGGATGTACACGCACTAGTACAATACATACGTGTTCCGGCTCTTATCCTGCATCGGAAGCTCAATCATGCATCGCACCAGCGTGTTCGTGTCATCTAGGAGGGGCGCGTAGGATAAATAATTCAATTAAGATATCGTTATGCTAGTATACGCCTACCCGTCACCGGCCAACAGTGTGCAGATGGCGCCACGAGTTACTGGCCCTGATTTCTCCGCTTCTAATACCGCACACTGGGCAATACGAGCTCAAGCCAGTCTCGCAGTAACGCTCATCAGCTAACGAAAGAGTTAGAGGCTCGCTAAATCGCACTGTCGGGGTCCCTTGGGTATTTTACACTAGCGTCAGGTAGGCTAGCATGTGTCTTTCCTTCCAGGGGTATG


  • Login