DNA序列分析 David Shiuan Department of Life Science Institute of Biotechnology and Interdisciplinary Program of Bioinformatics National Dong Hwa University
DNA序列分析 (I) • BLAST comparison • ORF (open reading frame) Finder • Promoter Search -Promoter Prediction (BCM) -EPD(Eukaryote Promoter Database) -NNPP prokaryote promoter prediction(BCM) -ProtScan (BIMAS)
DNA序列分析 (II) • Sequence Alignment (Clastal W) • Tree Analysis (MEGA, PAUP, UPGMA) • Motif Prediction • Restriction Analysis (TCGA) • RNAFOLD (GCG)
Basic Local Alignment Search Tool • A sequence comparison algorithm optimized for speed used to search sequence databases for optimal local alignments to a query. • Algorithm : A fixed procedure embodied in a computer program.
Basic Local Alignment Search Tool • The initial search is done for a word of length "W" that scores at least "T" when compared to the query using a substitution matrix. Word hits are then extended in either direction in an attempt to generate an alignment with a score exceeding the threshold of "S". The "T" parameter dictates the speed and sensitivity of the search.
BLOSUM62 Substitution Scoring Matrix • The BLOSUM 62 matrix shown here is a 20 x 20 matrix, in which every possible identity and substitution is assigned a score based on the observed frequencies of such occurences in alignments of related proteins. • Identities are assigned the most positive scores.
The NCBI BLAST family of programs • blastp compares an amino acid query sequence against a protein sequence database • blastn compares a nucleotide query sequence against a nucleotide sequence database • blastx compares a nucleotide query sequence translated in all reading frames against a protein sequence database • tblastn compares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames • tblastx compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database.
Peptide Sequence Databasesfor BLAST search • nr • All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF • month • All new or revised GenBank CDS translation+PDB+SwissProt+PIR+PRF released in the last 30 days. • swissprot • Last major release of the SWISS-PROT protein sequence database (no updates)
E-value for the score S • the expected number of HSPs with score at least S is given by the formula E = K m n e – lS HSP : high-scoring segment pairs m and n :sequence lengths K and lambda : parameters
Promoter Search • ProtScan (at BIMAS) • EPD (Eukaryote Promoter Database) • Promoter Prediction (BCM) • NNPP (Prokaryote Promoter Prediction at BCM)