heuristic approaches scoring matrices
Download
Skip this Video
Download Presentation
HEURISTIC APPROACHES

Loading in 2 Seconds...

play fullscreen
1 / 16

HEURISTIC APPROACHES - PowerPoint PPT Presentation


  • 61 Views
  • Uploaded on

BIOCHEMISTRY

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'HEURISTIC APPROACHES' - mprasadnaidu


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
heuristic approaches scoring matrices

Heuristic approaches & scoring matrices

M.Prasad Naidu

MSc Medical Biochemistry, Ph.D,.

introduction
Introduction
  • Two algorithms are there in these methods
    • BLAST
    • FASTA
  • FastA is an algorithm developed by Pearson and Lipman. Its more sensitive than Blast.
  • Blast is an algorithm developed by Altschul et al., in 1990. It provides tools for high scoring local alignment between two sequences. Now a days, a gapped versions are available.
blastp algorithm
BLASTP algorithm
  • Blast Algorithm involves the following steps.
    • Breaking of the sequence into defined word size.
    • Finding a match or HSP (High Scoring Pair).
    • Alignment of the word and extending the alignment.
breaking of the sequence into defined word size
Breaking of the sequence into defined word size

Query : AILDTGATGDA

Word size : 4

AILDTGATGDA

AILD

ILDT

LDTG

DTGA

TGAT

GATG

ATGD

TGDA

finding a high scoring pair
Finding a High scoring Pair

MQVWGWAILDTVATDAAMLL

AILD

extending the alignment
Extending the alignment

MQVWGWAILDTVATDAAMLL

……………..AILDTGATGDA……

Parameters in BLAST result

Percentage of Homology

Scoring of the alignment

No of residues aligned

E-value

fasta algorithm
FastA algorithm
  • The word size in FastA algorithm is defined as K-tuple.
  • Generally the K-tuple for the algorithm is either 3 or 4 for nucleotide sequences and 1 or 2 for protein sequences.
  • FastA algorithm also involves the steps similar to that of the BLAST tool. But the alignment generation procedure is different.
breaking of the sequence into defined k tuple
Breaking of the sequence into defined k-tuple

F A M L G F I K Y L P G C M

1 2 3 4 5 6 7 8 9 10 11 12 13 14

slide9

The most occuring number in the algorithm is 3, so the alignment starts after leaving three characters or residues

alignment of the sequences
Alignment of the sequences

F A M L G F I K Y L P G C M

T G F I K Y L P G A C T

Parameters in FASTA result

Percentage of Homology

Scoring of the alignment

No of residues aligned

P-Score

scoring schemes
Scoring schemes

Identity scoring matrix

  • Residue to residue scores are represented here in the form of similarity.
  • A 4 X 4 matrix is built for the nucleotides and 20 X 20 matrix for the amino acids.
  • For match score is +1 and mismatch is -1
pam matrices
PAM Matrices
  • These were first developed by Margaret Dayhoff and co-workers in 1978.
  • This model assumes that evolutionary changes follow the markov model i.e. residual changes occur independent on the previous mutation. One PAM is a unit of evolutionary divergence in which there is 1% amino acid change but it doesn’t imply that 100 PAM results in different aminoacids.
  • Dayhoff and coworkers have calculated the frequencies of accepted mutations for 1PAM by analyzing closely related families of sequences.
  • The scores are represented as log odd ratios.
  • The 1PAM can be extended to any no of PAMS. For example, 1PAM table is extended to N X 1PAM.
  • For closely related protein sequences, lower distance PAM is used and higher PAM is used for variying proteins.
  • PAM 30 is used for closer proteins and PAM 250 for divergent ones.
blosum matrices
BLOSUM Matrices
  • These matrices are developed by Heinkoff and Heinkoff in 1991.
  • The matrices have been constructed in a similar fashion as PAM matrices.
  • The data was derived for local alignment of distantly related proteins deposited in the BLOCKS database.
  • BLOSUM 30 is used for comparing highly divergent sequences and BLOSUM 90 is used for closely related proteins.
  • Commonly used BLOSUM matrix is BLOSUM 62 that is used for proteins with 62% identities.
ad