Heuristic approaches scoring matrices
Download
1 / 16

HEURISTIC APPROACHES - PowerPoint PPT Presentation


  • 57 Views
  • Uploaded on

BIOCHEMISTRY

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'HEURISTIC APPROACHES' - mprasadnaidu


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Heuristic approaches scoring matrices

Heuristic approaches & scoring matrices

M.Prasad Naidu

MSc Medical Biochemistry, Ph.D,.


Introduction
Introduction

  • Two algorithms are there in these methods

    • BLAST

    • FASTA

  • FastA is an algorithm developed by Pearson and Lipman. Its more sensitive than Blast.

  • Blast is an algorithm developed by Altschul et al., in 1990. It provides tools for high scoring local alignment between two sequences. Now a days, a gapped versions are available.


Blastp algorithm
BLASTP algorithm

  • Blast Algorithm involves the following steps.

    • Breaking of the sequence into defined word size.

    • Finding a match or HSP (High Scoring Pair).

    • Alignment of the word and extending the alignment.


Breaking of the sequence into defined word size
Breaking of the sequence into defined word size

Query : AILDTGATGDA

Word size : 4

AILDTGATGDA

AILD

ILDT

LDTG

DTGA

TGAT

GATG

ATGD

TGDA


Finding a high scoring pair
Finding a High scoring Pair

MQVWGWAILDTVATDAAMLL

AILD


Extending the alignment
Extending the alignment

MQVWGWAILDTVATDAAMLL

……………..AILDTGATGDA……

Parameters in BLAST result

Percentage of Homology

Scoring of the alignment

No of residues aligned

E-value


Fasta algorithm
FastA algorithm

  • The word size in FastA algorithm is defined as K-tuple.

  • Generally the K-tuple for the algorithm is either 3 or 4 for nucleotide sequences and 1 or 2 for protein sequences.

  • FastA algorithm also involves the steps similar to that of the BLAST tool. But the alignment generation procedure is different.


Breaking of the sequence into defined k tuple
Breaking of the sequence into defined k-tuple

F A M L G F I K Y L P G C M

1 2 3 4 5 6 7 8 9 10 11 12 13 14


The most occuring number in the algorithm is 3, so the alignment starts after leaving three characters or residues


Alignment of the sequences
Alignment of the sequences alignment starts after leaving three characters or residues

F A M L G F I K Y L P G C M

T G F I K Y L P G A C T

Parameters in FASTA result

Percentage of Homology

Scoring of the alignment

No of residues aligned

P-Score


Scoring schemes
Scoring schemes alignment starts after leaving three characters or residues

Identity scoring matrix

  • Residue to residue scores are represented here in the form of similarity.

  • A 4 X 4 matrix is built for the nucleotides and 20 X 20 matrix for the amino acids.

  • For match score is +1 and mismatch is -1


Pam matrices
PAM Matrices alignment starts after leaving three characters or residues

  • These were first developed by Margaret Dayhoff and co-workers in 1978.

  • This model assumes that evolutionary changes follow the markov model i.e. residual changes occur independent on the previous mutation. One PAM is a unit of evolutionary divergence in which there is 1% amino acid change but it doesn’t imply that 100 PAM results in different aminoacids.

  • Dayhoff and coworkers have calculated the frequencies of accepted mutations for 1PAM by analyzing closely related families of sequences.

  • The scores are represented as log odd ratios.

  • The 1PAM can be extended to any no of PAMS. For example, 1PAM table is extended to N X 1PAM.

  • For closely related protein sequences, lower distance PAM is used and higher PAM is used for variying proteins.

  • PAM 30 is used for closer proteins and PAM 250 for divergent ones.


Pam 250 scoring matrix
PAM 250 scoring matrix alignment starts after leaving three characters or residues


Blosum matrices
BLOSUM Matrices alignment starts after leaving three characters or residues

  • These matrices are developed by Heinkoff and Heinkoff in 1991.

  • The matrices have been constructed in a similar fashion as PAM matrices.

  • The data was derived for local alignment of distantly related proteins deposited in the BLOCKS database.

  • BLOSUM 30 is used for comparing highly divergent sequences and BLOSUM 90 is used for closely related proteins.

  • Commonly used BLOSUM matrix is BLOSUM 62 that is used for proteins with 62% identities.


Blosum 62 matrix
BLOSUM 62 Matrix alignment starts after leaving three characters or residues


Thank you

THANK YOU alignment starts after leaving three characters or residues


ad