Chapter 6 - Profiles

1 / 10

# Chapter 6 - Profiles - PowerPoint PPT Presentation

Chapter 6 - Profiles. Assume we have a family of sequences. To search for other sequences in the family we can Search with a sequence from the family Search with more sequences from the family together Consensus sequences (regular expressions) Regular expression Ex. A-[FR]-X(2,3)-M

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Chapter 6 - Profiles' - otis

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Chapter 6 - Profiles

Assume we have a family of sequences. To search for other sequences in the family we can

Search with a sequence from the family

Search with more sequences from the family together

Consensus sequences (regular expressions)

Regular expression Ex. A-[FR]-X(2,3)-M

GARCCMH LCAFARLMLMA

Weight matrices or position-specific scoring matrices

Not considering gaps

Profiles

Profiles as Hidden Markov Models

Chapter 6 - Profiles

Search with a family of sequences
• Align the sequences (multiple)
• Make a profile from part of the alignment
• Search in the database with the profile
• As an option, revise the profile, and search again (iteratively)

Chapter 6 - Profiles

Multiple alignments and profiles

What weight does amino acid a have in position r in the profile

Chapter 6 - Profiles

Example

Clustal X (1.64b) multiple sequence alignment

XENLA1 ALVSGPQD------NELDG--MQL

XENLA2 AQVNGPQD------NELDG--MQF

MOUSE1 PQVEQLEL------GGSP---GDL

RAT1 PQVPQLEL------GGGPEA-GDL

MOUSE2 PQVAQLEL------GGGPGA-GDL

RAT2 PQVAQLEL------GGGPGA-GDL Removed

CRILO PQVAQLEL------GGGPGA-DDL

RABIT LQVGQAEL------GGGPGA-GGL

BOVIN PQVGALEL------AGGPG-----

SHEEP PQVGALEL------AGGPG----- Removed

PIG PQAGAVEL------GGGLGG---L

CANFA LQVRDVEL------AGAPGE-GGL

HUMAN LQVGQVEL------GGGPGA-GSL

CHICK P-LVSSPL------RGEAGV-LPF

ORENI LLGFLPPKAGGAVVQGGEN---EV

VERMO LLGFLPAKSGGAAAGG-ENEVAEF

12345678******567890*234 * means removed

Cons A B C D E F G H I K L M N P Q R S T V W X Y Z Gap Le

1 P 1 0 -18 -17 -12 -14 -21 -13 -3 -10 1 -2 -15 26 -6 -12 -3 -2 -1 -32 0 -18 0 100 100

2 q -4 0 -18 -5 2 -10 -17 2 -3 3 0 1 -3 -7 11 3 -4 -3 -4 -17 0 -10 0 50 100

3 V 1 0 -5 -23 -17 -6 -15 -17 15 -15 9 7 -17 -16 -13 -17 -7 -3 18 -26 0 -14 0 100 100

4 G 0 0 -12 -8 -7 -14 0 -5 -13 -6 -14 -10 -2 -9 -5 -6 -1 -3 -8 -22 0 -11 0 100 100

5 Q 2 0 -15 1 1 -25 4 -3 -17 -1 -15 -11 1 -7 3 -2 3 -1 -12 -30 0 -20 0 100 100

6 P 1 0 -13 -17 -11 -14 -21 -13 0 -10 0 -1 -13 18 -7 -13 -1 0 3 -32 0 -17 0 100 100

7 E 0 0 -29 12 19 -36 -10 0 -25 7 -24 -19 3 20 13 2 2 0 -17 -41 0 -26 0 100 100

8 L -8 0 -20 -15 -10 -1 -29 -10 7 -7 14 9 -13 -17 -6 -10 -12 -8 3 -20 0 -8 0 100 100

5 g 3 0 -16 5 2 -36 21 0 -28 3 -28 -21 10 -8 4 5 4 -2 -20 -32 0 -25 0 34 34

6 G 4 0 -21 6 0 -49 51 -10 -41 -6 -40 -32 4 -13 -4 -7 3 -9 -30 -40 0 -37 0 100 100

7 G 3 0 -16 -3 -4 -31 23 -11 -22 -8 -20 -16 -2 -12 -5 -9 0 -6 -16 -33 0 -27 0 100 100

8 P 3 0 -24 7 6 -32 -10 -5 -21 -1 -20 -17 0 27 2 -6 2 0 -14 -43 0 -25 0 100 100

9 g 3 0 -19 5 -2 -45 49 -8 -39 -6 -38 -30 9 -13 -5 -6 4 -7 -28 -37 0 -33 0 50 78

0 a 5 0 -3 -2 0 -12 0 -5 -3 -3 -6 -3 -2 -3 -1 -4 1 0 0 -19 0 -12 0 50 78

2 g -1 0 -11 -9 -9 -12 7 -9 -6 -9 -4 0 -6 -13 -7 -10 -4 -6 -6 -18 0 -14 0 50 78

3 q 0 0 -22 13 11 -33 4 0 -26 3 -25 -19 6 6 7 0 3 0 -19 -36 0 -23 0 50 78

4 L -12 0 -10 -37 -28 28 -42 -13 22 -22 29 21 -27 -24 -17 -23 -20 -12 15 1 0 10 0 100 100

* 17 0 0 10 17 3 52 0 0 1 36 2 4 22 21 2 5 0 16 0 0 0 0

Chapter 6 - Profiles

What to take into account when creating a profile?

1. The observed amino acids in position r in the alignment.

2. The number of independent ‘observations’ that has been used for constructing

the alignment of position r (for example number of different a.a. in the column)

3. The similarity of a to the amino acids observed in column r, to allow for not yet

observed amino acids. Amino acid a is more likely to occur in unknown family members if there are many amino acids similar to a in the known sequences.

Thus a ‘background’ scoring matrix should be used.

4. The background (a priori) distribution of the amino acids.

5. The diversity and similarity of the sequences, resulting in the importance (or

weight) of each sequence. The known sequences are normally not uniformly

distributed in the ‘family space’, and should have different weights in the calculation.

6. The number of gaps over column r and the neighbouring columns.

These points are not independent. How these aspects are treated varies with the different methods for profile construction.

Chapter 6 - Profiles

Database search with a profile

Chapter 6 - Profiles

Notations

Chapter 6 - Profiles

Position weight

No sequence weight considered now

• All a.a. In the column count equally
• A.a occurring many times are favored
• A.a. Occurring many times are ’punished’

Chapter 6 - Profiles

PSI-BLAST

Chapter 6 - Profiles

Hidden Markov Model

Chapter 6 - Profiles