Modules
Download
1 / 47

Modules - PowerPoint PPT Presentation


  • 82 Views
  • Uploaded on

Another example of the helix-loop-helix motif is seen within several DNA binding domains including the homeobox proteins which are the master regulators of development. Modules. HMMs, Profiles, Motifs, and Multiple Alignments used to define modules. (Figures from Branden & Tooze).

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Modules' - temima


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Modules

Modules

HMMs, Profiles, Motifs, and Multiple Alignments used to define modules

(Figures from Branden & Tooze)

  • Several motifs (b-sheet, beta-alpha-beta, helix-loop-helix) combine to form a compact globular structure termed a domain or tertiary structure

  • A domain is defined as a polypeptide chain or part of a chain that can independently fold into a stable tertiary structure

  • Domains are also units of function (DNA binding domain, antigen binding domain, ATPase domain, etc.)


Modules

COG 272, BRCT family several DNA binding domains including the homeobox proteins which are the master regulators of development

P. Bork et al


Five principal fold classes
Five Principal Fold Classes several DNA binding domains including the homeobox proteins which are the master regulators of development

All a folds

All b folds

a + b folds

a / b folds

small irregular folds


Scop protein fold hierarchy
SCOP - Protein Fold Hierarchy several DNA binding domains including the homeobox proteins which are the master regulators of development

Class - 5

Fold - ~500

Superfamily - ~ 700

Family ~ 1000

Family - domains with common evolutionary origin


Sequence similarity may miss functional homologies which can be detected by 3d structural analysis

} several DNA binding domains including the homeobox proteins which are the master regulators of development

“Twilight zone”

Sequence Similarity May Miss Functional Homologies Which Can Be Detected by3D Structural Analysis

Homologous 3D Structure

Non-homologous

3D Structure

% Sequence Identity

Residues Aligned

Adapted from Chris Sander


Structural validation of homology
Structural Validation several DNA binding domains including the homeobox proteins which are the master regulators of developmentof Homology

19% Seq ID

Z = 12.2

Adenylate Kinase Guanylate Kinase


Modules

Asp tRNA Synthetase several DNA binding domains including the homeobox proteins which are the master regulators of development

Staphylococcal Nuclease

CspA

Gene 5 ssDNA Binding Protein

Topoisomerase I

CspB


What is protein geometry
What is Protein Geometry? several DNA binding domains including the homeobox proteins which are the master regulators of development

  • Coordinates (X, Y, Z’s)

  • Dihedral Angles

    • Assumes standard bond lengths and bond angles


Other aspects of structure besides just comparing atom positions

Atom Position, XYZ triplets several DNA binding domains including the homeobox proteins which are the master regulators of development

Other Aspects of Structure, Besides just Comparing Atom Positions

Lines, Axes, Angles

Surfaces, Volumes


Depicting protein structure sperm whale myoglobin
Depicting Protein Structure: several DNA binding domains including the homeobox proteins which are the master regulators of developmentSperm Whale Myoglobin


Sperm whale myoglobin
Sperm Whale Myoglobin several DNA binding domains including the homeobox proteins which are the master regulators of development


Structural alignment of two globins
Structural Alignment several DNA binding domains including the homeobox proteins which are the master regulators of developmentof Two Globins


Automatic alignment to build fold library

Hb several DNA binding domains including the homeobox proteins which are the master regulators of development

Automatic Alignment to Build Fold Library

Alignment of Individual Structures

Fusing into a Single Fold “Template”

Mb

Hb VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-----HGSAQVKGHGKKVADALTNAV

||| .. | |.|| | . | . | | | | | | | .| .| || | || .

Mb VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTEAEMKASEDLKKHGVTVLTALGAIL

Hb AHVD-DMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR------

| | . || | .. . .| .. | |..| . . | | . ||.

Mb KK-KGHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQG

Elements: Domain definitions; Aligned structures, collecting together Non-homologous Sequences; Core annotation

Previous work: Remington, Matthews ‘80; Taylor, Orengo ‘89, ‘94; Artymiuk, Rice, Willett ‘89; Sali, Blundell, ‘90; Vriend, Sander ‘91; Russell, Barton ‘92; Holm, Sander ‘93; Godzik, Skolnick ‘94; Gibrat, Madej, Bryant ‘96; Falicov, F Cohen, ‘96; Feng, Sippl ‘96; G Cohen ‘97; Singh & Brutlag, ‘98


Explain concept of distance matrix on blackboard
Explain Concept of Distance Matrix on Blackboard several DNA binding domains including the homeobox proteins which are the master regulators of development

N x N distance matrix

N dimensional space

Metric matrix

Mij = Dij2 - Dio2 - Djo2

Eigenvectors of metric matrix

Principal component analysis


Automatically comparing protein structures
Automatically several DNA binding domains including the homeobox proteins which are the master regulators of developmentComparing Protein Structures

Given 2 Structures (A & B),

2 Basic Comparison Operations

1 Given an alignment optimally SUPERIMPOSE A onto B

Find Best R & T to move A onto B

2 Find an Alignment between A and B based on their 3D coordinates


Rms superposition 1
RMS Superposition (1) several DNA binding domains including the homeobox proteins which are the master regulators of development

B

A


Rms superposition 2 distance between an atom in 2 structures
RMS Superposition (2): several DNA binding domains including the homeobox proteins which are the master regulators of developmentDistance Betweenan Atom in 2 Structures


Rms superposition 3 rms distance between aligned atoms in 2 structures
RMS Superposition (3): several DNA binding domains including the homeobox proteins which are the master regulators of developmentRMS Distance BetweenAligned Atoms in 2 Structures


Rms superposition 4 rigid body rotation and translation of one structure b
RMS Superposition (4): several DNA binding domains including the homeobox proteins which are the master regulators of developmentRigid-Body Rotation and Translationof One Structure (B)


Rms superposition 5 optimal movement of one structure to minimize the rms
RMS Superposition (5): several DNA binding domains including the homeobox proteins which are the master regulators of developmentOptimal Movement of One Structure to Minimize the RMS

Methods of Solution:

springs(F ~ kx)

SVD

Kabsch


Alignment 1 make a similarity matrix like dot plot
Alignment (1) several DNA binding domains including the homeobox proteins which are the master regulators of developmentMake a Similarity Matrix(Like Dot Plot)


Structural alignment 1b make a similarity matrix generalized similarity matrix
Structural Alignment (1b) several DNA binding domains including the homeobox proteins which are the master regulators of developmentMake a Similarity Matrix(Generalized Similarity Matrix)

  • PAM(A,V) = 0.5

    • Applies at every position

  • S(aa @ i, aa @ J)

    • Specific Matrix for each pair of residues i in protein 1 and J in protein 2

    • Example is Y near N-term. matches any C-term. residue (Y at J=2)

  • S(i,J)

    • Doesn’t need to depend on a.a. identities at all!

    • Just need to make up a score for matching residue i in protein 1 with residue J in protein 2

i

J


Seq alignment struc alignment threading
Seq. Alignment, Struc. Alignment, Threading several DNA binding domains including the homeobox proteins which are the master regulators of development


Structural alignment 1c similarity matrix for structural alignment
Structural Alignment (1c*) several DNA binding domains including the homeobox proteins which are the master regulators of developmentSimilarity Matrixfor Structural Alignment

  • Structural Alignment

    • Similarity Matrix S(i,J) depends on the 3D coordinates of residues i and J

    • Distance between CA of i and J

    • M(i,j) = 100 / (5 + d2)

  • Threading

    • S(i,J) depends on the how well the amino acid at position i in protein 1 fits into the 3D structural environment at position J of protein 2


Alignment 2 dynamic programming start computing the sum matrix
Alignment (2): Dynamic Programming, several DNA binding domains including the homeobox proteins which are the master regulators of developmentStart Computing the Sum Matrix

new_value_cell(R,C) <=

cell(R,C) { Old value, either 1 or 0 }

+ Max[

cell (R+1, C+1), { Diagonally Down, no gaps }

cells(R+1, C+2 to C_max),{ Down a row, making col. gap }

cells(R+2 to R_max, C+2) { Down a col., making row gap }

]


Alignment 3 dynamic programming keep going
Alignment (3):Dynamic Programming, several DNA binding domains including the homeobox proteins which are the master regulators of developmentKeep Going


Alignment 4 dynamic programming sum matrix all done
Alignment (4): Dynamic Programming, several DNA binding domains including the homeobox proteins which are the master regulators of developmentSum Matrix All Done


Alignment 5 traceback
Alignment (5): Traceback several DNA binding domains including the homeobox proteins which are the master regulators of development

Find Best Score (8) and Trace BackA B C N Y - R Q C L C R - P MA Y C - Y N R - C K C R B P


In structural alignment not yet done step 6

ACSQRP--LRV-SH -R S several DNA binding domains including the homeobox proteins which are the master regulators of developmentENCVA-SNKPQLVKLMTH VK DFCV-

In Structural Alignment, Not Yet Done (Step 6*)

  • Use Alignment to LSQ Fit Structure B onto Structure A

    • However, movement of B will now change the Similarity Matrix

  • This Violates Fundamental Premise of Dynamic Programming

    • Way Residue at i is aligned can now affect previously optimal alignment of residues(from 1 to i-1)



Structural alignment 7 iterate until convergence
Structural Alignment (7*), structural alignmentIterate Until Convergence

1 Compute Sim. Matrix

2 Align via Dyn. Prog.

3 RMS Fit Based on Alignment

4 Move Structure B

5 Re-compute Sim. Matrix

6 If changed from #1, GOTO #2


Some similarities are readily apparent others are more subtle
Some Similarities are Readily Apparent others are more Subtle

Easy:Globins

125 res., ~1.5 Å

Tricky:Ig C & V

85 res., ~3 Å

Very Subtle: G3P-dehydro-genase, C-term. Domain >5 Å


Some similarities are readily apparent others are more subtle1
Some Similarities are Readily Apparent others are more Subtle

Easy:Globins

125 res., ~1.5 Å

Tricky:Ig C & V

85 res., ~3 Å

Very Subtle: G3P-dehydro-genase, C-term. Domain >5 Å


Modules
DALI: Protein Structure Comparison by Alignment of Distance MatricesL. Holm and C. Sander J. Mol. Biol. 233: 123 (1993)

  • Generate Ca-Ca distance matrix for each protein A and B

  • Decompose into elementary contact patterns; e.g. hexapeptide-hexapeptide submatrices

  • Systematic comparisons of all elementary contact patterns in the 2 distance matrices; similar contact patterns are stored in a “pair list”

  • Assemble pairs of contact patterns into larger consistent sets of pairs (alignments), maximizing the similarity score between these local structures

  • A Monte-Carlo algorithm is used to deal with the combinatorial complexity of building up alignments from contact patterns

  • Dali Z score - number of standard deviations away from mean pairwise similarity value


Structural validation of homology1
Structural Validation Matricesof Homology

19% Seq ID

Z = 12.2

Adenylate Kinase Guanylate Kinase


Dali domain dictionary deitman park notredame heger lappe and holm nucleic acids res 29 5557 2001
Dali Domain Dictionary MatricesDeitman, Park, Notredame, Heger, Lappe, and Holm Nucleic Acids Res. 29: 5557 (2001)

  • Dali Domain Dictionary is a numerical taxonomy of all known domain structures in the PDB

  • Evolves from Dali / FSSP Database

    Holm & Sander, Nucl. Acid Res. 25: 231-234 (1997)

  • Dali Domain Dictionary Sept 2000

    • 10,532 PDB enteries

    • 17,101 protein chains

    • 5 supersecondary structure motifs (attractors)

    • 1375 fold types

    • 2582 functional families

    • 3724 domain sequence families



Modules

Most proteins in biology have been Matrices

produced by the duplication,

divergence and recombination of

the members of a small number of

protein families.

courtesy of C. Chothia






Modules

Cadherins Matrices

courtesy of C. Chothia




A global representation of protein fold space hou sims zhang kim pnas 100 2386 2390 2003
A Global Representation of Protein Fold Space MatricesHou, Sims, Zhang, Kim, PNAS 100: 2386 - 2390 (2003)

Database of 498 SCOP “Folds” or “Superfamilies”

The overall pair-wise comparisons of 498 folds lead to a 498 x 498 matrix of similarity scores Sijs, where Sij is the alignment score between the ith and jth folds.

An appropriate method for handling such data matrices as a whole is metric matrix distance geometry . We first convert the similarity score matrix [Sij] to a distance matrix [Dij] by using Dij = Smax - Sij, where Smax is the maximum similarity score among all pairs of folds.

We then transform the distance matrix to a metric (or Gram) matrix [Mij] by using Mij = Dij2 - Dio2 - Djo2

where Di0, the distance between the ith fold and the geometric centroid of all N = 498 folds. The eigen values of the metric matrix define an orthogonal system of axes, called factors. These axes pass through the geometric centroid of the points representing all observed folds and correspond to a decreasing order of the amount of information each factor represents.


A global representation of protein fold space hou sims zhang kim pnas 100 2386 2390 20031
A Global Representation of Protein Fold Space MatricesHou, Sims, Zhang, Kim, PNAS 100: 2386 - 2390 (2003)