Guidelines for sequence reports
1 / 12

Guidelines for sequence reports - PowerPoint PPT Presentation

  • Uploaded on

Guidelines for sequence reports. Outline. Summary Results & Discussion Sequence identification Function assignment Fold assignment Identification of functional residues Methods Web tools: list which ones you used References E.g. functional characterization

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Guidelines for sequence reports' - aderyn

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript


  • Summary

  • Results & Discussion

    • Sequence identification

    • Function assignment

    • Fold assignment

    • Identification of functional residues

  • Methods

    • Web tools: list which ones you used

  • References

    • E.g. functional characterization

  • Maximum length: ten pages

Sequence identification
Sequence identification

  • What is the source of the query sequence?

  • Database search tools

    • Blast, PSI-Blast, PFAM-search

    • Superfamily, GTG, PFAM-squared

    • Dali (structures only)

  • All tools give a list of similar sequences.

  •  Nearest neighbour indicates species or taxon.

  •  Some sequences are classified in families.

Function assignment
Function assignment

  • Many proteins are hypothetical; look further down the list for informative functional annotations

  • If sequence neighbour list shows many different functions at similar distances, build a sequence-tree to see if query sequence groups with one particular function.

Fold assignment
Fold assignment

  • Structures are conserved within families. E.g. PFAM family identification allows you to transfer fold (if no direct hit to PDB).

  • Sometimes the link may be at clan level (still homologous, conserved fold).

  • Not all homologous relationships are classified as such in databases. Evidence for remote homology: common fold, common conserved residues, similarity of function

Identification of functional residues
Identification of functional residues

  • Multiple sequence alignment (MSA) of sufficiently diverse sequences highlights functional residues. Use structural model to identify sites.

  • It is often difficult to make a good MSA between distant sequences.

    • Structural alignments show sharp signatures & functional sites

    • Compare well-aligned set and its secondary structure prediction to structural alignment. If you find conserved SSEs and conserved residues in proper succession, it strengthens the hypothesis of homology.

Example sequence 9a
Example: sequence_9A

  • Nearest neighbours

    • GTG server’s Blast search (old database):

      • 601959|AAG04994|AAG04994 HYPOTHETICAL 39.5 KDA PROTEIN at 62 % identity, alignment score 441 bits, evalue e-123

      • 1270496|AAN69757|AAN69757 5-oxo-L-prolinase, putative at 61 % identity, alignment score 429 bits, evalue e-119

    • NCBI’s Blast gives a closer match at 66 % identity to a protein from Pseudomonas mendocina ymp.

    • Conclusion: no perfect match, bacterial sequence, related to Pseudomonas (the query sequence was actually taken from the global ocean sampling survey).

Family membership
Family membership

  • GTG matched PFAM families:

    • PF04909 (best score >22000)

    • PF02126, PF01026, PF07969 (scores in the range 500-1000)

    • PF01979, PF0962 (scores below 500)

  • PF04909 is the Amidohydro_2 family which belongs to the Amidohydrolase (CL0034) clan. The other families found above are also members of this clan. Two PFAM families which are members of the clan but were not found by GTG are PF01244 and PF02811.

    • The clan was first described by Holm & Sander (1997).

Function assignment1
Function assignment

  • The neighbour list by NCBI’s Blast has many sequences annotated amidohydrolase_2 (very general description) and some annotated as 5-oxo-prolinase.

  •  phylogenetic tree will tell if the query sequence groups with 5-oxo-prolinases or some other function(s) of amidohydrolases.

Fold assignment1
Fold assignment

  • There are known structures in PF04909 (e.g. 2ffi PUTATIVE 2-PYRONE-4,6-DICARBOXYLIC ACID HYDROLASE)  homology modeling is possible.

  • Quality of model

    • Partial alignment (bad!)

    • Manual alignment is difficult  check conservation of SSEs and conserved residues versus superfamily

  • Conservation mapping: many structures!

Checkpoint day 13
Checkpoint (Day 13)

  • Filling the following fields in Excel sheet:

    • Sequence id (e.g. Sequence_1A)

    • Protein identification

      • (best match in protein database, description line)

    • Protein family

      • (e.g. PFAM family name)

    • Superfamily

      • (e.g. PFAM clan name)

    • PDB template found in family / superfamily / not found

    • Function assignment strategy

      • (e.g. analysing MSA, or phylogenomic approach)

    • 3D modelling strategy

      • (e.g. Swissmodel, manual MSA refinement, or threading)

Returning the reports
Returning the reports

  • Reports must be printed on paper

  • Send/deliver to

    • L. Holm, P.O. Box 56 (Viikinkaari 5)

      • Pigeonhole on floor 4 in Biocenter 2 (wing D)

    • Deadline: Monday 14 December, 2009