Using (and abusing) sequence analysis to
Download
1 / 43

Nothing in computational biology makessense except in the light of evolution - PowerPoint PPT Presentation


  • 292 Views
  • Uploaded on

Using (and abusing) sequence analysis to make biological discoveries . Nothing in ( computational ) biology makes sense except in the light of evolution. after Theodosius Dobzhansky (1970). Significant sequence similarity is evidence of homology.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Nothing in computational biology makessense except in the light of evolution' - salena


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Slide1 l.jpg

Using (and abusing) sequence analysis to

make biological discoveries

Nothing in (computational) biology makes

sense except in the light of evolution

after Theodosius Dobzhansky (1970)


Slide2 l.jpg

Significant sequence similarity is evidence of homology

Only a small fraction of amino acid residues is directly

involved in protein function (including enzymatic);

the rest of the protein serves largely as structural

scaffold

Conserved sequence motifs are determinants of

conserved ancestral functions


Slide3 l.jpg

Pre-sequencing era (before 1978-80)

Study biological function

Pre-genomic era (1980-1996)

Study biological function

Clone/sequence gene

Analyze/interpret sequence

Post-genomic era (1996-

Analyze/interpret sequences

of all genes

Sequence genome

Study biological function

Prioritize targets

The evolving roles of computational analysis in biology


Slide5 l.jpg

Sequence complexity

Measure of the randomness of a sequence

Random sequence - highest complexity (entropy) -

globular protein domains

Homopolymer - lowest complexity (entropy) -

non-globular structures

Algorithmic complexity

QQQQQQQQQQQQQ = (Q)n

KRKRKRKRKRKR = (KR)n

ASDFGHKLCVNM - random sequence - no algorithm to derive

from a simpler one


Slide6 l.jpg

seg BRCA1 45 3.4 3.7 > BRCA1.seg

>gi|728984|sp|P38398|BRC1_HUMAN Breast cancer type 1 susceptibility protein

1-388 MDLSALRVEEVQNVINAMQKILECPICLEL

IKEPVSTKCDHIFCKFCMLKLLNQKKGPSQ

CPLCKNDITKRSLQESTRFSQLVEELLKII

CAFQLDTGLEYANSYNFAKKENNSPEHLKD

EVSIIQSMGYRNRAKRLLQSEPENPSLQET

SLSVQLSNLGTVRTLRTKQRIQPQKTSVYI

ELGSDSSEDTVNKATYCSVGDQELLQITPQ

GTRDEISLDSAKKAACEFSETDVTNTEHHQ

PSNNDLNTTEKRAAERHPEKYQGSSVSNLH

VEPCGTNTHASSLQHENSSLLLTKDRMNVE

KAEFCNKSKQPGLARSQHNRWAGSKETCND

RRTPSTEKKVDLNADPLCERKEWNKQKLPC

SENPRDTEDVPWITLNSSIQKVNEWFSR

sdellgsddshdgesesnakvadvldvlne 389-458

vdeysgssekidllasdphealickservh

sksvesnied

459-526 KIFGKTYRKKASLPNLSHVTENLIIGAFVT

EPQIIQERPLTNKLKRKRRPTSGLHPEDFI

KKADLAVQ

ktpeminqgtnqteqngqvmnitnsghenk 527-635

tkgdsiqneknpnpieslekesafktkaep

isssisnmelelnihnskapkknrlrrkss

trhihalelvvsrnlsppn

636-995 CTELQIDSCSSSEEIKKKKYNQMPVRHSRN

LQLMEGKEPATGAKKSNKPNEQTSKRHDSD

TFPELKLTNAPGSFTKCSNTSELKEFVNPS

LPREEKEEKLETVKVSNNAEDPKDLMLSGE

RVLQTERSVESSSISLVPGTDYGTQESISL

LEVSTLGKAKTEPNKCVSQCAAFENPKGLI

HGCSKDNRNDTEGFKYPLGHEVNHSRETSI

EMEESELDAQYLQNTFKVSKRQSFAPFSNP

GNAEEECATFSAHSGSLKKQSPKVTFECEQ

KEENQGKNESNIKPVQTVNITAGFPVVGQK

DKPVDNAKCSIKGGSRFCLSSQFRGNETGL

ITPNKHGLLQNPYRIPPLFPIKSFVKTKCK

knlleenfeehsmsperemgnenipstvst 996-1089

isrnnirenvfkeasssninevgsstnevg

ssineigssdeniqaelgrnrgpklnamlr

lgvl

1090-1238 QPEVYKQSLPGSNCKHPEIKKQEYEEVVQT

VNTDFSPYLISDNLEQPMGSSHASQVCSET

PDDLLDDGEIKEDTSFAENDIKESSAVFSK

SVQKGELSRSPSPFTHTHLAQGYRRGAKKL

ESSEENLSSEDEELPCFQHLLFGKVNNIP

sqstrhstvateclsknteenllslknsln 1239-1312

dcsnqvilakasqehhlseetkcsaslfss

qcseledltantnt

1313-1316 QDPF

Non-globular regions

Globular domains


Slide7 l.jpg

1422-1513 GSQPSNSYPSIISDSSALEDLRNPEQSTSE

KAVLTSQKSSEYPISQNPEGLSADKFEVSA

DSSTSKNKEPGVERSSPSKCPSLDDRWYMH

SC

sgslqnrnypsqeelikvvdveeqqleesg 1514-1616

phdltetsylprqdlegtpylesgislfsd

dpesdpsedrapesarvgnipsstsalkvp

qlkvaesaqspaa

1617-1863 AHTTDTAGYNAMEESVSREKPELTASTERV

NKRMSMVVSGLTPEEFMLVYKFARKHHITL

TNLITEETTHVVMKTDAEFVCERTLKYFLG

IAGGKWVVSYFWVTQSIKERKMLNEHDFEV

RGDVVNGRNHQGPKRARESQDRKIFRGLEI

CCYGPFTNMPTDQLEWMVQLCGASVVKELS

SFTLGTGVHPIVVVQPDAWTEDNGFHAIGQ

MCEAPVVTREWVLDSVALYQCQELDTYLIP

QIPHSHY


Slide12 l.jpg

1422-1513 GSQPSNSYPSIISDSSALEDLRNPEQSTSE

KAVLTSQKSSEYPISQNPEGLSADKFEVSA

DSSTSKNKEPGVERSSPSKCPSLDDRWYMH

SC

sgslqnrnypsqeelikvvdveeqqleesg 1514-1616

phdltetsylprqdlegtpylesgislfsd

dpesdpsedrapesarvgnipsstsalkvp

qlkvaesaqspaa

1617-1863 AHTTDTAGYNAMEESVSREKPELTASTERV

NKRMSMVVSGLTPEEFMLVYKFARKHHITL

TNLITEETTHVVMKTDAEFVCERTLKYFLG

IAGGKWVVSYFWVTQSIKERKMLNEHDFEV

RGDVVNGRNHQGPKRARESQDRKIFRGLEI

CCYGPFTNMPTDQLEWMVQLCGASVVKELS

SFTLGTGVHPIVVVQPDAWTEDNGFHAIGQ

MCEAPVVTREWVLDSVALYQCQELDTYLIP

QIPHSHY


Slide16 l.jpg

Paradigm shift in database searching

Traditional

PSI-BLAST

Set of homologs

Query

sequence

Sequence database

PSSM

RPS-BLAST

New

Query

sequence

Domain

architecture

PSSM database


Slide25 l.jpg

DOMAIN ARCHITECTURE OF SELECTED BRCT PROTEINS

BRCT

RING

BRCA1

BARD1

PHD-l

BRCA1/BARD

homolog plant

CMP-trans

REV1 yeast

DPB11 yeast

AZF

PARP

vertebrates

PARP

DNA ligase III

ATP-dep ligase

human

HhH

TdT eukaryotes

polX

RFC1

eukaryotes

ATP and PCNA-binding

DNA ligase

bacteria

NAD-dep ligase


Slide41 l.jpg

Use of profile libraries to examine domain representation

in individual proteomes

yeast

6,200

Detect domains

using

PSI-BLAST,

IMPALA

Compare domain

distributions

Profile library

worm

~20,000

Chervitz SA, Aravind L, Sherlock G, Ball CA, Koonin EV, Dwight SS, Harris MA, Dolinski K, Mohr S, Smith

T, Weng S, Cherry JM, Botstein D. 1998. Comparison of the complete protein sets of worm and yeast:

orthology and divergence. Science 282: 2022-8


Slide42 l.jpg

Normalized domain counts in worm and yeast

1.Hormone receptor; 2.POZ; 3.EGF; 4.MATH; 5.PTPase; 6.Cation Channels; 7.PDZ;

8.SH2; 9.FNIII; 10.Homeodomain; 11.LRR; 12.EF hands; 13.Ankyrin; 14.RING finger;

15.C2H2 finger; 16.small GTPase; 17.RRM; 18.AAA+; 19.C6 finger


Slide43 l.jpg

  • Searching a domain library is often easier and more informative

  • than searching the entire sequence database. However, the latter

  • yields complementary information and should not be skipped

  • if details are of interest.

  • Varying the search parameters, e.g. switching composition-based statistics

  • on and off, can make a difference.

  • Using subsequences, preferably chosen according to objective criteria,

  • e.g. separation from the rest of the protein by a low-complexity linker,

  • may improve search performance.

  • Trying different queries is a must when analyzing protein (super)families.

  • Even hits below the threshold of statistical significance often are worth

  • analyzing, albeit with extreme care. Transferring functional information

  • between homologs on the basis of a database description alone is dangerous.

  • Conservation of domain architectures, active sites and other features

  • needs to be analyzed (hence automated identification of protein families is

  • difficult and automated prediction of functions is extremely error-prone).

  • Always do a reality check!


ad