prosite and ucsc genome browser exercise 3
Download
Skip this Video
Download Presentation
Prosite and UCSC Genome Browser Exercise 3

Loading in 2 Seconds...

play fullscreen
1 / 48

Prosite and UCSC Genome Browser Exercise 3 - PowerPoint PPT Presentation


  • 171 Views
  • Uploaded on

Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite . Turning information into knowledge. The outcome of a sequencing project is masses of raw data The challenge is to turn this raw data into biological knowledge

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Prosite and UCSC Genome Browser Exercise 3' - benjamin


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
turning information into knowledge
Turning information into knowledge
  • The outcome of a sequencing project is masses of raw data
  • The challenge is to turn this raw data into biological knowledge
  • A valuable tool for this challenge is an automated diagnostic pipe through which newly determined sequences can be streamlined
from sequence to function
From sequence to function
  • Nature tends to innovate rather than invent
  • Proteins are composed of functional elements: domains and motifs
    • Domains are structural units that carry out a certain function
    • The same domains are

shared between different

proteins

    • Motifs are shorter

sequences with certain

biological activity

what is a motif
What is a motif?
  • A sequence motif = a certain sequence that is widespread and conjectured to have biological significance
  • Examples:KDEL – ER-lumen retention signalPKKKRKV – an NLS (nuclear localization signal)
more loosely defined motifs
More loosely defined motifs
  • KDEL (usually)+
  • HDEL (rarely) =
  • [HK]-D-E-L:H or K at the first position
  • This is called a pattern (in Biology), or a regular expression (in computer science)
syntax of a pattern
Syntax of a pattern
  • Example:W-x(9,11)-[FYV]-[FYW]-x(6,7)-[GSTNE]
patterns

WOPLASDFGYVWPPPLAWSROPLASDFGYVWPPPLAWSWOPLASDFGYVWPPPLSQQQ

Patterns
  • W-x(9,11)-[FYV]-[FYW]-x(6,7)-[GSTNE]

Any amino-acid, between 9-11 times

F or Y or V

patterns syntax
Patterns - syntax
  • The standard IUPAC one-letter codes.
  • ‘x’ : any amino acid.
  • ‘[]’ : residues allowed at the position.
  • ‘{}’ : residues forbidden at the position.
  • ‘()’ : repetition of a pattern element are indicated in parenthesis. X(n) or X(n,m) to indicate the number or range of repetition.
  • ‘-’ : separates each pattern element.
  • ‘‹’ : indicated a N-terminal restriction of the pattern.
  • ‘›’ : indicated a C-terminal restriction of the pattern.
  • ‘.’ : the period ends the pattern.
profile pattern consensus
Profile-pattern-consensus

consensus

multiple alignment

pattern

[AC]-A-[GC]-T-[TC]-[GC]

profile

prosite
Prosite
  • A method for determining the function of uncharacterized translated protein sequences
  • Database of annotated protein families and functional sites as well as associated patterns and profiles to identify them
prosite1
Prosite
  • Entries are represented with patterns or profiles

profile

pattern

[AC]-A-[GC]-T-[TC]-[GC]

Profiles are used in Prosite when the motif is relatively divergent and it is difficult to represent as a pattern

scanning prosite
Scanning Prosite

Query: pattern

Query: sequence

Result: all sequences which adhere to this pattern

Result: all patterns found in sequence

weblogo
WebLogo

http://weblogo.berkeley.edu/logo.cgi

patterns with a high probability of occurrence
Patterns with a high probability of occurrence
  • Entries describing commonly found post-translational modifications or compositionally biased regions.
  • Found in the majority of known protein sequences
  • High probability of occurrence
slide34

Vertebrate conservation

Single species compared

UCSC Genome Browser Annotation tracks

Base position

UCSC Genes

UTR

RefSeq

mRNA (GenBank)

Intron

CDS

Direction oftranscription (<)

SNPs

Repeats

slide38

Annotation track options

dense

squish

pack

full

slide39

Another option totoggle between‘pack’ and ‘dense’view is to click onthe track title

Sickle-cell anemia distr.

Malariadistr.

Annotation track options

slide40
BLAT
  • BLAT = Blast-Like Alignment Tool
  • BLAT is designed to find similarity of >95% on DNA, >80% for protein
  • Rapid search by indexing entire genome.

Good for:

  • Finding genomic coordinates of cDNA
  • Determining exons/introns
  • Finding human (or chimp, dog, cow…) homologs of another vertebrate sequence
  • Find upstream regulatory regions
blat results1
BLAT Results

Match

Non-Match(mismatch/indel)

Indel boundaries

ad