Prosite and ucsc genome browser exercise 3
Sponsored Links
This presentation is the property of its rightful owner.
1 / 48

Prosite and UCSC Genome Browser Exercise 3 PowerPoint PPT Presentation


  • 143 Views
  • Uploaded on
  • Presentation posted in: General

Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite . Turning information into knowledge. The outcome of a sequencing project is masses of raw data The challenge is to turn this raw data into biological knowledge

Download Presentation

Prosite and UCSC Genome Browser Exercise 3

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Prosite and UCSC Genome BrowserExercise 3


Protein motifs andProsite


Turning information into knowledge

  • The outcome of a sequencing project is masses of raw data

  • The challenge is to turn this raw data into biological knowledge

  • A valuable tool for this challenge is an automated diagnostic pipe through which newly determined sequences can be streamlined


From sequence to function

  • Nature tends to innovate rather than invent

  • Proteins are composed of functional elements: domains and motifs

    • Domains are structural units that carry out a certain function

    • The same domains are

      shared between different

      proteins

    • Motifs are shorter

      sequences with certain

      biological activity


What is a motif?

  • A sequence motif = a certain sequence that is widespread and conjectured to have biological significance

  • Examples:KDEL – ER-lumen retention signalPKKKRKV – an NLS (nuclear localization signal)


More loosely defined motifs

  • KDEL (usually)+

  • HDEL (rarely) =

  • [HK]-D-E-L:H or K at the first position

  • This is called a pattern (in Biology), or a regular expression (in computer science)


Syntax of a pattern

  • Example:W-x(9,11)-[FYV]-[FYW]-x(6,7)-[GSTNE]


WOPLASDFGYVWPPPLAWSROPLASDFGYVWPPPLAWSWOPLASDFGYVWPPPLSQQQ

Patterns

  • W-x(9,11)-[FYV]-[FYW]-x(6,7)-[GSTNE]

Any amino-acid, between 9-11 times

F or Y or V


Patterns - syntax

  • The standard IUPAC one-letter codes.

  • ‘x’ : any amino acid.

  • ‘[]’ : residues allowed at the position.

  • ‘{}’ : residues forbidden at the position.

  • ‘()’ : repetition of a pattern element are indicated in parenthesis. X(n) or X(n,m) to indicate the number or range of repetition.

  • ‘-’ : separates each pattern element.

  • ‘‹’ : indicated a N-terminal restriction of the pattern.

  • ‘›’ : indicated a C-terminal restriction of the pattern.

  • ‘.’ : the period ends the pattern.


Profile-pattern-consensus

consensus

multiple alignment

pattern

[AC]-A-[GC]-T-[TC]-[GC]

profile


http://www.expasy.ch/prosite/


Prosite

  • A method for determining the function of uncharacterized translated protein sequences

  • Database of annotated protein families and functional sites as well as associated patterns and profiles to identify them


Prosite

  • Entries are represented with patterns or profiles

profile

pattern

[AC]-A-[GC]-T-[TC]-[GC]

Profiles are used in Prosite when the motif is relatively divergent and it is difficult to represent as a pattern


Scanning Prosite

Query: pattern

Query: sequence

Result: all sequences which adhere to this pattern

Result: all patterns found in sequence


prosite sequence query


Prosite profile


Prosite profile  sequence logo


Sequence logo


WebLogo

http://weblogo.berkeley.edu/logo.cgi


Searching Prosite with a sequence


Patterns with a high probability of occurrence

  • Entries describing commonly found post-translational modifications or compositionally biased regions.

  • Found in the majority of known protein sequences

  • High probability of occurrence


Searching Prosite with a pattern


prosite pattern query


Searching Prosite with a Prosite AC


UCSC Genome Browser


UCSC Genome Browser


Reset all settings of previous user

UCSC Genome Browser - Gateway


UCSC Genome Browser - Gateway


UCSC Genome Browser - Gateway


UCSC Genome Browser query results


Vertebrate conservation

Single species compared

UCSC Genome Browser Annotation tracks

Base position

UCSC Genes

UTR

RefSeq

mRNA (GenBank)

Intron

CDS

Direction oftranscription (<)

SNPs

Repeats


USCS Gene


UCSC Genome Browser - movement

Zoom x3 + Center


UCSC Genome Browser – Base view


Annotation track options

dense

squish

pack

full


Another option totoggle between‘pack’ and ‘dense’view is to click onthe track title

Sickle-cell anemia distr.

Malariadistr.

Annotation track options


BLAT

  • BLAT = Blast-Like Alignment Tool

  • BLAT is designed to find similarity of >95% on DNA, >80% for protein

  • Rapid search by indexing entire genome.

    Good for:

  • Finding genomic coordinates of cDNA

  • Determining exons/introns

  • Finding human (or chimp, dog, cow…) homologs of another vertebrate sequence

  • Find upstream regulatory regions


BLAT on UCSC Genome Browser


BLAT on UCSC Genome Browser


BLAT Results


BLAT Results

Match

Non-Match(mismatch/indel)

Indel boundaries


BLAT Results


BLAT Results on the browser


Getting DNA sequence of region


Getting DNA sequence of region


  • Login