a field guide part 2 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
A Field Guide part 2 PowerPoint Presentation
Download Presentation
A Field Guide part 2

Loading in 2 Seconds...

play fullscreen
1 / 98

A Field Guide part 2 - PowerPoint PPT Presentation


  • 107 Views
  • Uploaded on

A Field Guide part 2. University of Colorado Health Sciences Center. August 30, 2005. Part 2. Entrez: text searching a GenBank record preview/index. BLAST: sequence searching pre-computed searches algorithms what’s new?. VAST: structure searching.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'A Field Guide part 2' - huy


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
a field guide part 2

A Field Guidepart 2

University of Colorado Health Sciences Center

August 30, 2005

part 2
Part 2

Entrez: text searching

  • a GenBank record
  • preview/index

BLAST: sequence searching

  • pre-computed searches
  • algorithms
  • what’s new?

VAST: structure searching

  • Example: mapping oligos to a genome
genbank records

Header

Feature Table

Sequence

GenBank Records

The Flatfile Format

a typical genbank record

LOCUS NM_019570 4279 bp mRNA linear INV 28-OCT-2004

DEFINITION Mus musculus REV1-like(S. cerevisiae)(Rev1l),mRNA

ACCESSION NM_019570

VERSION NM_019570.3 GI:50811869

KEYWORDS .

= Title

A Typical GenBank Record
indexing for nucleotide uid 59958365

[accn]

[orgn]

[mdat]

[prop]

Indexing for Nucleotide UID 59958365

FieldIndexed Terms

[primary accession] NM_001012399

[title] Bos taurus hemochromatosis (hfe), mRNA.

[organism] Bos taurus

[sequence length] 1168

[modification date] 2005/02/19

[properties] biomol mrna

gbdiv mam

srcdb refseq

smarter query

42 records

Curated HFE

splice variants

(11 total)

Smarter Query

hfe[title]

AND human[orgn]

preview index properties srcdb1
Preview/Index: Properties, srcdb

…AND srcdb refseq[Properties]

preview index properties srcdb2
Preview/Index: Properties, srcdb

…AND srcdb ddbj/embl/genbank[Properties]

database queries

Primate division gbdiv pri[prop]

EST division gbdiv est[prop]

Database Queries

#1hfe 137

#2 hfe[title]AND human[orgn] 42

#3 #2 AND srcdb refseq[prop] 11

#4 #2 AND srcdb ddbj/embl/genbank[prop] 31

#5 #4 AND gbdiv pri[prop] 29

#4 #4 AND gbdiv est[prop] 2

molecule queries

Genomic DNA biomol genomic[prop]

cDNA biomol mrna[prop]

Molecule Queries

#1hfe 116

#2 hfe[title]AND human[orgn] 42

#3 #2 AND biomol mrna[prop] 29

#4 #2 AND biomol genomic[prop] 13

more queries

Entrez Nucleotide

Reviewed RefSeqs with transcript variants:

srcdb refseq reviewed[prop]ANDtranscript[title] AND variant[title]

Entrez Gene

Topoisomerase genes from Archaea:

topoisomerase[gene name]ANDarchaea[organism]

Genes on human chromosome 2 with OMIM links

2[chromosome] ANDhuman[organism]AND“gene omim”[filter]

Membrane proteins linked to cancer:

“integral to plasma membrane”[gene ontology]ANDcancer[dis]

More Queries…

Fields are database-specific

other entrez databases
Other Entrez Databases

UniGene:rat clusters that have at least one mRNA

rat[organism] NOT0[mrna count]

SNP:uniquely mapped microsatellites on human chr2

microsat[SNP Class] AND 1[Map Weight] AND 2[Chromosome]) AND human[orgn]

UniSTS:markers on the Genethon map of human chromosome 12

Genethon[Map Name] ANDhuman[organism] AND12[chromosome]

Structure:structures of bacterial kinases with resolutions below 2 Å

bacteria[organism]ANDkinaseAND000.00:002.00[resolution]

precomputed blast services

Nucleotide or protein:Related Sequences

  • BLAST link:BLink

Precomputed BLAST Services

  • Transcript clusters:UniGene
  • Protein homologs: HomoloGene
related sequences
Related Sequences

Most similar

Least similar

blink output

Best hits

3D structures

CDD-Search

BLink Output
global vs local alignment

Seq 1

Seq 1

Seq 2

Seq 2

Global alignment

Local alignment

Global vs Local Alignment
global vs local alignment1

Global

Seq1: 1 W--HEREISWALTERNOW 16

W HERE

Seq2: 1 HEWASHEREBUTNOWISHERE 21

Local

Seq1: 1 W--HERE 5 Seq1: 1 W--HERE 5

W HERE W HERE

Seq2: 3 WASHERE 9 Seq2: 15 WISHERE 21

Global vs Local Alignment

Seq1: WHEREISWALTERNOW (16aa)

Seq2: HEWASHEREBUTNOWISHERE (21aa)

the flavors of blast
The Flavors of BLAST
  • Standard BLAST
    • nucleotide, protein and translations (blastn, blastp, blastx, tblastn, tblastx)
    • traditional “contiguous” word hit
  • Megablast
    • optimized for large batch searches
    • can use discontiguous words
  • PSI-BLAST
    • constructs PSSMs automatically; uses as query
    • very sensitive protein search
  • RPS BLAST
    • searches a database of PSSMs
    • tool for conserved domain searches

“contiguous”

discontiguous

why is blast so popular
Why Is BLAST So Popular?
  • Fast

- heuristic approach based on Smith Waterman

  • Localalignments
  • Statisticalsignificance

-Expect value

  • Versatile

- blastn, blastp, blastx, tblastn, tblastx, rps-blast, psi-blast

-www, standalone, and network clients

how blast works
How BLAST Works
  • Make lookup table of “words” for query
  • Scan database for hits
  • Ungapped extensions of hits (initial HSPs)
  • Gapped extensions (no traceback)
  • Gapped extensions (traceback; alignment details)
nucleotide words

GTACTGGACATGGACCCTACAGGAA

Query:

11-mer

Nucleotide Words

GTACTGGACAT

TACTGGACATG

ACTGGACATGG

CTGGACATGGA

TGGACATGGAC

GGACATGGACC

GACATGGACCC

ACATGGACCCT

Make a lookup

table of words

. . .

protein words

GTQITVEDLFYNIATRRKALKN

Query:

Word size = 3 (default)

Word size can only be 2 or 3

Neighborhood Words

LTV, MTV, ISV, LSV, etc.

Protein Words

GTQ

TQI

QIT

ITV

TVE

VED

EDL

DLF

...

Make a lookup

table of words

[ -f 11 = blastp default ]

minimum requirements for a hit
Minimum Requirements for a Hit

ATCGCCATGCTTAATTGGGCTT

CATGCTTAATT

one exact match

  • Nucleotide BLAST requires one exact match
  • Protein BLAST requires two neighboring matches within 40 aa

GTQITVEDLFYNI

SEI YYN

neighborhood words

[ -A 40 = blastp default ]

two matches

blastp summary

example query words

Query: IETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESILEV…

HFL 18

HFV 15

HFS 14

HWL 13

NFL 13

DFL 12

HWV 10

etc …

YLS 15

YLT 12

YVS 12

YIT 10

etc …

Neighborhood words

Neighborhood score threshold

T (-f) =11

Query 1 IETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESI 47

YLSHFL

Sbjct 287 LEETYAKYLHKGASYFVYLSLNMSPEQLDVNVHPSKRIVHFLYDQEI 333

+E YA YL K F+L +SP+ +DVNVHP+K V+++ I

Gapped extension with trace back

Query 1 IETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESI-LEV… 50

+E YA YL K F+YLSL +SP+ +DVNVHP+K VHFL+++ I + +

Sbjct 287 LEETYAKYLHKGASYFVYLSLNMSPEQLDVNVHPSKRIVHFLYDQEIATSI… 337

Final HSP

BLASTP Summary

High-scoring pair (HSP)

scoring systems nucleotides
Scoring Systems - Nucleotides

Identity matrix

A G C T

A +1 –3 –3 -3

G –3 +1 –3 -3

C –3 –3 +1 -3

T –3 –3 –3 +1

[ -r 1 -q -3 ]

CAGGTAGCAAGCTTGCATGTCA

|| |||||||||||| ||||| raw score = 19-9 = 10

CACGTAGCAAGCTTG-GTGTCA

scoring systems proteins
Scoring Systems - Proteins
  • Position Independent Matrices
    • PAM Matrices (Percent Accepted Mutation)
      • Derived from observation; small dataset of alignments
      • Implicit model of evolution
      • All calculated from PAM1
      • PAM250 widely used
    • BLOSUM Matrices (BLOck SUbstitution Matrices)
      • Derived from observation; large dataset of highly conserved blocks
      • Each matrix derived separately from blocks with a defined percent identity cutoff
      • BLOSUM62 - default matrix for BLAST
  • Position Specific Score Matrices (PSSMs)
  • PSI- and RPS-BLAST
blosum62

F

Negative for less likely substitutions

Y

Positive for more likely substitutions

D

D

F

BLOSUM62

A 4

R -1 5

N -2 0 6

D -2 -2 1 6

C 0 -3 -3 -3 9

Q -1 1 0 0 -3 5

E -1 0 0 2 -4 2 5

G 0 -2 0 -1 -3 -2 -2 6

H -2 0 1 -1 -3 0 0 -2 8

I -1 -3 -3 -3 -1 -3 -3 -4 -3 4

L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4

K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5

M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5

F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6

P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7

S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4

T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5

W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11

Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7

V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4

X 0 -1 -1 -1 -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -2 0 0 -2 -1 -1 -1

A R N D C Q E G H I L K M F P S T W Y V X

position specific score matrix

PSSM scores

1

5

7

4

4

Position-Specific Score Matrix

Serine/Threonine protein kinases

catalytic loop

DAF-1

position specific score matrix1
Position-Specific Score Matrix

A R N D C Q E G H I L K M F P S T W Y V

435 K -1 0 0 -1 -2 3 0 3 0 -2 -2 1 -1 -1 -1 -1 -1 -1 -1 -2

436 E 0 1 0 2 -1 0 2 -1 0 -1 -1 0 0 0 -1 0 0 -1 -1 -1

437 S 0 0 -1 0 1 1 0 1 1 0 -1 0 0 0 2 0 -1 -1 0 -1

438 N -1 0 -1 -1 1 0 -1 3 3 -1 -1 1 -1 0 0 -1 -1 1 1 -1

439 K -2 1 1 -1 -2 0 -1 -2 -2 -1 -2 5 1 -2 -2 -1 -1 -2 -2 -1

440 P -2 -2 -2 -2 -3 -2 -2 -2 -2 -1 -2 -1 0 -3 7 -1 -2 -3 -1 -1

441 A 3 -2 1 -2 0 -1 0 1 -2 -2 -2 0 -1 -2 3 1 0 -3 -3 0

442 M -3 -4 -4 -4 -3 -4 -4 -5 -4 7 0 -4 1 0 -4 -4 -2 -4 -1 2

443 A 4 -4 -4 -4 0 -4 -4 -3 -4 4 -1 -4 -2 -3 -4 -1 -2 -4 -3 4

444 H -4 -2 -1 -3 -5 -2 -2 -4 10 -6 -5 -3 -4 -3 -2 -3 -4 -5 0 -5

445 R -4 8 -3 -4 0 -1 -2 -3 -2 -5 -4 0 -3 -2 -4 -3 -3 0 -4 -5

446 D -4 -4 -1 8 -6 -2 0 -3 -3 -5 -6 -3 -5 -6 -4 -2 -3 -7 -5 -5

447 I -4 -5 -6 -6 -3 -4 -5 -6 -5 3 5 -5 1 1 -5 -5 -3 -4 -3 1

448 K 0 0 1 -3 -5 -1 -1 -3 -3 -5 -5 7 -4 -5 -3 -1 -2 -5 -4 -4

449 S 0 -3 -2 -3 0 -2 -2 -3 -3 -4 -4 -2 -4 -5 2 6 2 -5 -4 -4

450 K 0 3 0 1 -5 0 0 -4 -1 -4 -3 4 -3 -2 2 1 -1 -5 -4 -4

451 N -4 -3 8 -1 -5 -2 -2 -3 -1 -6 -6 -2 -4 -5 -4 -1 -2 -6 -4 -5

452 I -3 -5 -5 -6 0 -5 -5 -6 -5 6 2 -5 2 -2 -5 -4 -3 -5 -3 3

453 M -4 -4 -6 -6 -3 -4 -5 -6 -5 0 6 -5 1 0 -5 -4 -3 -4 -3 0

454 V -3 -3 -5 -6 -3 -4 -5 -6 -5 3 3 -4 2 -2 -5 -4 -3 -5 -3 5

455 K -2 1 1 4 -5 0 -1 -2 1 -4 -2 4 -3 -2 -3 0 -1 -5 -2 -3

456 N 1 1 3 0 -4 -1 1 0 -3 -4 -4 3 -2 -5 -2 2 -2 -5 -4 -4

457 D -3 -2 5 5 -1 -1 1 -1 0 -5 -4 0 -2 -5 -1 0 -2 -6 -4 -5

458 L -3 -1 0 -3 0 -3 -2 3 -4 -2 3 0 1 1 -2 -2 -3 5 -1 -3

catalytic loop

local alignment statistics

E = Kmne-S or E = mn2-S’

K = scale for search space

 = scale for scoring system

S’ = bitscore = (S - lnK)/ln2

(applies to ungapped alignments)

Local Alignment Statistics

High scores of local alignments between two random sequences

follow the Extreme Value Distribution

Expect Value

E = number of database hits you expect to find by chance, ≥ S

your score

Alignments

expected number of random hits

Score (S)

More info:www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html

gapped alignments
Gapped Alignments
  • Gapping provides more biologically realistic alignments
  • Gapped BLAST parameters are simulated for each scoring matrix
  • Affine gap costs = -(a+bk)
    • a = gap open penalty b = gap extend penalty
    • A gap of length 1 receives the score -(a+b)
an alignment blast cannot make
An Alignment BLAST Cannot Make

1 GAATATATGAAGACCAAGATTGCAGTCCTGCTGGCCTGAACCACGCTATTCTTGCTGTTG

|| | || || || | || || || || | ||| |||||| | | || | ||| |

1 GAGTGTACGATGAGCCCGAGTGTAGCAGTGAAGATCTGGACCACGGTGTACTCGTTGTCG

61 GTTACGGAACCGAGAATGGTAAAGACTACTGGATCATTAAGAACTCCTGGGGAGCCAGTT

| || || || ||| || | |||||| || | |||||| ||||| | |

61 GCTATGGTGTTAAGGGTGGGAAGAAGTACTGGCTCGTCAAGAACAGCTGGGCTGAATCCT

121 GGGGTGAACAAGGTTATTTCAGGCTTGCTCGTGGTAAAAAC

|||| || ||||| || || | | |||| || |||

121 GGGGAGACCAAGGCTACATCCTTATGTCCCGTGACAACAAC

Reason:

no contiguous exact match of 7 bp.

an alignment blast can make

Score = 290 bits (741), Expect = 7e-77Identities = 147/331 (44%), Positives = 206/331 (61%), Gaps = 8/331 (2%)Frame = +3

BLAST 2 Sequences (blastx) output:

An Alignment BLAST Can Make

Solution: compare protein sequences; BLASTX

other blast algorithms
Other BLAST Algorithms
  • Megablast
  • Discontiguous Megablast
  • PSI-BLAST
  • PHI-BLAST
megablast ncbi s genome annotator
Megablast: NCBI’s Genome Annotator
  • Long alignments of similar DNA sequences
  • Greedy algorithm
  • Concatenation of query sequences
  • Faster than blastn; less sensitive
megablast word size

Too fast for

you?

MegaBLAST & Word Size

Trade-off: sensitivity vs speed

megablast word size1

blastp

3

2

WORD SIZE

default

minimum

blastn

11

7

megablast

28

8

MegaBLAST & Word Size

Trade-off: sensitivity vs speed

discontiguous megablast
Discontiguous Megablast
  • Uses discontiguous word matches
  • Better for cross-species comparisons
templates for discontiguous words
Templates for Discontiguous Words

W = 11, t = 16, coding: 1101101101101101

W = 11, t = 16, non-coding: 1110010110110111

W = 12, t = 16, coding: 1111101101101101

W = 12, t = 16, non-coding: 1110110110110111

W = 11, t = 18, coding: 101101100101101101

W = 11, t = 18, non-coding: 111010010110010111

W = 12, t = 18, coding: 101101101101101101

W = 12, t = 18, non-coding: 111010110010110111

W = 11, t = 21, coding: 100101100101100101101

W = 11, t = 21, non-coding: 111010010100010010111

W = 12, t = 21, coding: 100101101101100101101

W = 12, t = 21, non-coding: 111010010110010010111

W = word size; # matches in template

t = template length

Reference: Ma, B, Tromp, J, Li, M. PatternHunter: faster and more sensitive homology search. Bioinformatics March, 2002; 18(3):440-5

megablast vs discontiguous megablast
MegaBLAST vs Discontiguous MegaBLAST

Homo sapiens cytochrome P450, family 3, subfamily A, polypeptide 4 (CYP3A4), transcript variant 1, mRNA (2768 letters)

NM_017460

vs Drosophila

megablast vs discontiguous megablast1
MegaBLAST vs Discontiguous MegaBLAST
  • MegaBLAST = “No significant similarity found.”
  • Discontiguous megaBLAST =
another example
Another Example . . .

Query: NM_078651

Drosophila melanogaster CG18582-PA (mbt) mRNA, (3244 bp)

/note= mushroom bodies tiny; synonyms: Pak2, STE20, dPAK2

Database: nr (nt), Mammalia[orgn]

  • MegaBLAST = “No significant similarity found.”
  • Discontiguous megaBLAST = numerous hits . . .
psi blast

PSI-BLAST

Position-specific Iterated BLAST

Example: Confirming relationships of purine

nucleotide metabolism proteins

psi blast1

>gi|113340|sp|P03958|ADA_MOUSE ADENOSINE DEAMINASE (ADENOSINE

MAQTPAFNKPKVELHVHLDGAIKPETILYFGKKRGIALPADTVEELRNIIGMDKPLSLPGF

VIAGCREAIKRIAYEFVEMKAKEGVVYVEVRYSPHLLANSKVDPMPWNQTEGDVTPDDVVD

EQAFGIKVRSILCCMRHQPSWSLEVLELCKKYNQKTVVAMDLAGDETIEGSSLFPGHVEAY

RTVHAGEVGSPEVVREAVDILKTERVGHGYHTIEDEALYNRLLKENMHFEVCPWSSYLTGA

VRFKNDKANYSLNTDDPLIFKSTLDTDYQMTKKDMGFTEEEFKRLNINAAKSSFLPEEEKK

0.005

PSI-BLAST

E value cutoff for PSSM

results initial blastp
RESULTS: Initial BLASTP

Same results as protein-protein BLAST; different format

results of first pssm search
Results of First PSSM Search

Other purine nucleotide metabolizing enzymes not found by ordinary BLAST

tenth pssm search convergence

Just below threshold, another

nucleotide metabolism enzyme

Check to add to PSSM

Tenth PSSM Search: Convergence
phi blast

>gi|231729|sp|P30429|CED4_CAEEL CELL DEATH PROTEIN 4

MLCEIECRALSTAHTRLIHDFEPRDALTYLEGKNIFTEDHSELISKMSTRLERIANFLRIYRRQASE

LIDFFNYNNQSHLADFLEDYIDFAINEPDLLRPVVIAPQFSRQMLDRKLLLGNVPKQMTCYIREYHV

IKKLDEMCDLDSFFLFLHGRAGSGKSVIASQALSKSDQLIGINYDSIVWLKDSGTAPKSTFDLFTDI

LKSEDDLLNFPSVEHVTSVVLKRMICNALIDRPNTLFVFDDVVQEETIRWAQELRLRCLVTTRDVEI

ASQTCEFIEVTSLEIDECYDFLEAYGMPMPVGEKEEDVLNKTIELSSGNPATLMMFFKSCEPKTFEK

[GA]xxxxGK[ST]

PHI-BLAST
example search pathways hemochromatosis

Genome

BLAST

sequence search

Map Viewer

OMIM

Gene

SNP

text search

Protein

Domains

Example Search Pathways: Hemochromatosis

Gene

nucleotide sequence

“hemochromatosis”

HFE

example human genome blast

Human EST

TGCCTCCTTTGGTGAAGGTGACACATCATGTGACCTCTTCAGTGACCACTCTACGGTGTCGGGCCTTGAACTACTACCCCCAGAAC

ATCACCATGAAGTGGCTGAAGGATAAGCAGCCAATGGATGCCAAGGAGTTCGAACCTAAAGACGTATTGCCCAATGGGGATGGGAC

CTACCAGGGCTGGATAACCTTGGCTGTACCCCCTGGGGAAGAGC

Example: Human Genome BLAST
blast databases
BLAST Databases

Nucleotide

  • refseq_rna = NM_*, XM_*
  • refseq_genomic = NC_*, NG_*
  • env_nt
    • environmental sample[filter], e.g., 16S rRNA

Protein

  • refseq = NP_*, XP_*
  • env_nr
new formatter
New Formatter

Select lower case

Select red

new formatter1
New Formatter
  • gray line = same database hit
  • hsp’s color-coded independently
blast output alignments filter
BLAST Output: Alignments & Filter

low complexity sequence filtered

advanced options
Advanced Options

Limit to Organism

all[filter] NOT ma

Example Entrez Queries

all[Filter] NOT mammalia[Organism]

ray finned fishes[Organism]

srcdb refseq[Properties]

Nucleotide only:

biomol mrna[Properties]

biomol genomic[Properties]

OtherAdvanced

–e 10000 expect value

-v 2000 descriptions

-b 2000 alignments

-e 10000 -v 2000

searching by structure
Searching by Structure
  • Why search for similar structures?
  • Find homologs with low sequence similarity
  • Explore protein evolution: similar protein folds can support different functions
  • Identify conserved core elements to model related proteins ofunknown structure
indexing into mmdb

Import only experimentally determined structures

  • Convert to ASN.1
  • Verify sequences
  • Create “backbone” model (Cα, P only)
  • Create single-conformer model

Add secondary structure

Add chemical bonds

id 1 ,

name "helix 1" ,

type helix ,

location

subgraph

residues

interval {

{ molecule-id 1 ,

from 49 ,

to 61 } } } ,

inter-residue-bonds {

{

atom-id-1 {

molecule-id 1 ,

residue-id 1 ,

atom-id 1 } ,

atom-id-2 {

molecule-id 1 ,

residue-id 2 ,

atom-id 9 } } ,

Indexing into MMDB

MMDBMolecular Modeling Data Base

Structure

structure summary
Structure Summary

Structure Neighbors

3D Domain Neighbors

Conserved Domains

vast alignment

IL-4 &

Leptin

VAST: Alignment

4

For each protein chain,

2

locate SSEs (secondary

structure elements),

5

6

represent SSEs as

individual vectors,

1

3

align the vectors.

Human IL-4

slide86
VAST

Taq DNA polymerase

Structure neighbors

slide88
VAST

Vector Alignment Search Tool

3D Domain structure neighbors

vast results for domain 1
VAST Results for Domain 1

Not found with Chain query!

submit file to pdb
submit file to PDB

Best way to convert PDB files to MMDB format

for viewing with Cn3D!

example mapping oligos onto a genome
Example: Mapping Oligos Onto a Genome

?

>forward

CCATGGCGACCCTGGAAAAGC

>reverse

CAGCAGCGGCTGTGCCTGCGG

?

?

map oligos onto genome

forward primer

reverse primer

Map Oligos Onto Genome

>CCATGGCGACCCTGGAAAAGCNNNNNNNNNNCAGCAGCGGCTGTGCCTGCGG

-W 7 –e 1000

primer alignments
Primer Alignments

reverse primer

forward primer

sequence view sv
Sequence View (sv)

forward

reverse

service addresses
Service Addresses
  • BLASTblast-help@ncbi.nlm.nih.gov
  • General Helpinfo@ncbi.nlm.nih.gov
  • Wayne Mattenmatten@ncbi.nlm.nih.gov