slide1
Download
Skip this Video
Download Presentation
Sources Page & Holmes

Loading in 2 Seconds...

play fullscreen
1 / 21

Sources Page & Holmes - PowerPoint PPT Presentation


  • 60 Views
  • Uploaded on

Sources Page & Holmes Vladimir Likic presentation: http ://science.marshall.edu/murraye/Clearer\%20Matrix\%20slide\% 20show.pdf Wikipedia Lecture at : http:// cs.njit.edu / usman /courses/bnfo601\_fall08/ AffineGap.pdf.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Sources Page & Holmes' - nitesh


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide2
Sources
  • Page & Holmes
  • Vladimir Likic presentation: http://science.marshall.edu/murraye/Clearer%20Matrix%20slide%20show.pdf
  • Wikipedia
  • Lecture at : http://cs.njit.edu/usman/courses/bnfo601_fall08/AffineGap.pdf
slide3
Homoplasy – structural or DNA resemblance due to parallelism or convergent evolution rather than to common ancestry
slide5
Problem: which base positions share common descent?

agtggtcttgctacattgctagctaaatcgatcatgatcgatgattcagg

tagctaaatcgatcatgatcgatgattcaggcgatgtcatgactgatcag

tacattgctagctaaatcgatcatgatcgatgattcaggcgatgtcatga

gatcatgatcgatgattcaggcgatgtcatgactgatcagggatgatgat

Alignment – residue to residue correspondence between 2 or more sequences such that the order of residues in each sequence is preserved.

agtggtcttgctacattgctagctaaatcgatcatgatcgatgattcagg

tagctaaatcgatcatgatcgatgattcaggcgatgtcatgactgatcag

tacattgctagctaaatcgatcatgatcgatgattcaggcgatgtcatga

gatcatgatcgatgattcaggcgatgtcatgactgatcagggatgatgat

Indels make alignment trickier

agtggtcttgctacattgctagctaaatcgatcatgatcgatgattcagg

tagctaaatcgatcatgatcgatgattcaggcgatgtcatgactgatcag

tacattgctagctaaa----tcatgatcgatgattcaggcgatgtcatga

gatcatgatcgatgattcaggcgat------actgatcagggatgatgat

slide6
Alignment problems (examples)

1) different sequences of the same allele from the same locus within the same individual

2) sequences of different alleles from the same locus within the same individual

3) same locus from different individuals

Assembly – (from ensembl) - When the genome of a species is to be sequenced, the chromosomes from many cells are broken at random positions into small fragments, which are sequenced, and reassembled into long sequences (contigs). Contigs may be assembled into longer sequences called scaffolds and sometimes, if the depth of sequencing is high enough, there may be enough information to assemble most of the scaffolds into chromosomes. The resulting collection of sequences after assembly is called a genome assembly.

slide7
Alignment Methods
  • Dot plot – qualitative
  • Sequence alignment – quantitative; constructing the best alignment using a scoring scheme
  • Types of Alignment
  • Global – best alignment over the entire length
  • Local – best alignment in small region; used when comparing sequences of different lengths
  • Multiple – beyond pairwise

cagcacttggattctgg & cagcgtgg

Local

cagca-cttggattctgg

---cagcgtgg-------

Global (best depending on gap penalties)

cagcacttggattctgg

cagc----g—t----gg

slide8
Gaps
  • residue to nothing match that can be inserted in either sequence
  • are not part of the DNA sequence, only a construct for alignment
  • Gap to gap match is meaningless and not allowed
slide10
Alignment with scoring schemes
  • score to select the best possible alignment given scoring scheme
  • Scoring scheme
  • A set of rules that assigns a score to a particular alignment between two sequences
  • Goal is to maximize score
  • Score is sum of residue substitution scores and gap penalties
slide11
+1 for match

-1 for mismatch

No gap penalty

atggcgt +1+1+1-1+1+1 = 4

atg-agt

atggcgt +1-1+1-1+1+1 = 2

a-tgagt

Substitution matrix:

c t a g

c 1 -1 -1 -1

t-1 1 -1 -1

a -1 -1 1 -1

g -1 -1 -1 1

slide12
What if we want to penalize transitions less than transversions?

Substitution matrix:

c t a g

c 2 1 -1 -1

T 1 2 -1 -1

a -1 -1 2 1

g -1 -1 1 2

slide13
Protein substitution matrices
  • More complex than DNA scoring matrices.
    • Proteins are composed of twenty amino acids, and physical-chemical properties of individual amino acids vary considerably.
    • can be based on any property of amino acids: size, polarity, charge, hydrophobicity.
    • Evolutionary substitution matrices – empirically derived by assessment of frequencies of changes at particular levels of divergence
slide14
Evolutionary substitution matrices
  • PAM ("point accepted mutation") family PAM250, PAM120, etc.
  • BLOSUM ("Blocks substitution matrix") family BLOSUM62, BLOSUM50, etc.
  • The BLOSUM matrices were developed more recently and considered better.
slide15
Blosum62

Blosum80 is used for less divergent sequences

Blosum45 is used for more divergent sequences

Etc.

slide16
Gaps
  • Because gaps often result in radical protein changes (frame shifts, premature stop), the penalty for a gap is usually several times greater than the penalty for a mutation.
  • Once created, gaps of more than one residue might be less expensive than a completely new gap - in other words gap opening penalties and gap extension penalties are often definedseparately
slide17
Affine gap penalty function W(i)

Wi=g+h*i

(for i>= 1, where i = gaplength )

•g: gapopeningpenalty

•h: gapextensionpenalty

•The ratio betweengandh determines the relative weight for opening versus extension

–Small g, Large h: gaplengthmore important

–Large g, Small h: gaplengthlessimportant

slide18
Substitution matrix:

c t a g

c 2 1 -1 -1

T 1 2 -1 -1

a -1 -1 2 1

g -1 -1 1 2

Wi=g+h*i

G = -3

H = -1

ATGTAGTGTATAGTACATGCA

ATGTAG-------TACATGCA

ATGTAGTGTATAGTACATGCA

ATGTA--G--TA---CATGCA

26 – 3 – 1(7) = 16

26 – 3 (3) – 1(7) =10

slide19
How do we find the best alignment?

Brute-force approach:

Generate the list all possible alignments between two sequences, score them, select the alignment with the best score

The number of possible global alignments between two sequences of length N is

For two sequences of 250 residues this is ~10149

slide20
Needleman-Wunschand Smith-Waterman are both algorithms that find the best alignment through breaking the problem down into sub problems using dynamic programming

…however, it is only the best based on the scoring matrix and the gap opening and extension penalities

These methods are computationally expensive

slide21
BLAST – Basic Local Alignment Search Tool
  • Tries to find the highest scoring ungapped local alignment between a query and a database
  • Uses a word length (w) and scans for matches with a higher threshold (T) when aligned with words in the query
  • The local alignment is then extended in both directions until the score falls below the best score reached so far.
  • Many types of blast can be found at
  • http://blast.ncbi.nlm.nih.gov/Blast.cgi
ad