1 / 8

Bioinformatics PhD. Course

Bioinformatics PhD. Course. Summary (approximate). 1. Biological introduction. 2. Comparison of short sequences (<10.000 bps). 3 Comparison of large sequences (up to 250 000 000). 4 Sequence assembly. 5 Efficient data search structures and algorithms. 6 Proteins.

ross-barton
Download Presentation

Bioinformatics PhD. Course

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics PhD. Course Summary (approximate) • 1. Biological introduction • 2. Comparison of short sequences (<10.000 bps) • 3 Comparison of large sequences (up to 250 000 000) • 4 Sequence assembly • 5 Efficient data search structures and algorithms • 6 Proteins...

  2. 2. Comparison of short sequences (<10.000 bps) Summary (more or less) • 2.1 Dot matrix • 2.2 Pairwise alignment. • 2.3 Hash algorithms. • 2.4 Multiple alignment.

  3. 2. Dot matrix S2 y S1 x Given two sequences, how we can analyse their degree of identity? By searching those parts that match: 1/0 1 if both characters coincide

  4. 2. Dot matrix S2 S2 y y . . . . . S1 S1 x x . . 1/0 1 if both characters coincide ? Given two sequences, how we can analyse their degree of identity? By searching those parts that match:

  5. 2.1 Dot matrix accaccacaccacaacgagcata… acctgagcgatat a c c . . t • m(i,j)=1 iff S1(i..i+L)=S2(j..j+L): exact matching • m(i,j)=1 iff k over L coincide: approximate matching. • m(i,j)=k iff k over L coincide: approximate matching L=window length What is the cost of the algorithm? When are the matchings relevant?

  6. 2.1. Dot matrix: algorithm cost accaccacaccacaacgagcata… acctgagcgatat a c c . . t • long(S1)*long(S2)* L in other words O(n2 L) • can long(S1)*long(S2)be possible? • can we also say that O(n2 ) is independent of L?

  7. 2.1. Dot matrix: signals C: Random B: S1=S2 A: transposons When are signals statistically significant?

  8. 2.1. Dot matrix: statistical significance: Given L=window length S2 y . . . . . S1 x . . We need to define a random model against which to compare the signals: we define RV: X number of characters that coincide, then Prob(X=k)=comb(L,k) pk (1-p)L-k What is its expected value?

More Related