1 / 8

String Matching

Explore the definition of string matching and its various aspects, including exact matching, approximate matching, dynamic programming, sequence alignment, and probabilistic search in bioinformatics. Discover the data structures and algorithms used for efficient string matching.

marshl
Download Presentation

String Matching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. String Matching String matching: definition of the problem (text,pattern) depends on what we have: text or patterns • Exact matching: • The patterns ---> Data structures for the patterns • 1 pattern ---> The algorithm depends on |p| and || • k patterns ---> The algorithm depends on k, |p| and || • Extensions • Regular Expressions • The text ----> Data structure for the text (suffix tree, ...) • Approximate matching: • Dynamic programming • Sequence alignment (pairwise and multiple) • Sequence assembly: hash algorithm • Probabilistic search: Hidden Markov Models

  2. Bioinformatics Pairwise and multiple alignment

  3. Pairwise alignment + - s(A,CTAC)-2 s(AC,CTACT)=maximum s(A,CTA) 1 s(AC,CTA)-2 Edit distance: match=0 mismatch=1 indel=1 d(A,CTAC)+1 d(AC,CTACT)=minimum d(A,CTA)….+1 d(AC,CTA)+1 Similarity: match=1 mismatch=-1 indel=-2

  4. Pairwise alignment Connect to http://alggen.lsi.upc.es Links to TEACHING EMBER LePA

  5. Multiple alignment alignment

  6. Pairwise to multiple alignment S2 A C A -1 S3 __ S1 What happens with three strings? Let n be their lenght, then the cost becomes O(n3) “O(23)” “O(32)” And with k strings? O(nk 2k k2)

  7. Multiple alignment Programs of multialignment use different heuristics: • Clustal (Progressive alignment) http://www.ebi.ac.uk/clustalw • TCoffee (Progressive alignment + data bases) http://igs-server.cnrs-mrs.fr/Tcoffee_cgi/index.cgi • HMM (Hidden Markov Models)

  8. Multiple alignment Connect to http://alggen.lsi.upc.es/ and follow the links TEACHING EMBER.

More Related