C ompares h to c they are different the distance for this comparison is 1
This presentation is the property of its rightful owner.
Sponsored Links
1 / 49

C ompares H to C. They are different. The distance for this comparison is 1. PowerPoint PPT Presentation


  • 43 Views
  • Uploaded on
  • Presentation posted in: General

C ompares H to C. They are different. The distance for this comparison is 1. Looking at the neighboring cells, the best distance so far is 0 (as seen in the top-left cell). So , if add this distance of 1 to the best previous distance, this cell gets a value of 1 (which is 0+1).

Download Presentation

C ompares H to C. They are different. The distance for this comparison is 1.

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


C ompares h to c they are different the distance for this comparison is 1

Compares H to C. They are different. The distance for this comparison is 1.

Looking at the neighboring cells, the best distance so far is 0 (as seen in the top-left cell).

So, if add this distance of 1 to the best previous distance, this cell gets a value of 1 (which is 0+1).


C ompares h to c they are different the distance for this comparison is 1

Final distance is 3.


C ompares h to c they are different the distance for this comparison is 1

In Boyer-Moore is that the comparison is done from right to left, starting with the last character in the pattern.


C ompares h to c they are different the distance for this comparison is 1

The first comparison is between X and C, which do not match.

But since X does not appear anywhere in the search pattern, we can now rule out a match anywhere in the first 3 characters. So the skip value for X will be initialized to 3, the length of the search pattern.


C ompares h to c they are different the distance for this comparison is 1

Again we start from the right by comparing B and C, which again do not match.

However, this time B does occur within the search pattern. The skip value for B will be 1 in order to line up with the last B in the search pattern.


C ompares h to c they are different the distance for this comparison is 1

Traditionally, implementations of this algorithm have created a 256-byte table to hold the skip value for all possible characters.


C ompares h to c they are different the distance for this comparison is 1

Example

pattern = "STING" string = "A STRING SEARCHING EXAMPLE CONSISTING OF TEXT" try to match first m characters STING A STRING SEARCHING EXAMPLE CONSISTING OF TEXT

This fails. Slide pattern right to look for other matches.Note that R does not occur in pattern. So can slide it past R.


C ompares h to c they are different the distance for this comparison is 1

Example

pattern = "STING" string = "A STRING SEARCHING EXAMPLE CONSISTING OF TEXT"

STING A STRING SEARCHING EXAMPLE CONSISTING OF TEXT Fails again. Rightmost character S is in pattern precisely once, so slide until two S's line up.


C ompares h to c they are different the distance for this comparison is 1

Example

pattern = "STING" string = "A STRING SEARCHING EXAMPLE CONSISTING OF TEXT"

STING A STRING SEARCHING EXAMPLE CONSISTING OF TEXT

No C in pattern. Slide past it.


C ompares h to c they are different the distance for this comparison is 1

Example

pattern = "STING" string = "A STRING SEARCHING EXAMPLE CONSISTING OF TEXT"

STING A STRING SEARCHING EXAMPLE CONSISTING OF TEXT

No space in pattern. Slide past it


C ompares h to c they are different the distance for this comparison is 1

Example

pattern = "STING" string = "A STRING SEARCHING EXAMPLE CONSISTING OF TEXT"

STING A STRING SEARCHING EXAMPLE CONSISTING OF TEXT

No O in pattern. Slide past it.


C ompares h to c they are different the distance for this comparison is 1

Example

pattern = "STING" string = "A STRING SEARCHING EXAMPLE CONSISTING OF TEXT"

STING A STRING SEARCHING EXAMPLE CONSISTING OF TEXT

Rightmost char T. Exactly one T in pattern. Slide to align them.


C ompares h to c they are different the distance for this comparison is 1

Example

pattern = "STING" string = "A STRING SEARCHING EXAMPLE CONSISTING OF TEXT"

STING A STRING SEARCHING EXAMPLE CONSISTING OF TEXT

match


C ompares h to c they are different the distance for this comparison is 1

  • Complexity is O(n).

  • The execution time can actually be sub-linear:

  • It doesn't need to actually check every character of the string to be

  • searched but rather skips over some of them (check right-most character of the block of m first, if not found in pattern can skip entire rest of block).

  • Best-case performance is O(n/m). In the best case, only one in m characters needs to be checked.

  • Actually works better (on average) with longer m!


C ompares h to c they are different the distance for this comparison is 1

Text Editor, Digital Library and Search Engine:

Every person uses a text editor and every user of a digital library or search engine, needs to find patterns in a text.

The Boyer Moore algorithm is directly implemented the search command of practically all text editors. The longest common subsequence dynamic programming algorithm is implemented in system commands that test differences between files.


C ompares h to c they are different the distance for this comparison is 1

Multimedia and Computational Biology:

Computational biology: in finding a close mutation,

Communications: to adjust for transmission noise,

Texts: to detect common typing errors.

Multimedia: to adjust for loss compressions, occlusions, scaling, affine transformations or dimension loss.

DNA sequencing: The largest overlap heuristic for finding the shortest common superstring.


C ompares h to c they are different the distance for this comparison is 1

Medical Tests:

The BMH algorithm achieves the best overall results when used with medical tests. This algorithm usually performs at least twice as fast as the other algorithms tested. The time performance of exact string pattern matching can be greatly improved if an efficient algorithm is used. Considering the growing amount of text handled in the electronic patient record, it is worth implementing this efficient algorithm.


C ompares h to c they are different the distance for this comparison is 1

Retrieving Music Pattern from Musical Database:

When musical note from musical database are to be retrieved then we need string matching. There are four similar techniques for this: edit distance, dice similarity, Jaccardsimilarity and cosine similarity. The musical notes are retrieved by QBE (query by example) approach. So the best scheme for this problem is Levenshtein distance with Jaccard similarity. This is an approximate music search technique. As the Jaccardsimilarity performs excellent in passing a query when a pitch change scenario is selected.


C ompares h to c they are different the distance for this comparison is 1

Intrusion Detection:

Intrusion detection systems fall into two basic categories:

signature-based intrusion detection systems and

anomaly detection systems.


C ompares h to c they are different the distance for this comparison is 1

Intrusion Detection:

Signature-based intrusion detection systems:

Intruders have signatures, like computer viruses.

Find data packets that contain any known intrusion relatedsignatures or anomalies related to Internet

protocols.

Based upon a set of signatures and rules, the detection system is able to find and log suspicious

activity and generate alerts.


C ompares h to c they are different the distance for this comparison is 1

Intrusion Detection:

Anomaly-based intrusion detection systems:

Anomaly-based intrusion detection usually depends on packet anomalies present in protocol header parts.


C ompares h to c they are different the distance for this comparison is 1

Intrusion Detection:

May become the performance bottleneck in deep packet inspection.


C ompares h to c they are different the distance for this comparison is 1

Detecting Plagiarism:

Composes of structural and syntactic phases:

In the structural phase, documents are decomposed into components by its syntax and compared at the coarse level.


C ompares h to c they are different the distance for this comparison is 1

Detecting Plagiarism:

The structural mapping processes the decomposed documents based on its syntax without actually mapping at the word level.

The structural mapping can be applied in a hierarchical way based on the structural organization of a document.


C ompares h to c they are different the distance for this comparison is 1

Detecting Plagiarism:

Secondly, the syntactic matching algorithm uses a heuristic look-ahead algorithm for matching consecutive tokens with a verification patch.


C ompares h to c they are different the distance for this comparison is 1

Bioinformatics:

Approximate matching of a search pattern to a target

(called the “text” in string algorithms) is a fundamental tool in molecular biology.

The pattern is often called the “query” and the text is called a “sequence database”, but we will use “pattern” and “text” to be consistent.


C ompares h to c they are different the distance for this comparison is 1

Bioinformatics:

The importance of approximate matching is that biological sequences change and evolve.

Related genes in different organisms, or even similar genes within the same organism, most commonly have similar, but not identical sequences.

Determining which sequences of known function are most similar to a new gene of unknown function is often the first step in finding out what the new gene does.


C ompares h to c they are different the distance for this comparison is 1

Digital Forensics:

Digital forensic text string searches are designed to

search every byte of the digital evidence, at the physical level, to locate specific text strings of interest to the investigation.

Given the nature of the data sets typically encountered, text string search results are extremely noisy, which results in inordinately high levels of information retrieval (IR) overhead and information overload.


C ompares h to c they are different the distance for this comparison is 1

Text Mining:

Information extraction,

topic tracking,

content summarization,

information visualization,

question answering,

concept linkage,

text categorization/ classification, and

text clustering


C ompares h to c they are different the distance for this comparison is 1

Video Retrieval:

String based video retrieval method first converts the unstructured video into a curve and marks the feature string of it.

Approximate string matching is then used to retrieve

video quickly.

The characteristic curve of the key frame sequence is first extracted followed by marking the feature string and then approximate string matching is used on the feature string to get fast video retrieval.


C ompares h to c they are different the distance for this comparison is 1

Introduction to Algorithmsby Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein.

http://shaunwagner.com/writings_computer_levenshtein.html

A fast string searching algorithm, R. S. Boyer and J. S. Moore, Communications of the ACM, vol. 20 (10), pp. 762-772).


C ompares h to c they are different the distance for this comparison is 1

Questions?


C ompares h to c they are different the distance for this comparison is 1

THANK YOU!


  • Login