1 / 49

# C ompares H to C. They are different. The distance for this comparison is 1. - PowerPoint PPT Presentation

C ompares H to C. They are different. The distance for this comparison is 1. Looking at the neighboring cells, the best distance so far is 0 (as seen in the top-left cell). So , if add this distance of 1 to the best previous distance, this cell gets a value of 1 (which is 0+1).

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' C ompares H to C. They are different. The distance for this comparison is 1. ' - ronia

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Compares H to C. They are different. The distance for this comparison is 1.

Looking at the neighboring cells, the best distance so far is 0 (as seen in the top-left cell).

So, if add this distance of 1 to the best previous distance, this cell gets a value of 1 (which is 0+1).

In Boyer-Moore is that the comparison is done from right to left, starting with the last character in the pattern.

The first comparison is between X and C, which do not match.

But since X does not appear anywhere in the search pattern, we can now rule out a match anywhere in the first 3 characters. So the skip value for X will be initialized to 3, the length of the search pattern.

Again we start from the right by comparing B and C, which again do not match.

However, this time B does occur within the search pattern. The skip value for B will be 1 in order to line up with the last B in the search pattern.

Traditionally, implementations of this algorithm have created a 256-byte table to hold the skip value for all possible characters.

Example created a 256-byte table to hold the skip value for all possible characters.

pattern = "STING" string = "A STRING SEARCHING EXAMPLE CONSISTING OF TEXT" try to match first m characters STING A STRING SEARCHING EXAMPLE CONSISTING OF TEXT

This fails. Slide pattern right to look for other matches.Note that R does not occur in pattern. So can slide it past R.

Example created a 256-byte table to hold the skip value for all possible characters.

pattern = "STING" string = "A STRING SEARCHING EXAMPLE CONSISTING OF TEXT"

STING A STRING SEARCHING EXAMPLE CONSISTING OF TEXT Fails again. Rightmost character S is in pattern precisely once, so slide until two S's line up.

Example created a 256-byte table to hold the skip value for all possible characters.

pattern = "STING" string = "A STRING SEARCHING EXAMPLE CONSISTING OF TEXT"

STING A STRING SEARCHING EXAMPLE CONSISTING OF TEXT

No C in pattern. Slide past it.

Example created a 256-byte table to hold the skip value for all possible characters.

pattern = "STING" string = "A STRING SEARCHING EXAMPLE CONSISTING OF TEXT"

STING A STRING SEARCHING EXAMPLE CONSISTING OF TEXT

No space in pattern. Slide past it

Example created a 256-byte table to hold the skip value for all possible characters.

pattern = "STING" string = "A STRING SEARCHING EXAMPLE CONSISTING OF TEXT"

STING A STRING SEARCHING EXAMPLE CONSISTING OF TEXT

No O in pattern. Slide past it.

Example created a 256-byte table to hold the skip value for all possible characters.

pattern = "STING" string = "A STRING SEARCHING EXAMPLE CONSISTING OF TEXT"

STING A STRING SEARCHING EXAMPLE CONSISTING OF TEXT

Rightmost char T. Exactly one T in pattern. Slide to align them.

Example created a 256-byte table to hold the skip value for all possible characters.

pattern = "STING" string = "A STRING SEARCHING EXAMPLE CONSISTING OF TEXT"

STING A STRING SEARCHING EXAMPLE CONSISTING OF TEXT

match

• Complexity is O(n). created a 256-byte table to hold the skip value for all possible characters.

• The execution time can actually be sub-linear:

• It doesn't need to actually check every character of the string to be

• searched but rather skips over some of them (check right-most character of the block of m first, if not found in pattern can skip entire rest of block).

• Best-case performance is O(n/m). In the best case, only one in m characters needs to be checked.

• Actually works better (on average) with longer m!

Text created a 256-byte table to hold the skip value for all possible characters. Editor, Digital Library and Search Engine:

Every person uses a text editor and every user of a digital library or search engine, needs to find patterns in a text.

The Boyer Moore algorithm is directly implemented the search command of practically all text editors. The longest common subsequence dynamic programming algorithm is implemented in system commands that test differences between files.

Multimedia created a 256-byte table to hold the skip value for all possible characters. and Computational Biology:

Computational biology: in finding a close mutation,

Communications: to adjust for transmission noise,

Texts: to detect common typing errors.

Multimedia: to adjust for loss compressions, occlusions, scaling, affine transformations or dimension loss.

DNA sequencing: The largest overlap heuristic for finding the shortest common superstring.

Medical created a 256-byte table to hold the skip value for all possible characters. Tests:

The BMH algorithm achieves the best overall results when used with medical tests. This algorithm usually performs at least twice as fast as the other algorithms tested. The time performance of exact string pattern matching can be greatly improved if an efficient algorithm is used. Considering the growing amount of text handled in the electronic patient record, it is worth implementing this efficient algorithm.

Retrieving created a 256-byte table to hold the skip value for all possible characters. Music Pattern from Musical Database:

When musical note from musical database are to be retrieved then we need string matching. There are four similar techniques for this: edit distance, dice similarity, Jaccardsimilarity and cosine similarity. The musical notes are retrieved by QBE (query by example) approach. So the best scheme for this problem is Levenshtein distance with Jaccard similarity. This is an approximate music search technique. As the Jaccardsimilarity performs excellent in passing a query when a pitch change scenario is selected.

Intrusion Detection: created a 256-byte table to hold the skip value for all possible characters.

Intrusion detection systems fall into two basic categories:

signature-based intrusion detection systems and

anomaly detection systems.

Intrusion Detection: created a 256-byte table to hold the skip value for all possible characters.

Signature-based intrusion detection systems:

Intruders have signatures, like computer viruses.

Find data packets that contain any known intrusion relatedsignatures or anomalies related to Internet

protocols.

Based upon a set of signatures and rules, the detection system is able to find and log suspicious

Intrusion Detection: created a 256-byte table to hold the skip value for all possible characters.

Anomaly-based intrusion detection systems:

Anomaly-based intrusion detection usually depends on packet anomalies present in protocol header parts.

Intrusion Detection: created a 256-byte table to hold the skip value for all possible characters.

May become the performance bottleneck in deep packet inspection.

Detecting Plagiarism: created a 256-byte table to hold the skip value for all possible characters.

Composes of structural and syntactic phases:

In the structural phase, documents are decomposed into components by its syntax and compared at the coarse level.

Detecting Plagiarism: created a 256-byte table to hold the skip value for all possible characters.

The structural mapping processes the decomposed documents based on its syntax without actually mapping at the word level.

The structural mapping can be applied in a hierarchical way based on the structural organization of a document.

Detecting Plagiarism: created a 256-byte table to hold the skip value for all possible characters.

Secondly, the syntactic matching algorithm uses a heuristic look-ahead algorithm for matching consecutive tokens with a verification patch.

Bioinformatics: created a 256-byte table to hold the skip value for all possible characters.

Approximate matching of a search pattern to a target

(called the “text” in string algorithms) is a fundamental tool in molecular biology.

The pattern is often called the “query” and the text is called a “sequence database”, but we will use “pattern” and “text” to be consistent.

Bioinformatics: created a 256-byte table to hold the skip value for all possible characters.

The importance of approximate matching is that biological sequences change and evolve.

Related genes in different organisms, or even similar genes within the same organism, most commonly have similar, but not identical sequences.

Determining which sequences of known function are most similar to a new gene of unknown function is often the first step in finding out what the new gene does.

Digital Forensics: created a 256-byte table to hold the skip value for all possible characters.

Digital forensic text string searches are designed to

search every byte of the digital evidence, at the physical level, to locate specific text strings of interest to the investigation.

Given the nature of the data sets typically encountered, text string search results are extremely noisy, which results in inordinately high levels of information retrieval (IR) overhead and information overload.

Text Mining: created a 256-byte table to hold the skip value for all possible characters.

Information extraction,

topic tracking,

content summarization,

information visualization,

text categorization/ classification, and

text clustering

Video Retrieval: created a 256-byte table to hold the skip value for all possible characters.

String based video retrieval method first converts the unstructured video into a curve and marks the feature string of it.

Approximate string matching is then used to retrieve

video quickly.

The characteristic curve of the key frame sequence is first extracted followed by marking the feature string and then approximate string matching is used on the feature string to get fast video retrieval.

Introduction to created a 256-byte table to hold the skip value for all possible characters. Algorithmsby Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein.

http://shaunwagner.com/writings_computer_levenshtein.html

A fast string searching algorithm, R. S. Boyer and J. S. Moore, Communications of the ACM, vol. 20 (10), pp. 762-772).

Questions? created a 256-byte table to hold the skip value for all possible characters.

THANK YOU! created a 256-byte table to hold the skip value for all possible characters.