- 96 Views
- Uploaded on

Download Presentation
## HARDCODING FINITE AUTOMATA

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Ernest Ketcha Ngassam

Prof. Bruce W. Watson

Prof. Derrick G. Kourie

Department of Computer Science

University of Pretoria

Fastar Research Group

http://fastar.cs.up.ac.za

HARDCODING FINITE AUTOMATAFA Definition: (Σ, S, F, δ, s0)

Finite set of Alphabet symbols (Σ)

Finite set of states (S)

Finite set of accepting states (F)

Transition function (δ)

Starting state (s0)

FAs Context

Chomsky hierarchy

Right linear grammar

Many FAs Applications

Pattern matching in text

Text indexing

Computational genetics

Network intrusion detection

Computer and natural virus scanning

Natural language translation

Spell checking

Etc.

FAs are therefore performance-sensitive

CFL

CFL

RLL

RLL

CSL

CSL

UL

UL

Introductory Remarks1986, Penello in “Very fast LR Parsing”

System that produces hardcoded parsers in Assembly Language

1988, Horspool and Whitney in “Even Faster LR Parsing”

Used Pennello’s idea

Additional optimization strategies to reduce the code size

Some fine tuning

1995, Bhamidipaty and Proebsting in “Very fast YACC-Compatible Parsers (For Very Little Effort)”

YACC produces table-driven Parsers

The method produced directly executable hardcoded parsers in C

2002, Kimmel in “Programming with Regular Expressions in C#”

Suggests implementation of regular expressions in Assembler

Related workObjective: Determine if a string is in a language represented by an FA?

Key issue:

Transition table that embeds

Alphabet

States

Entries

Uses function / ”controller” recognize(str, transition): boolean

Checks for acceptance symbol per symbol from str

Transverses the table transition

Returns true or false

Conventional FA Implementation(i,chk)

No table as data structure

Only Primitive data types used

Data embedded into algorithm

Data are part of the instructions

Uses function recognize(str): boolean

Checks for acceptance symbol per symbol from str

Returns true or false

What is a Hardcoded algorithm?read(str[0]);

goto label_0;

label_0:

action_0;

read(str[1]);

goto label_1;

label_1:

action_1;

read(str[2]);

goto label_2;

…

…

label_{n-1}:

action_{n-1};

goto decision;

…

Instructions

Table-driven heavily depends on data

Hardcoded heavily depends on instructions

Computationally equivalents O(len)

Need to perform empirical evaluation!

Table-driven vs. Hardcoded AlgorithmsHardcoded

Table-driven

Based on single symbol recognition

Easy to implement

Problem domain restricted

Various implementation strategies for the hardcoded algorithm

High-level language (2 variations)

Low-level language (3 variations)

Baseline for string recognition

Table-driven Algorithm reflects work for any transition function

Hardcoded Algorithm reflects work for specific transition function

a

e

s0

d

Preliminary ExperimentsTransition array

Generate random transition array

Measure clock cycles using

The control program for Table-driven (C++)

Hardcoded program (5 variations)

Switch statement (C++)

Nested conditionals (C++)

Linear search (ASM)

Jump table (ASM)

Direct jump (ASM)

The Experiment & Data CollectionJust an indication on how to continue with experiments

Hardcode outperforms table-driven (in low-level language)

Conclusion:

Rely on jump table version for further experiments

Use it to explore cache effects

Preliminary ResultsLanguage based on:

Accepting symbol (a)

Rejecting symbol (b)

In each of the n-1 states

a :triggers a transition to the next state

b : does not trigger transition

Only string accepted: aaa…aaa (n-1 times)

Represents worst case scenario

Not concerned about reducing the FA

Use Jump table and table-driven versions

a

a

a

a

1

2

3

n

A Simple String Test ExperimentTable-driven (2 symbols alphabet)

Hardcode (2 symbols alphabet)

Hardcode (single state)

Table-driven (single state)

Performance based on 2 symbols alphabet- Remark:
- Caching effect on the hardcoded version
- L1 cache (Hits) between 10 states and about 110 states
- L1 cache (Misses) between 160 states and about 360 states
- L2 cache (Hits) between 460 states and 1700 states
- Slow L2 cache (Misses) from 1800 states then need Main memory

Two ways of Implementing a string recognizer:

Implementation based on direct indexing

1

2

3

n

(i,val(strk))

The String Recognition ExperimentString

Two ways of Implementing a string recognizer:

Implementation based on direct indexing

Implementation based on symbol searching

1

2

3

n

(i, pos(strk))

a

b

c

d

e

Array of alphabet symbols

The String Recognition ExperimentString

Two ways of Implementing a string recognizer:

Implementation based on direct indexing

Implementation based on symbol searching

Binary search

Linear search

We used Linear search

1

2

3

n

(i, pos(strk))

a

b

c

d

e

Array of alphabet symbols

The String Recognition ExperimentString

2

3

n

The String Recognition Experiment- Language based on:
- 10-symbol alphabet
- Number of states between 10 and 4000
- Randomly generate accepting string of length n-1 (n automaton size)
- Filling density of each automaton sets to 41%

Table-driven searching

Hardcode direct index

Table-driven direct index

The String Recognition Experiment- Remarks and Finding:
- Caching effect on the hardcoded version
- Noises due to Branch Prediction Buffer
- Wrong guesses in the Branch History Buffer
- Hardcoding outperforms table-driven up to a thousand states

Dynamic Implementation of Finite Automata for Performance (DIFAP) using:

Table-driven

Linked list

Hardcode

Fine tuning,

Constraints,

Etc.

An Adaptive method for DIFAP (A-DIFAP)

Adapts to system’s/platform’s constraints at run-time

Programming Language specific toolkit for DIFAP / A-DIFAP

Exploits programming language’s features

Future WorkPreliminary Experiments on Hardcoding Finite Automata. CIAA 2003

Hardcoding Finite State Automata Processing. SAICSIT 2003

Hardcoding Finite State Automata Processing. (Submitted to SACJ)

On Hardcoding Finite State Automata Processing. Technical Report T/UE 2003.

The Effect of Cache Memory on Hardcoded Finite Automata (To be submitted to SP&E)

Publications
Download Presentation

Connecting to Server..