- 83 Views
- Uploaded on
- Presentation posted in: General

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations

Shuhei Denzumi1, Ryo Yoshinaka2, 1, Shin-ichi Minato1,2, and Hiroki Arimura1

1) Hokkaido University2) JST ERATO Minato Discrete Structure Manipulation System Project

- Researches on string processing become active.
- Massive online data: The internet and sensing networks.
- String matching and string mining problems.

- Data mining
- Input data should be represented in compact form
- Computation under compressed structure is needed

Input

Data Structure

Result

Input

Compress

Operation

Input

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

- Manipulatable Compact data structure
- Represent data in compressed form
- Have operations to manipulate data in compacted style
- Get much attention for recent years

- Binary Decision Diagram (BDD)
- LSI area

- Deterministic Finite Automata (DFA)
- Natural Language Processing area

Input

D 1

Data Structure

Input

Compaction

Operation

D 3

Input

D 2

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

- Sequence Binary Decision Diagram (SeqBDD, SDD).
- Loekito, Bailey, and Pei (2009)
- Graph structure
- Represent finite sets of stringswith finite length

- SDD’s basic properties are unknown
- Minimization
- Size complexity
- Operation time

- Application
- Data mining
- Graph mining
- Human genome sequencing

Text

Text

Text

…

Sequence BinaryDecision Diagram

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

- Compact representation for discrete structure
- With rich algebraic operations

BDD [Bryant 1986]

Boolean functions

xy ∨ yz∨ zx

￢xyz ∨ x￢yz∨ xy￢z

SDD [Loekito, et.al 2009]

Sets of strings

ZDD [Minato 1993]

Sets of combinations

{{a}, {b}, {a, b}}

{abc, acb, bac, bca}

{{a}, {b}, {c}, {a, b, c}}

{a, b, ab, bab, abbab}

- Relationship to Acyclic Deterministic Finite Automata (ADFA)
- Translation from an SDD to an ADFA and vice versa
- An SDD is never larger than an ADFA
- An SDD can be |Σ| times smaller than an ADFA

- Computational complexity of binary set operations
- Generalize eight set operations
- Tight analysis on time complexity for binary set operation algorithm

- Experimental results
- SDDs can be smaller than ADFAs
- Binary operation time

a

b

…

z

1

0

- Σ: alphabet (totally ordered by ≺)
- Internal node: , , , , 1/0 - terminal node: /
- 1/0 - edge: /
- SDD: directed acyclic graph
- Internal node S, τ(S) ↦ 〈S.lab, S.1, S.0〉
- S.lab: label
- S.1: 1-child
- S.0: 0-child

- Ordering rule
- N.lab ≺ (N.0).lab

S

S.1

S.lab

≺

≺

≺

a

a

b

…

z

b

c

S.0

1

0

- L(N): set of strings N represents
- L( ) = {ε}
- L( ) = {}
- L(N) = N.lab・L(N.1)∪L(N.0)
- A path from the root to the 1-terminal noderepresent a string.

1

{aa, ab, bb}

{aa, ab, bb}

{aa, ab, bb}

{aa, ab, bb}

0

a

a

a

a

{a, b}

{a, b}

{a, b}

{a, b}

a

a

a

a

b

b

b

b

{bb}

{bb}

{bb}

{bb}

{b}

{b}

{b}

{b}

b

b

b

b

{ε}

{ε}

{ε}

{ε}

1

1

1

1

0

0

0

0

{}

{}

{}

{}

- accept state
- reject state

1

0

{aa, ab, bb}

a

a

b

b

c

c

{aa, ab, bb}

{a, b}

{b}

b

a

b

c

a

{a, b}

{bb}

a

b

a

a

b

c

b

a

b

b

1

0

a・{} ∪ L(N.0) = L (N.0)

N’

- Suppression
- N.1 ≠ 0-terminal node
- In ADFA, removing edges pointing dead state

- Merging
- τ(N) = τ(N’) ⇒ N = N’
- In ADFA, share all equivalent nodes

- Under these rules, SDD is unique and minimal
- Like ADFA’s have unique canonical form

x

N.0

N.0

N

N

N.1

N.1

x

x

a

N.0

N.0

0

- Almost isomorphic to Acyclic Deterministic Finite Automata
- BDD/ZDD techniques are applicable
- Binary form
- Simple recursive algorithm
- Easy to implement

- Rich collections of operations
- Use of hash tables
- To share equivalent nodes
- To share intermediate computations

BDD/ZDD

ADFA

SDD

- An SDD node correspond to an ADFA edge
- The description size is proportional to|N|: the number of internal nodes in SDD N|A|: the number of edges in ADFA A

a

b

c

a

b

c

- For equivalent an SDD and an ADFA
- From an ADFA A to an SDD N
- From an SDD N to an ADFA A
- SDD |Σ| times can be smaller than ADFA

a

e

c

c

d

e

a

b

b

d

d

c

e

{anbicj, n = 0, …, 4, i, j = 0, 1}

ADFA A

SDD S

a

a

a

1

c

a

b

b

c

c

a

b

c

c

b

a

b

c

b

c

a

a

|S| = 6

|A| = 14

- Input: Canterbury corpus
- BibleAll: bible.txt, BibleBi: all bigrams from bible.txt, Ecoli: E.coli.txt
- Fac means store all fanctors of input data

P

Q

- A binary set operation♢ ∈ {∪, ∩, ＼, …}
- Input: two SDDs P, Q
- Output: SDD Rsuch thatL(R) = L(P) ♢ L(Q)

Binary Set Operation

P ♢Q

- Originally for BDD [Bryant 1986], applied to SDD
- Based on the definition L(N) = N.lab ・ L(N.1) ∪ L(N.0)
- In operation, (when P.lab = Q.lab)L(P) ♢ L(Q) = P.lab ・ (L(P.1) ♢ L(Q.1)) ∪ (L(P.0) ♢ L(Q.0))

P

Q

P♢Q

a

a

a

P1

P0

♢

Q1

P1♢Q1

Q0

P0♢Q1

- Key-Value hashtables
- Uniquetable
- Key: 〈letter x, SDD node N1, SDD node N0〉
- Value: SDD node N with τ(N) = 〈x, N1, N0〉

- Opcache
- Key: 〈operation id ♢, SDD node P, SDD node Q〉
- Value: SDD node R which is R = P ♢ Q

P

♢

Q

N1

P ♢Q

N

x

Uniquetable

Opcache

Key (triple)

Key (triple)

〈♢, P, Q〉

〈x, N1, N0〉

N0

Value (node)

Value (node)

R

N

- Any SDD node needed during computation is created via this process
- Once an internal node is registered in Uniquetable, equivalent nodes will not created anymore.

Check the Uniquetable for key 〈x, N1, N0〉.

Exist

Not exist

Return it.

Create a new node and return it.

- When P ♢ Q is executed
- Every operation use Opcache
- At most |P| ×|Q| different instances of recursive calls invoke
- (Assume that the access time to hash tables is constant)

- Naïve method
- Prepare |P| × |Q| size table

- This method
- No useless or redundant node

- Theorem
- Worst case O(|P| |Q|) time
- Example needs Ω(|P| |Q|) time exist
- Lower and upper bound got

Check the Opcachefor key 〈♢, P, Q〉.

Exist

Not exist

P ♢ Q is already done,

return it.

Continue to computation on 0-side and 1-side.

- Operation time
- Prepare two SDDs for all factors of random texts of length n
- Time to compute operation

- Relationship to Acyclic Automata
- An SDD can be |Σ| times smaller than an ADFA
- For real data, SDDs are 10~20 % more compact than ADFAs

- Computational complexity of binary set operations
- Worst case time complexity is quadratic
- Tight time bound is analyzed
- In our experiment, operation time is almost linear

- Future work
- Efficient implement of various operations
- Propose substring index on SDD
- Factor SDD construction algorithm

Thank you!