- 82 Views
- Uploaded on
- Presentation posted in: General

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations

Shuhei Denzumi1, Ryo Yoshinaka2, 1, Shin-ichi Minato1,2, and Hiroki Arimura1

1) Hokkaido University2) JST ERATO Minato Discrete Structure Manipulation System Project

- Researches on string processing become active.
- Massive online data: The internet and sensing networks.
- String matching and string mining problems.

- Data mining
- Input data should be represented in compact form
- Computation under compressed structure is needed

Input

Data Structure

Result

Input

Compress

Operation

Input

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

- Manipulatable Compact data structure
- Represent data in compressed form
- Have operations to manipulate data in compacted style
- Get much attention for recent years

- Binary Decision Diagram (BDD)
- LSI area

- Deterministic Finite Automata (DFA)
- Natural Language Processing area

Input

D 1

Data Structure

Input

Compaction

Operation

D 3

Input

D 2

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

- Sequence Binary Decision Diagram (SeqBDD, SDD).
- Loekito, Bailey, and Pei (2009)
- Graph structure
- Represent finite sets of stringswith finite length

- SDDâ€™s basic properties are unknown
- Minimization
- Size complexity
- Operation time

- Application
- Data mining
- Graph mining
- Human genome sequencing

Text

Text

Text

â€¦

Sequence BinaryDecision Diagram

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011

- Compact representation for discrete structure
- With rich algebraic operations

BDD [Bryant 1986]

Boolean functions

xy âˆ¨ yzâˆ¨ zx

ï¿¢xyz âˆ¨ xï¿¢yzâˆ¨ xyï¿¢z

SDD [Loekito, et.al 2009]

Sets of strings

ZDD [Minato 1993]

Sets of combinations

{{a}, {b}, {a, b}}

{abc, acb, bac, bca}

{{a}, {b}, {c}, {a, b, c}}

{a, b, ab, bab, abbab}

- Relationship to Acyclic Deterministic Finite Automata (ADFA)
- Translation from an SDD to an ADFA and vice versa
- An SDD is never larger than an ADFA
- An SDD can be |Î£| times smaller than an ADFA

- Computational complexity of binary set operations
- Generalize eight set operations
- Tight analysis on time complexity for binary set operation algorithm

- Experimental results
- SDDs can be smaller than ADFAs
- Binary operation time

a

b

â€¦

z

1

0

- Î£: alphabet (totally ordered by â‰º)
- Internal node: , , , , 1/0 - terminal node: /
- 1/0 - edge: /
- SDD: directed acyclic graph
- Internal node S, Ï„(S) â†¦ ã€ˆS.lab, S.1, S.0ã€‰
- S.lab: label
- S.1: 1-child
- S.0: 0-child

- Ordering rule
- N.lab â‰º (N.0).lab

S

S.1

S.lab

â‰º

â‰º

â‰º

a

a

b

â€¦

z

b

c

S.0

1

0

- L(N): set of strings N represents
- L( ) = {Îµ}
- L( ) = {}
- L(N) = N.labãƒ»L(N.1)âˆªL(N.0)
- A path from the root to the 1-terminal noderepresent a string.

1

{aa, ab, bb}

{aa, ab, bb}

{aa, ab, bb}

{aa, ab, bb}

0

a

a

a

a

{a, b}

{a, b}

{a, b}

{a, b}

a

a

a

a

b

b

b

b

{bb}

{bb}

{bb}

{bb}

{b}

{b}

{b}

{b}

b

b

b

b

{Îµ}

{Îµ}

{Îµ}

{Îµ}

1

1

1

1

0

0

0

0

{}

{}

{}

{}

- ïƒ³ accept state
- ïƒ³ reject state

1

0

{aa, ab, bb}

a

a

b

b

c

c

{aa, ab, bb}

{a, b}

{b}

b

a

b

c

a

{a, b}

{bb}

a

b

a

a

b

c

b

a

b

b

1

0

aãƒ»{} âˆª L(N.0) = L (N.0)

Nâ€™

- Suppression
- N.1 â‰ 0-terminal node
- In ADFA, removing edges pointing dead state

- Merging
- Ï„(N) = Ï„(Nâ€™) â‡’ N = Nâ€™
- In ADFA, share all equivalent nodes

- Under these rules, SDD is unique and minimal
- Like ADFAâ€™s have unique canonical form

x

N.0

N.0

N

N

N.1

N.1

x

x

a

N.0

N.0

0

- Almost isomorphic to Acyclic Deterministic Finite Automata
- BDD/ZDD techniques are applicable
- Binary form
- Simple recursive algorithm
- Easy to implement

- Rich collections of operations
- Use of hash tables
- To share equivalent nodes
- To share intermediate computations

BDD/ZDD

ADFA

SDD

- An SDD node correspond to an ADFA edge
- The description size is proportional to|N|: the number of internal nodes in SDD N|A|: the number of edges in ADFA A

a

b

c

a

b

c

- For equivalent an SDD and an ADFA
- From an ADFA A to an SDD N
- From an SDD N to an ADFA A
- SDD |Î£| times can be smaller than ADFA

a

e

c

c

d

e

a

b

b

d

d

c

e

{anbicj, n = 0, â€¦, 4, i, j = 0, 1}

ADFA A

SDD S

a

a

a

1

c

a

b

b

c

c

a

b

c

c

b

a

b

c

b

c

a

a

|S| = 6

|A| = 14

- Input: Canterbury corpus
- BibleAll: bible.txt, BibleBi: all bigrams from bible.txt, Ecoli: E.coli.txt
- Fac means store all fanctors of input data

P

Q

- A binary set operationâ™¢ âˆˆ {âˆª, âˆ©, ï¼¼, â€¦}
- Input: two SDDs P, Q
- Output: SDD Rsuch thatL(R) = L(P) â™¢ L(Q)

Binary Set Operation

P â™¢Q

- Originally for BDD [Bryant 1986], applied to SDD
- Based on the definition L(N) = N.lab ãƒ» L(N.1) âˆª L(N.0)
- In operation, (when P.lab = Q.lab)L(P) â™¢ L(Q) = P.lab ãƒ» (L(P.1) â™¢ L(Q.1)) âˆª (L(P.0) â™¢ L(Q.0))

P

Q

Pâ™¢Q

a

a

a

P1

P0

â™¢

Q1

P1â™¢Q1

Q0

P0â™¢Q1

- Key-Value hashtables
- Uniquetable
- Key: ã€ˆletter x, SDD node N1, SDD node N0ã€‰
- Value: SDD node N with Ï„(N) = ã€ˆx, N1, N0ã€‰

- Opcache
- Key: ã€ˆoperation id â™¢, SDD node P, SDD node Qã€‰
- Value: SDD node R which is R = P â™¢ Q

P

â™¢

Q

N1

P â™¢Q

N

x

Uniquetable

Opcache

Key (triple)

Key (triple)

ã€ˆâ™¢, P, Qã€‰

ã€ˆx, N1, N0ã€‰

N0

Value (node)

Value (node)

R

N

- Any SDD node needed during computation is created via this process
- Once an internal node is registered in Uniquetable, equivalent nodes will not created anymore.

Check the Uniquetable for key ã€ˆx, N1, N0ã€‰.

Exist

Not exist

Return it.

Create a new node and return it.

- When P â™¢ Q is executed
- Every operation use Opcache
- At most |P| Ã—|Q| different instances of recursive calls invoke
- (Assume that the access time to hash tables is constant)

- NaÃ¯ve method
- Prepare |P| Ã— |Q| size table

- This method
- No useless or redundant node

- Theorem
- Worst case O(|P| |Q|) time
- Example needs Î©(|P| |Q|) time exist
- Lower and upper bound got

Check the Opcachefor key ã€ˆâ™¢, P, Qã€‰.

Exist

Not exist

P â™¢ Q is already done,

return it.

Continue to computation on 0-side and 1-side.

- Operation time
- Prepare two SDDs for all factors of random texts of length n
- Time to compute operation

- Relationship to Acyclic Automata
- An SDD can be |Î£| times smaller than an ADFA
- For real data, SDDs are 10~20 % more compact than ADFAs

- Computational complexity of binary set operations
- Worst case time complexity is quadratic
- Tight time bound is analyzed
- In our experiment, operation time is almost linear

- Future work
- Efficient implement of various operations
- Propose substring index on SDD
- Factor SDD construction algorithm

Thank you!