slide1
Download
Skip this Video
Download Presentation
Symmetry Detection in Constraint Satisfaction Problems & Its Application in Databases

Loading in 2 Seconds...

play fullscreen
1 / 33

Symmetry Detection in Constraint Satisfaction Problems & Its Application in Databases - PowerPoint PPT Presentation


  • 81 Views
  • Uploaded on

Symmetry Detection in Constraint Satisfaction Problems & Its Application in Databases Berthe Y. Choueiry Constraint Systems Laboratory Department of Computer Science & Engineering University of Nebraska-Lincoln Joint work with Amy Beckwith-Davis, Anagh Lal, and Eugene C. Freuder

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Symmetry Detection in Constraint Satisfaction Problems & Its Application in Databases' - tilden


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1
Symmetry Detection in

Constraint Satisfaction Problems

& Its Application in Databases

Berthe Y. Choueiry

Constraint Systems Laboratory

Department of Computer Science & Engineering

University of Nebraska-Lincoln

Joint work with Amy Beckwith-Davis, Anagh Lal, and Eugene C. Freuder

Supported by NSF CAREER award #0133568

outline
Outline
  • Definitions
    • CSP
    • Interchangeability
    • Bundling
  • Bundling in CSPs
  • Bundling for join query computation
  • Conclusions
constraint satisfaction problem csp

V1

V2

{c, d, e, f}

{d}

V4

V3

{a, b, d}

{a, b, c}

Constraint Satisfaction Problem (CSP)
  • GivenP = (V, D, C)
    • V : set of variables
    • D : set of their domains
    • C : set of constraints (relations) restricting the acceptable combination of values for variables
    • Solution is a consistent assignment of values to variables
  • Query: find 1 solution, all solutions, etc.
  • Examples: SAT, scheduling, product configuration
  • NP-Complete in general
backtrack search

Solution

V1 d

V2  e

V3  a

V4  c

V1

d

V1

V2

{ c, d, e, f}

{d}

V2

{c,d,e,f}

V3

V4

{a,b,d}

V3

{a, b, d}

{a, b, c}

V4

{a,b,c}

Backtrack search

S

  • DFS + backtracking (linear space)
    • Variable being instantiated: current variable
    • Un-instantiated variables: futurevariables
    • Instantiated variables: pastvariables
  • + Constraint propagation
    • Backtrack search with forward checking (FC)

d

V1

V2

c

e

f

d

V3

interchangeability freuder 91

V1

V2

{ c, d, e, f}

{d}

In every solution

V1 d

V1 d

V1 

V2  c

V2  c

V4

V2  {d, e, f}

V3

{a, b, d}

{a, b, c}

V3 a

V3 b

V3 

V4 b

V4 a

V4 

Interchangeability [Freuder, 91]
  • Captures the idea of symmetry between solutions
  • Functional interchangeability
    • Any mapping between two solutions
    • Including permutation of values across variables, equivalent to graph isomorphism
  • Full interchangeability (FI)
    • Restricted to values of a single variable
    • Also, likely intractable
value interchangeability freuder 91
Value interchangeability [Freuder, 91]
  • Full Interchangeability (FI):
    • d, e, finterchangeable for V2 in any solution
  • Neighborhood Interchangeability (NI):
    • Considers only the neighborhood of the variable
    • Finds e, f but misses d
    • Efficiently approximates FI
    • Discrimination tree DT(V2)

{c, d, e, f }

{d}

V1

V2

{a, b, d}

{a, b, c}

V3

V4

outline1
Outline
  • Definitions
  • Bundling in CSPs
    • Static bundling
    • Dynamic bundling
    • Dynamic bundling for non-binary CSPs
  • Bundling for join query computation
  • Conclusions
bundling using ni in search

V1 d

V2  {e,f}

V3  a

V1

V2

{ c, d, e, f }

{d}

S

V1

d

V4

V3

V2

{a, b, d}

{a, b, c}

c

e, f

d

Bundling: using NI in search

V1

{ c, d, e, f }

V2

{ c, d, e, f }

{ d, c, e, f }

V4  {b,c}

V3

Static bundling

V4

  • Static bundling [Haselböck, 93]
    • Before search: compute & store NI sets
    • During search:
      • Future variables: remove bundle of equivalent values
      • Current variable: assign a bundle of equivalent values
  • Advantages
    • Reduces search space
    • Creates bundled solutions
dynamic bundling dynbndl 2001

V1

V2

{ c, d, e, f }

{d}

S

S

V1

V1

d

d

V4

V3

V2

V2

{a, b, d}

{a, b, c}

c

e, f

d

c

d, e, f

Dynamic bundling (DynBndl) [2001]
  • Dynamically identifies NI
  • Using discrimination tree for forward checking:
    • is never less efficient than BT & static bundling

<V3,a>

<V3,b>

<V4,a>

<V3,d>

<V4,a>

<V4,b>

<V4,c>

<V4,b>

V2,{c}

V2,{d,e,f}

Static bundling

Dynamic bundling

non binary csps

V

{1, 2, 3}

Constraint

V3

{1, 2, 3,

4, 5, 6}

Variable

C2

{1, 2, 3}

V2

C1

V4

{1, 2, 3}

C3

{1, 2, 3}

V1

C4

Non-binary CSPs
  • Scope(Cx): the set of variables involved in Cx
  • Arity(Cx): size of scope

Computing NI for non-binary CSPs is not a trivial extension from binary CSPs

ni for non binary csps 2003 2005

C2

V

{1, 2, 3,

4, 5, 6}

V3

C1

V2

V4

C3

V1

C4

{1, 2}

{3, 4}

{5}

{6}

NI for non-binary CSPs [2003,2005]
  • Building an nb-DT for each constraint
    • Determines the NI sets of variable given constraint
  • Intersecting partitions from nb-DTs
    • Yields NI sets of V (partition of DV)
  • Processing paths in nb-DTs
    • Gives, for free, updates necessary for forward checking

Root

Root

{5}

{1, 2}

{5, 6}

{3, 4}

{3, 4}

{6}

{1, 2}

nb-DT(V, C1)

nb-DT(V, C2)

robust solutions

V1 d

V1 d

V2  {e,f}

V2  e

V3  a

V3  a

V4  {b,c}

V4  c

Robust solutions

Single Solution

Static bundling

Dynamic bundling

  • Solution bundle
    • Cartesian product of domain bundles
    • Compact representation
    • Robust solutions
  • Dynamic bundling finds larger bundles

V1 d

V2  {d,e,f}

V3  a

V4  {b,c}

dynbndl worth the effort
DynBndl: worth the effort?
  • Finds larger bundles
  • Enables forward checking at no extra cost
  • Does not cost more than BT or static bundling
    • Cost model:
      • # nodes visited by search
      • # constraint checks made
    • Theoretical guarantee holds
      • for finding all solutions
      • under same variable ordering
  • Finding first solution ?
    • Experiments uncover an unexpected benefit
bundling of no goods

V

{3, 4}

{1, 2}

V3

V

{1, 2, 3}

C2

{1, 2, 3,

4, 5, 6}

V1

{1, 3}

{1}

V2

C1

{1, 2, 3}

V4

{1, 2, 3}

{1}

C3

{3}

V2

V1

C4

{1, 2, 3}

V3

{2}

{1}

V4

Bundling of no-goods…
  • … is particularly effective

No-good bundle

Solution bundle

experimental set up

Mostly

un-solvable

instances

Mostly

solvable instances

Cost of solving

Order parameter

Critical value

Experimental set-up
  • CSP parameters:
    • n: number of variables {20,30}
    • a: domain size {10,15}
    • t: constraint tightness [25%, 75%]
    • CR: constraint ratio (arity: 2, 3, 4)
    • 1,000 instances per tightness value
  • Phase transition
  • Performance measures
    • Nodes visited (NV)
    • Constraint checks (CC)
    • CPU time
    • First Bundle Size (FBS)
empirical evaluations
Empirical evaluations
  • DynBndl versus FC (BT + forward checking)
  • Randomly generated problems, Model B
  • Experiments
    • Effect of varying tightness
    • In the phase-transition region
      • Effect of varying domain size
      • Effect of varying constraint ratio (CR)
  • ANOVA to statistically compare performance of DynBndl and FC with varying t
  • t-distribution for confidence intervals
analysis varying tightness
Analysis: Varying tightness
  • Low tightness
    • Large FBS
      • 33 at t=0.35
      • 2254 (Dataset #13, t=0.35)
    • Small additional cost
  • Phase transition
    • Multiple solutions present
    • Maximum no-good bundling causes max savings in CPU time, NV, & CC
  • High tightness
    • Problems mostly unsolvable
    • Overhead of bundling minimal

FC

20

n=20

t FBS

0.350 33.44

a=15

18

Time [sec]

DynBndl

0.400 10.91

CR=CR3

16

#NV, hundreds

0.425 7.13

0.437 6.38

14

0.450 5.62

12

0.462 2.37

FC

0.4750.66

10

0.500 0.03

NV

8

0.550 0.00

6

DynBndl

4

2

CPU time

0

0.325

0.35

0.375

0.4

0.425

0.45

0.475

0.5

0.525

0.55

0.575

0.6

Tightness

analysis varying domain size
Analysis: Varying domain size
  • Increasing a in phase-transition
    • FBS increases: More chances for symmetry
    • CPU time decreases: more bundling of no-goods

Increasing a (n=30)

Because the benefits of DynBndl increase with increasing domain size, DynBndl is particularly interesting for database applications where large domains are typical

outline2
Outline
  • Definitions
  • Bundling in CSPs
  • Bundling for join query computation
    • Idea
    • A CSP model for the query join
    • Sorting-based bundling algorithm
    • Dynamic-bundling-based join algorithm
  • Conclusions
slide20

The join query

Join query

  • SELECT R2.A,R2.B,R2.C
  • FROM R1,R2
  • WHERE R1.A=R2.A
  • AND R1.B=R2.B
  • AND R1.C=R2.C

(compacted)

R1

R2

Result:

10 tuples in

3 nested tuples

A

B

C

{1, 5}

{12, 13, 14}

{23}

{2, 4}

{10}

{25}

{6}

{13, 14}

{27}

databases csps
Databases & CSPs
  • Same computational problems, different cost models
    • Databases: minimize # I/O operations
    • CSP community: # CPU operations
  • Challenges for using CSP techniques in DB
    • Use of lighter data structures to minimize memory usage
    • Fit in the iterator model of database engines
modeling join query as a csp

R1.A

R1.B

R1.C

R2

R1

R2.C

R2.A

R2.B

Modeling join query as a CSP
  • Attributes of relations  CSP variables
  • Attribute values  variable domains
  • Relations  relational constraints
  • Join conditions  join-condition constraints
  • SELECT R1.A,R1.B,R1.C
  • FROM R1,R2
  • WHERE R1.A=R2.A
  • AND R1.B=R2.B
  • AND R1.C=R2.C
join operator
Join operator
  • R1 xyR2
    • Most expensive operator in terms of I/O
    •  is “=”  Equi-Join
      • x is same as y  Natural Join
  • Join algorithms
    • Nested Loop
    • Sorting-based
      • Sort-Merge, Progressive Merge-Join (PMJ)
      • Partitions relations by sorting, minimizes # scans of relations
    • Hashing-based
join query

R1.A

R1.B

R1.C

R2

R1

R2.C

R2.A

R2.B

Join query
  • R1 xyR2
    • Most expensive operator in terms of I/O
    •  is “=”  Equi-Join
      • x is same as y  Natural Join
  • CSP model
    • Attributes of relations  CSP variables
    • Attribute values  variable domains
    • Relations  relational constraints
    • Join conditions  join-condition constraints
  • SELECT R1.A,R1.B,R1.C
  • FROM R1,R2
  • WHERE R1.A=R2.A
  • AND R1.B=R2.B
  • AND R1.C=R2.C
progressive merge join
Progressive Merge Join
  • PMJ: a sort-merge algorithm [Dittrich et al. 03]
  • Two phases
    • Sorting: sorts sub-sets of relations &
    • Merging phase: merges sorted sub-sets
  • PMJ produces early results
  • We use the framework of the PMJ
new join algorithm
New join algorithm
  • Sorting & merging phases
    • Load sub-sets of relations in memory
    • Compute in-memory join using dynamic bundling
      • Uses sorting-based bundling (shown next)
      • Computes join of in-memory relations using dynamically computed bundles
slide27

Sorting-based bundling

R1.A

  • Heuristic for variable ordering

Place variables linked by join conditions as close to each other as possible

R2.A

R1

R1.B

R2.B

R2

R1.C

R2.C

  • Sort relations using above ordering
  • Next: Compute bundles of variable ahead in variable ordering (R1.A)
computing a bundle of r1 a
Computing a bundle of R1.A
  • Partition of a constraint
    • Tuples of the relation having the same value of R1.A
  • Compare projected tuples of first partition with those of another partition
  • Compare with every other partition to get complete bundle

R1

A

B

C

1

12

23

Partition

1

13

23

1

14

23

Unequal

partitions

2

10

25

Symmetric

partitions

5

12

23

5

13

23

5

14

23

Bundle

{1, 5}

slide29

Finding the valid bundle

Common

{1, 5}

  • Compute a bundle for the attribute
  • Check bundle validity with future constraints
  • If no common value ‘backtrack’

 Assign variable with the surviving values in the bundle

{1, 5, x}

{1, 5, y, z}

experiments
Experiments
  • XXL library for implementation & evaluation
  • Data sets
    • Random: 2 relations R1, R2 with same schema as example
        • Each relation: 10,000 tuples
        • Memory size: 4,000 tuples
        • Page size 200 tuples
    • Real-world problem: 3 relations, 4 attributes
  • Compaction rate achieved
    • Random problem: 1.48
        • Savings even with (very) preliminary implementation
    • Real-world problem: 2.26 (69 tuples in 32 nested tuples)
outline3
Outline
  • Definitions
  • Bundling in CSPs
  • Bundling for join query computation
  • Conclusions
    • Summary
    • Future research
summary
Summary
  • Dynamic bundling in finite CSPs
    • Binary and non-binary constraints
    • Produces multiple robust solutions
    • Significantly reduces cost of search at phase transition
  • Application to join-query computation

Constraint Processing inspires innovative solutions to fundamental difficult problems in Databases

future research
Future research
  • CSPs
    • Only scratched the surface:
    • interchangeability + decomposition [ECAI 1996],
    • partial interchangeability [AAAI 1998],
    • tractable structures
  • Databases
    • Investigate benefit of bundling
      • Sampling operator
      • Main-memory databases
      • Automatic categorization of query results
  • Constraint databases
    • Design bundling mechanisms for gap & linear constraints over intervals (spatial databases)
ad