490 likes | 605 Views
Explore PXML, a model storing uncertain information in probabilistic environments, with algebraic operations for data manipulation.
E N D
PXML: A Probabilistic Semistructured Data Model and Algebra Edward Hung, Lise Getoor, V.S. Subrahmanian University of Maryland, College Park ICDE, Bangalore, India, Mar 2003
Outline • Motivating example • Semistructured data model • PXML data model • Semantics • Algebra • Probabilistic point query • Related work
Motivating Example • Surveillance applications monitoring a region of battlefield • Image processing system identifies vehicles in convoys appearing in the region in different time • Convoys • timestamp • tanks, trucks, etc • Uncertainty • number of vehicles • Category and identity of a vehicle, e.g., a tank? T-72?
Motivating Example • Semistructured data model • General hierarchical structure is known. • The schema is not fixed • Number of vehicles • Properties of vehicles • Our work: store uncertain information in probabilistic environments.
Semistructured Data Model • Example
PIXML Data Model • Uncertainty • Existence of sub-objects • Number of sub-objects • Identity of the sub-objects
card(convoy2, ts)=[1,1] Time = 15 card(convoy2, truck)=[1,2] PIXML Data Model (Cardinality) • Example of cardinality Weak Instance W = Semistructured Instance + card
PIXML Data Model (Weak Instance) • Example of a weak instance W card(S1,convoy)=[2,2] card(convoy1,ts)=[1,1] card(convoy1,truck)=[1,1] card(convoy1,tank)=[1,1] card(convoy2,ts)=[1,1] card(convoy2,truck)=[1,2]
PIXML Data Model • Example of an instance compatible with W card(convoy1,ts)=[1,1] card(convoy1,truck)=[1,1] card(S1,convoy)=[2,2] card(convoy1,tank)=[1,1] card(convoy2,ts)=[1,1] card(convoy2,truck)=[1,2]
D(W)= the set of all semistructured instances compatible with the weak instance W
card(convoy2, ts)=[1,1] Time = 15 Time = 15 card(convoy2, truck)=[1,2] Time = 15 Time = 15 Potential child set of convoy2, PC(convoy2) = {{ts2, truck3, truck4}, {ts2, truck3}, {ts2, truck4}}
card(convoy2, ts)=[1,1] Time = 15 Time = 15 Time = 15 card(convoy2, truck)=[1,2] Time = 15 Object probability function (OPF) for convoy2 w.r.t. W is a mapping w: PC(convoy2) [0,1] s.t. wconvoy2({ts2, truck3 , truck4}) = 0.2 wconvoy2({ts2, truck3}) = 0.5 wconvoy2({ts2, truck4}) = 0.3
Semantics (Local Interpretation) • Interpretation • Local interpretation, p • a mapping from the set of non-leaf objects to OPFs • Example • p(convoy2) = wconvoy2
Semantics (Local Interpretation) • Here the opf assigns the probability to each possible set of children. • More independence assumptions are possible to make the representation more compact • e.g. independence between trucks and tanks. • e.g. all trucks are all indistinguishable.
Semantics (Global Interpretation) • Previously, probabilities are assigned to the actual children of each non-leaf object in a local manner. • Now we are going to assign probabilities of each compatible instance globally.
Semantics (Global Interpretation) • Interpretation • Global interpretation, P • a mapping from D(W) (the set of semistructured instances compatible with W) to [0,1] s.t.
S1a S1b S1c P(S1a) = 0.12 P(S1b) = 0.08 P(S1c) = 0.2 S1d S1e S1f P(S1d) = 0.18 P(S1e) = 0.12 P(S1f) = 0.3
Semantics (Local Global) • We have defined operators to convert between local and global interpretations. • Theorems (Reversibility) • The conversions from local to global interpretation and from global to local interpretation are correct. • The conversion between local and global interpretations is reversible.
Algebra • Operators • Projection • Selection • Cross-product • Path expression • o.l1.l2…ln S1.convoy.truck
Algebra (Projection) • Ancestor projection • Descendant projection • Single projection
Algebra (Projection) Semistructured Instance • Ancestor projection ( )
Globally • Ancestor projection ( )
Probabilistic Instance • Ancestor projection ( ) card(convoy1,ts)=[1,1] card(I2,convoy)=[1,1] card(convoy1,truck)=[1,1] p(I2)({convoy1})=0.8 card(convoy1,tank)=[1,1] p(convoy1)({ts1,truck1,tank1})=0 p(convoy1)({ts1,truck1,tank2})=0.1 p(convoy1)({ts1,truck2,tank1})=0.3 p(convoy1)({ts1,truck2,tank2})=0.6 PC(convoy1) card(I2,convoy)=[1,1] card(convoy1,truck)=[1,1] After normalization, p(I2)({convoy1})=1 Children of convoy1 before = CI2(convoy1)={ts1, truck1, truck2, tank1, tank2} Children of convoy1 after = CI2’(convoy1)={truck1, truck2} PC’(convoy1)={{truck1},{truck2}}
Probabilistic Instance • Ancestor projection ( ) card(convoy1,ts)=[1,1] card(I2,convoy)=[1,1] card(convoy1,truck)=[1,1] p(I2)({convoy1})=0.8 card(convoy1,tank)=[1,1] p(convoy1)({ts1,truck1,tank1})=0 p(convoy1)({ts1,truck1,tank2})=0.1 p(convoy1)({ts1,truck2,tank1})=0.3 p(convoy1)({ts1,truck2,tank2})=0.6 PC(convoy1) card(I2,convoy)=[1,1] card(convoy1,truck)=[1,1] After normalization, p(I2)({convoy1})=1 For {truck1}, p(convoy1)({truck1}) = 0 + 0.1 = 0.1 For {truck2}, p(convoy1)({truck2}) = 0.3 + 0.6 = 0.9 After normalization, p(convoy1)({truck1}) = 0.1, p(convoy1)({truck2}) = 0.9
Ancestor Projection • Experiments • running time is linear to the number of objects (selected objects and their ancestors) • time to update the OPF entries of an object o is sub-quadratic to the number of OPF entries
card(I7, convoy)=[1,2], wI7({convoy1})=0.2, wI7({convoy2})=0.5, wI7({convoy1,convoy2})=0.3 Algebra (Selection) ( ) card(convoy1, tank)=[1,1] wconvoy1({tank1})=0.3, wconvoy1({tank2})=0.7 card(convoy2, tank)=[1,1] wconvoy2({tank2})=0.4, wconvoy2({tank3})=0.6 0.14 +0.2 +0.036 +0.084 +0.126 =0.586 D(I7) 0.036 / 0.586 0.06 0.054 0.14 / 0.586 0.084 0.2 / 0.586 / 0.586 0.3 0.126 / 0.586
Algebra (Cross product (x)) card(I4, truck)=[1,1] p(I4)({truck1})=0.2 p(I4)({truck2})=0.8 card(I5, tank)=[1,1] p(I5)({tank1})=0.1 p(I5)({tank2})=0.9 card(I6, truck)=[1,1] card(I6, tank)=[1,1] I4 x I5 p(I6)({truck1, tank1})=0.2*0.1=0.02
Algebra (Cross product (x)) card(I4, truck)=[1,1] p(I4)({truck1})=0.2 p(I4)({truck2})=0.8 card(I5, tank)=[1,1] p(I5)({tank1})=0.1 p(I5)({tank2})=0.9 card(I6, truck)=[1,1] card(I6, tank)=[1,1] I4 x I5 p(I6)({truck1, tank1})=0.2*0.1=0.02 p(I6)({truck1, tank2})=0.2*0.9=0.18
Algebra (Cross product (x)) card(I4, truck)=[1,1] p(I4)({truck1})=0.2 p(I4)({truck2})=0.8 card(I5, tank)=[1,1] p(I5)({tank1})=0.1 p(I5)({tank2})=0.9 card(I6, truck)=[1,1] card(I6, tank)=[1,1] I4 x I5 p(I6)({truck1, tank1})=0.2*0.1=0.02 p(I6)({truck1, tank2})=0.2*0.9=0.18 p(I6)({truck2, tank1})=0.8*0.1=0.08
Algebra (Cross product (x)) card(I4, truck)=[1,1] p(I4)({truck1})=0.2 p(I4)({truck2})=0.8 card(I5, tank)=[1,1] p(I5)({tank1})=0.1 p(I5)({tank2})=0.9 card(I6, truck)=[1,1] card(I6, tank)=[1,1] I4 x I5 p(I6)({truck1, tank1})=0.2*0.1=0.02 p(I6)({truck1, tank2})=0.2*0.9=0.18 p(I6)({truck2, tank1})=0.8*0.1=0.08 p(I6)({truck2, tank2})=0.8*0.9=0.72
card(I7, convoy)=[1,2], wI7({convoy1})=0.2, wI7({convoy2})=0.5, wI7({convoy1,convoy2})=0.3 Probabilistic Point Query card(convoy1, tank)=[1,1] wconvoy1({tank1})=0.3, wconvoy1({tank2})=0.7 card(convoy2, tank)=[1,1] wconvoy2({tank2})=0.4, wconvoy2({tank3})=0.6 0.14 +0.2 +0.036 +0.084 +0.126 =0.586 D(I7) 0.036 0.06 0.054 0.14 0.084 0.2 0.3 0.126
card(I7, convoy)=[1,2], wI7({convoy1})=0.2, wI7({convoy2})=0.5, wI7({convoy1,convoy2})=0.3 Probabilistic Point Query card(convoy1, tank)=[1,1] wconvoy1({tank1})=0.3, wconvoy1({tank2})=0.7 card(convoy2, tank)=[1,1] wconvoy2({tank2})=0.4, wconvoy2({tank3})=0.6 D(I7) 0.2*0.7+0.5*0.4+0.3*(1-(1-0.7)*(1-0.4))
card(I7, convoy)=[1,2], wI7({convoy1})=0.2, wI7({convoy2})=0.5, wI7({convoy1,convoy2})=0.3 Probabilistic Point Query card(convoy1, tank)=[1,1] wconvoy1({tank1})=0.3, wconvoy1({tank2})=0.7 card(convoy2, tank)=[1,1] wconvoy2({tank2})=0.4, wconvoy2({tank3})=0.6 D(I7) 0.2*0.7+0.5*0.4+0.3*(1-(1-0.7)*(1-0.4))
card(I7, convoy)=[1,2], wI7({convoy1})=0.2, wI7({convoy2})=0.5, wI7({convoy1,convoy2})=0.3 Probabilistic Point Query card(convoy1, tank)=[1,1] wconvoy1({tank1})=0.3, wconvoy1({tank2})=0.7 card(convoy2, tank)=[1,1] wconvoy2({tank2})=0.4, wconvoy2({tank3})=0.6 D(I7) 0.2*0.7+0.5*0.4+0.3*(1-(1-0.7)*(1-0.4)) = 0.14+0.2+0.246 = 0.586
Related Work • Another paper of interval probability version in ICDT 2003: • Semantics • Interpretations • Satisfaction • Consistency • Query and r-answer (objects satisfying the query with minimal probability no less than r)
Related Work • Semistructured Probabilistic Objects (SPOs) (Dekhtyar, Goldsmith, Hawkes, in SSDBM, 2001) • SPO: express contexts (not random variables) in a semistructured manner • PXML data model stores XML data AND probabilistic information.
Related Work • ProTDB (Nierman, Jagadish, in VLDB, 2002) • Independent probabilities assigned to each child VS arbitrary distributions over sets of children • Tree-structured VS arbitrary acyclic • Our model theory provides two formal semantics • We propose a set of algebraic operators and point probabilistic query
Questions and Answers Thank you very much!
Future Work • System implementation • Query optimization
Summary • PIXML data model • Semistructured instance • Weak instance (add cardinality) • Probabilistic instance (add ipf) • Semantics • Local and Global • Interpretation • Satisfaction
Related Work • Semistructured Probabilistic Objects (SPOs) (Dekhtyar, Goldsmith, Hawkes, in SSDBM, 2001) • SPO: express contexts (not random variables) in a semistructured manner • PIXML data model stores XML data AND probabilistic information.
Related Work • ProTDB (Nierman, Jagadish, in VLDB, 2002) • Point probabilities VS interval probabilities • Independent probabilities assigned to each child VS arbitrary distributions over sets of children • Tree-structured VS arbitrary acyclic • Our model theory provides two formal semantics • Differences in their queries and our algebra and query.
Future Work • System implementation • Query optimization
Summary • PXML data model • Semistructured instance • Weak instance (add cardinality) • Probabilistic instance (add ipf) • Semantics • Local and Global • Interpretation • Satisfaction • Algebra • Projections, selection, cross product
Algebra (Projection) • Equivalence Equivalent
Algebra (Projection) • Equivalence Equivalent e1 and e2 are a sequence of zero or more edges. Thus, I.e1.lm can include I.lm, I.l1.lm, I.l2.l3.lm, etc.
Algebra (Cross product) • Equivalence • (I1 x I2) x I3 • I1 x (I2 x I3) • (I1 x I3) x I2 Equivalent
Related Work • Bayesian net (Pearl, 1988) • random variables (probability of events) • ours: existence of children requires existence of parents