Probabilistic Similarity Query on Dimension Incomplete Data. Wei Cheng 1 , Xiaoming Jin 1 , and Jian-Tao Sun 2. Intelligent Data Engineering Group, School of Software, Tsinghua University 1 Microsoft Research Asia 2. ICDM 2009, Miami. Outline. Motivation & Problem Our Solution Experiments
Wei Cheng1, Xiaoming Jin1, and Jian-Tao Sun2
Intelligent Data Engineering Group, School of Software, Tsinghua University1
Microsoft Research Asia2
ICDM 2009, Miami
In order to process similarity query, imputation is necessary. (i.e. by “completing” the missing data by filling in specific values)
For an m-dimensional data object which has n elements missing, there will be Cmn cases to recover it.
lost one dimension
3 possible results after data recovery
Lower/upper bounds of the observed part, denoted by δLBobs and δUBobs.
Lower/upper bounds of the missing part, denoted by δLBmis and δUBmis.
Xrv =(2, , , 8, 7).
Denoted by: ,
Calculated in advance and stored in the database O(|Xobs|(|Q|-|Xobs|)2)
Calculated during query processing O(|Q|)
Query precision on S&P500 data set
Query recall on S&P500 data set
Query precision on IMAGE data set
Query recall on IMAGE data set
Confidence threshold vs precision-recall
Pruning power of probability triangle inequality
Pruning power of four pruners
Comparison of query quality
Many thanks! verification