1 / 23

Part-2 Qualifying Exam

Part-2 Qualifying Exam. Jiaqi Ge Department of Computer and Information Science Indiana University Purdue University Indianapolis June 20, 2011. 1. Roadmap. David H., “A Tutorial on Learning with Bayesian Networks.”

nelia
Download Presentation

Part-2 Qualifying Exam

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Part-2 Qualifying Exam JiaqiGe Department of Computer and Information Science Indiana University Purdue University Indianapolis June 20, 2011 1

  2. Roadmap • David H., “A Tutorial on Learning with Bayesian Networks.” • P. Sen, A. Deshpande, “Representing and Querying Correlated Tuples in Probabilistic Databases” • H. Kriegel, M. Pfeifle, “Density-Based Clustering of Uncertain Data.” • W. DuMouchel, D. Pregibon, “Empirical Bayes Screening for Multi-Item Associations” • C. C. Aggarwal, P. Yu, “A Survey of Uncertain Data Algorithms and Applications.”

  3. A Tutorial on Learning with Bayesian Networks David Heckerman Technical Report MSR-TR-95-06

  4. Bayesian Networks • A joint probability distribution of variables • Probabilistic inference • Learning parameters p(ɵs| D, Sh) from data • No missing data • Incomplete data

  5. Bayesian Networks (cont.) • Learning Structure from data • Sh is exponential in n. • Criteria for Model Selection • Search Methods • Structure and causal graph

  6. Bayesian Networks (cont.) • Advantage • Complete analysis of correlations between variables • Combination of domain knowledge and data • Disadvantage • Complexity of Learning Bayesian Networks purely from data is exponential

  7. Representing and Querying Correlated Tuples in Probabilistic Databases PrithvirajSen, AmolDeshpande ICDE, 2007

  8. A model to capture tuple correlations • Existential probabilistic database • A probabilistic distribution pr(X) over all possible worlds • Query evaluation • Intermediate tuple • Inference in probabilistic graph model

  9. Advantage • Model tuple correlations in probabilistic databases • Cast query evaluation to probabilistic inference • Issues • Inference in general probabilistic graph is NP-hard • Refine the graph model beyond direct combining the operators • Use an approximate approach in inference

  10. Density-Based Clustering of Uncertain Data Hans-Peter Kriegel, Martin Pfeifle SIGKDD 2005

  11. FDBSCAN • Integrate distance probability distribution in clustering • Distance pdf: Pd(o1,o2) • Core Object Probability • Reachability Probability • Add p to cluster, if Preach(p,o) > 0.5

  12. FDBSCAN

  13. FDBSCAN • Advantage • Integrate distance pdf in uncertain clustering • Experiments show that FDBSCAN outperforms other algorithms, with both high recall and precision • Issues • Distance pdf is approximated by sampling • The upper bound error rate of this approximation has not been stated • The global threshold in Preach(p,o) > 0.5 is arbitrary

  14. Empirical Bayes Screening for Multi-Item Association William DuMouchel, Daryl Pregibon SIGKDD 2001

  15. A criteria to assert association • A smoothed criterion to analyze correlation • R (Lift): sufficient for large supported Itemset • Empirical Bayes estimation: λ • Lower support • Reduce effect of noise • Given pairs (n,e), n > n*, and n~ Poi(λe)

  16. EXCESS2 • To find the multi-item association what cannot be explained by pairwise association • New baseline probability expectation • eAll2F = predicted frequency of all-two-factor model based on two-way distribution

  17. Advantage • measure association in not that frequent itemset • Robust to noise

  18. A Survey of Uncertain Data Algorithms and Applications Charu C. Aggarwal, Philip S. Yu TKDE, 2007.

  19. Uncertain Model • Possible worlds models • Probabilistic ?- table (tuple-level uncertain) • Independency assumption (inconsistency) • Probabilistic or-set table (Attribute-level uncertain) • Attribute modeled by its pdf

  20. Query Processing • Two semantics • Intension semantics • Complex, Accurate • Extension Semantics • Efficient, Approximate • Query with Correlations

  21. Indexing Uncertain Data • Nearest neighbor query • Probabilistic threshold query • Uncertain categorical data • Probabilistic equality query • Probabilistic equality threshold query • Distributional similarity threshold query • Join Processing • Probabilistic join query • Probabilistic similarity join

  22. Data mining Applications • Clustering • FDBSCAN • UK-Means • Classification • SVM • Frequent Pattern Mining • U-Aprior • On Density based general approach

  23. Thanks! • Questions?

More Related