1 / 18

“Lost in the Middle of Nowhere” Graduate Student Presentation

“Lost in the Middle of Nowhere” Graduate Student Presentation. M. J. Gravier. Learning Bayesian Network Structure from Distributed Data. R. Chen, K. Sivakumar, H. Kargupta SIAM International Conference on Data Mining 2003. Overview. What is a Bayesian network? What problem is addressed?

aldona
Download Presentation

“Lost in the Middle of Nowhere” Graduate Student Presentation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. “Lost in the Middle of Nowhere”Graduate Student Presentation M. J. Gravier

  2. Learning Bayesian Network Structure from Distributed Data R. Chen, K. Sivakumar, H. Kargupta SIAM International Conference on Data Mining 2003

  3. Overview • What is a Bayesian network? • What problem is addressed? • What is the contribution?

  4. Bayesian Networks • “...state-of-the-art representation of probabilistic knowledge.” • Graphical diagrams • Probabilistic degrees of dependency • Efficient representation of a joint probability distribution Sun-Me Lee and Patricia Abbott, “Bayesian networks for knowledge discovery in large datasets: basics for nurse researchers,” Journal of Biomedical Informatics, 36 (2003):389-399.

  5. Simple Bayesian Network Day after rock concert (X1) Poor exam grade (X2) Mega headache (X3) “Structure Learning”: discovering relationships by - a dependence analysis method (constraint satisfaction problem, often based on hypothesis testing) - a search and score method (basically an optimization problem)

  6. Advantages of BN • Domain expert knowledge • Simple to understand • Captures interactions • Flexible re: missing information • Less influenced by sample size • Need conditional probabilities • Lack of software • Computational complexity Disadvantages of BN

  7. Typical Centralized Data Site 2 Site 1 Database Site 5 Site 3 Site 4

  8. What if its Decentralized? Different data at each site How do you create your Bayesian network model in this environment? Site 2 Site 1 Site 5 Site 3 Issues: - variable data can all be in one site - variable data may be in two or more sites - bandwidth Site 4

  9. Collective Learning • Local Learning • Sample selection • Cross learning • Combination of the results

  10. 1. Local Learning • Local variable: since all the information is available locally, the normal local scoring method works • But what about non-local variables?

  11. Cross Variables • Some local and some non-local parents • local links can be found • problem with cross links UlocalYlocalinstead of UlocalZnon-localYlocal U Z Y Site 2 Site 1

  12. 2. Sample Selection • Rank-base local models • low probabilities evidence of cross relationships • Send “keys” for models ranked below threshold ρfrom each site to a central site

  13. 3. Cross Learning • Keys from step 2 used to create a BN of cross relationships • ρselection is critical • try two different levels and retain common cross links as a noise reduction method • Cross learning eliminates hidden variables

  14. 4. Combination • Combine local & cross load BNs • All BNlocal assembled, then cross links added with cross load BN • Finds missing cross links for cross variables • Eliminates extra local links (hidden variable problem)

  15. ALARM network model on-line monitoring of ICU patients widely used BN benchmark Characteristics 37 nodes 5 cross variables 15,000 samples Experimental Validation

  16. Experimental Results • Learned correct structure • All cross links detected • ~10% of all samples transmitted

  17. Conclusion • Collective learning method learned same BN as centralized method • Small data transmission requirement • First approach to learn BN structure from heterogeneous data

  18. Questions?

More Related