1 / 32

PhD Research Proficiency Exam

PhD Research Proficiency Exam. Social Network Analysis using Link Mining. Jing Xia Laboratory for Knowledge Discovery in Databases Department of Computing and Information Sciences Kansas State University http://www.kddresearch.org http://www.cis.ksu.edu/~xiajing. Outline.

sven
Download Presentation

PhD Research Proficiency Exam

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PhD Research Proficiency Exam Social Network Analysis using Link Mining Jing Xia Laboratory for Knowledge Discovery in Databases Department of Computing and Information Sciences Kansas State University http://www.kddresearch.org http://www.cis.ksu.edu/~xiajing

  2. Outline • Social Network Introduction • Networks in Biological System • Mining on Social Network • Linking Mining • Multi Relational Mining • Problem Specification • Proposed approach

  3. Social Network Introduction • What is Social Network? • a social net work is a heterogeneous and multirelational data set represented by a graph. • Characteristics of Social Network • “Natural” Networks and Universality • Quantitative measures • Mining Social Network • Link Mining: Tasks and Challenges

  4. Society Nodes: individuals Links: social relationship(family/work/friendship/etc.) • S. Milgram (1967) “natural” network appears to be a universalSix Degrees of Separation • Society networks: Many individuals with diversesocial interactions between them. 2014年10月21日星期二 Data Mining: Concepts and Techniques

  5. Communication • The Earth is developing an electronic system, a network with diverse nodes and links are -computers -routers -satellites -phone lines -TV cables -EM waves Communication networks: Many non-identical components with diverseconnections between them.

  6. Epidemiology Nodes: doctors, patients, geological location Links: contact relationship(direct/indirect infectiousness)

  7. Characteristics of Social Network • Consider many kinds of networks: • social, technological, business, economic, content,… • These networks tend to share certain informal properties: • Multi relational interaction • Temporal (time-evolving) • large scale; continual growth • distributed, organic growth: vertices “decide” who to link to • mixture of local and long-distance connections • abstract notions of distance: geographical, content, social,…

  8. Social Network Theory • Do natural networks share more quantitative universals? • What would these “universals” be? • How can we make them precise and measure them? • How can we explain their universality? • This is the domain of social network theory • Sometimes also referred to as link analysis

  9. Quantitative Measure • Connected components: • how many, and how large? • Networkdiameter: • maximum (worst-case) or average? • exclude infinite distances? (disconnected components) • the small-world phenomenon

  10. Quantitative Measure • Clustering: • to what extent that links tend to cluster “locally”? • what is the balance between local and long-distance connections? • what roles do the two types of links play? • Degreedistribution: • what is the typical degree in the network? • what is the overall distribution?

  11. Outline • Social Network Introduction • Networks in Biological System • Problem Specification • Mining on Social Network • Linking Mining • Multi Relational Mining

  12. GENOME Protein-gene interaction PROTEOME protein-protein interactions METABOLISM Bio-chemical reactions Citrate Cycle Bio-Map

  13. PROTEOME protein-protein interactions Protein-Protein Interaction Network

  14. Protein-Protein Interaction Network • Nodes: proteins • Links: multi relational • physical interactions (binding) • complex membership • Pathway P. Uetz, et al.Nature 403, 623-7 (2000).

  15. Outline • Social Network Introduction • Networks in Biological System • Mining on Social Network • Linking Mining • Multi Relational Mining • Problem Specification • Proposed approach

  16. Link Mining • Traditional machine learning and data mining approaches assume: data is flat • Typical real data set • Instances in data set form linked networks • Link Mining • Newly emerging research area at the intersection of research in social network and link analysis, hypertext and web mining, graph mining, relational learning and inductive logic programming

  17. Link Mining Tasks • Object-Related Tasks • Link-based object ranking • Link-based object classification • Object clustering (group detection) • Object identification (entity resolution) • Link-Related Tasks • Link prediction • Graph-Related Tasks • Subgraph discovery • Graph classification

  18. Multi-relational Link Mining • Traditional link mining assume there is only one kind of relation in the network: link is flat • There exist multiple, heterogeneous social networks, each representing a particular kind of relationship • Multi-relational & heterogeneous

  19. Multi-relational Network • Multi-relational& heterogeneous Network • Multiple object and link types • Example Network • Medical network: patients, doctors, disease, contacts, treatments • Bibliographic network: years, publications, authors, venues • Epidemic transmission network (involve temporal data, multi-relational: airborne, patients’ contacts

  20. Outline • Social Network Introduction • Networks in Biological System • Mining on Social Network • Linking Mining • Multi Relational Mining • Problem Specification • Proposed approach

  21. Problem Specification • Phenomenon: Heterogeneity & Multi-relationship existsin many real network • Rationale: it might be useful for link mining • Problem • Can weutilize multi-relationship to helplink analysis • How to extract relations as relation network (RN)? • How toidentify relationship among relation network? (co-relation, independent, etc) • Is RN time-evolving? Which relation plays an important role?

  22. Problem Example1 • Application Domain: Epidemic Disease • Pre-condition 1: given multi relations -- patients’ contacts network in timeline • Pre-condition 2: sequential relationship among relations • Pre-condition 3: another medium of disease transmission • Problem: can we predict if any person will be infected, based on mining these multi-relational networks?

  23. Problem Example2 • Application Domain: bibliographic network • Pre-condition 1: given multi relations – the co-author relation networks of a conference in some years • Problem 1: what is the relationship among these relation networks • Problem 2: How can we utilize the relationship to meet the user’s query Mining Hidden Community in Heterogeneous Social Networks, Deng Cai, Zheng Shao, Xiaofei He, Xifeng Yan, and Jiawei Han, March, Report No. UIUCDCS-R-2005-2538 UILU-ENG-2005-1731

  24. Problem Example3 • Application Domain: bibliographic network • Pre-condition 1: given multi relations – the co-author networks of a conference in some years • Pre-condition 2: topics of publications • Problem: Can we predict if two researchers will be co-author in the future, based on two types of networks?

  25. Outline • Social Network Introduction • Networks in Biological System • Mining on Social Network • Linking Mining • Multi Relational Mining • Problem Specification • Proposed approach

  26. 10 0.03 0.04 9 10 9 12 0.10 2 12 2 8 1 0.08 0.02 0.13 11 3 8 1 0.13 11 3 0.04 4 4 6 5 0.05 6 5 0.13 7 7 0.05 Proposed approach • Random walk with restart Nearby nodes, higher scores More red, more relevant

  27. Proposed approach • Basic idea • RWR serves as a measure for proximity between two nodes in network • Model relationship among multi relations using RWR • Purpose • Facilitate mining more interesting patterns • Increase prediction accuracy

  28. Measure Relationship Q: what is most related conference to ICDM A: RWR! Neighborhood Formulation [Sun ICDM2005]

  29. Multi-Relational Model KDD author network ICDM author network relation network ICML author network PKDD author network

  30. Other Applications • Content-based Image Retrieval [He] • Personalized PageRank [Jeh], [Widom], [Haveliwala] • Anomaly Detection (for node; link) [Sun] • Link Prediction [Getoor], [Jensen] • Semi-supervised Learning [Zhu], [Zhou] • …

  31. Summary • Social Network Analysis • Linking mining • Problem: multi relational • Proposed approach

  32. Thank you

More Related