Small world networks applications in document clustering and healthcare
Download
1 / 25

Small World Networks: Applications in Document Clustering and Healthcare - PowerPoint PPT Presentation


  • 101 Views
  • Uploaded on

Small World Networks: Applications in Document Clustering and Healthcare. Brant Chee Bruce Schatz University of Illinois http://www.beespace.uiuc.edu. Small World Graph. Clauset et al., 2004. Small World Graph. Characteristic Path Length The typical separation of nodes in a graph.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Small World Networks: Applications in Document Clustering and Healthcare' - kapono


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Small world networks applications in document clustering and healthcare

Small World Networks: Applications in Document Clustering and Healthcare

Brant Chee

Bruce Schatz

University of Illinois

http://www.beespace.uiuc.edu


Small world graph
Small World Graph and Healthcare

Clauset et al., 2004


Small world graph1
Small World Graph and Healthcare

  • Characteristic Path Length

    • The typical separation of nodes in a graph.

    • lrand ~ ln(N)/ln(z)

  • Clustering Coefficient C

    • Average fraction of pairs of neighbors of a node which are also neighbors of each other.

    • Average number of nodes that are cliques!

    • Crand~ z/N

  • Small World Graph

    • C >> Crand

    • L ≥ Lrand

    • N>> z >> ln(N)

Newman, 2000


Sw mi graph
SW MI Graph and Healthcare

Sole et al., 2003


Purpose so what
Purpose…So What? and Healthcare

  • Facilitate Exploratory Process

    • Search result clustering

    • Information discovery

  • Develop Middle Ground Algorithms

    • Interactive responses AND

    • Useful clusters

  • Language as a Small World Network

    • Make use of underlying structure of language


System overview
System Overview and Healthcare


Graph construction
Graph Construction and Healthcare

  • A node is a term in the index

    • Terms bounded by frequency cutoff.

    • Terms occurring < 5 documents > 25% documents are removed.

  • Edges between nodes are determined by Mutual Information

    • P(x,y) is calculated in a window of the size of the abstract

log2

Church and Hanks, 1989


What threshold
What threshold? and Healthcare


Where to cut
Where to cut? and Healthcare


Clustering algorithm
Clustering Algorithm and Healthcare

  • Clauset, Newman and Moore, 2004

  • Generalization for nodes based upon Newman’s algorithm.

  • Based upon modularity: The fraction of edges within communities versus the fraction falling at random in the same network. 0 if little community structure, between .3 if there is significant structure.

  • If just looking at the fraction of nodes within communities, then max modularity will always be when all nodes are in one cluster.

(ci,cj) = 1 if ci and cj are in the same community

2m=# of edges in graph


Experiments
Experiments and Healthcare

  • 3 clustering algorithms

    • Complete Link (Cluto)

    • K means (Cluto)

    • Small World


Test collections
Test Collections and Healthcare


Experimental setup
Experimental Setup and Healthcare

  • Parameters left at package defaults

  • Clustered with n = 50,100,150 and 200.

  • Clusters with less than 4 elements or more than 50 elements were eliminated and the clustering which resulted in less than 40 clusters was chosen to be evaluated.


Quantitative results
Quantitative Results and Healthcare


Conclusions
Conclusions and Healthcare

  • Developed Balanced Clustering System

    • Fast running time

    • Good clustering results

  • Modified Small World Algorithm

    • Clustered text based on language model

    • Produced many similar sized clusters


Social networks as small world networks
Social Networks as Small World Networks and Healthcare

  • Social Network

    • Network demonstrating who interacts with whom

      • Threaded messages in a Newsgroup

      • Create a network based on various characteristics

  • Homophily

    • Similar people tend to interact more than those who are dissimilar

      • Race, Age, Gender, Social Class


Social networks inform healthcare
Social Networks Inform Healthcare and Healthcare

  • You do what your peers do

  • Framingham Study

    • 20 years of data

    • Manually constructed networks

      • Smoking Cessation

      • Obesity

      • Happiness

  • Can we construct Social Networks automatically?


Social network construction and evaluation
Social Network Construction and Evaluation and Healthcare

  • We have lots of text available

    • 30K message groups from Yahoo! Health

  • Utilize threaded messaging to establish network

  • Our cognitive model is evident in what we write

    • Differentiate Schizophrenic from non-Schizophrenic

    • LIWC

      • Poets who commit suicide vs those that do not

      • Differentiate depressed vs non depressed college students

  • Sentiment – positive or negative polarity

    • Score – evaluation metric


Example message
Example Message and Healthcare

  • Hi All, I need your input. I'm havingabout 27,000 extra pre-ventricular beats in a24 hour period, per a Holter monitor test. Myelectrophysiologist and cardiologist agree thatI should go on <Link>sotalol</Link>/Betapace. They are putting me in the hospitalon February 26 to titrate me up on it. I'verefused the drug in the past because it is sucha dangerous drug. Is there anyone out there who couldgive me an idea of how you've done on thisdrug? I'd sure appreciate hearing about yourexperiences. Thanks so much.


Sentiment

Figure . Sentiment of messages mentioning Tysabri versus those that do not for two MS groups. Vertical bars indicate dates for FDA approval of Tysabri, voluntary withdrawal, and remarketing.

Sentiment


Results
Results those that do not for two MS groups. Vertical bars indicate dates for FDA approval of Tysabri, voluntary withdrawal, and remarketing.

  • Sentiment over all messages

    • Proxy for mental model – how happy they are

  • Difference in average sentiment between two people

    • Higher between random people in a network

    • Lower for pairs that are closely connected

  • Test methodology

    • Compare means of differences between highly connected nodes vs random pairs of nodes

  • T-Test for statistical significance

    • P-value < .0001 for 10 randomly selected groups


Acknowledgements
Acknowledgements those that do not for two MS groups. Vertical bars indicate dates for FDA approval of Tysabri, voluntary withdrawal, and remarketing.

  • Nyla Ismail for evaluating results

  • Todd Littell for the MI code


Questions
Questions? those that do not for two MS groups. Vertical bars indicate dates for FDA approval of Tysabri, voluntary withdrawal, and remarketing.

  • Live demonstration available at:

    • http://www.beespace.uiuc.edu


References
References those that do not for two MS groups. Vertical bars indicate dates for FDA approval of Tysabri, voluntary withdrawal, and remarketing.

Church, K. W. and Hanks, P., (1989). Word association norms, mutual information, and lexicography. in Proc. of the 27th Annual Conference of the Association of Computational Linguistics, (Vancouver, B.C.), ACM Press, 76-83.

Clauset, A., Newman, M. E. J., and Moore, C., (2004). Finding community structure in very large networks. Phys. Rev. E,70 (6), 066111.

Kuhlthau, C. C., (1989). Information search process: A Summary of research and implications for school library media programs. SLMQ, 18(1).

Newman, M. E. J., (2000). Models of the small world. J. Stat. Phys., 101, 819-841.

Solé, R., Ferrer-Cancho, R., Montoya, J. M., and Valverde, S., (2003). Selection, tinkering, and emergence in complex networks. Complexity, 8 (1), 20-33.


ad