Easier than excel social network analysis of docgraph with gephi
1 / 48

Easier than Excel: Social Network Analysis of DocGraph with Gephi - PowerPoint PPT Presentation

  • Uploaded on

Easier than Excel: Social Network Analysis of DocGraph with Gephi. Janos G. Hajagos Stony Brook School of Medicine Fred Trotter fredtrotter.com. DocGraph. Based on FOIA request to CMS by Fred Trotter Pre-released at Strata RX 2012 Medicare providers (more than doctors)

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Easier than Excel: Social Network Analysis of DocGraph with Gephi' - kiona

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Easier than excel social network analysis of docgraph with gephi
Easier than Excel: Social Network Analysis of DocGraph with Gephi

  • Janos G. Hajagos

  • Stony Brook School of Medicine

  • Fred Trotter

  • fredtrotter.com

DocGraph Gephi

  • Based on FOIA request to CMS by Fred Trotter

  • Pre-released at Strata RX 2012

  • Medicare providers (more than doctors)

  • CY 2011 dates of service

  • Share 11 or more patients in a 30 day forward window

  • Initial access restricted to MedStartr funders

Docgraph by the numbers
DocGraph by the numbers Gephi

  • Directed graph

  • Average total degree 52.8

  • 940,492 providers (graph nodes/vertices)

  • 49,685,810 shared edges

Geographic visualization
Geographic visualization Gephi


Docgraph d ata
DocGraph Gephidata


  • National Plan and Provider Enumeration System

  • Source of NPI (National Provider Identifier)

  • No cost download 

  • Information is entered and updated by provider

    • Data quality is good to poor 

  • CSV file with 314 columns 

  • A custom MySQL load script is used to normalize the database

  • Bloom.api open source project to make data easier to access

    • http://www.bloomapi.com/

Tabular data
Tabular data Gephi


Graph data
Graph data Gephi

Relation between authors and MeSH terms from PubMed



Graph types
Graph types Gephi

  • Undirected graph

    • Facebook friendships

  • Directed graph

    • Twitter: follow and be followed

  • Bipartite graph

  • Multipartite

    • RDF graph model

    • Property graph model

  • Allow parallel edges

    • RDF graph Model


Graphs in healthcare
Graphs in healthcare Gephi

  • Prescriber and patient (bipartite)

    • NCPDP data with NPI

  • Referral data sets

  • Shared patients

    • DocGraph

  • Social networks

    • Tweeting about a disease

  • Limited by imagination


Generating graphml
Generating GraphML Gephi

  • XML based file format for graphs

  • Readable by a large number of tools

    • Gephi

    • Mathematica

    • igraph (R)

  • NetworkX a Python library for graphs which can export to GraphML

  • GraphML is not a file format for really large graphs

  • GraphML is not readable by d3.js

Gephi Gephi


Gephi Gephi

  • Java based open source tool

  • Focused on interactivity

    • Fast graphics

    • Multi-threaded

    • Visual updates

  • Strong graph analytics

  • Graphs stored in memory

    • Upper limit is about 100,000 nodes

  • Netbeans plugin architecture

    • Integration with Neo4J

    • Additional layout algorithms

Downloading gephi
Downloading Gephi Gephi



Downloading sample files
Downloading sample files Gephi



Easier than excel social network analysis of docgraph with gephi

Subsets are generated using a Python script Gephi

python extract_providers_to_graphml.py "npi='1750499653'" sterrence Leaf-edges

Opening connection referral


Selection criteria for subset graph: npi='1750499653'

Referral table _name: referral.referral2011

NPI detail table name: referral.npi_summary_primary_taxonomy

Nodes will be labeled by: provider_name

Leaf-to-leaf edges will be exported? False

Imported 1 nodes

Imported 986 nodes

Imported 1724 edges

Edge types imported

{'core-to-leaf': 866, 'leaf-to-core': 856: None : 2}

Leaf-to-leaf edges were not selected for export

Writing GraphML file

Generating a subset some concepts
Generating a subset: some concepts Gephi

Core nodes

Connecting core nodes

Adding leaf nodes

Connecting to leaf nodes

Connecting leaf nodes

Sample files
Sample files Gephi

  • jamestown_core_provider_graph.graphml

    • Providers selected with practice addresses in Jamestown, NY

    • Small city in far western New York (approximately 30,000 residents)

    • 179 nodes with 5,560 edges

  • jamestown_core_and_leaf_provider_graph.graphml

    • Includes providers above and those who are linked to them

    • 1,322 nodes with 12,457 edges

  • albany_core_provider_graph.graphml

    • Providers selected with practice addresses in Albany, NY

    • A small city in New York (approximately 100,000 residents)

    • 1,368 nodes with 44,711 edges

Sample files continued
Sample files (continued) Gephi

  • bronx_core_provider_graph.graphml

    • Providers selected with practice addresses in Bronx, NY

    • Urban community (1.4 million residents)

    • 3,268 nodes and 53,828 edges

Import report
Import report Gephi


Navigating the graph
Navigating the graph Gephi

  • Best experience with a three button mouse with a scroll wheel

    • Right click and hold to pan

    • Scroll wheel to zoom in and out

    • Left click to select

    • Right click for context menus

  • MacBook users

    • command key and click and hold down on trackpad to pan

    • Two fingers to zoom on trackpad

    • Click on trackpad to select

    • Control click for context menus


Varying node size based on importance
Varying node size based on importance Gephi

  • Step 1: Need to select a measure for node importance

    • Degree

    • PageRank

    • Eigenvector centrality

  • Step 2: Run the measure against the graph

  • Step 3: Ranking tab and “Size/Weight”

  • Step 4: Set size range


Graph measures
Graph measures Gephi

  • Degree

    • In-degree

    • Out-degree

  • Graph structure measures

    • Clustering (global and local)

    • Network diameter

  • Centrality Measures

    • Eigenvector centrality

    • PageRank (Google search)

  • Community measures

  • And more . . . . .


Interactively viewing node attributes
Interactively viewing node attributes Gephi

Click the “T” icon on the bottom to turn on node labeling


Data laboratory
Data Laboratory Gephi


Saving your graph
Saving your graph Gephi

  • Save your graph in .gephi format

    • xml based format

    • preserves layout, size, and color

  • Save in GraphML format for use with outside programs


Hints for f iltering nodes
Hints for Gephifiltering nodes

  • Drag field filter “is_physician” from the top pane to the lower pane

  • Set the value to filter on

    • Value should equal 1

    • 1 is equivalent to true

  • Click “Filter” to apply


Producing a final graph
Producing a final graph Gephi

We need to rescale the edge weights in the graph


Challenge questions
Challenge questions Gephi

  • Which institution is the most “important” provider for the Bronx?

    • Hint: try a centrality measure

  • Can you determine if geography plays a role in patient sharing in the Bronx?

    • Which parameter could be used to partition the graph?

  • Can you filter the graph to show only radiologists?

  • Which radiologist has the highest “authority” in the graph?


Other tools for graph analysis
Other tools for graph analysis Gephi

  • NetworkX

    • Python

    • Lots of algorithms

  • igraph

    • R and Python

  • Gremlin – graph traversal and manipulation

    • Groovy shell

    • Gremlin interface is implemented for Neo4J

  • And more . . .


Scaling the analysis to the entire docgraph
Scaling the analysis to the entire DocGraph Gephi

  • Most healthcare graphs will be big (millions of nodes)

  • What we learn at the local level can be applied at the global level

    • Importance of geography

    • Supernodes (radiologist, ER docs, pathologist, transportation, …)

  • Many graph measures don’t scale well

    • Maximal cliques

  • Currently exploring how to use Faunus to scale the analysiswith Hadoop


Links Gephi

http://strata.oreilly.com/2012/11/docgraph-open-social-doctor-data.html (information)

https://github.com/jhajagos/DocGraph (code)

http://notonlydev.com/docgraph-data/ (open source $1 covers bandwidth fees)

https://groups.google.com/forum/#!forum/docgraph (mailing list)

Questions Gephi

Try to publish your own healthcare dataset as a graph!