1 / 21

Network Mapping of Large Data Sets Al Ozonoff, Ph.D. Joel Bernanke, M.Sc. Boston University

Network Mapping of Large Data Sets Al Ozonoff, Ph.D. Joel Bernanke, M.Sc. Boston University School of Public Health. Network Analyses of Linked Data Sets.

yetta-chan
Download Presentation

Network Mapping of Large Data Sets Al Ozonoff, Ph.D. Joel Bernanke, M.Sc. Boston University

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Network Mapping of Large Data Sets Al Ozonoff, Ph.D. Joel Bernanke, M.Sc. Boston University School of Public Health

  2. Network Analyses of Linked Data Sets ─ Yook (2002) developed network generators that captured the Internet’s topology; postulated preferential attachment and linear distance dependence. Yook, S.-H., Jeong, H., & Barabasi, A.-L. 2002. Modeling the Internet’s large- scale topology. PNAS, 99, 13382-13386. ─ Schwikowski (2000) built a protein-protein interaction network in yeast to predict protein function. Schwikowski, B., Uetz, P., & Fields, S. 2000. A network of protein-protein interaction in yeast. Nature Biotechnology, 18 12, 1267-1261. Interface - RISK : Reality

  3. Networks in Public Health ─ Jones (2003) reported on power-law scaling in sexual contact networks, relating the scaling coefficient to the rate of disease transmission and the threat of epidemic. Jones, J. H., & Handcock, M. S. 2003. An assessment of preferential attachment as a mechanism for human sexual network formation. Proc. R. Soc. Lond. B, 270, 1123-1128 . ─ De (2004) used network centrality measures to identify key individuals in a gonorrhea outbreak. De, P., Singh, A. E., Wong, T., Yacoub, W. & Jolly, A. M. 2004. Sexual network analysis of gonorrhea outbreak. Sex Transm Infect, 80,280-285. Interface - RISK : Reality

  4. Natural Mapping of a Data Set When linkages are not predefined, suitable criteria for identifying linkages must be developed. We propose a natural mapping of a data set onto a network: variables map to nodes and the associations among variables map to edges Interface - RISK : Reality

  5. The NHANES Data Set The National Health and Nutrition Examination Survey (NHANES) assesses the health and nutritional status of adults and children in the United States through interviews and physical examinations. The NHANES data set includes: ─ Demographics ─ Laboratory test results ─ Dietary records ─ Physiological measurements ─ General health information Interface - RISK : Reality

  6. Selecting Data to Map A selected subset of continuous measures from all four of the NHANES modules were included in the analysis. Continuous measures with small numbers of observations (< 20) were excluded. Examples: ─ Age (years) ─ Blood titers ─ Number of green vegetables eaten per month ─ Cardiovascular stress test measurements Interface - RISK : Reality

  7. Generating a Correlation Matrix We generated a correlation matrix that includes the Spearman correlation between every variable and every other variable. All the correlations were converted to their absolute value. We included correlations in in the matrix regardless of their significance. Interface - RISK : Reality

  8. Mapping the NHANES Data Set Variables were mapped to nodes. Spearman correlations among the variables were mapped to edges. The exact correlation was either retained as a measure of the strength of an association or was dichotomized (0, 1) based on a cutoff. Age (years) 0.6 Body Mass Index Age (years) Cutoff = 0.7 Body Mass Index Interface - RISK : Reality

  9. Software SAS 9.1 – Integrate NHANES data modules and generate correlation matrix. UUCINET – Convert correlation data to network data. Netdraw –Visualize and analyze network data. KeyPlayer – Identify key players. Interface - RISK : Reality

  10. Cutoff = 0.2 Cutoff = 0.5 Cutoff = 0.8 Networks by Cutoff Interface - RISK : Reality

  11. Cutoff = 0.2 Cutoff = 0.5 Cutoff = 0.8 Distribution of Connections by Cutoff Interface - RISK : Reality

  12. Cutoff Cutoff Degrees and Unlinked Nodes Mean number of connectionsper node (degree) Percentage of unlinked nodes (isolates) Interface - RISK : Reality

  13. Hubs and Key Players Hubs – Nodes with many connections (edges). Key Players – A set of N nodes that, in this case, is maximally correlated with the rest of the network. Interface - RISK : Reality

  14. 10 Key Players For the entire weighted network: ─ Age (years) ─ CD4 count (cells/mm3) ─ Urine creatinine (mg/dl) ─ CD8 count (cells/mm3) ─ Upper arm length (cm) ─ Alcohol fasting time (min) ─ Antacid / laxative fasting time (min) ─ Number of years taking insulin ─ How often wore hearing aid in the past year (number) ─ Lipid adjusted dioxin (pg/g) Interface - RISK : Reality

  15. Hubs and Key Players - Creatinine Nodes with higher degreesare larger. The purpple squares are the10 key players. Notice that the key playersare not necessarily thelargest hubs. Interface - RISK : Reality

  16. Urine Creatinine Ego Network Urine Elementse.g. Molybdenum Urine Creatinine Urine Phthalates Urine Phosphates Interface - RISK : Reality

  17. Hubs and Key Players – CD4, CD8 Nodes with higher degreesare larger. The blue squares are the10 key players. Notice that the key playersare not necessarily thelargest hubs. Interface - RISK : Reality

  18. CD4, CD8, and Immunotoxins Isoflavones CD-4 counts CD-8 counts PCBs TCDDs Interface - RISK : Reality

  19. Conclusion Future directions: ─ Further exploration of scale-free (power law) properties of the NHANES data network. ─ Extend methodology to binary outcomes. ─ Account for negative correlations. ─ Investigate confounding. ─ Analyze additional data sets. Interface - RISK : Reality

  20. Network Terms Node – a junction point. Edge – a line connecting two nodes. Degree – the number of edges a node has. Hub – a node with many connections (edges). Key players – a group of nodes who together are connected to the maximum number of distinct nodes. Power distribution – f(x) ~ x-γ Interface - RISK : Reality

  21. A Basic Undirected Network Isolate – a node that is not connectedto the rest of the network. Pendant – a node that is connected to the rest of the network by only one edge. Interface - RISK : Reality

More Related