1 / 30

Yang Xiang

K -neighborhood Decentralization: A Comprehensive Solution to Index the UMLS for Large Scale Knowledge Discovery. Yang Xiang Joint work with Kewei Lu, Stephen L. James, Tara B. Borlawsky , Kun Huang, and Philip R.O. Payne Journal of Biomedical Informatics, In Press.

wren
Download Presentation

Yang Xiang

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. K-neighborhood Decentralization: A Comprehensive Solution to Index the UMLS for Large Scale Knowledge Discovery Yang Xiang Joint work with Kewei Lu, Stephen L. James, Tara B. Borlawsky, Kun Huang, and Philip R.O. Payne Journal of Biomedical Informatics, In Press

  2. Unified Medical Language System(UMLS) • A compendium of controlled vocabularies in the biomedical sciences (since 1986). It contains: • Metathesaurus • Semantic Network • SPECIALIST Lexicon • UMLS contains data more than ontologies • Maintained by US National Library of Medicine • Website: http://www.nlm.nih.gov/research/umls/

  3. UMLS - Metathesaurus • Number of biomedical concepts > 1 million • Stem from over 100 incorporated controlled source vocabularies: • ICD (International Statistical Classification of Diseases and Related Health Problems) • MeSH (Medical Subject Headings) • SNOMED CT (Systematized Nomenclature of Medicine – Clinical Terms) • LOINC (Logical Observation Identifiers Names and Codes) • Gene Ontology • OMIM (Mendelian Inheritance in Man) … http://www.nlm.nih.gov/research/umls/knowledge_sources/metathesaurus/release/source_vocabularies.html

  4. Detailed Data of UMLS

  5. UMLS - Semantic Network • Semantic types (categories) 133 in 2011AA • Entity • Physical Object • Organism … … • Event • Actitivity • Behavior … … • Semantic relationships (connecting two concepts)591 In 2011AA • isa • assoicated_with • physically_related_to • part_of… • spatially_related_to • location_of… … Drug A treats treated_by Disease B disease_is_marked_by_gene Gene A http://www.nlm.nih.gov/research/umls/META3_current_semantic_types.html http://www.clres.com/semrels/umls_relation_list.html

  6. UMLS Graph

  7. Knowledge Discovery in the UMLS Graph • Reachability • Distance • Path • Conceptual Knowledge Construct (Subject Matter Expert) • Depth First Search for CKC, limited to 4 hops, and limited to a small number of data sources, (CITIH) • Finding all paths are computationally intractable

  8. Reachability The problem: Given two vertices u and v in a directed graph G, is there a path from u to v ? ?Query(1,11) Yes ?Query(3,9) No 15 14 11 13 10 12 6 7 8 9 3 4 5 1 2

  9. Distance The problem: Given two vertices u and v in a (directed) graph G, what is the distance from u to v? ?Query dG(1, 11) =3 15 14 11 13 10 12 6 7 8 9 3 4 5 1 2

  10. Path The problem:Given two vertices u and v in a (directed) graph G, what is a path (are paths) connecting u to v ? 15 14 Find a path from1to11 11 13 10 12 6 7 8 9 3 4 5 1 2

  11. The estimated difficulty of building a very efficient indexing graph database schemes (based on current research) Reference: R. Jin, Y. Xiang, N. Ruan, H. Wang, "Efficiently Answering Reachability Queries on Very Large Directed Graphs", Proc. of ACM SIGMOD Conference, Vancouver, June 9-12, 2008, pp. 595-608. R. Jin, Y. Xiang, N. Ruan, D. Fuhry, "3-HOP: A High-Compression Indexing Scheme for Reachability Query", Proc. of ACM SIGMOD Conference, Providence, Rhode Island, June 29-July 2, 2009, pp. 813-826.

  12. Degree Distribution in the UMLS Graph

  13. Average distance between two vertices in the UMLS

  14. Average Neighborhood coverage in the UMLS

  15. Decentralization

  16. Query

  17. Workflow of Index construction

  18. Distance, Reachability and Path Queries supported by kDLS

  19. Distance Query Time

  20. Path estimation and Path construction

  21. Per path estimation and per path construction

  22. Memory cost

  23. Application: Disease Gene Prioritization • 8,134 Disease concepts from OMIM (Online Mendelian Inheritance in Man), by selecting semantic type to be “Disease or Syndrome” or “Neoplastic Process”. • 29,333 Genes from HUGO (Human Genome)

  24. closeness measure and fold enrichment

  25. Recall

  26. Chronic Lymphocytic Leukemia (CLL)

  27. Breast Carcinoma

  28. IDS gene to CLL • IDS gene is associated with inflammation and enlargement of the liver, as well as enlargement of the spleen which is a lymphocytic organ • GSE2466: IDS expression levels show a significant decrease in CLL patients as compared to the normal control (t-test p-value<10-11, mean fold change=1.63)

  29. MIR1-1 gene to Breast Cancer the paths are then led to breast carcinoma via three drugs (Cyclophosphamide, Methotrexate, and Fluorouracil) which are the three components constituting the NCI recommended CMF regimen for breast cancer.

  30. Thanks! Questions?

More Related