1 / 39

The Use of Semantic Graphs for Modeling Biomedical Text

The Use of Semantic Graphs for Modeling Biomedical Text. Laura Plaza NIL- Natural Interaction based on Language Universidad Complutense de Madrid. Text summarization. Semantic graph based representation. Semantic graph based representation. Automatic Indexing. Information Retrieval.

alka
Download Presentation

The Use of Semantic Graphs for Modeling Biomedical Text

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Use of SemanticGraphsforModelingBiomedicalText Laura Plaza NIL- Natural InteractionbasedonLanguage Universidad Complutense de Madrid

  2. Textsummarization Semanticgraphbasedrepresentation Semanticgraphbasedrepresentation AutomaticIndexing InformationRetrieval

  3. Whysemantic? Cerebrovascular diseases during pregnancy may result from hemorrhage Brain vascular disorders during gestation may result from hemorrhage Synonymy = • The common cold is more common in cold weather than in summer Polysemy

  4. Whygraphs? Pneumococcalinfectionis a lunginfectioncausedbystreptococcuspneumonia. Mycoplasma pneumonia is another type of atypical phneumonia. PneumonIa The patient referred feeling short of breath andwas diagnosed with pneumonia Symptom Co-occurswith Pneumococcal pneumonia influenza

  5. OurProposal • Usingconcepts and relationsfromexternalknowlegdesourcesforrepresentingthetext as a graph • Exploitingthetopology of thenetworktoidentifygroups of conceptssemanticallyrelatedthatrepresentdifferenttopics

  6. Representation Process Document pre-processing Concept identification Document representation Concept clustering and topic recognition

  7. Documentpreprocessing

  8. Concept Identification The goal of the trial was to assess cardiovascular mortality for stroke

  9. Concept Identification - Ambiguity Tissues are often cold • Personalized • PageRank (PPR) • Journal Descriptor • Indexing (JDI) • Machine Readable • Dictionary (MRD) • AutomaticExtracted • Corpus (AEC) WSD

  10. Document Representation Activity Disease Personnel Anatomic Structure Clinical or Research Activity Professional Personnel System or Substance Disorder Or Finding Finding by Site or System Research Activity Disease or Disorder The goal of the trial was to assess cardiovascular mortality and morbidity for stroke, coronary heart diseaseandcongestive heart failure, as an evidence-based guide for clinicians who treat hypertension. Clinicians Organ System Cardiovascular System Finding Non-Neoplastic Disorder Disorder by Site Study Cardiovascular System Non-Neoplastic Disorder by Site Respiratory and Thoracic Disorder Blood Pressure Finding Clinical Study Non-Neoplastic Cardiovascular Disorder Hypertensive Disease Thoracic Disorder Clinical Trials Non-Neoplastic Vascular Disorder Non-Neoplastic Heart Disorder Heart Disorder Cerebrovascular Disorder Congestive Heart Failure Coronary Heart Disease Cerebrovascular Accident

  11. DocumentRepresentation • All the sentence graphs are merged into a single DocumentGraph • Thegraphis extended with more semantic relations • Eachedge is assigned a weight in [0, 1] • Different relations may be assigned different weights • The more specific are theconcepts, the more weightisassignedtotheedge

  12. The goal of the trial was to assess cardiovascular mortality and morbidity for stroke, coronary heart diseaseand congestive heart failure, as an evidence-based guide for clinicians who treat hypertension. While event rates for fatal cardiovascular disease were similar, there was a disturbing tendency for stroke to occur more often in the doxazosin group, than in the group taking chlorthalidone Disease or Disorder Non-Neoplastic Disorder Disorder by Site Finding by Site or System Respiratory and Thoracic Disorder Disorder of Cardiovascular System Non-NeoplasticDisorder by Site Organ System Cardiovascular Diseases Non-NeoplasticCardiovascular Disorder Cardiovascular System Finding Cardiovascular System Thoracic Disorder Non-Neoplastic Heart Disorder Non-Neoplastic Vascular Disorder Blood Pressure Finding Heart Disorder Congestive Heart Failure Cerebrovascular Disorder Hypertensive Disease Coronary Heart Disease Cerebrovascular Accident Pharmaceutical Adjuvant Cardiovascular Drug Research Activity 1/2 1/2 Diuretic Study Alpha-Adrenergic BlockingAgent 2/3 2/3 Thiazide Diuretics Clinical Study Clinicians 1 3/4 Doxazosin Chlorthalidone Is a relations Clinical Trials Other related relations Associated with relations

  13. Concept Clustering & TopicRecognition hubs . . .

  14. Concept Clustering & TopicRecognition • Concepts are rankedbysalience • Thenverticeswith a highestsalience are calledhubvertices

  15. Concept Clustering & TopicRecognition • Thehubvertices are groupedintoHubVertex Sets (HVSs) • The remaining vertices are assigned to the cluster to which they are more connected • The number and properties of the clustering strongly depends on the parameters’ values

  16. Concept Clustering & TopicRecognition Congestiveheartfailure Adverse reactions Amlodipine Chlorthalidone Drugpseudoallergenbyfunction Bloodpressurefinding Cerebrovascular accident Hepatic . . . Health personnel Elderly Organism Population group Persons Clinicians Patients

  17. Textsummarization Semanticgraphbasedrepresentation AutomaticIndexing InformationRetrieval

  18. TextSummarization Creating a compactedversion of oneorvariousdocuments • Summaries as anindication of what a documentisabout • Improvingindexing, categorization, and IR Motivation • Extracts vs. abstracts • Single vs. multi-document • Generic vs. Application-oriented Types

  19. TextSummarization Similarity = 35.0 Similarity = 4.0 Sentence1 Cluster 1 . . . . . . Similarity = 86.0 Similarity = 12.0 Sentence n Cluster m

  20. TextSummarization • H.1: Selectingthe top nrankedsentencesfromthebiggestcluster • H.2: Selectingnisentencesfromeachcluster • H.3: Weightingthesentence-to-clustersimilaritytotheclusters’ sizes Sentenceselection + othertraditionalcriteria: frequency, position, similaritywiththetitle, etc

  21. TextSummarization • Evaluation: How is the important content preserved in the summary? • ROUGE automatic evaluation metrics • Comparison with the abstracts of the articles

  22. TextSummarization • Evaluation:How does ambiguity affect summarization?

  23. Summarization of BiologicalEntity-relatedInformation • Given a list of genes (orproteins): • Retrievingdocumentsrelatedtothe genes • Building a sematicgraph-basedrepresentation of the corpus • Identifyinggroups of genes/proteins • Generating a summaryforeachgroupthat describes thefunctionality of theentities Multi-document, application-oriented summarization

  24. AutomaticIndexing of BiomedicalLiteratureusingSummaries Title + Abstract Full text MTI Orderedlist of MeSHmainheadings Refinedlist of MeSHHeadings

  25. AutomaticIndexing of BiomedicalLiteratureusingSummaries Whataboutusingthefull texts? • Recallincreasesbyprecisiondecreases Whataboutusingautomaticsummariesof differentlenghts? • As thelenghtincreases, recallimprovesbutprecisionworsens • Thereis a summarylenghtwhichmaximizes F-measure

  26. Textsummarization Semanticgraphbasedrepresentation AutomaticIndexing InformationRetrieval

  27. Retrieval of Similar Patient Cases Motivation: Facilitating the access to previous cases Problem: Given a reference patient record, to retrieve others from the clinical database that are similar to the reference one

  28. Retrieval of Similar Patient Cases When can we consider that two patient records are similar? • Same symptom or sign (e.g. , fever) • Same diagnosis (e.g. bacterial pneumonia) • Same test or procedure (e.g., endoscopy biopsy) • Same medication (e.g. clopidogrel) • But … absent criteria are not relevant!!!

  29. Retrieval of Similar Patient Cases • The records are represented using UMLS graphs • Concepts are filtered by semantic types • Negated concept are ignored

  30. Graph A Graph B Clinical finding Clinical finding 1/11 Finding by site Finding by site Disease 2/11 Respiratory Disorder by Disorder by finding body site Infectious 3/5 body site 3/11 disease ... Functional finding ... of respiratory tract 8/11 4/5 Virus Diseases Bacterial Bacterial Coughing pneumonia pneumonia 5/5 9/11 Pneumonia due to Pneumonia due anaerobic bacteria to Streptococcus 10/11 Pneumococcal Pneumonia due pneumonia to pleuropneumonia 11/11 Mycoplasma pneumonia Retrieval of Similar Patient Cases • We compute the similarity among the reference record and all records in the database

  31. Textsummarization Semanticgraphbasedrepresentation AutomaticIndexing InformationRetrieval

  32. Automatic Indexing of EHR • Discovering relevant SNOMED-CT concepts in health records • Spell checking • Acronym expansion and WSD • Negation detection • Concept identification 4 steps

  33. AutomaticIndexing of EHR • Spell Checking • Hunspell + Levenshtein + keyboard + phonetic distance

  34. AutomaticIndexing of EHR • Acronym expansion and WSD • A list of abbreviation + Machine Learning + expert rules

  35. AutomaticIndexing of EHR • Negation detection • NegEx algorithm Spanish adaptation • Negation cue + Negation scope

  36. AutomaticIndexing of EHR • Concept identification Query El recién nacido fue ingresado • Candidatemappings • Recién nacido. • Recién nacido prematuro. • Ingreso del paciente. SNOMED-CT concept descriptions Scoring function • Final mappings • Recién nacido. • Ingreso del paciente.

  37. AutomaticIndexing of EHR

  38. AutomaticIndexing of EHR • Future work • Representing the EHR as a graph using different relations from SNOMED-CT • Computing the salience of the concepts to obtain the most representative ones • Using such representation in different NLP tasks (e.g., categorization, IR, etc.)

  39. FurtherReadings Summarization Plaza, L., Díaz, A., Gervás, P. (2011). A semantic graph-based approach to biomedical summarization. Artificial Intelligence in Medicine,53. Plaza, L. (2012). Evaluating the importance of sentence position for automatic summarization of biomedical literature. Submitted to Bioinformatics Word Sense Disambiguation Plaza, L., Stevenson, M., Díaz, A. (2012). Resolving Ambiguity in Biomedical Text to Improve Summarization. Information Processing & Management, 48(4). Plaza, L., Jimeno-Yepes, A., Díaz, A., Aronson, A.(2011).Studying correlation between different word sense disambiguation methods and summarization effectiveness in biomedical texts. BMC Bioinformatics, 12. Automatic Indexing Jimeno-Yepes, A., Plaza, L., Mork, J., Díaz, A., Aronson, A.(2012).Using automatic summaries to improve automatic indexing. To appear in BMC Bioinformatics. Retrieval of Similar Cases Plaza, L., Díaz, A.(2010).Retrieval of Similar Electronic Health Records using UMLS Concept Graphs. 15th International Conf. on Applications of Natural Language to Information Systems.

More Related