1 / 35

Medical Digital Library to Support Scenario Specific Information Retrieval

Medical Digital Library to Support Scenario Specific Information Retrieval. Wesley W. Chu wwc@cs.ucla.edu Computer Science Department University of California Los Angeles, California. Wesley W. Chu, PhD Hooshang Kangarloo, MD Usha Sinha, PhD David B. Johnson, PhD. Bernard Churchill, MD

manjit
Download Presentation

Medical Digital Library to Support Scenario Specific Information Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Medical Digital Library to Support Scenario Specific Information Retrieval Wesley W. Chu wwc@cs.ucla.edu Computer Science Department University of California Los Angeles, California

  2. Wesley W. Chu, PhD Hooshang Kangarloo, MD Usha Sinha, PhD David B. Johnson, PhD Bernard Churchill, MD John D. N. Dionisio, PhD Richard Johnson, MD Osman Ratib, MD, PhD A Project of theNIH Grant at UCLA A Digital File Room for Patient Care, Education, and Research

  3. • Background • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress Background • Current file rooms managing patient records have limited functionality • Main goal of mapping patient ID to patient records • PACS implementations are an electronic version of the traditional file room

  4. Finding relevant information for a particular user is time consuming and labor intensive Poorly structured and incomplete results, which may affect patient management Current search tools limited for general use and not tailored to specific users or tasks • Background • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress Background Lack of structure makes...

  5. • Background • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress Digital File Room Requirements A navigable information space providing: • Relevant and reputable information • Access to similar patient records • Content-based cross referencing • Dynamically updated data repository • Tailored access for specific users and devices

  6. • Background• Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress Hypotheses • A digital file room (digital library) that delivers relevant and structured answers to specific query can be developed from existing medical databases • Such a digital file room will increase user satisfaction and improve patient management

  7. • Background • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress Specific Aims SA1Develop a system that identifies and provides access to reputable information sources SA2 Provide users with greater query capability (e.g. similar-to, approximate) SA3 Extract knowledge from patient data, medical literature and radiology teaching files to support content-based cross-referencing SA4 Provide access to dynamically updated collections based on patient data SA5 Adapt information retrievalto user and device characteristics

  8. • Background • Hypothesis • Specific Aims• Significance • Approach and Innovations • Research Progress Approach and Innovations • Intelligent information registration • Provide access to multiple, related data sources through a single access point • Content-based navigation and matching • Develop similarity matching based on medical concepts & patterns • Content correlation • User and device modeling • Adaptive information retrieval based on user and device models • Scenario-based information web (proxies) • Develop information web linking clustered data sources for agiven set of related tasks (i.e., scenario)

  9. proxy-object (access point) Patient meta-data Procedures Labs summarization Ortho Incontinence Neurological Incontinence Ortho Procedure database data: billing, cpt Laboratory databases • Background • Hypothesis • Specific Aims• Significance • Approach and Innovations • Research Progress Intelligent Information Registration Registers multiple information sources to provide transparent access through a single point (proxy object). • Information requests are routed to appropriate data sources based on query characteristics • Data sources are hierarchically clustered according to a four-layer data model

  10. • Background • Hypothesis • Specific Aims• Significance • Approach and Innovations • Research Progress Content-Based Navigation & Matching Two types of navigation • Navigation of the information space using proxies and content correlation • Pattern/similarity navigation using type abstraction hierarchies (TAHs)

  11. Incontinence Poor holding Adequate holding Type IV poor holding,poor storage, poor emptying Type V adequate holding,poor storage, poor emptying Type II adequate holding,adequate storage, poor emptying Type III poor holding,adequate storage, poor emptying 6 day M 7 mo F 12 yr M 25 yr F 28 day M 24 mo F 15 yr M 20 yr F • Background • Hypothesis • Specific Aims• Significance • Approach and Innovations • Research Progress Pattern-Based Type Abstraction Hierarchies • Scalable, hierarchical knowledge structures that facilitate similarity matching

  12. • Background • Hypothesis • Specific Aims• Significance • Approach and Innovations • Research Progress Adaptive Information Retrieval • Tailors query processing and query results according to: • Particular user • Characteristics of their device • Examples: • Doctors prefer JAMA or Lancet while patients prefer Time or CNN. • High resolution workstations support large, detailed imaging studies while portable devices need lower-bandwidth data. • Allows the system to retrieve appropriate data for a particular query, user, and device

  13. Adequate holding Inadequate holding Type II Type V Type III Type IV Patient Procedures Labs UCLA HFC MD Office HFC Blood UCLA Blood • Background • Hypothesis • Specific Aims• Significance • Approach and Innovations • Research Progress Scenario-Based Proxy A framework that defines, for a particular domain and set of tasks, the access methods to and the relationships between information sources. • intelligent information registration • pattern-based similarity matching • adaptive information retrieval • information web

  14. • Background • Hypothesis • Specific Aims• Significance • Approach and Innovations • Research Progress Scenario-Based Information Web A directed graph that defines access paths for navigation among proxy objects correlated-to similar-to Literature Patient correlated-to Teaching File similar-to

  15. correlated-to similar-to Patient Literature similar-to correlated-to Teaching File • Background • Hypothesis • Specific Aims• Significance • Approach and Innovations • Research Progress Scenario-Based Information Web • Similar-to links relate objects based on their similarity • patients similar by age, sex, and disease Extends the scope of the digital file room into a digital medical library • Correlated-to links relate objects based on related content • disease can be correlated to relevant literature

  16. • Background• Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress Research Progress • Phrase Indexing Phrase generated from a n-word combination in a sentence. • Domain Specific Retrieval • Document Summarization • Content Correlation • Linking of relevant documents via patterns

  17. • Background• Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress Domain Specific Retrieval • Document are grouped into domain-specific collections • Medical patient reports • Web sites are often tailored to specific subject areas • Phrases can capture content better than single word, thus improve retrieval performance

  18. 1.00E+12 1.00E+11 1.00E+10 100 word document 1.00E+09 125 word 1.00E+08 document 1.00E+07 150 word 1.00E+06 document 100^n 1.00E+05 1.00E+04 14-word 1.00E+03 sentence 1.00E+02 1.00E+01 1.00E+00 1 2 3 4 5 6 • Background• Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress Problem With Longer Phrases Large combinatorial problem To process longer phrases it is necessary to partition documents into smaller segments

  19. • Background• Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress Phrase Analysis The right upper lobe mass is seen again. sentence • A phrase is defined as any 2, 3 or 4 words co-occurring in a sentence (word combination) • Very large number of possible phrases • Use a stoplist to remove “useless” words • Normalize words to a common stem case the right upper lobe mass is seen again normalization stop word right upper lobe mass seen again removal right upp lob mass seen again stemming again lob mass right seen upp sorting mass right again lob lob mass mass seen again mass candidate lob right mass upp again right 2-word lob seen right seen combinations again seen lob upp right upp again upp seen upp

  20. • Background• Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress Document Retrieval Evaluation • Preliminary evaluation • A domain specific collection of documents • Can phrase analysis limited to sentences improve retrieval effectiveness? • SMART system (single word terms) used as baseline • Data • Thoracic radiology patient reports • Dictated reports • Describe anatomy and abnormal findings such as enlarged lymph nodes and cancer masses

  21. • Background• Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress Domain SpecificDocument Retrieval • Query: “right upper lobe mass”

  22. heart, aspirin, patient, doct, study, they, risk, prevent, take, diseas, stafford, use, too, may, thi, we, attack, ther, intern, bia, gener, peopl, problem, call, know, not, pain, some, reduc, medicat, very, becaus, data, regul aspirin patient take, aspirin patient study, heart aspirin patient, aspirin doct some, aspirin patient some, heart aspirin use, doct use some, aspirin take not, aspirin they not, aspirin patient not, aspirin they take, aspirin study take, patient doct use, heart aspirin diseas, heart use regul, heart aspirin regul, aspirin patient too, heart aspirin attack, aspirin risk reduc, patient take not, patient they not, heart patient too, heart aspirin too, patient use some, patient doct some, patient they take, patient study take, aspirin doct use, heart doct stafford, aspirin patient use, heart diseas peopl, aspirin use regul, aspirin patient they, heart patient study, heart aspirin study, aspirin patient becaus, aspirin patient doct, aspirin use some, they take not (a) Frequent 1-word table (total 34) aspirin patient, heart aspirin, aspirin use, aspirin take, aspirin risk, aspirin study, patient take, patient study, heart diseas, heart patient, diseas peopl, prevent too, they not, they ther, they take, doct data, doct some, doct too, doct use, doct stafford, aspirin regul, aspirin becaus, aspirin reduc, aspirin some, aspirin pain, aspirin not, aspirin attack, aspirin too, aspirin diseas, use regul, aspirin they, aspirin doct, stafford intern, take not, risk reduc, study take, patient becaus, patient some, patient not, patient too, patient use, patient they, patient doct, heart regul, heart peopl, heart attack, heart too, heart use, heart stafford, use some, heart study, heart doct (c) Frequent 3-word table (total 39) heart aspirin use regul, aspirin they take not, aspirin patient take not, patient doct use some, aspirin patient study take, patient they take not, aspirin patient use some, aspirin doct use some, aspirin patient they not, aspirin patient they take, aspirin patient doct some, heart aspirin patient too, aspirin patient doct use, heart aspirin patient study (b) Frequent 2-word table (total 52) aspirin patient they take not, aspirin patient doct use some (e) Frequent 5-word table (total 2) (d) Frequent 4-word table (total 14) Frequent N-Words

  23. Phrase length distribution

  24. P1 P2 Pn P3 P5 P4 • Background • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress Automatic Text Summarization • Salton Method • Given a text file with n paragraphs • A paragraph can be represented by Di=(di1, di2, …, dim) • dik is the weight to represent the importance for term Tk(word or phrase) • The pair-wise similarity of two paragraphs • Sim(Di,Dj) =  dik * djk , k = 1..m • Text relationship map: • Nodes = paragraph • Links = pair-wise similarity of the connected nodes • Links are created if Sim(Di, Dj) > threshold Bushiness of a node = # of links of a node Text Summarization derived from the Bushy nodes.

  25. • Background • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress Performance Comparison of Sultan’s Summarization Method Based on Phrase and Single Word Summarization based on Phrases are less sensitive to Threshold setting than Single Words.

  26. Salton FBI df Threshold 0.1 Threshold 0.2 Threshold 0.3 Apirin01 13 sent 1 12,9,3,2,7 9,2,3,7,1 1,2,3,7,9 2,9,3,12,7 0 2 3,2,9,1,7 3,2,9,1,7 3,2,9,1,7 2,3,12,9,4 2 3 2,3,12,4,9 2,3,12,4,9 2,3,12,4,9 2,12,3,9,4 0 Apirin02 68 sent 1 12,14,22,61,66 12,14,1,15,20 1,12,14,15,20 14,12,22,66,20 0 2 22,14,12,15,36 22,12,15,36,66 15,36,66,20,22 14,12,66,22,36 0 3 12,14,66,22,36 12,14,22,36,66 12,14,22,36,66 14,12,66,22,36 0 Elian04 92 sent 1 26,76,33,59,2 26,76,33,2,24 76,26,2,44,7 26,76,2,7,44 1 2 26,7,76,33,2 26,7,76, 29,82 26,7,76,2,29 26,76,2,7,6 1 3 6,26,27,7,2 6,27,7,26,2 6,27,26,2,7 26,2,6,27,59 1 LAPD06 27 sent 1 7,6,19,25,20 6,19,7,20,25 6,19,7,14,25 6,7,19,20,25 0 2 18,6,19,20,9 18,6,19,20,9 6,19,18,20,9 18,19,6,20,7 1 3 5,12,14,17,18 5,12,14,17,18 5,12,14,17,18 5,20,12,14,17 1 CNNbush 14 sent 1 12,5,6,8,11 12,5,6,11,7 12,5,6,11,3 5,12,8,11,6 0 2 8,12,5,6,7 8,12,5,11,3 8,12,5,3,10 5,12,8,3,10 0 3 5,8,12,10,3 5,8,12,10,3 12,5,8,3,10 12,5,8,9,10 1 Florida 49 sent 1 29,11,41,2,26 29,41,11,26,2 29,41,26,11,14 29,11,17,48,41 1 2 20,40,17,11,22 20,40,17,11,22 20,17,40,22,25 17,40,20,11,48 1 3 17,20,40,6,22 17,20,40,6,22 17,20,6,22,25 17,20,25,40,11 1 Comparison between Salton & FBI

  27. Time CNN Patient Records New England Journal of Medicine • Background• Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress Content Correlation • Given a document in one collection, content correlation links relevant documents in another document collection

  28. • Background• Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress Document ClusterBy Pattern • A pattern is a set of unique terms that characterize some features in the data set • Patterns can be found in a collection of documents by data mining • Documents are grouped into clusters based on patterns via clustering technique

  29. • Background• Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress Cluster Signature • Every cluster can be classified according to the occurrence frequency of the patterns • Looking to answer: • The set of patterns summarize a given cluster? • How the patterns related among the clusters ? Literature Patient Records

  30. Literature Patient Records • Background• Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress Deriving Cluster Signature • Metrics • Local Cluster Certainty (LCC) measures the coverage of a pattern in a given cluster (Popularity) • The Global Cluster Certainty (GCC) measures the coverage of a pattern among clusters (Exclusiveness) • The Cluster Signature is the set of those patterns that have both high LCC and GCC • Documents from one collection (source) can be linked to relevant clusters in another collection (target)

  31. Document # Title 1 Complications in pediatric urological laparoscopy: results of a survey 2 Laparoscopic surgery in pediatric urology 3 [Laparoscopic interventions in pediatric urology] 4 Role of laparoscopic surgery in pediatric urology 5 [Laparoscopic interventions in urology] 6 Laparoscopic heminephroureterectomy in pediatric patients • Background• Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress Preliminary Results • A collection of 69 pediatric urology literature abstracts taken from Medline were clustered using the complete link clustering algorithm • 3 large clusters, each with 2 or more sub-clusters • GCC and LCC were calculated for patterns found in several sub-clusters • Data from one sub-cluster is reported here

  32. • Background• Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress LCC GCC

  33. A system that provides: relevant and reputable information, access to similar patient records, content-based cross referencing, a dynamically updated data repository, and tailored access for specific users and devices will: augment the patient record to provide tailored and timely access to a broader array of reputable information and extend the digital file room into a digital medical library. • Background • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress Project Summary

  34. • Background • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress Research Results • Phrase Indexing • Developed an efficient algorithm for extracting n-word features from textual documents • Phrase index provide better results than single word index in document retrieval and summarization • Content Correlation via Cluster Signature (LCC & GCC) • Preliminary results reveal the feasibility using cluster signature for linking relevant documents • Work begun on proxy for information navigation

  35. • Background • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress Future Work • Develop Ontology for Intelligent Information Registration • User Model for Information Retrieval

More Related