1 / 21

Enhancing Quality of Retrieval Through Concept Edit History -- EVS Update

Enhancing Quality of Retrieval Through Concept Edit History -- EVS Update. Frank Hartel Sherri De Coronado Gilberto Fragoso Iris Guo Kim Ong. Outline. Terminology development -- concept creation, modification, split, merge, retirement Edit history Usage TDE Ontylog editor extension

ulf
Download Presentation

Enhancing Quality of Retrieval Through Concept Edit History -- EVS Update

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Enhancing Quality of Retrieval Through Concept Edit History -- EVS Update • Frank Hartel • Sherri De Coronado • Gilberto Fragoso • Iris Guo • Kim Ong NCICB Jamboree

  2. Outline • Terminology development -- concept creation, modification, split, merge, retirement • Edit history Usage • TDE Ontylog editor extension • Next steps • Summary NCICB Jamboree

  3. Elementary Edit Actions In Terminology Development (Create, Modify, Split, Merge, Retire) Create Split Create Split Create Split Create Split Modify Modify Modify Modify Version 3 Version 1 Version 2 Version 4 Retire Merge Retire Merge Retire Merge Retire Merge Evolution of versions/baseline over time NCICB Jamboree

  4. Scientific Reasons for Concept Splits • Oncogene ras discovered based on sequence homology (hybridization) to the v-onc gene of the Harvey strain of murine sarcoma virus. • Subsequently, it was discovered that there were multiple related ras genes, Ha-ras, and Ki-ras. Later on, a new ras, N-ras, was found. NCICB Jamboree

  5. Scientific Reasons for Concept Merges • BCL1 gene discovered in the vicinity of a t(11;14) translocation, involved in the malignant transformation of B cells. • PRAD1 gene found in parathyroid adenomas bearing chromosomal abnormalities. • CCND1 codes for one of a set of proteins, cyclins, that regulate cell cycle progression. NCICB Jamboree

  6. Concept Based Retrieval C2 C1 User Concepts used for retrieval D1<C1, C2> Search Engine D2<C1, C3, C4> Relevant documents Document Indexing terms NCICB Jamboree

  7. Edit History Usage Thesaurus version pre-indexed documents Edit History R1 Version 1 new R2 Version 2 modify R3 retire Version 3 merge split R4 Version 4 • Document are often indexed using different versions of terminology. • Re-indexing document to keep in pace with changes made to the terminology is impractical and can be very costly. • Edit history can greatly enhance precision and recall. Search Engine Concepts used for retrieval NCICB Jamboree

  8. Edit History Storage NCICB Jamboree

  9. Terminology Development Environment NCICB Jamboree

  10. Terminology Development Environment • Previously, only three types of edit action are logged – add, modify, and delete. • Concepts created through split actions are confounded by newly created concepts. • Concepts merged into other concepts are indistinguishable from retired concepts. • Failure to explicitly track merge and split edit actions may result in a low recall rate in information retrieval. * Recall defines the number of relevant documents retrieved as fraction of all relevant documents. NCICB Jamboree

  11. Approach Taken to Extend TDE • Create reusable concept edit tree Java bean • Develop user interface for processing split, merge, and retirement edit actions • Log edit events in TDE history database with clarity and precision NCICB Jamboree

  12. Extend Ontylog Editor With Plug-Ins Use Concept Edit Tree widget to build plug-ins NCICB Jamboree

  13. TDE Extension - Split Panel Roles and properties may be transferred from one concept to another using drag & drop. A concept is created as a result of a split. Edit action is explicitly logged in the TDE History database as a split event. NCICB Jamboree

  14. TDE Extension - Merge Panel Concept to stay Concept to retire Non-redundant roles and properties are transferred from the retiring concept to the resultant merged concept. Edit action is explicitly logged in the TDE History database as a merge event. NCICB Jamboree

  15. TDE Extension - Preretirement Concept to retire • Sub-concepts are re-treed. • Role relationships targeted (i.e., pointing) to the retiring concept are either removed or re-targeted. NCICB Jamboree Concept can be retired only if all preconditions are met.

  16. TDE Extension - Retire Panel A non-editable tree shows concept definition information pertinent to the retiring concept. Edit action is explicitly logged in the TDE History database as a retire event. NCICB Jamboree

  17. Next Steps • Consolidate edit history logged by individual modelers in terminology development environment (TDE) into concept history data useful to Distributed Terminology System (DTS) users NCICB Jamboree

  18. Next Steps • Extend caBIO and DTS Server capability to facilitate high quality information retrieval caBIO.jar XMLRPC Server DTS History API XMLRPC Client Edit history database DTS Extension Repositories of Indexed Document DTS Server End User Applications EVS External Databases Concepts used for retrieval NCICB Jamboree ( to be developed )

  19. Summary • Tracking explicit edit actions in TDE is absolutely essential to terminology and concept based information retrieval. • We have successfully extend TDE Ontylog editor to explicitly track split, merge, and retirement edit events. • Concept history data and supporting APIs will soon become available to DTS users and developers through caBIO. caBIO (Cancer Bioinformatics Infrastructure Objects) NCICB Jamboree

  20. EVS Team • Frank Hartel • Sherri De Coronado • Gilberto Fragoso • Margaret Haber • Larry Wright • Jim Oberthaler • Northrop Grumman, Inc. • Kevric Corporation • Aspen Inc. • Apelon, Inc. • Kim Ong • Iris Guo • Bob Dione NCICB Jamboree

  21. Contact Dr. Francis W. Hartel Center for Bioinformatics National Cancer Institute 6116 Executive Blvd. Rockville, MD 20892-8335 Phone: (301) 435-3869 Fax: (301) 480-4222 Email: hartel@mail.nih.gov NCICB Jamboree

More Related