1 / 41

caTIES 2.0 APIII 2006

caTIES 2.0 APIII 2006. Rebecca Crowley Kevin Mitchell. Presentation Overview. caTIES – Goals Tissue Banking Collaboration Grid Trust Fabric Concept coding and recoding Data stewardship, data sharing and honest brokering Interoperability within a grid community. caTIES – Goals.

prentice
Download Presentation

caTIES 2.0 APIII 2006

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. caTIES 2.0 APIII 2006 Rebecca Crowley Kevin Mitchell

  2. Presentation Overview • caTIES – Goals • Tissue Banking Collaboration • Grid Trust Fabric • Concept coding and recoding • Data stewardship, data sharing and honest brokering • Interoperability within a grid community University of Pittsburgh

  3. caTIES – Goals • Extract coded information from free text Surgical Pathology Reports (SPRs) using controlled terminologies to populate caBIG-compliant data structures. • Provide researchers with the ability to query, browse, and acquire annotated tissue data and physical material across a network of federated sources. • Provide a collaboration space in which researchers may construct and manage retrospective tissue distribution protocols. • Pioneer research for distributed text information extraction within the context of caBIG. caTIES modules will be developed as generalized components available in caBIG, to encourage reuse by other caBIG projects that require tissue information extraction. The Cancer Text Information Extraction System (caTIES) pilot project will focus on two important challenges of bioinformatics: (1) information extraction from free text and (2) access to tissue. Specifically, caTIES has four primary goals: University of Pittsburgh

  4. Tissue Banking Collaboration • Administrator initiation of a Research Protocol – The IT System’s administrator is responsible for providing support for the electronic capture of research information. The Administrator works with Researchers, Health Care Professionals, IRBs and others to establish repositories of electronic data often categorized by study • Researcher case discovery and order generation – In conducting tissue sample based retrospective research studies, Researchers examine free text descriptions of those tissue or delegate the responsibility of gathering a tissue collection to Honest Brokers. • Honest Brokerorder facilitation – Work with Tissue Bank personnel to acquire tissue and tissue related materials. Work with courier system to deliver orders to researchers. These orders often need to maintain a degree of atomicity University of Pittsburgh

  5. Administrator – Create New Study University of Pittsburgh

  6. Administrator – Assign Organization Role University of Pittsburgh

  7. Administrator – Add User to Study as Role University of Pittsburgh

  8. Researcher Perspective University of Pittsburgh

  9. Researcher - Graphical Search Specification University of Pittsburgh

  10. Honest Broker – Verifies Physical Material University of Pittsburgh

  11. Honest Broker – Relays Order Status back to Researcher University of Pittsburgh

  12. Grid Trust Fabric • Electronic Components (4 Pillars of security) • Identity (DN or public key) • Isolation • Traceability • Authentication (TLS handshake) • Prevent Identity Theft • Authorization (gridmapfile or Globus+OGSA-AuthZ+Services) • Access Control • Resource Control • Audit (logfiles) • Troubleshooting • Forensics • Accounting University of Pittsburgh

  13. Grid Trust Fabric (cont) • Social Fabric • Narrative DeIdentification defined by levels or kind of DeIdentification. • Narrative redactors • Concept Coders • Information Extraction to Synoptic Structures • IRB must endorse federated environment • Individuals must maintain a level of integrity University of Pittsburgh

  14. Current caTIES Security Summary of caTIES’ current security solution • User Registration with IMS – GUMS • User Registration with caTIES System – CTRM • Authentication and Authorization – GUMS + CTRM • User Access to caTIES Resources – caTIESClient University of Pittsburgh

  15. User Authentication Scenario: Users log into the caTIES client with their GUMS username and password. The caTIES client securely connects to GUMS with the user’s GUMS X.509 certificate and retrieves the GUMS user proxy. The caTIES client uses the user proxy to securely connect to the EVS service exposed by caTIES. This is essentially a connectivity check, and any caTIES secured service could be used. User Authentication - GUMS User Authentication University of Pittsburgh

  16. User Authorization - CTRM • CTRM contains user authorization information. It contains information about how users are related to organizations. It classifies these user-organization relationships by the following roles - Researcher, Institution Honest Broker or Local Administrator. • The CTRM service is responsible for issuing queries to the CTRM. When a user is authenticated, the user proxy’s distinguished name is sent as a query parameter to the CTRM service by the caTIES Client. • CTRM Services in turn fetches the user’s role from CTRM and sends the user’s role information to the client. University of Pittsburgh

  17. De-Identification • caTIES De-Identification service scrubs pathology report, creates de-identified identifiers, loads ‘De-Identified’ caTIES datastore • caTIES de-identification service wraps the de-ID™ software; easy to switch • Safe-Harbor method removes HIPAA mandated identifiers • Creates tokens for names and preserves temporal relationships • De-ID will work with adopters as each site comes on-line • Currently evaluating Harvard Scrubber open-source option University of Pittsburgh

  18. Concept Coding and Recoding • Changing dimensions necessitate recoding • Vocabulary revisions • Algorithmic enhancements and bug fixes • De-Identification redactor errors • What is the necessary level of auditing for recoding? University of Pittsburgh

  19. Tokenization

  20. Sectioning

  21. Concept Mapping with MMTx

  22. Negation and Semantic Type Categorization

  23. RegEx Finding Attribute Value

  24. Concept Coded Structured Data University of Pittsburgh

  25. Data stewardship, data sharing and honest brokering • CaTIES maintains data in three databases that are schematically equivalentbut differ in their deployment location, security configuration, and the data being held. Each Role has limited access to the set of data sources • public datastore – (Researcher) • private datastore – (Honest Broker) • central tissue resource manager datastore (Administrator, Researcher, Honest Broker) University of Pittsburgh

  26. caTIES Model Three points for Data Access: University of Pittsburgh

  27. Interoperability within a grid community • MDA - caBIG uses Model Driven Architecture to automatically generate Object Relational Mapping (ORM) middleware. • Following caBIG’s semi-automated guidelines for application development guarantees grid compliant data services. • caBIG annotates data and service interfaces with a conceptual ontology. This provides an environment for intelligent discovery and automatic data transformation. University of Pittsburgh

  28. caTIES Development Process • Design UML Model in Enterprise Architect • Metadata annotation using NCIT (public model only) • CDEs are registered in the caDSR in the ‘caBIG’ context • Run Model through caCORE SDK to generate API and caTIES Silver Application • Implement API generated by the SDK for caTIES’ Client’s functions • Utilize caGrid SDK to generate Gold front-end to the caTIES Silver Application University of Pittsburgh

  29. University of Pittsburgh caTIES Phase 2 Grid-Enabled [Public] Model

  30. Development Process Summary University of Pittsburgh

  31. Access to caTIES Public Resources • Dual Access to caTIES • Via caTIES Client • Via caGrid Gold API. The caTIES Gold Service provides programmatic access to caTIES’ resources. The caGrid Browser implements this API to query resources. University of Pittsburgh

  32. Sample Query Silver Format DetachedCriteria p = DetachedCriteria.forClass(PathologyReport.class); p.add(Restrictions.like(“uuid","e44ddc0f-c589-11da-bbee-5103a71c2a47")); List resultList = appService.query(p,PathologyReport.class.getName()) ; for(int i=0;i<resultList.size();i++){ PathologyReport pr = (PathologyReport)reslutSet.get(i); pr.getDocumentText(); } Gold Format <caBIGXMLQuery name="MyQueryTest3"> <Target name="edu.upmc.opi.caBIG.caTIES.database.domain.PathologyReport"> <Objects name="edu.upmc.opi.caBIG.caTIES.database.domain.PathologyReport"> <Property name="uuid" predicate="equal" value="e44ddc0f-c589-11da-bbee-5103a71c2a47"/> </Objects> </Target> </caBIGXMLQuery> University of Pittsburgh

  33. Query run by caTIES Client University of Pittsburgh

  34. Query run through caGrid Browser University of Pittsburgh

  35. Query run through caGrid Browser University of Pittsburgh

  36. Query run through caGrid Browser University of Pittsburgh

  37. Equivalent Results • Both methods return the same Pathology Report caGRID Browser caTIES Client University of Pittsburgh

  38. CaDSR CDEs CAP Protocols University of Pittsburgh

  39. Shallow Structure Derivation based on conceptual matching. University of Pittsburgh

  40. Deep Structure Inference Based on Discourse Reasoning University of Pittsburgh

More Related