Challenges In Transforming Observational Data
This presentation is the property of its rightful owner.
Sponsored Links
1 / 30

Challenges In Transforming Observational Data For Analysis PowerPoint PPT Presentation


  • 60 Views
  • Uploaded on
  • Presentation posted in: General

Challenges In Transforming Observational Data For Analysis. OR How To Call Into Question Your Observational Data Without Even Trying. Don Griffin Health Informatics Technology Director Computer Sciences Corporation May 20, 2009. Objectives. Lofty Objective:

Download Presentation

Challenges In Transforming Observational Data For Analysis

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Challenges in transforming observational data for analysis

Challenges In Transforming Observational DataFor Analysis

OR

How To Call Into QuestionYour Observational Data Without Even Trying

Don Griffin

Health Informatics Technology Director

Computer Sciences Corporation

May 20, 2009


Objectives

Objectives

Lofty Objective:

Present a complete health Informatics solution:

  • that is flexible enough to accommodate all of the types of source data that end users will require—even if they do not know what those data will be—and

  • that is rich enough in functionality to support all of the data transformations and manipulations that end users will require to convert those source data into research-oriented knowledge on which they may confidently rely.

    More Practical Objective:

    Leave those in the audience with an appreciation for the things that must be done ahead-of-time to make multifarious, disparate, observational source data sets useful for analysis.


Definitions

Definitions

Observational Data

  • “... the outcomes of acts of measurement using particular protocols within the context of any objective scientific measurement activity.”

  • “… the basic or atomic notion of an observation represents:

    • the outcome of some measurement taken of a defined attribute or characteristic of some ‘entity’ (e.g., an organism ‘in the field,’ a specimen, a sample, an experimental treatment, etc.),

    • within some context (possibly given by other observations).”

  • “Every observation entails the measurement of one or more properties of some real-world entity or phenomenon.”

    Biodiversity Information Standards – TDWG

    For Our Purposes:

  • we are most interested in observational data on drug exposures and medical conditions (but other data may interest us, too), and

  • chief sources will be Medical Claims and Electronic Health Records (EHRs).


Definitions1

Definitions

Data Transformation

  • “... the operation of changing (as by rotation or mapping) one configuration or expression into another in accordance with a mathematical rule; especially: a change of variables or coordinates in which a function of new variables or coordinates is substituted for each original variable or coordinate…”

  • “… an operation that converts (as by insertion, deletion, or permutation) one grammatical string (as a sentence) into another…”

    Merriam-Webster’s Dictionary

  • One of the three pillars of data governance (along with compliance and integration). “… transformation is a goal unto itself, as well as an enabler for the goals of compliance and integration.”

    The Data Warehousing Institute

  • For Our Purposes:

    • we are most interested in reformatting data into a Common Data Model that allows portability of analysis methods across disparate source data sets, and

    • in standardizing data representations to make analysis results from disparate source data sets readily comparable.


  • Transforming observational data

    Transforming Observational Data

    Again, for our purposes, the process is rather simple. However, to do it correctly presents some challenges.


    Transforming observational data1

    Transforming Observational Data

    Again, for our purposes, the process is rather simple. However, to do it correctly presents some challenges.


    The it view of the end user s goal

    The IT View of the End User’s Goal

    Skillful use of Common Data Model content to communicate “complex ideas… with clarity, precision, and efficiency” (and, ideally, unimpeachability )

    • Show the data

    • Induce the viewer to think about the substance rather than about methodology, graphic design, the technology of graphic production, or something else

    • Avoid distorting what the data have to say

    • Present many numbers in a small space

    • Make large data sets coherent

    • Encourage the eye to compare different pieces of data

    • Reveal the data at several levels of detail, from a broad overview to the fine structure

    • Serve a reasonably clear purpose: description, exploration, tabulation, or decoration

    • Be closely integrated with the statistical and verbal descriptions of a data set

    Edward Tufte, The Visual Display of Quantitative Information


    The it view of it s goals

    The IT View of IT’s Goals

    Provide services necessary to populate the Common Data Model

    • Data Architecture

    • Data Collection

    • Data Extraction, Transformation, and Loading (ETL)

    • Data Management

      Help (or do not hinder) end users in pursuit of their own goals

    • Preserve the data (i.e., their native values, formats, etc.)

    • Avoid distorting the data

    • Maintain data detail

      Foster the widespread understanding of the data

    • What the data are and are not

    • What the data can and cannot do


    It issues challenges

    IT Issues/Challenges

    Source

    Target

    (CDM)

    DataManagement

    Technical

    DataCollection

    ETL

    Design

    DataArchitecture

    DataUnderstanding

    Philosophical


    It issues challenges1

    IT Issues/Challenges

    Data Collection

    • Batch vs. Stream

    • Reception and Profiling

    • Verification to Specification

    • Culling and Cleansing

    • Staging


    Profiling

    Profiling


    Verification to specification

    Verification to Specification


    Profiling1

    Profiling


    Profiling2

    Profiling


    Verification to specification1

    Verification to Specification


    It issues challenges2

    IT Issues/Challenges

    Data Management

    • Inventory and Tracking

    • Privacy, Security, and Compliance

    • Master/Reference Data Management

    • Logging and Auditing


    Privacy

    Privacy

    Protected Health Information

    • Any information (not just textual data) in the medical record or designated data set that can be used to identify an individual, and

    • That was created, used, or disclosed in the course of providing a health care service (e.g., diagnosis, treatment, etc.)

      HIPAA regulations allow researchers to access and use PHI when necessary to conduct research. However, HIPAA only affects research that uses, creates, or discloses PHI that will be entered in to the medical record or that will be used for the provision of heath care services (e.g., treatment).

    • Research studies involving review of existing medical records for research information, such as retrospective chart review, are subject to HIPAA regulations.

    • Research studies that enter new PHI into the medical record (e.g., because the research includes rendering a health care service, such as diagnosing a health condition or prescribing a new drug or device for treating a health condition) are also subject to HIPAA regulations.

    • If in doubt, stay away from the 18 “identifiers.”


    Privacy1

    Privacy

    18 Identifiers

    1. Names;

    2. All geographical subdivisions smaller than a State, including street address, city, county, precinct, zip code, and their equivalent geocodes, except for the initial three digits of a zip code, if according to the current publicly available data from the Bureau of the Census: (1) The geographic unit formed by combining all zip codes with the same three initial digits contains more than 20,000 people; and (2) The initial three digits of a zip code for all such geographic units containing 20,000 or fewer people is changed to 000.

    3. All elements of dates (except year) for dates directly related to an individual, including birth date, admission date, discharge date, date of death; and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older;

    4. Phone numbers;

    5. Fax numbers;

    6. Electronic mail addresses;

    7. Social Security numbers;


    Privacy2

    Privacy

    18 Identifiers

    8. Medical record numbers;

    9. Health plan beneficiary numbers;

    10. Account numbers;

    11. Certificate/license numbers;

    12. Vehicle identifiers and serial numbers, including license plate numbers;

    13. Device identifiers and serial numbers;

    14. Web Universal Resource Locators (URLs);

    15. Internet Protocol (IP) address numbers;

    16. Biometric identifiers, including finger and voice prints;

    17. Full face photographic images and any comparable images; and

    18. Any other unique identifying number, characteristic, or code (note this does not mean the unique code assigned by the investigator to code the data)


    Privacy3

    Privacy

    De-identification is a possible solution. However, additional standards and criteria apply.

    • Any code used to replace the identifiers in datasets cannot be derived from any information related to the individual and the master codes, nor can the method to derive the codes be disclosed. For example, a subject's initials cannot be used to code his data because the initials are derived from his name.

    • The researcher must not have actual knowledge that the subject could be re-identified from the remaining identifiers in the PHI used in the research study. That is, the information would still be considered identifiable is there was a way to identify the individual even though all of the 18 identifiers were removed.


    Privacy4

    Privacy

    The following is NOT considered PHI, and therefore is not subject to HIPAA regulations.

    • Health information absent the 18 identifiers.

    • Data that would ordinarily be considered PHI, but which are not associated with or derived from a healthcare service event (treatment, payment, operations, medical records), not entered into the medical record, and not disclosed to the subject. Research health information that is kept only in the researcher’s records is not subject to HIPAA, but is regulated by other human subjects protection regulations.

      Examples of research health information not subject to HIPAA include such studies as the use of aggregate data, diagnostic tests that do not go into the medical record because they are part of a basic research study and the results will not be disclosed to the subject, and testing done without the PHI identifiers.

    • Some genetic basic research can fall into this category such as the search for potential genetic markers, promoter control elements, and other exploratory genetic research.

    • In contrast, genetic testing for a known disease that is considered to be part of diagnosis, treatment and health care would be considered to use PHI and therefore subject to HIPAA regulations.

    University of California, Berkeley

    Committee for Protection of Human Subjects


    It issues challenges3

    IT Issues/Challenges

    Data Extraction

    • Form (e.g., ASCII vs. EBCDIC)

    • Format (e.g., delimited, fixed-length, ragged right, etc.)

      Data Transformation

    • Reformatting (usually from flat to relational)

    • Probabilistic Matching

    • Augmentation (excluding Standardization)

    • Master <fill in the blank> Indexing

    • Standardization

      Data Loading


    Augmentation

    Person Timeline

    Drug A

    A1

    A2

    A3

    A4

    Persistence

    window

    DrugEra1

    B1

    B2

    Drug B

    Persistence

    window

    DrugEra2

    DrugEra3

    Person Timeline

    Condition A

    A1

    A2

    A3

    A4

    ConditionEra1

    B1

    B2

    Condition B

    Persistence

    window

    ConditionEra2

    ConditionEra3

    Augmentation


    Standardization

    Standardization


    It issues challenges4

    IT Issues/Challenges

    Data Architecture

    • Common Data Model Design Paradigms

    • “All models are wrong, but some are useful” George Box, Statistician

    • Flexibility vs. Intuitiveness “Compromise”


    Omop common data model conceptual

    OMOP Common Data Model (conceptual)


    Omop common data model logical

    OMOP Common Data Model (logical)


    Solution framework

    Solution Framework

    CORE BUSINESS INTELLIGENCE SERVICES

    Statistical Analysis and Validation

    Reports/ Dashboards

    Process

    Models

    OLAP, ROLAP MOLAP, HOLAP

    Business Rules/Predictive Models

    Queries

    Optimization

    FOUNDATIONAL DATA SERVICES

    Data Architecture

    SUPPORTING SERVICES

    Business Integration Services

    Presentation and Portal Services

    Systems Management Services

    Database Management System

    Data Models

    Metadata

    Data Collection

    Data Integration

    Data Management

    Verification to Specification

    Reception and Profiling

    Probabilistic Matching

    Inventory and Tracking

    Privacy, Security, and Compliance

    Augmentation

    Controlled Medical Vocabularies

    Staging for Integration

    Master Person Indexing

    Culling and Cleansing

    Logging and Auditing

    Master/Reference Data Maintenance


    Solution context

    Solution Context

    OVERALL SOLUTION STEWARDSHIP

    Strategy

    Process Intelligence

    Governance

    LIFE SCIENCES SOLUTIONS

    Scientific Applications

    Operational Reporting

    Marketing

    Study Recruitment

    Drug Safety Monitoring

    Site

    Management

    Clinical Data

    Management

    Market Intelligence

    Exploratory Data

    Analysis

    Study

    Management

    Protocol

    Feasibility

    Health Outcomes & Economics

    Licensing Intelligence

    Drug Safety Management

    Executive Dashboards

    Closed Loop

    Marketing

    CORE BUSINESS INTELLIGENCE SERVICES

    Statistical Analysis and Validation

    Reports/ Dashboards

    Process

    Models

    OLAP, ROLAP MOLAP, HOLAP

    Business Rules/Predictive Models

    Queries

    Optimization

    FOUNDATIONAL DATA SERVICES

    Data Architecture

    SUPPORTING SERVICES

    Business Integration Services

    Presentation and Portal Services

    Systems Management Services

    Database Management System

    Data Models

    Metadata

    Data Collection

    Data Integration

    Data Management

    Verification to Specification

    Reception and Profiling

    Probabilistic Matching

    Inventory and Tracking

    Privacy, Security, and Compliance

    Augmentation

    Controlled Medical Vocabularies

    Staging for Integration

    Master Person Indexing

    Culling and Cleansing

    Logging and Auditing

    Master/Reference Data Maintenance


    Challenges in transforming observational data for analysis

    Thank You

    Don Griffin ([email protected])

    Health Informatics Technology Director

    Computer Sciences Corporation

    May 20, 2009


  • Login