Aggregating Data From Your Contract PartnersData Warehousing & Business IntelligenceMass Biotech CouncilJuly 11, 2008 John Postle Vice President Enterprise Life Sciences
Highlights • Life After Data Migration • Concepts of Data Warehousing • Architecting the Data Warehouse • Components to Successful Data Warehousing • Turning Data into Information • Business Intelligence Tools
CDMS Safety Migration & ETL Process Data Warehouse or Datamart EDC Analytics Preclin Life After Data Migration Data
Master Meta Data Repository CDMS Safety Homogenous Data Migration & ETL Process EDC Preclin Life After Data Migration KEY OUTPUTS OF ETL Single Source of The TRUTH
Concepts of Data Warehousing • A data warehouse is a repository of an organization's electronically stored data. Data warehouses are designed to facilitate reporting and analysis(1) • The data warehouse provides a common view of data regardless of the data’s source • Provides decision makers with intuitive access to a single, clean, consistent source of information • Replaces fragmented decision-making process with a single integrated process • Integrates and consolidates information across different contract partners (CRO’s , CMO’s, etc.) • All Data is retained, no data is overwritten. (1) Inmon, W.H. Tech Topic: What is a Data Warehouse? Prism Solutions. Volume 1. 1995
Architecting the Data Warehouse • Data Warehouse vs. Transaction Systems • Transaction Systems are designed to process a lot of individual transactions one at a time at a very high rate of speed. • Data Warehouses are built to provide access to large amounts of information across the database in a single query.
Architecting the Data Warehouse • Dimension and Fact tables- • Fact Tables contain the core transactional information (number, units, metrics, etc.) • Dimension Tables provide the contextual information that breathes life into the “Fact” data.
Architecting the Data Warehouse • Top Down vs Bottom Up Approach for designing the Data Warehouse • Top Down – Bill Inmon one of the pioneers of Data Warehouse was in favor of the “Top Down” approach. This approach can be considered the “Over Engineered Approach” requiring for the designing of all components prior to doing anything. • Bottom Up – Ralph Kimball who is considered the father of Data Warehouse is an advocate for the “Bottom Up” approach. In this approach data marts (mini data warehouses) are created to support individual business processes and then linked together. • Top Down is very costly and time consuming while Bottoms Up can start delivering a return in a short amount of time.
Architecting the Data Warehouse • Design Terminology • Star Schema vs. Snowflake Schema • Star Schema consists of a “Fact” table containing the numeric information and multiple “Dimension” tables with all of the identifying information • Snowflake Schema is closely related to star schema and consist of the same “Fact” table but the difference is that there are multiple layers of dimension tables.
Application Packages ODS Warehouse Admin. Tools OLTP Tools ETL Tool Safety CDMS Central Data Warehouse Mid- Tier Local Metadata Extract, Transform and Load EDC Data Mart Mid- Tier PreClin Central Local Metadata RDBMS Metadata Exchange Metadata Data Modeling Tool Data Cleansing Tool ERP MOLAP Local Metadata Data Mart RDBMS Source Databases Data Extraction, Transformation, Load Data Access and Analysis Architecting the Data Warehouse Central Data Warehouse Architected Data Marts The Final Picture
Components to a Successful Data Warehouse • Business Driven – ensure that the data warehousing application is driven by a REAL business need, not technology • Know your Data – Identifying all interrelationships and common data definitions across heterogeneous systems • NO DIRTY DATA – “Dirty Data” is source data that contains missing, inconsistent, or erroneous data • Single Source of the TRUTH – Do not alter, update, correct or change data in the data warehouse. All changes in data occur in the source transaction system. • Central Metadata – have only one instance of Meta Data Across the entire data warehouse and associated data marts. • Data Transformation Tool – Select the correct data transformation tool (ETL) and only select ONE
Components to a Successful Data Warehouse • Source Data at the Atomic Level- capture source data at the lowest level of granularity • Store Summarized Data in Data Marts – to expedite end user data queries store summarized and aggregated data into multiple data marts. • Obtain Early Wins – to continue to gain support deliver some early wins • Easy End User Tools – make sure the end user access tools are intuitive and easy to use.
Components to a Successful Data Warehouse • Hardware Rules – Do not underestimate the amount of computing power needed for a data warehouse. • Scalability and Performance – address in the design stage how scalability will be handled (Data & Users) as well as performance benchmarks. • Outsource Your Data Warehouse – Small and Medium size organizations find it cost effective to outsource data warehouse development and ongoing maintenance.
Turning Data Into Information • Data vs Information • Data is just RAW data elements from the source systems that gives no insight into the meanings of the data. Where As • Information is the packaging of the data elements from the source systems into a framework that can answer questions as well as generate new ones that can be answered.
Turning Data Into Information • Presentation Layer – In a Data Warehouse this information is provided to end user via the “Presentation Layer” • Dashboards & Portals – pre-tailored screens that present KEY metrics • “Push Reporting” – pre-configured reports that are distributed automatically across the organization • “Pull Reporting” – Intuitive end user access tools that allow them to access their own questions and “Drill Down to the base data” • Slicing, Dicing and Offline – the capability for users to slice and dice data, create new views of the information and even take it offline to play with.
Business Intelligence • Business Intelligence (BI) – • refers to technologies, applications and practices for the collection, integration, analysis, and presentation of business information to support better business decision making • OLAP, MOLAP, ROLAP – These are all variations on the same theme to support BI • OLAP or MOLAP (Online Analytical Processing) – is the compilation of the data into Multi Dimensional Cube that enables results to complex queries to be returned in less than 1% of the time of a normal Query • ROLAP (Relational Online Analytical Processing) – does not use cubes but indexes the database directly to expedite the return of data queries. Slower but more Flexible
Business Intelligence • Goal of BI – • Easy Access for all to information needed to make business decisions • Most users of BI are Passive (Dashboards / Reports) • Power User need to understand the data elements and how to use the tools to venture down new paths • Goal is only being met by a select few • Data Mining – Process for sorting through large amounts of data and picking out relevant information to a question or to other data. • Biostatistics and Clinical Trials – areas where Data Mining is beneficial and cost effective
Business Intelligence BI Players • The list of BI Players are extensive but with much Consolidation of late • Hyperion Solutions (Essbase) – acquired by Oracle • Cognos & Applix – acquired by IBM • Business Objects – acquired by SAP • Microsoft • MicroStrategy
Business Intelligence What is the Future of BI • BI 2.0 – refers to a new generation of BI software that is • more pro-active than reactive • search and querying abilities based on the relationship among objects (Semantic Data Model) • Google on Steroids
Questions?? Thank you… John Postle Vice President Enterprise Life Science email: Postle@courtsquaregroup.comwebsite: www.courtsquaregroup.com