Lecture 5
This presentation is the property of its rightful owner.
Sponsored Links
1 / 25

Lecture 5 PowerPoint PPT Presentation


  • 93 Views
  • Uploaded on
  • Presentation posted in: General

Lecture 5. Themes in this session Building and managing the data warehouse Data extraction and transformation Technical issues. Building the data warehouse. Basic guidelines for the creation of a DW. Create corporate sponsors and plan thoroughly

Download Presentation

Lecture 5

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Lecture 5

Lecture 5

Themes in this session

  • Building and managing the data warehouse

  • Data extraction and transformation

  • Technical issues


Building the data warehouse

Building the data warehouse


Basic guidelines for the creation of a dw

Basic guidelines for the creation of a DW

  • Create corporate sponsors and plan thoroughly

  • Determine a scalable architectural framework for the DW

  • Identify and document all assumptions, conflicts and issues at the start of the project

  • Choose methodology and tools which are compatible with the organisation

  • Take the continuous nature of the DW life cycle into account

  • Ensure a thorough data analysis and resolve all data conflicts

  • Learn from your experience and above all, learn from your mistakes


The data warehouse life cycle

The data warehouse life cycle

Investigation

On-going data

administration

Analysis of current

environment

Implementation

Identify requirements

Development

Identify architecture

Data warehouse design


Creating a strategy for a data warehousing

Creating a strategy for a data warehousing

  • Identification of current information strategy (usually an implicit strategy)

  • Internal appraisal of the strategy

  • External appraisal of the strategy

  • Perform a strategic gap analysis

  • Enumerate strategic alternatives which can be achieved through the application of a DW

  • Choose among alternatives on the basis of the organisations mission, objectives and resources

  • Formulate a strategic statement which specifies the organisation’s utilisation of its information resources and the role the DW is to play in this utilisation


Building a business case for a data warehouse

Building a business case for a data warehouse

  • Describe the AS-IS situation

    • Identify business goals that the DW will help to fulfil

    • Identify business problems that the DW can be used to solve

  • Describe the TO-BE situation

  • Explain how the data warehouse will be used to evolve the organisation from the AS-IS to the TO-BE situation

  • Calculate the expected ROI for the data warehouse

    • identify all costs associated with the data warehouse

    • quantify all the benefits between the AS-IS and TO-BE situation


  • Creating a project plan

    Creating a project plan

    • What is the project scope?

      • determine the scope of the data

      • determine the role of technology

      • determine any temporal considerations

  • What is the business reason?

    • identify the essential purpose of the project

    • identify business drivers

  • What are the critical success factors for the project?

    • identify critical objectives

    • identify critical tasks and activities

  • What are the resource constraints on the project?

    • Determine the need for resources

    • determine the resource availability


  • Output from the planning phase

    Output from the planning phase

    • A set of activities to be performed and a set of requirements on these activities (performance metrics)

    • Documented business drivers

    • Definition of scope of data

    • Defined temporal scope

    • Business reasons

    • The overall approach

    • Participants and their roles

    • Assumptions and constraints

    • Project management strategy


    Data warehouse development activities

    Data warehouse development activities

    Architecture

    definition

    Decision maker

    needs

    Source system

    analysis

    Transform

    design

    Physical

    database

    design

    Warehouse

    development

    Data

    modelling

    End-user

    access

    development

    End-user

    access

    design

    Subject area

    analysis

    Planning and

    project initiation

    Warehouse

    populate and

    implement

    End-user

    access

    definition


    Project delivery tactics

    Project delivery tactics

    • First things first

    • Market the project

    • Adopt a customer-focused orientation

    • Deliver everything well

      Note: the project must always be able to show progress and have the ability to deliver business value


    Focus areas for the management of a data warehouse

    Focus areas for the management of a data warehouse

    • Monitor and manage data warehouse activity

    • Monitor and manage data warehouse data

    • Monitor and manage security in the data warehouse

    • Monitor and manage the data warehouse data model

    • Monitoring and managing data warehouse metadata

    • Monitoring and managing the integration and transformation interface

    • Monitoring and managing the demands of the data warehouse’s business environment


    Staffing requirements for initial data warehouse development and subsequent dw management

    Staffing requirements for initial data warehouse development and subsequent DW management

    • Data warehouse management and maintenance

      • Data warehouse administrator

      • Data warehouse organisational change manager

      • Database administrator

      • Data warehouse maintenance developers

      • Metadata Manager

    • Analysis and design

      • Business requirements analysts

      • User groups

      • Data warehouse architect

    • Data procurement

      • Data quality analyst

      • Data acquisition developer

      • Data access developer

    • IS executive sponsor


    Data warehouse end user roles and responsibilities

    Data warehouse end-user roles and responsibilities

    • Support roles

      • Iteration sponsors

      • Subject matter experts

      • User support technician

  • User types

    • Unlimited ad hoc user access users

    • Limited ad hoc access users

    • Predefined application users

  • Data warehouse initial and ongoing end-user staffing

    • Subject matter experts

    • User support technician


  • Data extraction and transformation

    Data extraction and transformation


    Data quality 1

    Data Quality (1)

    • Data should be accurate

    • Data should be stored according to data type

    • Data should have high integrity

    • Data should be consistent

    • Databases should be well designed

    • Data should not be redundant

    • Data should follow business rules

    • Data should correspond to established domains


    Data quality 2

    Data quality (2)

    • Data should be timely

    • Data should be well understood

    • Data should be integrated

    • Data should satisfy the needs of the business

    • Users should be satisfied with the data and the information derived from the data

    • Data should be complete

    • There should be no duplicate records

    • There should be no data anomalies


    A four phase process to achieve high data quality

    A four-phase process to achieve high data quality

    • Data investigation

      • parsing

      • lexical analysis

      • pattern investigation

      • data typing

    • Data conditioning and standardisation

    • Data integration

    • Data Survivorship and formatting


    Fundamental types of data transformation

    Fundamental types of data transformation

    • Simple transformation

      • data type conversion

      • date/time format conversions

      • field decoding

  • Cleansing and scrubbing

    • valid values

    • complex reformatting

  • Integration

    • simple field level mappings

    • complex integration

  • Aggregation and summarisation


  • Other transformations

    Other transformations

    • Operating system conversions

    • Hardware architecture conversions

      • affects the structure of data

      • affects the structure of programs running against the data

      • affects the computer operations needed for each environment

      • the available software which makes the different environments run

  • Application conversions


  • Types of source system extracts

    Types of source system extracts

    • Point-in-time snapshots

      • scheduled at specific points in time

      • efficient method for users to pinpoint specific points in time or ranges of time

      • unfortunately requires nearly a complete read of all operational sources of data

  • Significant business events

    • non-predetermined events drive the capture of data

    • captured as a snapshot of relevant data entities

    • triggered when a completion event is performed

  • Delta data

    • see next slide...


  • Types of source system extracts delta data

    Types of source system extracts - Delta data

    Delta data is both new and changed data, it represents changes from one point in time to the next

    • Delta data can be captured in a number of ways:

      • Operational events

      • Changed data capture

      • date last modified

      • Point-in-time comparisons


    Types of data warehouse updates

    Types of data warehouse updates

    • Insert

    • Full replace

    • Partial replace

    • Update

    • Update plus insert

    • Insert with update

    • Replace and insert


    Copy management

    Copy management

    • What to extract?

    • When to extract?

    • How to extract?

    • Transformation requirements?

    • What to transform?

    • How to transform?

    • What to update?

    • When to update?

    • How to update?

    • How to generate the necessary metadata?


    Technical issues

    Technical issues


    Major questions affecting the choice of the technical solution

    Major questions affecting the choice of the technical solution

    • Connectivity and interoperability?

    • Need for parallel processing ability?

    • Scalability?

    • Standards?

    • Single vendor/multi-vendor

    • Vendor stability and service?

    • In house competency?

    • Compatibility with existing systems architecture?

    • Compatability with IT strategy?

    • Functionality vs Cost?


  • Login