1 / 99

Dátové sklady

Dátové sklady. Pokročilé dátové technológie Genči. Obsah. Literatúra Pojem INFORMÁCIA Motivácia pre DWH Bližší pohľad na DWH Š tr uktúra DWH Metadata Komponenty DWH Nástroje (Tools). Literatúra.

stephanyl
Download Presentation

Dátové sklady

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dátové sklady Pokročilé dátové technológie Genči

  2. Obsah • Literatúra • Pojem INFORMÁCIA • Motivácia pre DWH • Bližší pohľad na DWH • Štruktúra DWH • Metadata • Komponenty DWH • Nástroje (Tools)

  3. Literatúra [1] Lacko L.: Datové sklady, analýza OLAP a dolování dát s příklady … . Computer Press. Brno. 2003 [2] Paulraj Ponniah: Data Warehousing Fundamentals: A Comprehensive Guide for IT Professionals. 2001. John Wiley & Sons, Inc.ISBNs: 0-471-41254-6 (Hardback); 0-471-22162-7 (Electronic)

  4. Literatúra (pokr.) [3] Ralph Kimball, Margy Ross: The Data Warehouse Toolkit. Second Edition. 2002. Wiley Computer Publishing. [4] W. H. Inmon: Building theData WarehouseThird Edition. 2002. John Wiley & Sons, Inc.

  5. Literatúra (pokr.) [5] Inmon W., Strauss D., Neushloss G.:DW 2.0: THE ARCHITECTURE FOR THE NEXT GENERATION OF DATA WAREHOUSING, Paperback, 400 pages, ISBN-13: 978-0-12-374319-0, MORGAN KAUFFMAN

  6. Pojem INFORMÁCIA • Podľa firemnej literatúry ORACLE sa údaje stávajú informáciami, ak • máme údaje; • vieme, že máme údaje; • vieme, kde máme tieto údaje; • máme k nim prístup; • zdroju údajov môžeme dôverovať.

  7. Hierarchia informačných úrovní Múdrosť Znalosti Informácie Údaje

  8. Motivácia pre DWH • Exekutíva potrebuje informácie (napr.) kvôli rozhodnutiu: • kde postaviť ďalší sklad; • ktorú produktovú líniu rozvíjať; • ktorý tržný segment by mal byť posilnený • t.j. potrebuje realizovať strategické rozhodnutia a pre ne potrebuje strategickú informáciu

  9. Strategická informácia • Nemôžu ju poskytnúť OLTP systémy • Neslúži pre denno-denné riadenie spoločnosti • Dôležitá pre zdravý vývoj a prežitie spoločnosti • Kritické rozhodnutia závisia od správnej (korektnej, patričnej) strategickej informácie

  10. Požadované vlastnosti strategickej informácie

  11. „Vstup“ dát

  12. „Výstup“ informácií

  13. Vyplývajúce protirečenia • Organizácie majú veľké množstvo dát ale • IT zdroje a systémy nie sú schopné efektívnym spôsobom toto množstvo dát premeniť na strategickú informáciu

  14. Informačná kríza • Nie kvôli nedostatku dát, ale preto, že dáta nie sú použiteľné pre strategické rozhodovanie • Dôvody: • Údaje sú v spoločnostiach rozložené naprieč mnohými typmi nekompatibilných štruktúr a systémov • Údaje sú v spoločnostiach uložené v rôznych nezlúčiteľných systémoch, viacerých platformách a rozmanitých štruktúrach

  15. These operational systems (order processing, inventory control, claims processing, outpatient billing, ...) are not designed or intended to provide strategic information. • If we need the ability to provide strategic information, we must get the information from altogether different types of systems. • Only specially designed decision support systems or informational systems can provide strategic information.

  16. Rozdiely

  17. Processing Requirements in the New Environment Most of the processing in the new environment for strategicinformation will have to be analytical. There are four levels of analytical processing requirements: • Running of simple queries and reports against current and historical data • Ability to perform “what if ” analysis in many different ways • Ability to query, step back, analyze, and then continue the process to any desired length • Spot historical trends and apply them for future results

  18. Data warehousing concept • Take all the data you already have in the organization, clean and transform it, and then provide useful strategic information.

  19. Data warehousing concept One of the most important approaches to the integration of data sources is based on a data warehouse architecture. In this architecture, data coming from multiple external data sources (EDSs) are extracted, filtered, merged, and stored in a central repository, called a data warehouse (DW). Data are also enriched by historical and summary information. From a technological point of view, a data warehouse is a huge database from several hundred GB to several dozens of TB. Thanks to this architecture, users operate on a local, homogeneous, and centralized data repository that reduces access time to data. Moreover, a data warehouse is independent of EDSs that may be temporarily unavailable. However, a data warehouse has to be kept up to date with respect to the content of EDSs, by being periodically refreshed.

  20. Bližší pohľad na DWH

  21. Functional definition of the data warehouse The data warehouse is an informational environment that: • Provides an integrated and total view of the enterprise • Makes the enterprise’s current and historical information easily available for decision making • Makes decision-support transactions possible without hindering operational systems • Renders the organization’s information consistent • Presents a flexible and interactive source of strategic information

  22. DWH – zmes technológií

  23. Bill Inmon’s definition Bill Inmon, considered to be the father of Data Warehousing provides the following definition: • “A Data Warehouse is a subject oriented, integrated, nonvolatile, and time variant collection of data in support of management’s decisions.”

  24. The data in the data warehouse is • Separate • Available • Integrated • Time stamped • Subject oriented • Nonvolatile • Accessible

  25. Subject-oriented

  26. Integrated Data

  27. Integrated Data (2) Before the data from various disparate sources can be usefully stored in a data warehouse, you have to: • remove the inconsistencies; • standardize the various data elements; • make sure of the meanings of data names in each source application.

  28. Integrated Data (3) • Before moving the data into the data warehouse, you have to go through a process of transformation, consolidation, and integration of the source data. • Here are some of the items that would need standardization: • Naming conventions • Codes • Data attributes • Measurements

  29. Nonvolatile Data

  30. Time-Variant Data • For an operational system, the stored data contains the current values. • The data in the data warehouse is meant for analysis and decision making. • A data warehouse, because of the very nature of its purpose, has to contain historical data, not just current values. Data is stored as snapshots over past and current periods. • Every data structure in the data warehouse contains the time element.

  31. Time-Variant Data (2) The time-variant nature of the data in a data warehouse • Allows for analysis of the past • Relates information to the present • Enables forecasts for the future

  32. Data Granularity

  33. DATA WAREHOUSES AND DATA MARTS

  34. OVERVIEW OF THE COMPONENTS

  35. Štruktúra DWH

  36. Source data component • Production systems • Internal data (spreadsheets) • Archived data (tapes) • External data (stocks, interest rates, …)

  37. Data Staging Component • Data Extraction. • Data Transformation. • Data Loading.

  38. Data Movement to the data Warehouse

  39. Information Delivery Component

  40. METADATA IN THE DATA WAREHOUSE

  41. WHY METADATA IS IMPORTANT Users to compose and run the query can have several important questions: • Are there any predefined queries I can look at? • What are the various elements of data in the warehouse? • Is there information about unit sales and unit costs by product? • How can I browse and see what is available? • From where did they get the data for the warehouse? From which source systems? • How did they merge the data from the telephone orders system and the mail orders system? • How old is the data in the warehouse? • When was the last time fresh data was brought in? • Are there any summaries by month and product?

  42. Metadata in a data warehouse contains the answers to questions about the data in the data warehouse.

  43. Different definitions for metadata • Data about the data • Table of contents for the data • Catalog for the data • Data warehouse atlas • Data warehouse roadmap • Data warehouse directory • Glue that holds the data warehouse contents together • Tongs to handle the data • The nerve center

  44. Metadata in OLTP • In operational systems we do not really have any easy and flexible methods for knowing the nature of the contents of the database. • There is no great need for user-friendly interfaces to the database contents. • The data dictionary or catalog is meant for IT uses only.

  45. Metadata in DWH • Users need sophisticated methods for browsing and examining the contents of the data warehouse. • Users need to know the meanings of the data items. • Users have to prevent them from drawing wrong conclusions from their analysis through their ignorance about the exact meanings. • Without adequate metadata support, users of the large data warehouses are totally handicapped.

  46. Types of Metadata • Metadata in a data warehouse fall into three major categories: • Operational Metadata • Extraction and Transformation Metadata • End-User Metadata

  47. Operational Metadata • Data for the data warehouse comes from several operational systems of the enterprise. • These source systems contain different data structures. • The data elements selected for the data warehouse have various field lengths and data types. • In selecting data from the source systems for the data warehouse, you split records, combine parts of records from different source files, and deal with multiple coding schemes and field lengths. • When you deliver information to the end-users, you must be able to tie that back to the original source data sets. • Operational metadata contain all of this information about the operational data sources.

More Related