1 / 29

Database Management System

College Of Science & Technology- Khan younis Information Technology & Computer Science Dep. Database Management System. Part 8 Data Warehousing. Prepared by Dr. Ahmed El-Ragal. Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra. ERP. HR. CRM and eCRM. Financial/ Accounting.

Download Presentation

Database Management System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. College Of Science & Technology- Khan younis Information Technology & Computer Science Dep. Database Management System Part 8 Data Warehousing Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra

  2. ERP HR CRM and eCRM Financial/ Accounting Islands of information Internet Inventory Call Center Procurement Info sources for organizations Data Warehouse

  3. Ad-hoc Parameterized Reports Standard Reports Applications reporting systems? Data Warehouse

  4. ERP HR CRM and eCRM Financial/ Accounting Enterprise Data Warehouse Internet Inventory Call Center Procurement Data Warehousing Data Warehouse

  5. Data Warehouse

  6. Data Warehouse

  7. Data Warehouse-- Defined Collection of Data in Support of Management Reporting Needs and Decision-Making Processes. • Organized by subject areas (known as data marts) and structured for query. • Integrated across subject areas. • Associated with a point in time, such as pay period, fiscal year, semester. • Query only, that is, the data does not change. Data Warehouse

  8. E M I T M ARKET P R O D U C T A Data Warehouse Designer’s View of the Business The points inside the cube store the measurements of the business –a combination of Product, Market, and Time. These points are the business facts. The three key words become the dimensions of the cube. Data Warehouse

  9. Goals of the Data Warehouse Provide Access to various scattered Data. Include Business Descriptions of Data. Promote Consistency Across Subject Areas. Assure Data Quality. Support User Directed Analysis. Data Warehouse

  10. Runs every night The Data Loading Process OperationalData Extract Transform Load ExternalData Data Warehouse

  11. DataWarehouse BusinessDescriptions Ready for Accessand Query The Data Loading Process Extract Transform Load OperationalData ExternalData Data Warehouse

  12. Characteristics of a Data Warehouse • Subject oriented – organized based on use (on subject not on application). • Integrated – inconsistencies removed. • Nonvolatile – stored in read-only format (historical ). • Time variant – data are normally time series. • Summarized – in decision-usable format. • Large volume – data sets are quite large. • Non normalized – often redundant. • Metadata – data about data are stored. • Data sources – comes from nonintegrated sources. Data Warehouse

  13. Data Warehousing Architecture OLAP servers Analysis Query/ Reporting Data Mining Extract Transform Load External Sources Serve Operational db systems Data Marts Data Warehouse

  14. Data Warehouse vs. Data Mart Source: adapted from Strange (1997). Data Warehouse

  15. Data Extraction • Often performed by COBOL routines(not recommended because of high program maintenance and no automatically generated meta data). • Sometimes source data is copied to the target database using the replication capabilities of standard RDMS (not recommended because of “dirty data” in the source systems). • Increasing performed by specialized ETL software. Data Warehouse

  16. Reasons for “Dirty” Data • Dummy Values. • Absence of Data. • Multipurpose Fields. • Cryptic Data. • Contradicting Data. • Inappropriate Use of Address Lines. • Violation of Business Rules. • Reused Primary Keys. • Non-Unique Identifiers. • Data Integration Problems. Data Warehouse

  17. Data Cleansing • Source systems contain “dirty data” that must be cleansed. • ETL software contains rudimentary data cleansing capabilities. • Specialized data cleansing software is often used. • Leading data cleansing vendors include Vality (Integrity), Harte-Hanks (Trillium), and Firstlogic (i.d.Centric) Data Warehouse

  18. Steps in Data Cleansing • Parsing • Correcting • Standardizing • Matching • Consolidating Data Warehouse

  19. Parsing • Parsing locates and identifies individual data elements in the source files and then isolates these data elements in the target files. • Examples include parsing the first, middle, and last name; street number and street name; and city and state. Data Warehouse

  20. Correcting • Corrects parsed individual data components using sophisticated data algorithms and secondary data sources. • Example include replacing an incomplete address and adding a zip code. Data Warehouse

  21. Standardizing • Standardizing applies conversion routines to transform data into its preferred (and consistent) format using both standard and custom business rules. • Examples include adding a pre name, replacing a nickname, and using a preferred telephone. Data Warehouse

  22. Matching • Searching and matching records within and across the parsed, corrected and standardized data based on predefined business rules to eliminate duplications. • Examples include identifying similar names and addresses. Data Warehouse

  23. Consolidating • Analyzing and identifying relationships between matched records and consolidating/merging them into ONE representation. Data Warehouse

  24. Data Staging • Often used as an interim step between data extraction and later steps. • Accumulates data from asynchronous sources using flat files, FTP sessions, or other processes. • At a predefined cutoff time, data in the staging file is transformed and loaded to the warehouse. • There is usually no end user access to the staging file. • An operational data store may be used for data staging. Data Warehouse

  25. Data Transformation • Transforms the data in accordance with the business rules and standards that have been established. • Example include: format changes, deduplication, splitting up fields, replacement of codes, derived values, and aggregates. Data Warehouse

  26. Building The WarehouseTransforming Data Transform Change buyer_name reg_id total_sales buyer_name reg_id total_sales Barr, Adam II 17.60 Barr, Adam 2 17.60 Chai, Sean IV 52.80 Chai, Sean 4 52.80 O’Melia, Erin VI 8.82 O’Melia, Erin 6 8.82 ... ... ... ... ... ... Combine buyer_first buyer_last reg_id total_sales buyer_name reg_id total_sales Adam Barr 2 17.60 Barr, Adam 2 17.60 Sean Chai 4 52.80 Chai, Sean 4 52.80 Erin O’Melia 6 8.82 O’Melia, Erin 6 8.82 ... ... ... ... ... ... ... Calculate buyer_name price qty buyer_name price qty total_sales Barr, Adam .55 32 Barr, Adam .55 32 17.60 Chai, Sean 1.10 48 Chai, Sean 1.10 48 52.80 Data Warehouse O’Melia, Erin .99 9 O’Melia, Erin .99 9 8.82 ... ... ... ... ... ... ...

  27. Data Loading • Data are physically moved to the data warehouse. • The loading takes place within a “load window”. • The trend is to near real time updates of the data warehouse as the warehouse is increasingly used for decisional activities that affect operations. Data Warehouse

  28. Meta Data • Data about data. • Needed by both information technology personnel and users. • IT personnel need to know data sources and targets; database, table and column names; refresh schedules; data usage measures; etc. • Users need to know entity/attribute definitions; reports/query tools available; report distribution information; help desk contact information, etc. Data Warehouse

  29. Reviewed By Mr. Mahmoud Rafeek Alfarra

More Related