1 / 28

Cash Registers & Satellites

Cash Registers & Satellites. Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD . Purpose. Improve communication between NOAA’s developers and the wider community of data management professionals

ina
Download Presentation

Cash Registers & Satellites

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD

  2. Purpose Improve communication between NOAA’s developers and the wider community of data management professionals • Introduce vocabulary • Identify NOAA applications that can be described using common vocabulary

  3. Agenda • Universal Data Management Challenges • Notional Data Warehouse Architecture • Data Modeling Approaches • Relational • Dimensional

  4. I. Universal Data Management Challenges

  5. Data Mining Example: “Market Basket Analysis” • Decisions: • Move beer display closer to the diaper display • On Thursdays, sell beer & diapers at full price • Rationale: • 1) When men bought diapers on Thursdays and Saturdays, they also tended to buy beer • 2) Men typically did their weekly grocery shopping on Saturdays • 3) On Thursdays, they only bought a few items

  6. Many Disciplines Mine Their Data • Law Enforcement - Optimal Deployment • Health Care – Coverage Risks • E-Commerce – Pop-up/Link Selection • Medicine – Gene/Disease Associations • Etc. Data Management Goal Develop systems in which the data and procedures are configured to answer questions that are important to the enterprise

  7. NOAA’s Future We are not unique. Any enterprise that collects large amounts of data has the same kind of challenges and goals • Integrating Global (Environmental Observations) and Data Management • Ensuring Sound, State-of-the-Art (Research) • Developing, Valuing, and Sustaining a World-Class Workforce

  8. We can find valuable expertise outside the NOAA community • Ask the same kinds of questions as those challenged with similar problems • Understand the constructs and vocabulary • Architectures • Data Modeling

  9. II. Notional Data Warehouse Architecture

  10. “Hub and Spoke Architecture” Application Specific“Data Marts” use ”OLAP” Technologies () “ETL” = Extract, Transform and Load Internal Data Transform & “Cleanse” Data Ware-house Data Staging Area External Data Application Neutral “OLAP” = Online Analytical Processing

  11. Marketing Floor Management Data Ware-house Data Staging Area Human Resources Real Estate Accounting Retail ApplicationHub and Spoke Architecture Sales Data Transform & Cleanse External Customer Lists Application Neutral OLAP Data Marts (Application Specific)

  12. Climate Prediction Weather Forecast Notional NOAA Hub and Spoke Architecture NOAA Applications (Data Marts using OLAP) CLASS ESPC Data Staging Area (Rich Inventory?) Data Ware-house Ecosystems Management Transform & Cleanse Other Satellite Archives Data Centers Commerce & Transportation External Customers Application Neutral

  13. III. Data Modeling Approaches

  14. “Relational” Vocabulary • “Relational” technologies • Relational Data Base Management Systems (RDBMS) • COTS Products (INFORMIX, DB2, ORACLE, MS/SS, etc.) • Proprietary data management/manipulation software • RDBMS Extensions (Most COTS products built on an RDBMS) • GUIs, CASE Tools, COOP, Application Generators, Security, etc. • “Relational” Data Models - Evolutionary approach to data base design • Conceptual Entity Relationship Diagrams (ERD) used to identify data requirements, relationships, rules • Diagrams • Data Dictionaries • Logical ERDs used to normalize (eliminate redundancies) • Physical models are the Table Schema entered into the RDBMS • Online Transaction Processing (OLTP) • e.g., CLASS

  15. key.. … … key.. … … key.. … … key.. … … Entity Relationship Diagram (ERD) The foundation of all OLTP systems, such as CLASS Entity Entity Class Relationship Attributes Cardinality (1, Many, or 0) Attributes, entities, and relationships are described in the data dictionary

  16. key.. … … key.. … … key.. … … Object Models “inherit” ERD constructs Object Class key … … Behavior: >>>> >>>>

  17. Pros & Cons of systems based on Relational models • Strengths • Referential integrity • Data locking • Fast Look-up and Retrieval • GUIs • Weaknesses • Entity proliferation • Users don’t understand them • Complex code must be written to accumulate multiple instances (Hard to use for Data Mining)

  18. Dimensional Data Models • Fact • An instance of numeric data • Dimension • Foreign key • Fact Table • Key is a concatenation of foreign keys (dimensions) • An instance can have dozens of foreign keys • Millions of instances (rows) often required • Programmers revenge on Data Base Administrators • Break many relational “rules” • Re-invented often

  19. A “Dimensional” Data Model for Retailing • Who (buys, sells) • Customer (age, gender, marital status, occupation, etc.) • Sales person ( “ , “ , training, etc.) • Cash Register • What (products) • Brand, color, size, type, etc • When • Time of day, day of week, season • Where • Store (location, size, type), Shelf • Why • Promotions, advertising, discounts, economic trends • How much (was spent) • Per product, per total sale

  20. Time Dimension Product Dimension Time_key DayofWeek Fiscal period Product_key Description Brand Sub Category Category Dept Flavor Package Type Customer Dimension Store Dimension Store_key StoreName Address FloorType Etc. Customer_key CustomerName Purchase Profile Etc. Register Dimension Register_key Location Type Etc. Promo Dimension Clerk Dimension Promo_key PromoName PriceType AdType Etc. Clerk_key ClerkName JobGrade Etc. Classical Star Schema: Point of Sale FACT Time_key Customer_key Store_key Clerk_key Promo_key Product_key Register_key Dollars Sold Units Sold Dollars Cost

  21. Time Dimension Time_key DayofWeek Fiscal period Customer Dimension Store Dimension Customer_key CustomerName Purchase Profile Etc. Store_key StoreName Address FloorType Etc. Register Dimension Register_key Location Type Etc. Promo Dimension Clerk Dimension Promo_key PromoName PriceType AdType Etc. Clerk_key ClerkName JobGrade Etc. Snowflake Schema: Point of Sale Brand-ID_PK Maker-Desc Model-Num_PK Model-Desc FACT Sub-Type_PK Sub-Type-Desc Time_key Customer_key Store_key Clerk_key Promo_key Product_key Register_key Dollars Sold Units Sold Dollars Cost Product Dimension Product_Type_PK Product_Type_Desc Sub-Type_PK Sub-Type-Desc Sub-Type_PK Sub-Type-Desc Brand-ID_PK Maker-Desc Model-Num_PK Model-Desc Sub-Type_PK Sub-Type-Desc

  22. Metadata in Dimensional Modeling • NOAA usage: • If it’s not a fact • If it’s not a key • It’s metadata • Conventional Dimensional usage: • If it’s not a fact • If it’s not a key • It’s documentation BUT • If it’s a key • It’s metadata (because it describes the fact)

  23. Which Satellite Instrument When Orbit, UTC, Season, decade, epoch, etc Where Geospatial coordinates Who User affiliation Developer affiliation FACT: How much? Temperature, moisture, radiance, color, etc. Dimensional Models for NOAA

  24. Time Dimension Product Dimension Time_key UTC of Obs’n UTC of receipt LocalT of Obs’n Orbit_Id Etc. Product_key Product Name Description System Sub System Etc. Altitude Dimension Location Dimension Altitude_ key Distance above SL Etc. Location key Geo-Coordinates of Obs’n Etc. Satellite Dimension Satellite_key Name Position Instrument Dimension Instrument_key Name Description A NOAA Star Schema? FACT TABLE Time_key (fk) Location-key (fk) Altitude key (fk) Product_key(fk) Satellite_key(fk) Instrument_key (fk) Temperature

  25. Pros & Cons of systems based on dimensional models • Strengths • Very few “entity types” needed • Decision Support Systems (DSS) • End-Users construct complex queries by selecting dimensions from a GUI • Statistical analysis of very large data bases • Artificial Intelligence (AI) • Automated scheduling of continuous executions • System identifies (“discovers”) new relationships • Discoveries shape successive execution • Weaknesses • Development Cost • Storage • Operational Cost - Requires much “care and feeding”

  26. False Dichotomy: Relational “vs.” Dimensional • Relational and dimensional systems are not mutually exclusive • Data warehouses usually extract fact tables from relational data bases • Data warehouse capabilities are extensions in RDBMSs • Depends on the business • Feasibility: Is the application data good enough for ETL? • ROI: Does the business benefit outweigh the cost?

  27. SUMMARY: • NOAA’s data mining challenge is similar to that of other enterprises • A world-wide community of IT professionals uses a particular vocabulary to address the challenge • Relational technologies & models are the essential first step • Dimensional technologies & models come next

  28. Questions Stan Cutler Mitretek System/NESDIS/OSD Stanley.Cutler@nooa.gov 301-457-5210 ex 163

More Related