1 / 49

August 11, 2003

31 Schema Flavors: Build a Warehouse Session 37018 OracleWorld San Francisco 2003 Robert Corfman Boeing Commercial Airplanes Information Systems. August 11, 2003. What is this Session?. Background on the Boeing Tool Management Project COTS Package Enhancements COTS Limitations

manny
Download Presentation

August 11, 2003

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 31 Schema Flavors: Build a Warehouse Session 37018 OracleWorld San Francisco 2003 Robert Corfman Boeing Commercial Airplanes Information Systems August 11, 2003

  2. What is this Session? • Background on the Boeing Tool Management Project • COTS Package Enhancements • COTS Limitations • Warehouse Limitations / Overview Design • Specific Code Examples / Techniques • Replication Usage • Detecting Schema Differences • Error Capture, Propagation, Reporting • Performance Issues

  3. Presentation Perspective • Application Architect • Responsible for technical application decisions • Enterprise project and Puget Sound responsibilities • Programmer • Multiple technical facets • COTS modification specs, delivery, testing, deployment • Radio Frequency Implementation • Data Conversion • Legacy application interfaces • Warehouse Not a Warehousing Specialist . . .

  4. 31 Schema Flavors Project Background And Warehouse Design Decisions How did we get where we are?

  5. Boeing Tool Standardization • Perishable / Portable Tools and Shop Supplies • Drill Motors, Safety Glasses, Reamers … “Non-Production” • Retirement of multiple legacy applications • Mergers and acquisitions • Robust home-grown applications • Enterprise Commodity Standardization • Internal customer base already communicating • Enterprise application • ‘Loose federation’ of independent site projects

  6. Everett, Renton, Auburn, Seattle Portland Philly Salt Lake City St Louis Wichita Oak Ridge Tulsa Macon Cecil Field Corinth Mesa San Antonio Site Distribution

  7. COTS Package • Site vs. Enterprise – daily use vs. high leverage • Picked best identified product for‘site-level’ usability • No Enterprise features in selected product • Cross-site visibility, item linkage, or transfers • Supplier to build-in needed features • No customizations Be Nimble, Keep the Product Standard

  8. Data Warehouse Design Decisions • Don’t break the site-level features/usability • Main reason for selecting this product… • Integrate warehouse features into the product • Allow continued site operations when warehouse is unavailable • Drove decision not to integrate all sites into a single Corporate-level database • Create separate roll-up database of all participating sites Site Operations  Company Asset

  9. What Data Goes in the Warehouse? • Data Subsets? Data Summarizations? • How do we decide what data to store? Summarize? • Volatile decision, different for different Companies • Drove decision to replicate all data into the warehouse and not pre-define summarizations • Application runs on multiple databases • Expected warehouse structure defined by COTS • Decision on exactly how to get data into designed warehouse left outside of COTS package • Additional tables and summaries could be added,but without direct integration in COTS package Be Nimble, Keep the Product Standard

  10. Application Look and Feel Site Data View

  11. Application Look and Feel Warehouse Data View

  12. Application Look and Feel Inventory Reports View

  13. Warehouse Design Considerations • 7 x 24 Operations • Allowed to be non-real-time • Includes Oracle and SQL Server sites • COTS package allows customization of site • User-defined fields in standard tables • Modification of field lengths from standard • Warehouse needs to handle‘greatest common denominator’ • Desire to leverage replication

  14. Warehouse Design Options • Updateable Snapshots • Warehouse is the Master, each site an updatable snapshot • Doesn’t work if sites have schema variants • Master control of data no longer at the sites • Replication directly into Warehouse • Same schema variant problem • Multiple “replicants” • Custom procedures to consolidate into Master warehouse • Allows Replication with Master control at site • Accommodates variation in site schemas ETL software now considered a possibility . . .

  15. Site 1 Snapshot Site 2 Snapshot Site 3 Snapshot Site N Snapshot Proposed Warehouse Design Warehouse Site Company Data Warehouse Site 1 (Unix) Site 2 (NT) Site 3 (NT) Site N Oracle Oracle SQL Server Site Data

  16. Initial Issues with Design • Could perform Roll-up via Views • Create or replace view as • Select 1 siteno,item,desc,null,null from site1.table • Union Select 2,item,desc,cust1,null from site2.table • Union Select 3,item,desc,null,cust2 from site3.table; • Dynamically shows replicated data changes • Significant difference in query times • Oracle 9.2 showed an Order of magnitude difference in query times • Oracle 8.1.7 was significantly worse • this was the version we were using when the decision was made not to use views

  17. Initial Issues with Design • Strong desire to have 7x24 data availability • Complexity of rebuilding in place • Difficult to update existing data . . . • Leverage transparency of synonyms • Maintain two copies of warehouse • Rebuild, then re-direct • If rebuild fails, don’t re-direct • Built-in contingency . . . Be Prepared for Success . . . (or Failure)

  18. Site 1 Snapshot Site 2 Snapshot Site 3 Snapshot Site N Snapshot Final Warehouse Design Warehouse Site Company Data Warehouse Warehouse Data 1 Warehouse Data 2 Site 1 (Unix) Site 2 (NT) Site 3 (NT) Site N Oracle Oracle SQL Server Site Data

  19. Replication of Oracle Data • Standard Oracle fast refresh functionality • Used to move data to the master warehouse site • Flexible schedule, avoid replication during rollup • Dbms_snapshot.refresh vs. Refresh Groups • Groups provided cross-table read consistency, required single commit • Perceived difficulties in schedule variations • Difficulty in automating snapshot validity prior to refresh • Custom column additions/deletions, column size changes • Snapshot refresh chosen, acceptance of potential for minor data inconsistencies • Managed replication via packages • One package set on snapshot-site, companion package on site • Designed validate and rebuild snapshot logs and snapshots as required

  20. Oracle Snapshot Verification Verify_snapshots_sql := 'select distinct upper(site.table_name)' || ' from user_tab_columns snap, ' || ' user_tab_columns@'|| db_link || ' site' || ' where upper(site.table_name) = upper(snap.table_name(+))' || ' and upper(site.column_name) = upper(snap.column_name(+))' || ' and upper(site.table_name) not like ''MLOG$%''' || ' and upper(site.table_name) not like ''RUPD$%''' || ' and upper(site.table_name) not in ' || ' (select upper(tablename) from exclude_whse@' || db_link || ')' || ' and (' || ' snap.column_name is null' || ' or snap.data_type != site.data_type' || ' or nvl(snap.data_precision,38)!=nvl(site.data_precision,38)' || -- default to 38 as replicated number have precision of 38... ' or nvl(snap.data_length,-1) != nvl(site.data_length,-1)' || ' or nvl(snap.data_scale,-1) != nvl(site.data_scale,-1)' ' )'; Never Apologize . . .

  21. Replication of SQL Server Data • Expected to have full refresh • Site administrators has strong preference against… • Apparent Options • Oracle Transparent Gateway • Two-part Replication • Microsoft Incremental replication to central LAN • Full replication to warehouse • Final Decision… • Site researched push incremental replication from SQL Server to Oracle … It works. • Site schema changes are manually handled

  22. Build a Warehouse More Code For Rolling the Schemas Together How did we get there from here?

  23. Automation with Configuration • Packaged functionality run with Oracle Jobs • Table-driven configuration • Which schemas are included • Snapshot and Warehouse Tables build parameters • with defaults and individual table overrides • Primary Keys and Indexes • Includes infrastructure features • Logging • Full error trapping • Continued operation after many failures • Statistical performance information • Positive response email results Keep the DBAs Happy . . . What goes around . . .

  24. Warehouse Build Steps • Drop sub-warehouse objects • Cycle through snapshots and build / modifysub-warehouse tables • Bulk insert each snapshot table into sub-warehouse • Create primary keys and indexes • Compute statistics • Switch master warehouse synonyms to point at“new” sub-warehouse (upon successful completion) • Email rebuild results

  25. Drop Sub-Warehouse Objects (Simple) PROCEDURE DROP_WAREHOUSE_TABLES IS cursor warehouse_tables is select table_name from user_tables order by 1; mysql varchar2(255); BEGIN FOR tables_rec IN warehouse_tables LOOP mysql := 'DROP TABLE '|| tables_rec.table_name; EXECUTE IMMEDIATE mysql; END LOOP; END DROP_WAREHOUSE_TABLES;

  26. Drop Warehouse Objects-Add Logging PROCEDURE DROP_WAREHOUSE_TABLES IS cursor warehouse_tables is select table_name from user_tables order by 1; mysql varchar2(255); me ERR_TYPE; BEGIN Me.my_name := 'DROP_WAREHOUSE_TABLES'; log_activity(me,'Drop Warehouse tables started','S'); FOR tables_rec IN warehouse_tables LOOP log_activity(me,'Dropping '|| tables_rec.table_name); mysql := 'DROP TABLE ' || tables_rec.table_name; EXECUTE IMMEDIATE mysql; END LOOP; log_activity(me,'Drop Warehouse tables completed','C'); END DROP_WAREHOUSE_TABLES;

  27. Drop Warehouse Objects-Add Error Traps PROCEDURE DROP_WAREHOUSE_TABLES IS cursor warehouse_tables is select table_name from user_tables order by 1; mysql varchar2(255); me ERR_TYPE; BEGIN Me.my_name := 'DROP_WAREHOUSE_TABLES'; log_activity(me,'Drop Warehouse tables started','S'); FOR tables_rec IN warehouse_tables LOOP log_activity(me,'Dropping '|| tables_rec.table_name); BEGIN mysql := 'DROP TABLE ' || tables_rec.table_name; EXECUTE IMMEDIATE mysql; EXCEPTION WHEN OTHERS THEN INTERNAL_PROC_ERROR(me, SQLERRM, SQLCODE, mysql); END; END LOOP; log_activity(me,'Drop Warehouse tables completed','C'); IF me.cnt_errors > 0 THEN raise_application_error(-20000, APP_ERR_MSG(me)); END IF; END DROP_WAREHOUSE_TABLES;

  28. Warehouse Build Steps • Drop sub-warehouse objects • Cycle through snapshots and build / modifysub-warehouse tables • Bulk insert each snapshot table into sub-warehouse • Create primary keys and indexes • Compute statistics • Switch master warehouse synonyms to point at“new” sub-warehouse (upon successful completion) • Email rebuild results

  29. Table Driven Configuration . . . 6 TULSA 7 EVERETT 8 RENTON 9 AUBURN . . . Warehouse_Sites SiteNo Schema_Name • Allows for Adding and Removing Sites as needed PROCEDURE VERIFY_TABLES IS BEGIN Loop Through Each Schema (With Error Trapping and Logging) Verify_All_Schema_Tables(Schema_Name); End Loop; END VERIFY_TABLES;

  30. Build Warehouse Tables • Verify Schema Tables performs the major warehouse structure build-up work • 31 Flavors . . . more similar than different • First Schema builds majority of tables / columns • Follow-on schemas modify structure as needed • Greatest common denominator • Loops through all table columns usingdata dictionary information • Very similar to snapshot verification routines • Column names, data type, length, precision, scale

  31. More Table Driven Configuration Storage_Parms Table_Size Table_Storage Index_Storage Table_Sizes Table_Name Table_Size STANDARD TABLESPACE WH128K_DATA STORAGE(INITIAL 128K NEXT 128K) NOLOGGING TABLESPACE WH128K_INDX STORAGE(INITIAL 128K NEXT 128K) NOLOGGING LARGE TABLESPACE WH4M_DATA STORAGE(INITIAL 4M NEXT 4M) NOLOGGING TABLESPACE WH4M_INDX STORAGE(INITIAL 4M NEXT 4M) NOLOGGING Select ' ' || Table_Storage into BUILD_PARMS from Storage_Parms Where Table_Size = 'LARGE'; My_SQL := Create_Table_Statement || BUILD_PARMS; Execute Immediate My_SQL; Used when Adding a New Table

  32. Warehouse Build Steps • Drop sub-warehouse objects • Cycle through snapshots and build / modifysub-warehouse tables • Bulk insert each snapshot table into sub-warehouse • Create primary keys and indexes • Compute statistics • Switch master warehouse synonyms to point at“new” sub-warehouse (upon successful completion) • Email rebuild results

  33. Insert Mechanics • Data Warehouse schema is the sum of every site • Similar technique to identifying schema • Loop through each schema and table • Dynamically build Insert Statement • Based on site columns • Include unique ID for each schema • Same logging techniques built-in • Includes actual SQL statement • Error trapping and continuation built-in

  34. Minimizing Archive Logs with NoLogging • Archive logging in the production warehouse instance • Some configuration data should be archived • ‘Normal’ Boeing Operation to have Archive Logs in Production • Warehouse data is recoverable from sites • Nologging specified on tablespaces and tables • Oracle still logs changes and inserts • Daily warehouse rebuild generates Significant Archive Log activity • Big Surprise with initial production use Ignorance is Bliss ???

  35. Using Bulk Inserts • Bulk load can be specified on inserts • Direct Load with sql*loader • Append hint with SQL • Only affects tables specified with NoLogging Insert /*+ APPEND */ into warehouse.inventory (siteno, item, description)Select 1,item,description from site1.inventory; • Invalidates session table use till commit ORA-12838:cannot read/modify an object after modifying it in parallel

  36. Warehouse Operations Log Warehouse_Op_Log ID Log_Date Proc Log_Type Log_User Message Error Log_Rows 1617893 08/08/03 02:08:02 INSERT_SITE_TABLE_DATA I CMPWH Inserting From WICHITA.STATION 0 0 1617894 08/08/03 02:08:03 INSERT_SITE_TABLE_DATA I CMPWH Data Inserted from WICHITA.STATION (5031 Rows) 0 5031

  37. Warehouse Build Steps • Drop sub-warehouse objects • Cycle through snapshots and build / modifysub-warehouse tables • Bulk insert each snapshot table into sub-warehouse • Create primary keys and indexes • Compute statistics • Switch master warehouse synonyms to point at“new” sub-warehouse (upon successful completion) • Email rebuild results

  38. Indexing Mechanics • Table-driven primary key and index list • Uses table storage size information for tablespace and other storage clause information • Dynamically build Alter table and Create Index Statements • Same logging techniques built-in • Includes actual SQL statement • Error trapping and continuation built-in

  39. Error Logging - Continuation select id, log_date, proc, log_type, message from warehouse_op_log where id between 1576388 and 1576390 order by id; 1576388 08/01/03 02:12:11 CREATE_PK_CONSTRAINTS I Create Primary Key STATIONHIST.PK_STATIONHIST(CRIBBIN,SITENO) 1576389 08/01/03 02:12:23 CREATE_PK_CONSTRAINTS E ORA-02437: cannot validate (CMPWH1.PK_STATIONHIST) - primary key violated (alter table CMPWH1.STATIONHIST ADD CONSTRAINT PK_STATIONHIST PRIMARY KEY (CRIBBIN,SITENO) USING INDEX TABLESPACE WH128K_INDX STORAGE(INITIAL 128K NEXT 128K) NOLOGGING) 1576390 08/01/03 02:12:23 CREATE_PK_CONSTRAINTS I Create Primary Key TASK.PK_TASK(TASKNO,SITENO)

  40. Error Logging - Propagation 1570901 08/01/03 02:00:02 REBUILD_WAREHOUSE S REBUILDING WAREHOUSE - CMPWH1 1576389 08/01/03 02:12:23 CREATE_PK_CONSTRAINTSE ORA-02437: cannot validate (CMPWH1.PK_STATIONHIST) - primary key violated (alter table CMPWH1.STATIONHIST ADD CONSTRAINT PK_STATIONHIST PRIMARY KEY (CRIBBIN,SITENO) USING INDEX TABLESPACE WH128K_INDX STORAGE(INITIAL 128K NEXT 128K) NOLOGGING) 1576418 08/01/03 02:12:49 CREATE_INDEXESE ORA-20000:1 Error Occurred in CREATE_PK_CONSTRAINTS (see log for details). Last was: ORA-02437: cannot validate (CMPWH1.PK_STATIONHIST) - primary key violated 1576471 08/01/03 02:20:31 REBUILD_WAREHOUSEE ORA-20000: 1 Error Occurred in CREATE_INDEXES (see log for details). Last was: ORA-20000: 1 Error Occurred in CREATE_PK_CONSTRAINTS (see log for details). Last was: ORA-02437: cannot validate (CMPWH1.PK_STATIONHIST) - primary key violated 1576728 08/01/03 02:41:58 REBUILD_WAREHOUSE C REBUILT WAREHOUSE - CMPWH1 (1 error encountered)

  41. Warehouse Build Steps • Drop sub-warehouse objects • Cycle through snapshots and build / modifysub-warehouse tables • Bulk insert each snapshot table into sub-warehouse • Create primary keys and indexes • Compute statistics • Switch master warehouse synonyms to point at“new” sub-warehouse (upon successful completion) • Email rebuild results

  42. Warehouse Build Steps • Drop sub-warehouse objects • Cycle through snapshots and build / modifysub-warehouse tables • Bulk insert each snapshot table into sub-warehouse • Create primary keys and indexes • Compute statistics • Switch master warehouse synonyms to point at“new” sub-warehouse (upon successful completion) • Email rebuild results

  43. Positive Email Response • Email sent upon success and failure • Catastrophic failure would result in no email • Subject clearly states warehouse status • Table-driven recipient list • Uses dbms_smtp package • Build higher-level packaged functionality • Used for multiple purposes (such as interface failures) • Email includes statistics regarding • Last snapshot refresh times (system tables) • Roll-up timing statistics (warehouse operation log) • Warehouse table row counts (warehouse operation log)

  44. Status Email Subject and Body WARNING: Snapshots NOT Healthy, but Warehouse Redirected - CMTWH1 SUCCESS: Production Warehouse Redirected - CMPWH2 The CribMaster Oracle Data Warehouse was successfully rebuild and activated Last Refresh of Participating Sites: MESA 07/31/03 01:48:17 KELLY 07/30/03 23:34:45 CECIL 07/30/03 23:14:42 . . . Attachment shows statistics

  45. Email Statistics Attachment • Timing statistics for all major warehouse build steps 8/11/2003 2:00:06 REBUILDING WAREHOUSE - CMPWH2 8/11/2003 2:00:16 Dropping all tables in Warehouse started 8/11/2003 2:00:31 Verify Tables for all Schemas started 8/11/2003 2:02:11 Insert Table Data for all Schemas started 8/11/2003 2:08:02 Create all Primary key constraints started 8/11/2003 2:12:35 Create all indexes started 8/11/2003 2:20:12 Counting Rows in CMPWH2 started 8/11/2003 2:20:13 Analyze all tables in Warehouse started 8/11/2003 2:41:52 Starting with REVOKE-CMPWH2 8/11/2003 2:42:01 Completed REVOKE-CMPWH2

  46. Email Statistics Attachment • Row count statistics for all warehouse tables TOTAL TABLE ROWS: 15288750 Row Counts for Significant Tables (rows over 100,000) 8/11/2003 SERIALSTATUSHISTORY 7235395 8/11/2003 TRANS 2945618 8/11/2003 STATIONHISTORY 669932 8/11/2003 STATION 658313 8/11/2003 ALTVENDOR 549365 8/11/2003 ITEMSERIAL 546443 8/11/2003 EMPLOYEE 486319 8/11/2003 INVENTRY 359461 . . .

  47. Failure Email ERROR: Production Warehouse Rebuild Failed - CMPWH2(All Snapshots ARE Healthy) There was a problem rebuilding the warehouse. Cutover did NOT occur.Please review the warehouse_op_log for problems, correct them, and establish the correct new warehouse.Statement Executed was (BEGIN CMPWH2.WH_UTIL_MASTER.REBUILD_WAREHOUSE('CMPWH2'); END;)Error was (ORA-20000: 2 Error(s) Occurred in REBUILD_WAREHOUSE (see log for details).Last was: ORA-20000: 6 Error(s) Occurred in INSERT_ALL_DATA (see log for details).Last was: ORA-20000: 105 Error(s) Occurred in INSERT_SCHEMA_DATA (see log for details).Last was: ORA-20000: 1 Error(s) Occurred in INSERT_SITE_TABLE_DATA (see log for details).Last was: ORA-00942: table or view does not exist)

  48. What to Take Away • Work with COTS Supplier, Avoid Customizations • Data needs in a warehouse • Don’t Over-Analyze, disk is cheap (now) • Plan in failure contingency in Build / Rebuild process • Table-drive the implementation • Decisions and Structures change with time • Provide Visibility of the Activity • Ongoing logging and positive results notification • Bulk Insert • Communications key • DBA, Developers, COTS Supplier, and Users

  49. Forever New Frontiers Robert Corfman TSMS System Architect Robert.a.corfman@Boeing.com Please turn in your evaluation form

More Related