1 / 35

Chapter 17 Data Warehousing, Archival, and Repositories

Chapter 17 Data Warehousing, Archival, and Repositories. 데이터베이스연구실 석사 3 학기 박보영. Introduction. Common applications of XML in the enterprise Large-scale transaction system Content management system Persistence of information using XML

Download Presentation

Chapter 17 Data Warehousing, Archival, and Repositories

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 17Data Warehousing, Archival, and Repositories 데이터베이스연구실 석사3학기 박보영

  2. Introduction • Common applications of XML in the enterprise • Large-scale transaction system • Content management system • Persistence of information using XML • Traditional strategies for these processes • How adding XML to the mix XML improve

  3. In this chapter • Data warehousing • How data warehousing work • How XML may be used to streamline the transaction and warehousing process • Data archival For most system • Data loses its relevance over time • Moved off the system when some threshold is reached • Typically want to set this data aside in some form so that, should we need to refer to it later • Can retrieve the data with the minimum of effort • How XML may be used to make this process easier • Data repositories • How XML may be used to persist detail-only information in a way that makes the relational database perform better, while taking advantage of XML technologies(such as XSLT) to present the detail information when necessary “All the examples in this chapter are designed to work with SQL Server 6.x+, ADO 2.5+, VBScript(IE 4.0+)”

  4. Data Warehousing • One of the problems often faced by data architects is to design a system • Detail information or transactionsquick retrieval • Query and summarize that dataeasily and efficiently In this section • Define data warehousing • Discuss the concepts that drive data warehouse design • Take a look XML can facilitate the data warehousing process • Look at examples of XML in use in a data warehouse

  5. The Two Roles of Data In an enterprise-level data solution, the data in the database plays two roles: • Information gathering • Querying and summarization of data • Detailed information gathering • The first use of the database is to gather data from external sources(such as other databases, XML, or simple delimited text files) • There are some things that remain consistent across all implementations that gather data: • Detail oriented • Write-heavy • Transacted • Space-conscious • Heavily normalized

  6. Infromation Querying and Summarization • The other use of database is to provide the ability to query the data and summarize it to extrapolate trends, volumes, and other useful information from the details. • The specific mechanisms will vary from implementation to implementation, but there are some constants: • Summary-oriented • Read-only • Results-conscious • Less normalized

  7. The Traditional Solution (ch17_ex01.sql) • The traditional approach to designing a relational database to support a platform is to design one database to perform both data acquisition function and querying/summarization functions CREATE TABLE Customer ( CustomerKey integer PRIMARY KEY, Name varchar(50), Address varchar(50), City varchar(30), State char(2), PostalCode varchar(10)) CREATE TABLE shipMethod ( shipMethodKey integer PRIMARY KEY, shipMethod varchar(5)) INSERT shipMethod (shipMethodKey, shipMethod) VALUES (1, 'FedEx') INSERT shipMethod (shipMethodKey, shipMethod) VALUES (2, 'USPS') INSERT shipMethod (shipMethodKey, shipMethod) VALUES (3, 'UPS') CREATE TABLE Invoice ( InvoiceKey integer PRIMARY KEY, invoiceDate datetime, shipDate datetime, shipMethodKey integer CONSTRAINT FK_Invoice_shipMethodKey FOREIGN KEY (shipMethodKey) REFERENCES shipMethod (shipMethodKey), CustomerKey integer CONSTRAINT FK_Invoice_Customer FOREIGN KEY (CustomerKey) REFERENCES Customer (CustomerKey)) CREATE INDEX ix_Invoice_invoiceDate ON Invoice (invoiceDate) CREATE INDEX ix_Invoice_shipDate ON Invoice (shipDate) CREATE INDEX ix_Invoice_CustomerKey ON Invoice (CustomerKey) CREATE TABLE Part ( PartKey integer PRIMARY KEY, name varchar(20), size varchar(10) NULL, color varchar(10) NULL) CREATE TABLE LineItem ( LineItemKey integer PRIMARY KEY, InvoiceKey integer CONSTRAINT FK_LineItem_Invoice FOREIGN KEY (InvoiceKey) REFERENCES Invoice (InvoiceKey), PartKey integer CONSTRAINT FK_LineItem_Part FOREIGN KEY (PartKey) REFERENCES Part (PartKey), Quantity integer, Price float) CREATE INDEX ix_LineItem_PartKey ON LineItem (PartKey) CREATE INDEX ix_LineItem_InvoiceKey ON LineItem (InvoiceKey)

  8. Customer Part LineItem Invoice ShipMethod - When run, this script creates the following table structure: • This set of tables supports the addition of invoice • Space is conserved for these tables where possible by normalizing out Part and Customer • This normalization also facilitates querying and reporting on Part and Customer • however, there are a couple of issues with using this type of dual-purpose structure • the system is well tuned to either data entry or reporting • serious danger of wait states for reporting and querying due to the locking being performed by the data entry

  9. Many databases try to get around the problem by adding structures that will be used to drive reporting • In this table • separate from the transaction tables • have all the appropriate indices to allow data to be queried and summarized from those tables in any way the business rules might call for • for example, add a MonthlyPartTotal table (ch17_ex02.sql): • this is a step on the right direction • another problems CREATE TABLE MonthlyPartTotal ( summaryMonth tinyint, summaryYear smallint, PartKey integer CONSTRAINT FK_MPT_PartKey FOREIGN KEY (PartKey) REFERENCES Part (PartKey), Quantity integer)

  10. CREATE TRIGGER UpdateMPT ON LineItem FOR INSERT, UPDATE AS BEGIN IF (SELECT COUNT(*) FROM MonthlyPartTotal, inserted, Invoice WHERE summaryMonth = DATEPART(mm, Invoice.invoiceDate) AND summaryYear = DATEPART(yyyy, Invoice.invoiceDate) AND inserted.InvoiceKey = Invoice.InvoiceKey AND MonthlyPartTotal.PartKey = inserted.PartKey) > 0 UPDATE MonthlyPartTotal SET Quantity = MonthlyPartTotal.Quantity + inserted.Quantity FROM inserted, Invoice WHERE summaryMonth = DATEPART(mm, Invoice.invoiceDate) AND summaryYear = DATEPART(yyyy, Invoice.invoiceDate) AND inserted.InvoiceKey = Invoice.InvoiceKey AND MonthlyPartTotal.PartKey = inserted.PartKey ELSE INSERT MonthlyPartTotal (summaryMonth, summaryYear, PartKey, Quantity) SELECT DATEPART(mm, Invoice.invoiceDate), DATEPART(yyyy, Invoice.invoiceDate), inserted.PartKey, inserted.Quantity FROM inserted, Invoice WHERE inserted.InvoiceKey = Invoice.InvoiceKey IF (SELECT COUNT(*) FROM deleted) > 0 UPDATE MonthlyPartTotal SET Quantity = MonthlyPartTotal.Quantity - deleted.Quantity FROM deleted, Invoice WHERE summaryMonth = DATEPART(mm, Invoice.invoiceDate) AND summaryYear = DATEPART(yyyy, Invoice.invoiceDate) AND deleted.InvoiceKey = Invoice.InvoiceKey AND MonthlyPartTotal.PartKey = deleted.PartKey END (Ch17_03.sql)

  11. The Data Warehousing Solution • On-Line Transaction Processing(OLTP) databases – the information gatherers • On Line Analytical Processing(OLAP) databases – the query and summarization handlers • Parts that make up an OLAP database • The role of XML in improving the function of OLAP databases

  12. On-Line Transaction Processing(OLTP) • The gathering of detail information is often referred to as OLTP • Handle all data gathering processes • Design tables to support the acquisition of transactional data • The database is an normalized ad possible, with the spedific table or tables being kept as small as possible to reduce insert time and disk consumption • Data archival strategies - In place to make OLTP database does not grow too large

  13. Customer Part LineItem Invoice ShipMethod CONSTRAINT FK_Invoice_shipMethodKey FOREIGN KEY (shipMethodKey) REFERENCES shipMethod (shipMethodKey), CustomerKey integer CONSTRAINT FK_Invoice_Customer FOREIGN KEY (CustomerKey) REFERENCES Customer (CustomerKey)) CREATE TABLE Part ( PartKey integer PRIMARY KEY, name varchar(20), size varchar(10) NULL, color varchar(10) NULL) CREATE TABLE LineItem ( LineItemKey integer PRIMARY KEY, InvoiceKey integer CONSTRAINT FK_LineItem_Invoice FOREIGN KEY (InvoiceKey) REFERENCES Invoice (InvoiceKey), PartKey integer CONSTRAINT FK_LineItem_Part FOREIGN KEY (PartKey) REFERENCES Part (PartKey), Quantity integer, Price float) CREATE TABLE Customer ( CustomerKey integer PRIMARY KEY, Name varchar(50), Address varchar(50), City varchar(30), State char(2), PostalCode varchar(10)) CREATE TABLE shipMethod ( shipMethodKey integer PRIMARY KEY, shipMethod varchar(5)) INSERT shipMethod (shipMethodKey, shipMethod) VALUES (1, 'FedEx') INSERT shipMethod (shipMethodKey, shipMethod) VALUES (2, 'USPS') INSERT shipMethod (shipMethodKey, shipMethod) VALUES (3, 'UPS') CREATE TABLE Invoice ( InvoiceKey integer PRIMARY KEY, invoiceDate datetime, shipDate datetime, shipMethodKey integer • OLTP database structure (ch17_ex04.sql):

  14. On-Line Analytical processing(OLAP) • Database intended to support querying and summarization • Designed with one goal in mind – the querying and summarization of detail data by any number of specifics defined in the business rules for our system • Create our tables to support the querying and retrieval of this information, with the ability to rapidly insert information(lower priority) • Indexing technologies designed specifically for OLAP querying  optimize(query and report)

  15. Parts of an OLAP Database • Fact Tables-where the information we wish to report on is stored • Measure Tables-where you store the measures used to do the reporting • Schema-where the two types of table above interact to give you your reports • Fact Tables • Contain the data that we are planning to report on, at the lowest level of granularity we will need to access • Denormalizationall facts together into oe table Fact table creation script for the inventory control team (ch17-ex05.sql) Fact table creation script for the executive team (ch17-ex06.sql) CREATE TABLE factInvoicePart ( InvoiceKey integer PRIMARY KEY IDENTITY, CustomerKey integer, ShipDateKey integer, ShipMethodKey integer, PartKey integer, Quantity integer, Price float) CREATE TABLE factDailyTotal ( DailyTotalKey integer PRIMARY KEY IDENTITY, InvoiceDate integer, partKey integer, partCount integer, partUnitPrice float)

  16. Measure/Dimension Tables • Parameters(use in the WHERE clause in query) (Ch17_ex07.sql) CREATE TABLE measureCustomer ( CustomerKey integer PRIMARY KEY, Name varchar(50), Address varchar(50), City varchar(30), State char(2), PostalCode varchar(10)) CREATE TABLE measureShipDate ( ShipDateKey integer PRIMARY KEY, Month tinyint, Day tinyint, Year smallint) CREATE TABLE measureShipMethod ( shipMethodKey integer PRIMARY KEY, shipMethod varchar(5)) CREATE TABLE measurePart ( PartKey integer PRIMARY KEY, name varchar(20), size varchar(10) NULL, color varchar(10) NULL)

  17. Schema • Composed of the tables in our database • Joined together by the foreign keys that relate the individual tables together • Two types of schema that are normally used when designing an OLAP database: • Star schema • Snowflake schema

  18. measureCustomer factInvoicePart measureShipDate measureShipMethod This script creates the structure seen below: measurePart Star schema(ch17_ex08.sql) CREATE TABLE measureCustomer ( CustomerKey integer PRIMARY KEY, Name varchar(50), Address varchar(50), City varchar(30), State char(2), PostalCode varchar(10)) CREATE TABLE measureShipDate ( ShipDateKey integer PRIMARY KEY, ShipMonth tinyint, ShipDay tinyint, ShipYear smallint) CREATE TABLE measureShipMethod ( shipMethodKey integer PRIMARY KEY, shipMethod varchar(5)) CREATE TABLE measurePart ( PartKey integer PRIMARY KEY, name varchar(20), size varchar(10) NULL, color varchar(10) NULL) CREATE TABLE factInvoicePart ( InvoiceKey integer PRIMARY KEY IDENTITY, CustomerKey integer CONSTRAINT fk_fact_Customer FOREIGN KEY (CustomerKey) REFERENCES measureCustomer (CustomerKey), ShipDateKey integer CONSTRAINT fk_fact_ShipDate FOREIGN KEY (ShipDateKey) REFERENCES measureShipDate (ShipDateKey), ShipMethodKey integer, CONSTRAINT fk_fact_ShipMethod FOREIGN KEY (ShipMethodKey) REFERENCES measureShipMethod (shipMethodKey), PartKey integer, CONSTRAINT fk_fact_Part FOREIGN KEY (PartKey) REFERENCES measurePart (PartKey), Quantity integer, Price float) CREATE INDEX ix_fact_Customer ON factInvoicePart (CustomerKey) CREATE INDEX ix_fact_ShipDate ON factInvoicePart (ShipDateKey) CREATE INDEX ix_fact_ShipMethod ON factInvoicePart (shipMethodKey) CREATE INDEX ix_fact_Part ON factInvoicePart (PartKey)

  19. CREATE TABLE factInvoicePart ( InvoiceKey integer PRIMARY KEY IDENTITY, CustomerKey integer CONSTRAINT fk_fact_Customer FOREIGN KEY (CustomerKey) REFERENCES measureCustomer (CustomerKey), ShipDateKey integer CONSTRAINT fk_fact_ShipDate FOREIGN KEY (ShipDateKey) REFERENCES measureShipDate (ShipDateKey), ShipMethodKey integer, CONSTRAINT fk_fact_ShipMethod FOREIGN KEY (ShipMethodKey) REFERENCES measureShipMethod (shipMethodKey), PartKey integer, CONSTRAINT fk_fact_Part FOREIGN KEY (PartKey) REFERENCES measurePart (PartKey), Quantity integer, Price float) CREATE INDEX ix_fact_Customer ON factInvoicePart (CustomerKey) CREATE INDEX ix_fact_ShipDate ON factInvoicePart (ShipDateKey) CREATE INDEX ix_fact_ShipMethod ON factInvoicePart (shipMethodKey) CREATE INDEX ix_fact_Part ON factInvoicePart (PartKey) CREATE INDEX ix_measure_Month ON measureShipDate (shipMonthKey) CREATE INDEX ix_measure_Year ON measureShipMonth (shipYearKey) Snowflake schema(ch17_ex09.sql) CREATE TABLE measureCustomer ( CustomerKey integer PRIMARY KEY, Name varchar(50), Address varchar(50), City varchar(30), State char(2), PostalCode varchar(10)) CREATE TABLE measureShipYear ( ShipYearKey integer PRIMARY KEY, ShipYear smallint) CREATE TABLE measureShipMonth ( ShipMonthKey integer PRIMARY KEY, ShipMonth tinyint, ShipYearKey integer CONSTRAINT fk_measure_ShipYear FOREIGN KEY (ShipYearKey) REFERENCES measureShipYear (ShipYearKey)) CREATE TABLE measureShipDate ( ShipDateKey integer PRIMARY KEY, ShipDay tinyint, ShipMonthKey integer CONSTRAINT fk_measure_ShipMonth FOREIGN KEY (ShipMonthKey) REFERENCES measureShipMonth (ShipMonthKey)) CREATE TABLE measureShipMethod ( shipMethodKey integer PRIMARY KEY, shipMethod varchar(5)) CREATE TABLE measurePart ( PartKey integer PRIMARY KEY, name varchar(20), size varchar(10) NULL, color varchar(10) NULL)

  20. measureShipMethod measurePart This SQL script creates the following set of tables: measureCustomer factInvoicePart measureShipDate measureShipMonth measureShipYear

  21. measure Customer factInvoicePart measureShipDate measureShipMethod measurePart Cubes • To better facilitate the query and retrieval of OLAP data – each star schema or snowflake schema creates a cube(the detail information is aggregated along each of the measures) • Example-star schema factInvoicePart table-100,000 invoice(shipping method) Add another dimension-customer

  22. Building your Fact Tables Two step: • Pulling the data out of the OLTP database • updating the OLAP database with that data • In order to perform rapidly update • Native export forms are often used (such as BCP files for SQL Server) • Loaded into the OLAP database • Some problems • The most notable of which is the relative fragility of the export and import code • If the OLTP or OLAP database change, this code will need to be revisited

  23. The Role of XML • Make it easier to update the OLAP database from the OLTP database • Effective way to handle OLTP data • Using XML for OLAP Update Data • Using XML for OLTP Data • Using XML for OLAP Update Data • rather than building specialized import routines for all the different formats, if you build one XML importer and than building leverage the data provider’s ability to export in XML • Can save a lot of aggravation and coding time

  24. (Ch17_ex09.xml) (Ch17_ex09.dtd) <!ELEMENT InvoiceBulk (Invoice+)> <!ELEMENT Invoice (LineItem+)> <!ATTLIST Invoice CustomerKey CDATA #REQUIRED ShipDate CDATA #REQUIRED ShipMethodKey CDATA #REQUIRED> <!ELEMENT Part EMPTY> <!ATTLIST Part PartKey CDATA #REQUIRED Quantity CDATA #REQUIRED Price CDATA #REQUIRED> <?xml version="1.0"?> <!DOCTYPE listing SYSTEM "ch27_ex09.dtd" > <InvoiceBulk> <Invoice CustomerKey="17" ShipDate="10/17/2000" ShipMethodKey="1"> <Part PartKey="33" Quantity="17" Price="0.20" /> <Part PartKey="29" Quantity="13" Price="0.15" /> </Invoice> <Invoice CustomerKey="12" ShipDate="10/11/2000" ShipMethodKey="2"> <Part PartKey="31" Quantity="19" Price="0.10" /> <Part PartKey="29" Quantity="17" Price="0.15" /> </Invoice> </InvoiceBulk>

  25. Function using XSLT(ch17_ex10.xsl): <?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="text" /> <xsl:template match="/"> <xsl:for-each select="InvoiceBulk/Invoice/Part"> <xsl:text>INSERT factInvoicePart ( CustomerKey, ShipDateKey, ShipMethodKey, PartKey, Quantity, Price) SELECT </xsl:text> <xsl:value-of select="../@CustomerKey" /><xsl:text>, </xsl:text> <xsl:text>measureShipDate.ShipDateKey, </xsl:text> <xsl:value-of select="../@ShipMethodKey" /><xsl:text>, </xsl:text> <xsl:value-of select="@PartKey" /><xsl:text>, </xsl:text> <xsl:value-of select="@Quantity" /><xsl:text>, </xsl:text> <xsl:value-of select="@Price" /> <xsl:text> FROM measureShipDate</xsl:text> <xsl:text> WHERE DATEPART(mm, '</xsl:text> <xsl:value-of select="../@ShipDate" /> <xsl:text>')=measureShipDate.shipMonth AND DATEPART(dd, '</xsl:text> <xsl:value-of select="../@ShipDate" /> <xsl:text>')=measureShipDate.shipDay AND DATEPART(yyyy, '</xsl:text> <xsl:value-of select="../@ShipDate" /> <xsl:text>')=measureShipDate.shipYear</xsl:text> <xsl:text>&#x0d;&#x0a;GO&#x0d;&#x0a;&#x0d;&#x0a;</xsl:text> </xsl:for-each> </xsl:template> </xsl:stylesheet>

  26. ch17_ex10.xsl -> Ch17_ex09.xml => ch17_10a.sql *참고< http://www.jclark.com/xml/xt.html > INSERT factInvoicePart ( CustomerKey, ShipDateKey, ShipMethodKey, PartKey, Quantity, Price) SELECT 17, measureShipDate.ShipDateKey, 1, 33, 17, 0.20 FROM measureShipDate WHERE DATEPART(mm, '10/17/2000')=measureShipDate.shipMonth AND DATEPART(dd, '10/17/2000')=measureShipDate.shipDay AND DATEPART(yyyy, '10/17/2000')=measureShipDate.shipYear GO INSERT factInvoicePart ( CustomerKey, ShipDateKey, ShipMethodKey, PartKey, Quantity, Price) SELECT 17, measureShipDate.ShipDateKey, 1, 29, 13, 0.15 FROM measureShipDate WHERE DATEPART(mm, '10/17/2000')=measureShipDate.shipMonth AND DATEPART(dd, '10/17/2000')=measureShipDate.shipDay AND DATEPART(yyyy, '10/17/2000')=measureShipDate.shipYear GO INSERT factInvoicePart ( CustomerKey, ShipDateKey, ShipMethodKey, PartKey, Quantity, Price) SELECT 12, measureShipDate.ShipDateKey, 2, 31, 19, 0.10 FROM measureShipDate WHERE DATEPART(mm, '10/11/2000')=measureShipDate.shipMonth AND DATEPART(dd, '10/11/2000')=measureShipDate.shipDay AND DATEPART(yyyy, '10/11/2000')=measureShipDate.shipYear GO INSERT factInvoicePart ( CustomerKey, ShipDateKey, ShipMethodKey, PartKey, Quantity, Price) SELECT 12, measureShipDate.ShipDateKey, 2, 29, 17, 0.15 FROM measureShipDate WHERE DATEPART(mm, '10/11/2000')=measureShipDate.shipMonth AND DATEPART(dd, '10/11/2000')=measureShipDate.shipDay AND DATEPART(yyyy, '10/11/2000')=measureShipDate.shipYear GO

  27. (Ch17_ex10b.sql): INSERT measureCustomer ( CustomerKey, [Name], Address, City, State, PostalCode) VALUES ( 12, 'Homer J. Simpson', '742 Evergreen Terrace', 'Springfield', 'KY', '12345') GO INSERT measureCustomer ( CustomerKey, [Name], Address, City, State, PostalCode) VALUES ( 17, 'Kevin B. Williams', '744 Evergreen Terrace', 'Springfield', 'KY', '12345') GO INSERT measureShipMethod ( shipMethodKey, shipMethod) VALUES ( 1, 'Fedex') GO INSERT measureShipMethod ( shipMethodKey, shipMethod) VALUES ( 2, 'USPS') GO INSERT measureShipMethod ( shipMethodKey, shipMethod) VALUES ( 3, 'UPS') GO INSERT measurePart ( PartKey, [name], [size], color) VALUES ( 31, 'grommets', '3 in.', 'blue') GO INSERT measurePart ( PartKey, [name], [size], color) VALUES ( 29, 'sprockets', '2 in.', 'silver') GO INSERT measurePart ( PartKey, [name], [size], color) VALUES ( 33, 'brackets', '1 in.', 'red') GO INSERT measureShipDate ( ShipDateKey, ShipMonth, ShipDay, ShipYear) VALUES ( 1, 10, 11, 2000 ) GO INSERT measureShipDate ( ShipDateKey, ShipMonth, ShipDay, ShipYear) VALUES ( 2, 10, 17, 2000 ) GO

  28. measure Customer factInvoicePart measureShipDate measureShipMethod measurePart Result table structure • Run ch17_ex08.sql against a sample database to create the tables in the firdt place • Transformed ch17_ex10.xml with ch17_ex10.xsl to produce the output script ch17_ex10a.sql • Run ch17_ex10b.sql to populate tables with initial data • Run ch17_ex10a.sql to add data to populate table factInvoicePart with data

  29. Using XML for OLAP Data • OLTP systems typically access one discrete transaction at a time • Locking can be avoid • XML technology can be easily leveraged

  30. (ch17_ex12.dtd): <!ELEMENT Invoice (LineItem+)> <!ATTLIST Invoice CustomerKey CDATA #REQUIRED ShipDate CDATA #REQUIRED ShipMethodKey CDATA #REQUIRED> <!ELEMENT Part EMPTY> <!ATTLIST Part PartKey CDATA #REQUIRED Quantity CDATA #REQUIRED Price CDATA #REQUIRED> RDBMS structure (ch17_ex11.sql): CREATE TABLE Customer ( CustomerKey integer PRIMARY KEY, Name varchar(50), Address varchar(50), City varchar(30), State char(2), PostalCode varchar(10)) CREATE TABLE shipMethod ( shipMethodKey integer PRIMARY KEY, shipMethod varchar(5)) INSERT shipMethod (shipMethodKey, shipMethod) VALUES (1, 'FedEx') INSERT shipMethod (shipMethodKey, shipMethod) VALUES (2, 'USPS') INSERT shipMethod (shipMethodKey, shipMethod) VALUES (3, 'UPS') CREATE TABLE Part ( PartKey integer PRIMARY KEY, name varchar(20), size varchar(10) NULL, color varchar(10) NULL) (ch17_ex12.xml): <?xml version="1.0"?> <!DOCTYPE listing SYSTEM "ch27_ex12.dtd" > <InvoiceBulk> <Invoice CustomerKey="17" ShipDate="10/17/2000" ShipMethodKey="1"> <Part PartKey="33" Quantity="17" Price="0.20" /> <Part PartKey="29" Quantity="13" Price="0.15" /> </Invoice> </InvoiceBulk>

  31. CONSTRAINT FK_Invoice_shipMethodKey FOREIGN KEY (shipMethodKey) REFERENCES shipMethod (shipMethodKey), CustomerKey integer CONSTRAINT FK_Invoice_Customer FOREIGN KEY (CustomerKey) REFERENCES Customer (CustomerKey)) CREATE TABLE Part ( PartKey integer PRIMARY KEY, name varchar(20), size varchar(10) NULL, color varchar(10) NULL) CREATE TABLE LineItem ( LineItemKey integer PRIMARY KEY, InvoiceKey integer CONSTRAINT FK_LineItem_Invoice FOREIGN KEY (InvoiceKey) REFERENCES Invoice (InvoiceKey), PartKey integer CONSTRAINT FK_LineItem_Part FOREIGN KEY (PartKey) REFERENCES Part (PartKey), Quantity integer, Price float) CREATE TABLE Customer ( CustomerKey integer PRIMARY KEY, Name varchar(50), Address varchar(50), City varchar(30), State char(2), PostalCode varchar(10)) CREATE TABLE shipMethod ( shipMethodKey integer PRIMARY KEY, shipMethod varchar(5)) INSERT shipMethod (shipMethodKey, shipMethod) VALUES (1, 'FedEx') INSERT shipMethod (shipMethodKey, shipMethod) VALUES (2, 'USPS') INSERT shipMethod (shipMethodKey, shipMethod) VALUES (3, 'UPS') CREATE TABLE Invoice ( InvoiceKey integer PRIMARY KEY, invoiceDate datetime, shipDate datetime, shipMethodKey integer Data Archival • Classical Approches • What about the LineItem data? • What about the other tables in the database? • What about human readability? • OLTP database structure (ch17_ex04.sql):

  32. Using XML for Data Archival (Ch17_ex13.dtd) (Ch17_ex13.xml) <!ELEMENT Invoice (Customer, LineItem+)> <!ATTLIST Invoice invoiceDate CDATA #REQUIRED shipDate CDATA #REQUIRED shipMethod (USPS | UPS | FedEx) #REQUIRED> <!ELEMENT Customer EMPTY> <!ATTLIST Customer Name CDATA #REQUIRED Address CDATA #REQUIRED City CDATA #REQUIRED State CDATA #REQUIRED PostalCode CDATA #REQUIRED> <!ELEMENT LineItem (Part)> <!ATTLIST LineItem Quantity CDATA #REQUIRED Price CDATA #REQUIRED> <!ELEMENT Part EMPTY> <!ATTLIST Part Name CDATA #REQUIRED Size CDATA #REQUIRED Color CDATA #REQUIRED> <?xml version="1.0"?> <!DOCTYPE listing SYSTEM "ch27_ex13.dtd" > <Invoice invoiceDate="10/17/2000" shipDate="10/20/2000" shipMethod="USPS"> <Customer Name="Homer J. Simpson" Address="742 Evergreen Terrace" City="Springfield" State="KY" postalCode="12345" /> <LineItem Quantity="12" Price="0.10"> <Part Color="Blue" Size="3-inch" Name="Grommets" /> </LineItem> <LineItem Quantity="12" Price="0.10"> <Part Color="Blue" Size="3-inch" Name="Grommets" /> </LineItem> </Invoice>

  33. Data Repositories • Classical Approaches (Ch17_ex14a.sql) (Ch17_ex14b.sql) CREATE TABLE Property ( PropertyKey integer PRIMARY KEY IDENTITY, NumberOfBedrooms tinyint, HasSwimmingPool bit, Address varchar(50), City varchar(30), State char(2), PostalCode varchar(10), SellerName varchar(50), SellerAgent varchar(50)) CREATE TABLE Property ( PropertyKey integer PRIMARY KEY IDENTITY, NumberOfBedrooms tinyint, HasSwimmingPool bit) CREATE TABLE PropertyDetail ( PropertyKey integer PRIMARY KEY, Address varchar(50), City varchar(30), State char(2), PostalCode varchar(10), SellerName varchar(50), SellerAgent varchar(50))

  34. Using XML for Data Repositories (Ch17_ex15.dtd) (Ch17_ex15.sql) <!ELEMENT Property EMPTY> <!ATTLIST Property NumberOfBedrooms CDATA #REQUIRED HasSwimmingPool CDATA #REQUIRED Address CDATA #REQUIRED City CDATA #REQUIRED State CDATA #REQUIRED PostalCode CDATA #REQUIRED SellerName CDATA #REQUIRED SellerAgent CDATA #REQUIRED> CREATE TABLE Property ( PropertyKey integer PRIMARY KEY IDENTITY, NumberOfBedrooms tinyint, HasSwimmingPool tinyint, DocumentFile varchar(50)) There are a number of advantages to using XML for data repositories: • Greater flexibility in providers • Faster querying and summarization • More presentation options • Fewer locking concerns

  35. Summary • How XML may be used to help create a data warehouse • The benefits you can realize by using XML as your archival strategy • How XML can improve the functionality of your data repository

More Related