1 / 58

XML Salman Azhar

XML Salman Azhar. Semi-structured Data XML (Extensible Markup Language) Well-formed and Valid XML Document Type Definitions IDs and IDREFs.

Download Presentation

XML Salman Azhar

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XMLSalman Azhar Semi-structured Data XML (Extensible Markup Language) Well-formed and Valid XML Document Type Definitions IDs and IDREFs These slides use some figures, definitions, and explanations from Elmasri-Navathe’s Fundamentals of Database Systemsand Molina-Ullman-Widom’s Database Systems Salman Azhar: Database Systems

  2. Framework • Information Integration : • Making databases from various places work as one. • Semi-structured Data : • A new data model designed to cope with problems of information integration. • XML : • A standard language for describing semi-structured data schemas and representing data. Salman Azhar: Database Systems

  3. 1. Information Integration • Generally databases in an enterprises have: • Several underlying database management systems • Oracle, MS SQL Server, DB2, • Informix, Sybase (SQL Server), MS Access, etc. • Several underlying database schemas • Information in an employee table can contain • Employee Name, SSN, DOB, title, hrsPerWeek.modifiedTime, modifiedBy • Employee Name, SSN, DOB, title, degree, createTime, createBy • Employee Name, SSN, DOB, title, salary,modifiedTime, modifiedBy, createTime, createBy Salman Azhar: Database Systems

  4. 2. Semi-structured Data • A new data model designed to cope with problems of information integration • Accommodates of different DBMS • Oracle, MS SQL Server, DB2, • Informix, Sybase (SQL Server), MS Access, etc. • Integrates different schemas • Employee Name, SSN, DOB, title, hrsPerWeek,modifiedTime, modifiedBy • Employee Name, SSN, DOB, title,degree, createTime, createBy • Employee Name, SSN, DOB, title,salary,createTime, createBy, modifiedTime, modifiedBy Salman Azhar: Database Systems

  5. 3. XML • A standard language for describing semi-structured data schemas and representing data. Salman Azhar: Database Systems

  6. The Information-Integration Problem • Major bottleneck in enterprise application integration • For example… • Hewlett Packard split into HP and Agilent • Need to separate data into different destinations • HP bought Compaq • Need to integrate data from different sources Salman Azhar: Database Systems

  7. The Information-Integration Problem • Related data exists in many places and could, in principle, work together. • But different databases differ in: • Model • relational, object-oriented? • Schema • normalized/denormalized? • Terminology • are consultants employees? Retirees? Subcontractors? • Conventions • meters versus feet? Salman Azhar: Database Systems

  8. Example • Consider merger of two stores in a Mall • may be some overlap in the products sold • but the databases are different Salman Azhar: Database Systems

  9. Example • Each company has a database • One may use a relational DBMS • the other keeps the data in an MS-Word document • One stores the phones of distributors, • the other does not • One distinguishes products in one department • the other doesn’t • One counts inventory by number of items, • the other by cases Salman Azhar: Database Systems

  10. Two Approaches to Integration • Warehousing • Makes a copy of the data • More developed of the two • Mediation • Creates a view of the data • Newer and less developed Salman Azhar: Database Systems

  11. Result User query Warehouse Diagram Warehouse Wrapper Wrapper Source 1 Source 2 Salman Azhar: Database Systems

  12. Result User query Query Result Result Query Query Result Query Result A Mediator Mediator Wrapper Wrapper Source 1 Source 2 Salman Azhar: Database Systems

  13. Warehousing • Make copies of the data sources at a central site and transform it to a common schema • Reconstruct data daily/weekly • Do not try to keep it more up-to-date than that. • Pro: • very well-developed • several commercial tools are available • Con: • data can be old since updates are expensive • 24-hour availability threatened by large data updates Salman Azhar: Database Systems

  14. Mediation • Create a view of all sources, as if they were integrated • Answers a view query by translating it to terminology of the sources and querying them • Pro: • Current data • Con: • Can be slow as it requires real time merger of different data sources • Lack of tools available Salman Azhar: Database Systems

  15. Result User query Warehouse Diagram Warehouse Wrapper Wrapper Source 1 Source 2 Salman Azhar: Database Systems

  16. Result User query Query Result Result Query Query Result Query Result A Mediator Mediator Wrapper Wrapper Source 1 Source 2 Salman Azhar: Database Systems

  17. Semi-structured: Motivation • Most effective approach to Information Integration: • Semi-structured Data Model • or Semi-structured Objects Salman Azhar: Database Systems

  18. Semi-structured: Motivation • Main limitation of Object-Oriented Models: • Object Models are Strongly Typed • Objects of a class have one structure only • Semi-structured approach solves this problem Salman Azhar: Database Systems

  19. Semi-structured Data • Purpose: • Represent data from independent sources more flexibly than • either relational • or object-oriented models Salman Azhar: Database Systems

  20. Semi-structured Data • Each object has a class of their own and properties are defined whatever labels are attached to that object • Properties mean • attributes, • relationships, • methods, etc. Salman Azhar: Database Systems

  21. Semi-structured Data • Think of objects • but with the type of each object is the objects its own business • not that of its “class” • Labels to indicate meaning of substructures Salman Azhar: Database Systems

  22. Semi-structured Graphs • Easy to think of Semi-structured data as Graphs • Nodes = objects • Labels on arcs = • attributes leading to a leaf node • relationships leading to another node Salman Azhar: Database Systems

  23. Semi-structured Graphs • Atomic values at leaf nodes • nodes with no arcs out • Flexibility: no restriction on… • labels out of a node • number of successors with a given label Salman Azhar: Database Systems

  24. The soda object for Pepsi (arc-in called soda; arc-out called name to Pepsi) Example: Data Graph Root object represents the entire DB. Often look like trees, but are not. root Notice a new kind of data. soda soda rest The restaurant object for KFC (arc-in called rest; arc-out labeled name to KFC) manf manf prize name PepsiCo name year award sellsAt Pepsi Sobe 2003 BestSeller name addr KFC Main St Salman Azhar: Database Systems

  25. Stage is Now Set for XML • A technology has application to different situations • foundations remain the same • applications changes Salman Azhar: Database Systems

  26. Extensible Markup Language (XML) • XML • uses tags for semantics (e.g., “this is an address”) • HTML • uses tags for formatting (e.g., “italic”), • Key idea: • create tag sets for a domain (e.g., genomics) • translate all data into properly tagged XML docs Salman Azhar: Database Systems

  27. Well-Formed and Valid XML • Well-Formed XML • allows you to invent your own tags • similar to labels in semi-structured data graph • Valid XML • involves a DTD (Document Type Definition) • DTD gives • a grammar for the use of labels • limits the set of labels our of node • the order and number of times a label occurs Salman Azhar: Database Systems

  28. Well-Formed XML • All XML documents have • Header • Body • Header defines • version • specifies that the document is in well-formed XML • Body can include • root tag • several properly matching tags Salman Azhar: Database Systems

  29. Well-Formed XML: Header • Start the document with a declaration • surrounded by <? … ?> . • Normal declaration for Well-Formed XML is: <? XML VERSION = “1.0” STANDALONE = “yes” ?> • Version indicates version number • Standalone = “yes” means no DTD • no DTD means well-formed XML Salman Azhar: Database Systems

  30. Well-Formed XML: Body • Body of document is a root tag surrounding nested tags. • Body can include: • several properly matching tags • (as in html structure) • special tag called root tag • can have a special meaning such as document type • or can be generic Salman Azhar: Database Systems

  31. Tags • Tags, as in HTML • are normally matched pairs, as • <BLAH> … </BLAH> • may be nested arbitrarily • some tags requiring no matching ending • such as <P> in HTML, are also permitted • however, we will not use these in examples Salman Azhar: Database Systems

  32. Example: Well-Formed XML Root tag RESTS surrounds the entire document <? XML VERSION = “1.0” STANDALONE = “yes” ?> <RESTS> <REST> <NAME>Taco Bell</NAME> <SODA><NAME>Pepsi</NAME> <PRICE>1.00</PRICE></ SODA> <SODA><NAME>Sobe</NAME> <PRICE>2.00</PRICE></SODA> </REST > <REST> … </REST > … </RESTS> One of several nested REST tags representing information about a single REST <NAME> tag specifies the REST name <SODA> tags have names and price for each Soda nested in <NAME> and <PRICE> tags Literal Data items are contained at the atomic level Salman Azhar: Database Systems

  33. XML and Semi-structured Data • Consider this… • Is Well-Formed XML documents with nested tags is exactly the same idea as trees of semi-structured data? • Tags • are the labels on edges • Nodes • represent data between matching tags • Parent-child relationship • is immediate nesting in XML Salman Azhar: Database Systems

  34. XML and Semi-structured Data • Semi-structured approach allows for non-tree structures • We shall see that XML also enables non-tree structures • mimics the semi-structured data model Salman Azhar: Database Systems

  35. Group Exercise • Convert the following into a Semi-structured representation <? XML VERSION = “1.0” STANDALONE = “yes” ?> <RESTS> <REST> <NAME>Taco Bell</NAME> <SODA><NAME>Pepsi</NAME> <PRICE>1.00</PRICE></ SODA> <SODA><NAME>Sobe</NAME> <PRICE>2.00</PRICE></SODA> </REST > <REST> … </REST > … </RESTS> Note: Do not turn over to the next page before attempting this exercise yourself! Salman Azhar: Database Systems

  36. Solution:The semi-structured representation <? XML VERSION = “1.0” STANDALONE = “yes” ?> <RESTS> <REST> <NAME>Taco Bell</NAME> <SODA><NAME>Pepsi</NAME> <PRICE>1.00</PRICE></ SODA> <SODA><NAME>Sobe</NAME> <PRICE>2.00</PRICE></SODA> </REST > <REST> … </REST > … </RESTS> RESTS REST REST REST NAME . . . SODA SODA Note: Data is stored in leaf nodes and structure (tags) in internal nodes Taco Bell PRICE NAME PRICE NAME Pepsi 1.00 Sobe 2.00 Salman Azhar: Database Systems

  37. Valid XML • Switching gears: Well-formed to Valid XML • Valid XML is the most interesting use of XML • Essentially a context-free grammar for describing XML tags and their nesting • Specified by DTD • Each domain of interest creates one DTD that describes all the documents this group will share • For example, electronic components, travel industry, etc., will have their own DTDs Salman Azhar: Database Systems

  38. DTD Structure <!DOCTYPE<root tag> [ <!ELEMENT<name> ( <components> ) <more elements> ]> Note: !DOCTYPE is key word with <root tag> being the name of DOCTYPE Between [ … ] list of ELEMENT definition Each !ELEMENT has a <name> with the allowed list of <components> usually in the order listed Salman Azhar: Database Systems

  39. DTD Elements • Element definition consists • of its name (tag) • and a parenthesized description of any nested tags • includes order of subtags • and their multiplicity (0, 1, or many times) • Leaves (text elements) • have #PCDATA in place of nested tags Salman Azhar: Database Systems

  40. Example: DTD <!DOCTYPE RESTS [ <!ELEMENT RESTS (REST*)> <!ELEMENT REST (RNAME, SODA+)> <!ELEMENT NAME (#PCDATA)> ]> RESTS can have * (0 or more) REST REST has NAME and then + (1 or more) SODA… Order matters! NAME and PRICE are data (#PCDATA): No more tags just text SODA has NAME followed PRICE SODA’s NAME and PRICE are data (#PCDATA) GROUP EXERCISE: COMPLETE THE DTD Note: Do not turn over to the next page before attempting this exercise yourself! Salman Azhar: Database Systems

  41. Example: DTD <!DOCTYPE RESTS [ <!ELEMENT RESTS (REST*)> <!ELEMENT REST (RNAME, SODA+)> <!ELEMENT NAME (#PCDATA)> <!ELEMENT SODA (NAME, PRICE)> <!ELEMENT NAME (#PCDATA)> <!ELEMENT PRICE (#PCDATA)> ]> RESTS can have * (0 or more) REST REST has NAME and then + (1 or more) SODA… Order matters! NAME and PRICE are data (#PCDATA): No more tags just text SODA has NAME followed PRICE Salman Azhar: Database Systems

  42. Element Descriptions Rules • Subtags must appear in order shown • A tag may be followed by a symbol to indicate its multiplicity: • Identical to UNIX regular expressions. • * = zero or more. • + = one or more. • ? = zero or one. • Alternative sequences of tags can be connected by • the symbol | Salman Azhar: Database Systems

  43. Example: Element Description • A name is • Either an optional title (e.g., “Dr.”), a first name, and a last name, in that order, • or it is an IP address <!ELEMENT NAME ( (TITLE?, FIRST, LAST) | IPADDR )> Alternative symbol Salman Azhar: Database Systems

  44. Use of DTDs • In order to specify a document follows a particular DTD • Set STANDALONE = “no” • Either include the DTD as a preamble of the XML document • Follow DOCTYPE and the <root tag> by SYSTEM and a path to the file where the DTD is stored Salman Azhar: Database Systems

  45. Example (a) <? XML VERSION = “1.0” STANDALONE = “no” ?> <!DOCTYPE RESTS [ <!ELEMENT RESTS (REST*)> <!ELEMENT REST (NAME, SODA+)> <!ELEMENT NAME (#PCDATA)> <!ELEMENT SODA (NAME, PRICE)> <!ELEMENT NAME (#PCDATA)> <!ELEMENT PRICE (#PCDATA)> ]> <RESTS> <REST> <NAME>Taco Bell</NAME> <SODA><NAME>Pepsi</NAME> <PRICE>1.00</PRICE></ SODA> <SODA><NAME>Sobe</NAME> <PRICE>2.00</PRICE></SODA> </REST > <REST> … </REST > … </RESTS> DTD Document Same as earlier but this time it conforms to the above DTD Salman Azhar: Database Systems

  46. Example (b) • Assume the RESTS DTD is in file rest.dtd <? XML VERSION = “1.0” STANDALONE = “no” ?> <!DOCTYPE Rests SYSTEM “rest.dtd”> <RESTS> <REST> <NAME>Taco Bell</NAME> <SODA><NAME>Pepsi</NAME> <PRICE>1.00</PRICE></ SODA> <SODA><NAME>Sobe</NAME> <PRICE>2.00</PRICE></SODA> </REST > <REST> … </REST > … </RESTS> Get the DTD from the file rest.dtd DocumentSame as earlier but this time it conforms to the DTD in rest.dtd Salman Azhar: Database Systems

  47. Attributes • Attributes are another important component of DTD and XML docs • Opening tags in XML can have attributes • like <A HREF = “…”> in HTML • In DTD <!ATTLIST <elementname>… > • gives a list of attributes and their data types for this element Salman Azhar: Database Systems

  48. Example: Attributes • Rests can have an attribute kind • which is either qsr, family, or other. • The element definition is unchanged • However, we add an ATTLIST. <!ELEMENT REST (NAME SODA*)> <!ATTLIST REST kind “qsr” | “family” | “other”> Salman Azhar: Database Systems

  49. Example: Attribute Use • In a document that allows REST tags, we might see: <REST kind = “qsr”> <NAME>KFC</NAME> <SODA><NAME>Pepsi</NAME> <PRICE>1.00</PRICE></SODA> ... </REST> New info: kind = “qsr” Salman Azhar: Database Systems

  50. IDs and IDREFs • Introduce links from one object to another • Allows the structure of an XML document to be a general graph • rather than just a tree. • These are pointers from one object to another • in analogy to HTML’s NAME = “blah” and HREF = “#blah” Salman Azhar: Database Systems

More Related