1 / 35

Tamino – a DBMS Designed for XML

Tamino – a DBMS Designed for XML. Dr. Harald Schoning Presenter: Wenhui Li University of Ottawa Instructed by: Dr. Mengchi Liu Carleton University. Abstract. Who?- Software AG What?- XML database management system When? 1999 the first time unveiled 2004 June Tamino XML Server 4.2

lamar
Download Presentation

Tamino – a DBMS Designed for XML

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tamino –a DBMS Designed for XML Dr. Harald Schoning Presenter: Wenhui Li University of Ottawa Instructed by: Dr. Mengchi Liu Carleton University

  2. Abstract • Who?- Software AG • What?- XML database management system • When? • 1999 the first time unveiled • 2004 June Tamino XML Server 4.2 • Why? • management and transfer of structured and unstructured data • completely designed for XML

  3. Industry Background • XML is becoming prevailing for data processing in the internet. • Early goals of Tamino • Easy data exchanging • Evolution trend • Storing, managing, publishing and exchanging XML documents • Business modeling

  4. Industry Background cont’XML support in databases • Oracle XML Developer’s Kit • SQL Server 2000 • DB2 XML Extender

  5. Limitations of XML support via traditional RDBMS or ORDB • XML is not well-structured like RDB,ORDB or OODB • Storing and querying XML is possible but not feasible in these DB systems

  6. Two Modeling approaches • Data-centric documents • Regular structure • Order does not matter • No mixed content • Document-centric documents • less regular structure • significance of the order • mixed content

  7. Why don’t use relational DB • XML documents can have schematic information (DTD), but they are notrequired to. • classical database handling objects of a predefined type, cannot be applied in XML

  8. Why doesn’t use XML itself? • XML is just a markup language, it does not contain processing facilities on its own • querying a set of XML documents is outside the scope of the XML recommendation Therefore, comes the Tamino!

  9. What does Tamino do? • What’s Tamino (the 1st slide) • Store XML documents, HTML files and GIF images, etc. • Retrieve them in a set-oriented manner, with sophisticated query facilities

  10. Tamino’s architecture

  11. The schema of XML documents • XML support schematic information, but it differs from the classical databases • DTD have a couple of deficiencies (e.g. data type) • W3C working group is developing an XML schema description language • However, DTD is the only standard schema at present

  12. XML schema vs. RDB and OODB schema • In RDB or OODB, the schema is created before the instances can be stored • Instances must conform to the declared schema • In XML database, each instance declares a schema on its own. • for XML documents, grouping of objects of homogeneous structure into (pre-defined) tables or classes doesn’t work

  13. Query and Index of XML schema • Queries operate on sets • Indexes are defined on the basis of a common schema • For the purpose of querying, arbitrary objects could be grouped to sets • Index definition also requires at least a common subset in the structure

  14. Schema handling in Tamino • Grouping documents by open content model + user-directed document grouping • Documents grouped into collections • Within a collection, declare several document types • For each document type define a common schema (open content model) • For each document, Tamino assigns one of the document type

  15. Type Assignment • Assignment is based on the root element type • Document must match the schema of the document type assigned, but might have additional elements/attributes • In a document type, documents might differ considerably • If no appropriate document type, document is stored without any schema checking

  16. Tamino schema example

  17. Document accepted by Tamino <City Inhabitants=”138000”> <Name>Darmstart</Name> <Addition>The city of art nouveaud</Addtion> <Monument Height=”39m”> <Name>Langer Ludwig</Name> <Location> <Name>Luisenplatz<Name> <MapIndex>M5</MapIndex> </Location> </Monument> </City>

  18. Is an element/attribute should be modeled? • an index will be defined on this element/attribute • the element/attribute is to be mapped to an external data source or to a server extension • dedicated access rights will be defined on the element/attribute • the presence / multiplicity of the element is to be enforced • one of the above conditions hold for a child of the element

  19. Indexing of Tamino • value-based indexes • well known from traditional database systems • used to accelerate the search • exactly address the data object • names need not be unique within a DTD

  20. Example of value-based index • value-based indexes • data-centric view <!ELEMENT City (Name, Inhabitants, Monument+)> <!ELEMENT Monument (Name, Description)> <!ELEMENT Inhabitants (#PCDATA)> <!ELEMENT Name (#PCDATA)> <!ELEMENT Description (#PCDATA)>

  21. Indexing of Tamino (cont’) • text indexing • document-centric view • limit the scope to a specific part of the document • the scope might span element content

  22. Example of text index • text indexing • document-centric view <statement> <author> <firstname>Harald></firstname> <lastname>Schoning</lastname> </author> <text> X<italic>M</italic>L and X<italic>S</italic>L are <stressed>very</stressed> important </text> </speech>

  23. Indexing of Tamino (cont’) • structural index • If multiplicity permits the omission of elements • or if no DTD is known • Example • in a database of all European cities • search all those cities which have an element called “beach”

  24. Querying XML documents • Currently, there is no standardized query language • XPath allows positioning within a single document • XPath fits well the needs of retrieval in data-centric environments • document-centric environments need a more content-based retrieval facility • Tamino also supports full text search

  25. Expectation for XML processor • W3C:XML recommendation specifies the handling of entities, comments and processing instructions. • User: Tamino, leave comments intact, no processing instruction evaluated, leave entity references unresolved. • User: the output of a Tamino query should match the specification of an XML processor.

  26. Why don’t leave entities unresolved? • In case result is a set of (parts of)matching documents • This result DTD must include all different entity declarations of the original document • Definition of the entity might differ from document to document • So, for the same entity name, entities are renamed, and the entity references are changed accordingly.

  27. problems of external entities • These entities can change without the database system knowing about this • Thus, the values of external entities must not be included in indexes • Example: <!ENTITY &mysubject SYSTEM “http://www.softwareag.com/hottopic.xml”> ... <ticker>Todays hot topic: &mysubject</ticker> • Checking the current contents of the external entity lead to unacceptable response times.

  28. Relational Databases and XML • major (object-) relation database systems include some forms of XML support • The simplest form is to generate XML documents for existing relational data. • But, real database handling of XML requires that XML data can be stored and retrieved • Two approaches

  29. XML support approach(1) • Map the XML document is to relational tables and their columns • Markup is ignored on storage, and reconstructed on retrieval • advantage of this approach: • the contents of an XML document can be handled with traditional SQL

  30. XML support approach(1) cont’ • Shortcomings: • The sequence information lost <Order CustomerId=”567” Date=”12- 12-2000”> <Item ProductID=” 17” Quantity=”2”/> <Item ProductID=”l6” Quantity=”9”/> <Item ProductID=“ 19 ” Quantity=“8”/> </Order> The retrieval of the order: <Order CustomerId=”567” Date=”12-12-2000”> <Item ProductID=” 16” Quantity=”9’/> <Item ProductID=” 17” Quantity=”2”/> <Item ProductID=” 19” Quantity=”8”/> </Order>

  31. XML support approach(1) cont’ • Data-centric documents sequence might not matter, it does for document-centric • this approach loses all comments and processing instructions • mixed content cannot be stored easily in this model

  32. XML support approach(2) • Leaves the XML document intact and stores it in a large text field (“BLOB”) • Or even outside the database • Text search is possible • Can limit a certain text-based condition

  33. XML support approach(2) cont’ • Limitations: • no structure-aware combinations are possible • Value-based search is not supported on these text fields • IBM solution: side tables • But, direct manipulation of side tables destroys the consistency of the database • Security can be defined on document level only, but not on elements orattributes

  34. Summary • Tamino was designed with particular attention to the XML • Schema handling for XML is different from relational databases does • In Schema handling, external entities cause conceptual problems • value-based indexes are useful for XML, as well as text index and structural index • Comments and processing instructions should be preserved when documents are stored • The result of a query against an XML database should be XML

  35. Q&A Thanks!

More Related