1 / 28

Natix

Natix. Done by Asmaa Hassanain CSC 5370 Dr. Hachim Haddoutti 12/8/2003. Contents. XML data management Techniques What is Natix Natix Architecture Storage Layer: Logical Data Model Mapping between XML and the Logical Model XML page Interpreter Storage Formater

dalit
Download Presentation

Natix

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Natix Done by Asmaa Hassanain CSC 5370 Dr. Hachim Haddoutti 12/8/2003

  2. Contents XML data management Techniques What is Natix Natix Architecture Storage Layer: Logical Data Model Mapping between XML and the Logical Model XML page Interpreter Storage Formater XML segment mapping for large trees Index Structures Natix Physical Algebra Example Plans To do... CSC 5370 XML and Data Management

  3. XML data management Techniques  Map data to relational database  But:  Unnormalized relations  Data centric view: Large number of tables  Document centric view: all informantion in a single data item (e.g. CLOB)  Store data as a plain text file •  But: •  Need to parse the entire file for processing every query  Store data as objects •  But: •  OOD systems are not enough developed to provide efficient querying capabilities Designing Native XML database systems from scratch CSC 5370 XML and Data Management

  4. Natix CSC 5370 XML and Data Management

  5. What is Natix?  Natix is a native XML Repository  Proposed by Kanne and Moerkotte at University of Mannheim (Germany)  Natix requires Linux to run (kernel 2.2.16 or later, or 2.4.*), with CODA support enabled in the kernel.  Still under development CSC 5370 XML and Data Management

  6. Natix Architecture CSC 5370 XML and Data Management

  7. Natix Architecture Binding Layer: map between the Natix Engine Interface and different application interfaces CSC 5370 XML and Data Management

  8. Natix Architecture e. g. NatixFS:  File system interface – Natix can be mounted like an ordinary file system  Allows to view XML tree as a file system tree  Importing a document – just copy it to a directory, e.g. cp bib.xml /natix  Exporting a document – just open it, e.g. more /natix/bib.xml  Removing a document – just delete a file, e.g. rm /natix/bib.xml  XPath expressions – just use it as file name, e.g.more /natix/{%%title} CSC 5370 XML and Data Management

  9. Natix Architecture Service Layer: Provides all DBMS functionality required in addition to simple storage and retrieval  Natix Engine Interface  Query execution engine  Query compiler  Transaction manager  Object manager CSC 5370 XML and Data Management

  10. Natix Architecture Natix Engine Interface: The interface through which the database services communicate with each other and with applications provides a unified facade to specify requests to the database system. CSC 5370 XML and Data Management

  11. Natix Architecture Query compiler: translates queries expressed in XML query languages into optimized query execution plans CSC 5370 XML and Data Management

  12. Natix Architecture Query execution engine: evaluates queries  Interprets the plan passed by the query compiler  Able to execute all queries expressible in a typical XML query language like XQuery CSC 5370 XML and Data Management

  13. Natix Architecture Transaction management: contains classes that provide ACID­style transactions + Components for recovery  adapt the ARIES protocol for recovery  For synchronization, an S2PL­based scheduler is introduced CSC 5370 XML and Data Management

  14. Natix Architecture Storage Layer: manages all persistent data structures and their transfer between main and secondary memory .  contains classes for efficient XML storage, indexes and meta­data storage.  manages the storage of the recovery log and controls the transfer of data between main and secondary storage.  accesses raw disks or file system files and provides a memory space divided into segments, which are a linear collection of equal-sized pages. CSC 5370 XML and Data Management

  15. Storage Layer: Logical Data Model Logical Data Model: logical tree New nodes can be inserted as children or siblings of existing nodes Any node can be removed Individual documents are represented as ordered trees CSC 5370 XML and Data Management

  16. Mapping between XML and the Logical Model A small wrapper class is used to map the XML model with its node types and attributes to a simple tree model and vice versa: Elements are mapped one to one to tree nodes of Logical Data Model Atributes are mapped to child nodes of an additional attribute container child node The name of referenced entities are retained in special internal nodes CSC 5370 XML and Data Management

  17. XML page Interpreter Storage Formater  The logical data tree is partitioned into subtrees  Each sudtree is stored in a single record of variable lenght  Each record contains a pointer to the record containing the parent node and the document identifier CSC 5370 XML and Data Management

  18. XML page Interpreter Storage Formater Subtrees of original XML document are stored together in a single physical record clusters connected subtrees of the document tree into large records and represents intra-record references differently from inter-record references  The inner structure of the subtrees is retained CSC 5370 XML and Data Management

  19. XML segment mapping for large trees Proxy nodes refer to connected subtrees not stored in the same record Helper aggregate nodes group together a subset of children of a node CSC 5370 XML and Data Management

  20. Index Structures Natix uses two Index Structures: • Full text index framework (inverted files): store lists of document references to indicate in which documents search terms appear Index  eXtended Access Support Relation List Manager  Map search terms to list identifier and store these mappings persistenly  Provides the main interface for the user to work with inverted files  Preserves the parent/child, ancestor/ descandant, and preceding/following relationships between nodes   Maps the list identifiers to the actual lists (managing the directory of the inverted file) FragmentedList • Lists are divided to fragments that fit on a page + linked together + can be traversed sequentially • It manages all the fragments of one list and control insertions and deletions on this list ContextDescription The XASR combined with a full text index provides a powerful method to search on contentens of nodes  Establishes the actual representation in which data is stored in a list CSC 5370 XML and Data Management

  21. Natix Physical Algebra ‘Let’, ‘for’, ‘where’ and ‘return’ in XQuery are supported ‘Select’, ‘map’, ‘join’, ‘grouping’ and ‘sort’ operations are performed by standard algebraic operators borrowed from relational context ‘D-join’ and ‘unary and binary grouping’ are borrowed from the object oriented context CSC 5370 XML and Data Management

  22. Natix Physical Algebra Scan operations: e. g. ExpressionScan ExpressionScan: generates a tuple containing the root of the document identified by its name by evaluating a given expression UnnestMap is used to generate variable bindings for XPath expressions e.g./a//b/c  UnnestMap$4=child($3,c)( UnnestMap$3=desc($2,b)( UnnestMap$2=child($1,a)([$1]))) ‘BA-Map’, ‘FL-Map’, ’Groupify-GroupApply’ and ‘NGroupify-NGroupApply’ are use to construct the XML result CSC 5370 XML and Data Management

  23. Example Plans (1): This query retrieves the title and the year for all recent books CSC 5370 XML and Data Management

  24. Example Plans (2): CSC 5370 XML and Data Management

  25. To do...  Support for functions inside XPath expressions  Cannot import DTDs as of now  Support for different character encodings  Support for XML namespaces preparing for the launch of the first full commercial end-user release of Natix that may support all these features CSC 5370 XML and Data Management

  26. Questions? CSC 5370 XML and Data Management

  27. References Natix: A Technology Overview: http://pi3.informatik.uni-mannheim.de/publications.html#79 Efficient storage of XML data: http://pi3.informatik.uni-mannheim.de/publications.html#79 Anatomy of a Natix XML base Management System: http://pi3.informatik.uni-mannheim.de/publications.html#79 Alebraic XML Construction and its Optimization in Natix: http://pi3.informatik.uni-mannheim.de/publications.html#79 Data ex machina: www.dataexmachina.de/natix.html CSC 5370 XML and Data Management

  28. Thank You CSC 5370 XML and Data Management

More Related