1 / 19

XML, distributed databases, and OLAP/warehousing

XML, distributed databases, and OLAP/warehousing. The semantic web and a lot more. What is XML?. A framework for declarative languages A syntax and two major constructs: elements & attributes Elements: Have begin and end tags Can be embedded

isolde
Download Presentation

XML, distributed databases, and OLAP/warehousing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XML, distributed databases, and OLAP/warehousing The semantic web and a lot more

  2. What is XML? • A framework for declarative languages • A syntax and two major constructs: elements & attributes • Elements: • Have begin and end tags • Can be embedded • Can be put in lists (homogeneous or heterogeneous) • Attributes: • Are assigned to elements • Are strings • Are put in quotes

  3. What is XML for? • Initially, as a cornerstone of the semantic web • Automatic searching of the web (versus interactive) • Self-describing data • Has been adapted to a wide variety of application domains • As a means for specifying the structure of data • As a catch-all for nontraditional data

  4. XML documents • An instance of XML is a language • An instance of an XML language is a document • Documents are hierarchical & list-oriented • XML documents can be parsed in a single, linear pass • There is do notion of a fixed schema • Does not leverage meta data for set-oriented queries • Order matters in a set of documents • Order matters in a series of elements in a document

  5. Is it a generalized HTML? • Sort of, but perhaps more of a meta alternative to HTML • The real point is to allow HTML pages to be located and searched automatically • This is done by allowing language developers to create their own names for documents, elements, & attributes

  6. What else is part of the XML philosophy? • Namespaces • Associated with URLs • Can be referenced in a nested fashion in an XML document • Widely distributed sharing of data, XML languages, and namespaces

  7. What’s missing, from the database uer’s and a programmer’s perspective? • No innate notion of a query language • No Objects • Very limited data structuring capabilities • Yet another impedance mismatch problem • No way to store XML documents in a relational database, at least not natively • No way to make a database out of a set of documents

  8. So, in response to the database community’s desires… • A hierarchical query language – Xpath • A specification format for schemas – DTDs • But uses a different syntax • Does not accommodate namespaces

  9. So, in response to the database community’s desires, phase 2… • XML schema • More atomic or “basic” types • Like DTD’s, but with an XML syntax • Supports namespaces • Adds primary keys and foreign keys • Adds more constructs for structuring data • Simple types: primitive types, list and union, & restriction • Attributes can be of simple types • Complex types: compositors • all (unordered) and sequence (ordered), and choice • Extension and restriction • Integrity constraints

  10. Query language 1: XPath • Follows hierarchy of XML documents • Uses syntax borrowed from Unix file system • \ for root • . for current node • @ for value of an attribute • [1], [2], etc., for siblings • // for self or descendent of • .//x for all descendants to find an element of a specific type x • Augmented with URLs to create Xpointer • Relational database systems generally have an XML data type now

  11. Distributed Databases & Distributed TXS – homogenous and heterogeneous • See page 689: multiple DBs vs. a distributed DB • Homogeneous distributed DBs • Single unified schema • Designed top down • Distribution by row, column, table, by table selection • Issues of distribution • Redundancy: availability vs. keeping copies up to date • Hidden joins with column distribution • Hidden unions with table selection distribution

  12. Executing distributed transactions • Each node has a master and a client module • Masters are all identical and contain distributed data info • Clients are like single site databases with a prepare to commit • 3 basic strategies for query fragment execution • Bring data to procedure • Send procedure to data • Meet in a 3rd place • Estimating costs • Data shipping • Result shipping • Wait times on nodes • Integrity constraint enforcement

  13. Heterogeneous distributed databases • Forms of heterogeneity • Model • Schema • Database product • Namespace • Table structure (implications for object identities) • Keys and Foreign keys • Units • SQL dialect • Semantic issues relating to varying interpretations of data

  14. Integrating heterogeneous databases • After the fact • Stability is never achieved • Mappings are complex • Data may have conflicts, redundancy, and gaps • Closed world vs. open world

  15. Engineering for nonstop change • Mediators around databases • Gateways connecting old apps and new databases • Gateways connecting new apps and old databases • A stability of instability

  16. OLAP • Standard model • N dimension tables • 1 fact table (PK is union of keys of dimension tables) • Hypercube visualization • Multidimensional table result visualizations • Star and constellation schemas • Terminology • Drilling down – stepping down nested attributes • Rolling up – moving up nested attributes • Pivot – group by

  17. Specialized operators • Cube operator and 4 equivalent queries • Viewing results • See page 722 • Equivalent – see 723

  18. Populating the warehouse • Transformation • Integration • cleaning

  19. Data mining • Effectively an open world application • Association, classification, clustering – page 730 • Association – confidence and support – page 731

More Related