1 / 15

The NoSQL movement or the dawn of the post-relational age

The NoSQL movement or the dawn of the post-relational age. What is the buzz? Job Trends Search Trends Twitter search. Something for your CV. NoSQL. Not only SQL or No Sql - No SQL support

magda
Download Presentation

The NoSQL movement or the dawn of the post-relational age

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The NoSQL movement or the dawn of the post-relational age

  2. What is the buzz? Job Trends Search Trends Twitter search

  3. Something for your CV

  4. NoSQL Not only SQL or No Sql - No SQL support Support for the full SQL language imposes constraints on datastores. So does ACID compliance. So does the need for a fixed database schema. Many applications need more specialised datastores. A movement for choice in database architecture CouchBase survey Mike Loukides at O'Reilly an excellent overview Polyglot Persistance by Martin Fowler Wikipedia Comparision nosql-databases.org - a rather terrifying set of resources. Tim Anglade's compilation of Interviews

  5. NoSQL is not new • Despite the wide-spread adoption of the relational data model for business application, there have always been a wide variety of specialised databases: • Geographic Information Systems - complex spatial relationships - ArcGIS e.g. BCC KnowYourPlace • OLAP - OnLine Analytic Processing - for analysis of transaction data • Free Text databases eg. LexisNexis for legal documents • Multi-dimensional sparse arrays - Pick and MUMPS • Object-oriented databases - eg ZOPE for the Plone CMS • These databases were directed at the need for complex and flexible data structures.

  6. Forces for change • Volume of data - Facebook has over 30 Petabytes - 30,000 terabytes or 30 million Gigabytes • Volume of transactions - order of 1 million writes/sec • Changeability/flexibility of schema - constant beta • Complexity of data - UK Legislation

  7. Use case: Terabytes of data need to be stored reliably with no schema requirements • Reliability is a big problem when volumes are large. In a farm of say, 1000 servers, each with 8 spindles , there is a high probability that one disk will be down at any time. • Random access update is too slow - append new data and merge in batch • BigTable from Google • HBase from Apache • Dynamo from Amazon • Doug Cutting on Apache's Hadoop

  8. Use case: Batch data analysis • Where very large transaction datasets need to be filtered and summarised, for example to analysis log files by IP location. In the past these could have been overnight jobs,now they need to be done in at most minutes. • Map-Reduce is an architecture for large-scale distributed computation. MapReduce should be called MapMergeReduce. Each MapReduce task is written in Java (or a high-level language like Pig). The operating system (like Hadoop) coordinates the distribution of the map, merge and reduce jobs and the dataflows. • input is a database of key-value pairs which are split ('sharded') over many spindles on many servers. • the user's map operation runs on every server hosting the shards and transforms each key/value input into 0,one or more key/value outputs. • Merge (shuffle) merges all pairs for the same key and distributes them (e.g. by hashing the keys) to multiple Reduce servers. This to can be user configurable. • the user's reduce takes each group of values for the same key and produces zero, one or more key/values for each group. • Successive MapMergeReduce operations can be chained together in a pipeline.

  9. Use case: Document storage and retrieval • Document store • Complex hierarchical documents present problems for storing in a relational database. Every repeated part of the document would stored in its own table -Shredding; each repeated part would need to be link to is parent with a key; to reconstruct the document would require multiple joins from data distributed all over the file system. • Platforms: • eXist eXist open source XML store - query with XQuery • MarkLogic MarkLogic commercial XML store • CouchDb JSON store - query with JavaScript • MongoDb JSON store Telemetric data precessing

  10. Use case: Fast put/get of keyed data Key-value store Where complex data is to be stored but the database is not interested in the internal structure. For example storing session data, user profiles, shopping carts The only operations are value = store.get(key) store.put(key, value) store.delete(key) Platforms: Project Voldemort Rhino

  11. Use case: Page Caching Key-value cache Where the generation of a page takes a significant time, it is better to cache the pages as key/value pairs where the key is a URI and the value is the HTML page. As much of the cache as poosible is kept in RAM for rapid access Issues: cache flushing For example this site views summarized data from an eXist document store: AidView Platforms: Memecached

  12. Use case: Linked data • Graph Database • Where data is composed of simple, highly interrelated facts. For example, there is an RDF version of Wikipedia called dbpedia. • Some use available databases such as MySQL, but the specific form of the data and the queries on the data suggest native • Triple (usually quad) stores to support RDF - Jena, SesameVirtuoso- query with SPARQL . RDF has a rigid data model : [graph] subject- predicate- object and is widely used for linked data • Custom Graph stores - Neo4J non standard interfaces

  13. XML/XQuery for graphs • tutorial for using Neo4j to compute relationships in a graph • Friends relationship • Some friends as XML • a bit of XQuery • The knows relationship expanded • Permissions • People • Roles • a bit of XQuery • People and permissions • Shortest Path is difficult - Dijkstra's algorithm is tricky to implement in functional languages

  14. Dan McCreary's Overview • The CIO's Guide to NoSQL

  15. Risks • Lack of standardisation • New technology • Design cul-de-sac - requirements change • Lack of available developer skills. • R DMBS like Oracle and SQL Server are changing too - but just get more complex. • A dissenting view - warning - NSFW

More Related