The NoSQL movement or the dawn of the post-relational age

The NoSQL movement or the dawn of the post-relational age

What is the buzz? Job Trends Search Trends Twitter search

Something for your CV

NoSQL Not only SQL or No Sql - No SQL support Support for the full SQL language imposes constraints on datastores. So does ACID compliance. So does the need for a fixed database schema. Many applications need more specialised datastores. A movement for choice in database architecture CouchBase survey Mike Loukides at O'Reilly an excellent overview Polyglot Persistance by Martin Fowler Wikipedia Comparision nosql-databases.org - a rather terrifying set of resources. Tim Anglade's compilation of Interviews

NoSQL is not new • Despite the wide-spread adoption of the relational data model for business application, there have always been a wide variety of specialised databases: • Geographic Information Systems - complex spatial relationships - ArcGIS e.g. BCC KnowYourPlace • OLAP - OnLine Analytic Processing - for analysis of transaction data • Free Text databases eg. LexisNexis for legal documents • Multi-dimensional sparse arrays - Pick and MUMPS • Object-oriented databases - eg ZOPE for the Plone CMS • These databases were directed at the need for complex and flexible data structures.

Forces for change • Volume of data - Facebook has over 30 Petabytes - 30,000 terabytes or 30 million Gigabytes • Volume of transactions - order of 1 million writes/sec • Changeability/flexibility of schema - constant beta • Complexity of data - UK Legislation

Use case: Terabytes of data need to be stored reliably with no schema requirements • Reliability is a big problem when volumes are large. In a farm of say, 1000 servers, each with 8 spindles , there is a high probability that one disk will be down at any time. • Random access update is too slow - append new data and merge in batch • BigTable from Google • HBase from Apache • Dynamo from Amazon • Doug Cutting on Apache's Hadoop

Use case: Batch data analysis • Where very large transaction datasets need to be filtered and summarised, for example to analysis log files by IP location. In the past these could have been overnight jobs,now they need to be done in at most minutes. • Map-Reduce is an architecture for large-scale distributed computation. MapReduce should be called MapMergeReduce. Each MapReduce task is written in Java (or a high-level language like Pig). The operating system (like Hadoop) coordinates the distribution of the map, merge and reduce jobs and the dataflows. • input is a database of key-value pairs which are split ('sharded') over many spindles on many servers. • the user's map operation runs on every server hosting the shards and transforms each key/value input into 0,one or more key/value outputs. • Merge (shuffle) merges all pairs for the same key and distributes them (e.g. by hashing the keys) to multiple Reduce servers. This to can be user configurable. • the user's reduce takes each group of values for the same key and produces zero, one or more key/values for each group. • Successive MapMergeReduce operations can be chained together in a pipeline.

Use case: Document storage and retrieval • Document store • Complex hierarchical documents present problems for storing in a relational database. Every repeated part of the document would stored in its own table -Shredding; each repeated part would need to be link to is parent with a key; to reconstruct the document would require multiple joins from data distributed all over the file system. • Platforms: • eXist eXist open source XML store - query with XQuery • MarkLogic MarkLogic commercial XML store • CouchDb JSON store - query with JavaScript • MongoDb JSON store Telemetric data precessing

Use case: Fast put/get of keyed data Key-value store Where complex data is to be stored but the database is not interested in the internal structure. For example storing session data, user profiles, shopping carts The only operations are value = store.get(key) store.put(key, value) store.delete(key) Platforms: Project Voldemort Rhino

Use case: Page Caching Key-value cache Where the generation of a page takes a significant time, it is better to cache the pages as key/value pairs where the key is a URI and the value is the HTML page. As much of the cache as poosible is kept in RAM for rapid access Issues: cache flushing For example this site views summarized data from an eXist document store: AidView Platforms: Memecached

Use case: Linked data • Graph Database • Where data is composed of simple, highly interrelated facts. For example, there is an RDF version of Wikipedia called dbpedia. • Some use available databases such as MySQL, but the specific form of the data and the queries on the data suggest native • Triple (usually quad) stores to support RDF - Jena, SesameVirtuoso- query with SPARQL . RDF has a rigid data model : [graph] subject- predicate- object and is widely used for linked data • Custom Graph stores - Neo4J non standard interfaces

XML/XQuery for graphs • tutorial for using Neo4j to compute relationships in a graph • Friends relationship • Some friends as XML • a bit of XQuery • The knows relationship expanded • Permissions • People • Roles • a bit of XQuery • People and permissions • Shortest Path is difficult - Dijkstra's algorithm is tricky to implement in functional languages

Dan McCreary's Overview • The CIO's Guide to NoSQL

Risks • Lack of standardisation • New technology • Design cul-de-sac - requirements change • Lack of available developer skills. • R DMBS like Oracle and SQL Server are changing too - but just get more complex. • A dissenting view - warning - NSFW

The NoSQL movement or the dawn of the post-relational age

The NoSQL movement or the dawn of the post-relational age

Presentation Transcript

The Dawn of A New Age

The industrial revolution Part I: dawn of the industrial age

The New Age Movement

The New Age Movement (NAM)

The Dawn of the Age of Aquarius

Spices: Dawn of the Modern Age

Dawn of the Industrial Age

Dawn of the Industrial Age

The Age of Discovery or The Age of Exploration

The Dawn of the Age of Mass Oceanography

The Dawn of the Atomic Age

Dawn of the Industrial Age

Dawn of the Industrial Age

Dawn of the Industrial Age

Dawn of the Industrial Age

The Post-Carolingian Age

5.1 Dawn of the Industrial Age

Dawn of the Industrial Age

Is This the Dawn of the Quantum Information Age?

The New Age Movement

Age of Reason or the “The Enlightenment”

The Dawn of the Digital Age