Schema-less databases

Schema-less databases
Really…? In actuality, there is no such thing as a schema-less database In a relational database, the schema is explicit and created separately in advance In column-based database, we create a fresh schema for each row, and in fact, we often reuse schema fragments from rows that are grouped together The same is true for document databases In column-based and also in document databases, users directly query the data based on the schema In graph-based databases, we are in essence building the schema as we build the data Perhaps we could say that a key-value db has no schema, but in truth, the app is must be coded to look for & interpret schematic information

Schema updates
In a relational database, it is almost always a big deal to change a schema In “schema-less” databases, the idea is to make it as easy as possible, so that we can: dynamically keep structural information up to date – because today, this sort of information changes frequently. keep the database online – but this does not always work, or we at least have to pull part of it offline. count on the structural information of other objects to remain current – because we can surgically control exactly what objects have their schemas changed.

The schema-less approach & consequences The general idea with schema-less databases is: To treat meta data like data, as much as possible To allow much more individuality for each object Interesting side effects of this idea The database can hold much more varied forms of data Data from a schema-less database could be extracted, interpreted by the application, and then structured and stored in a relational database when necessary

Language-related factors 1. In a schema-less database, the boundary between the db and the application is lower, as much of the query/update code is written in a conventional language 2. Or, perhaps we could say that the boundary is higher, because much more complex/rich things can be done to the data directly in the database But perhaps the deciding factor is that in a schema-less database, we don’t have many the amenities – such as full ACID transactions - that a relational database would have, and so 1 above is closer to the truth.

Problems with schema-less approach If there is no explicit schema, it can be difficult to know what to change in the application if some of the data changes format, as code in many places will be doing their own data interpretations If updates and queries are written in a general purpose language, it can be harder to isolate the code that needs to be changed within the database-level code In a relational database, queries are fairly declarative

The term “migrations”
This refers to the evolution of schema information during the life-cycle of applications that use it In a relational database this is a big deal, but it is explicit In a schema-less database, we can better support incremental change The term is also used in MVC-based web development environments to refer to the indirect creation of schema components during the development of a web app Perhaps the best way to look at this term is philosophically – we want to migrate schemas, not operation is an offline-online endless loop

Maintaining backward compatibility We could create new objects or new versions of objects in order to be assured that applications can use the database as it was In a graph database, we could add new edges but not delete old ones In fact, we could view both data and metadata this way, and have an ever-growing database This is not as absurd as it might sound – for legal and business reasons, we often need to keep old data We can push old data off on faraway clusters

Reasons for using a schema Encapsulation gives us a structure that can serve as the scope of an operation We rely on structure as a differentiator so we can reuse data and retarget data No structure – bits Minimal structure – textual documents Modest structure – relational tables Medium structure – business objects High structure – CAD Extreme structure – photos, video, audio, language

Assignment 4 You will build an application using PostGreSQL and Cassandra The application will consist of a handful of operations that you will perform on each database – you can run your operations manually and have no app PostgreSQL will hold your schema based, tabular data Cassandra will hold your schema-variable data There will be two tables in PostgreSQL The first holds customers who are buying items Key for customer, customer names, item purchased for each row (FK of primary key of second table) The second will hold the items for purchase Key for item, price for item Cassandra will hold the buying history of each customer What items purchased How many of each item Price paid all of the instances of a given item – prices can change over time This is due at the beginning of class on Feb. 25.

Schema-less databases

Schema-less databases

Presentation Transcript

Schema-less databases

Schema updates

The term “migrations”

SCHEMA

Managing and Querying Transaction-time Databases under Schema Evolution

Schema Schema Integration

Schema

Schema and Schema Calculus

Analysis of Schema Evolution for Databases in Open-Source Software

SCHEMA

Schema

Schema

No-Schema SQL Querying Relational Databases Independent of Schema

Schema

Schema Creator schema-creator/

Schema

Schema

Schema

SCHEMA

Databases : SQL-Schema Definition and View

LESS IS LESS

Schema:

Schema

SCHEMA

Schema