300 likes | 504 Views
.NET Database Technologies: Using NoSQL databases. NoSQL – “Not only SQL”. Alternatives to the ubiquitous relational database which may be superior in specific application scenarios Object-oriented databases (ODBMS) They came, they saw, they....
 
                
                E N D
.NET Database Technologies: Using NoSQL databases
NoSQL – “Not only SQL” • Alternatives to the ubiquitous relational database which may be superior in specific application scenarios • Object-oriented databases (ODBMS) • They came, they saw, they.... • ...didn’t conquer, but they are still around • NoSQL databases • The new kids on the block • General term applied to a range of different non-relational database systems • Largely emerging to meet the needs of large-scale Web 2.0 applications
Object-oriented databases • ODBMSs use the same data model as object-oriented programming languages • no object-relational impedance mismatch due to a uniform model • An object database combines the features of an object-oriented language and a DBMS (language binding) • treat data as objects • object identity • attributes and methods • relationships between objects • extensible type hierarchy • inheritance, overloading and overriding as well as customised types
ODBMS history • Object Database Manifesto • Paper published in 1989 (Atkinson et. al) • Some ODBMS products • Early 1990s: Gemstone, Objectivity • Late 1990s: Versant, ObjectStore, Poet , Matisse • 2000s: db4o, Cache • ODMG (Object Data Management Group) • 1993: ODMG 1.0 standard • 1997: ODMG 2.0 • 1999: ODMG 3.0, then ODMG disbanded • 2005: ODMG reformed, working towards new standard
ODMG • Object Database ManagementGroup (ODMG) founded in 1991 • standardisation body including all majorODBMS vendors • Define a standard to increase the portability across different ODBMS products • Mirroring the SQL standard for RDBMS • Object Model • Object Definition Language (ODL) • Object Query Language (OQL) • language bindings • C++, Smalltalk and Java bindings
Characteristics of ODBMS • Support complex data models with no mapping issues • Tight integration with an object-oriented programming language (persistent programming language) • High performance in suitableapplication scenarios • Different products scale fromsmall-footprint embedded db (db4o) to large-scale highly-concurrent systems (e.g. Versant V/OD)
Persistence patterns and ODBMS • Some of Fowler’s patterns are specific to the use of a relational database, e.g. • Data Mapper • Foreign Key Mapping • Metadata Mapping • Single-table Inheritance, etc. • Some are not specific to the data storage model and are relevant when using an ODBMS, e.g. • Identity Map • Unit of Work • Repository • Lazy-Loading
db4o • Open-source object-database engine • Now owned by Versant • Complements their own V/OD product • Can be used in embedded or client-server modes • Embed in application simply by including DLLs • Native object database • Stores .NET (or Java) objects directly with no special requirements on classes • Other ODBMSs (e.g. V/OD) require classes to be marked as persistent through bytecode manipulation and also store class definitions • Tight integration with application, but trade-off in limited ad-hoc querying and reporting • Can replicate data to relational database if required
IObjectContainer • IObjectContainer interface is implemented by objects which provide access to database • IObjectContainer is roughly equivalent to EF ObjectContext • Unit of Work pattern if transparent persistence is enabled (see later) • Can access DB in embedded mode (direct file access) or client-server mode (local or remote) • IObjectServer instance required in client-server mode • IObjectContainer instances created by factory classes, e.g. Db40Embedded • Queries on IObjectContainer return IObjectSet (except LINQ queries)
Viewing data and ad-hoc querying • ObjectManager Enterprise • Visual Studio plug-in • Browsing and drag-and-drop queries • LINQPad • Need to include db4o DLLs and namespaces for stored classes • Executes LINQ queries and visualises results
db4o query APIs • Query-by-example (QBE) • Very limited - no comparisons, ranges, etc. • Simple Object Data Access (SODA) • Build query by navigating graph and adding constraints to nodes • Native Queries • Expressed completely in programming language • Type-safe • Optimised to SODA query at runtime if possible • LINQ • .NET version, not in Java (obviously)
Activation • Objects are stored in DB as an object graph • If db4o configured to cascade-on-activate (eager loading) then retrieving one object could potentially load a large number of related objects • Fixed activation depth limits depth of traversal of graph when retrieving objects • Default value is 5 • Can then explicitly activate related objects when needed • Lazy loading can be configured with transparent activation • Classes need to be “instrumented” at load time by running Db4oTool.exe • Code injected into assembly so that classes implement IActivatable interface
Update depth • Similar considerations apply to updates • Storing an updated object could cause unnecessary updates to related objects • Fixed update depth limits depth of traversal of graph when retrieving objects • Default value is 1 • Can configure transparent persistence which allows changes to be tracked • Only changed objects are updated in database • Behaves like change tracking in, for example, Entity Framework • Unit of Work
PI? • Stores POCOs without any need for mapping, so yes • Transparent Activation requires that classes implement a specific interface • But this is done at build time so domain classes don’t need any specific code • Has parallels with dynamic proxies in ORMs: • Classes are instances of domain classes, which have been modified ‘under the hood’ at build-time • Compare with dynamic proxy class which derive from domain classes and are created ‘under the hood’ at run-time
Further reading • www.odbms.org • Resource portal • Db4o Tutorial • included in product download • The Definitive Guide to db4o (Apress)
NoSQL databases • New breed of databases that are appearing largely in response to the limitations of existing relational databases • Typically: • Support massive data storage (petabyte+) • Distribute storage and processing across multiple servers • Contrast in architecture and priorities compared to relational databases • Hence term NoSQL • “Not only SQL” – absence of SQL is not a requirement
NoSQL features • Wide variety of implementations, but some features are common to many of them: • Schema-less • Shared-nothing architecture • Elasticity • Sharding and asynchronous replication • BASE, not ACID • Basically Available • Soft state • Eventually consistent
MapReduce • Algorithm for dividing a work load into units suitable for parallel processing • Useful for queries against large sets of data: the query can be distributed to 100’s or 1000’s of nodes, each of which works on a subset of the target data • The results are then merged together, ultimately yielding a single “answer” to the original query • Example: get total word count of a large number of documents • Map: calculate word count of each document • Each node works on a subset of the overall data set • Results emitted to intermediate storage • Reduce: calculate total of intermediate results
Brewer’s CAP theorem • Can optimize for only two of three priorities in a distributed database: • Consistency • All clients have same view of the data • Requires atomicity, transaction isolation • Availability • Every request received by a non-failing node must result in a response • Partition Tolerance • Partitions happen if certain nodes can’t communicate • No set of failures less than total network failure is allowed to cause the system to respond incorrectly
Implications of CAP theorem • Any two properties can be achieved • CP • If messages between nodes are lost then system waits • Possible that no response returned at all • No inconsistent data returned to client • CA • No partitions, system will always respond and data is consistent • AP • Response always returned even if some messages between nodes • Different nodes may have different views of the data
Implications of CAP theorem • Choose a database whose priorities match the application http://blog.nahurst.com/visual-guide-to-nosql-systems
Using a NoSQL database in a .NET application • Application typically makes connection to remote cluster • Some (but not many) NoSQL databases are supported by native .NET clients • Handle “mapping” from .NET objects to data model • Many NoSQL databases are accessed through a REST interface • Application must construct request and handle response format, e.g. JSON • Application can be written in any suitable language • Azure Table Storage is Microsoft’s NoSQL storage for cloud-based applications • However the data is accessed, you need to understand the data model, which will be significantly different from a typical relational database or object model
NoSQL database types and examples • Key/value Databases • These manage a simple value or row, indexed by a key • e.g. Voldemort, Vertica • Big table Databases • “a sparse, distributed, persistent multidimensional sorted map” • e.g. Google BigTable, Azure Table Storage, Amazon SimpleDB • Document Databases • Multi-field documents (or objects) with JSON access • e.g. MongoDB, RavenDB (.NET specific), CouchDB • Graph Databases • Manage nodes, edges, and properties • e.g. Neo4j, sones
MongoDB • Scalable, high-performance, open source, document-oriented database • Stores JSON-style (actually BSON) documents with dynamic schema • Replication, high-availability and auto-sharding • Supports document-based queries and map/reduce • Command line tools : • mongod – starts server as a service or daemon • mongo – client shell • Store documents defined as JSON • Retrieved documents form query displayed as JSON
MongoDB and HTTP • Admin console at http://<server name>:28017 • REST interface on http://<server name>:28018 • Enabled by starting server with mongod --rest • Server responds to RESTful HTTP requests, e.g. • http://127.0.0.1:28017/company/Employee/?filter_Name=Fernando • Response is in JSON format • Could be consumed by client-side code in Ajax application
MongoDB .NET driver • Can access documents as instances of Document class • Represents document as key-value pairs • Or, can serialize POCOs to database format (JSON) • Deserialize database documents to POCOs • Supports LINQ queries • MapReduce queries can be expressed as LINQ queries
MongoDB schema design • Collections are essentially named groupings of documents • Roughly equivalent to relational database tables • Less "normalization" than a relational schema because there are no server-side joins • Generally, you will want one database collection for each of your top level objects • Don’t want a collection for every "class" - instead, embed objects relational document
Document example • Save: • Query: http://www.10gen.com/video/mongosv2010/schemadesign
MongoDB in C# applications - PI? • Up to a point • Collection class needs Id property of a specific type (MongoDB.Oid) • Object model needs to be designed with document schema in mind
Further reading • http://nosql-database.org/ • http://www.nosqlpedia.com/ • http://www.mongodb.org/ • http://www.codeproject.com/KB/database/MongoDBCS.aspx • Nice code example for C# and MongoDB