1 / 25

NoSQL Databases : MongoDB vs Cassandra

NoSQL Databases : MongoDB vs Cassandra . Introduction. What is a Database? “… a repository with organized and structured data, … “ ( Abramova & Bernardino, 2013-07) Data can be accessed using DBMS ( DataBase Management System) What is DBMS?

china
Download Presentation

NoSQL Databases : MongoDB vs Cassandra

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NoSQL Databases:MongoDB vs Cassandra

  2. Introduction • What is a Database? • “… a repository with organized and structured data, … “ (Abramova & Bernardino, 2013-07) • Data can be accessed using DBMS (DataBase Management System) • What is DBMS? • “DBMS can be defined as a collection of mechanisms that enables storage, edit and extraction of data” (Abramova & Bernardino, 2013-07)

  3. SQL • SQL: Structured Query Language • Became standard for: • Data interaction • Data manipulation • Data Stored as set of tables • Accessing data from different tables at the same time is possible.

  4. NoSQL • Carlo Strozzi presented NoSQL in 1980, back then, it refers to an open source database that didn’t use SQL interface. • Carlo Strozzi preferred to call it “noseequel” or “NoRel” • Principle Difference • Popular after San Francisco conference held 2009 • Why do we need NoSQL? • In SQL ,efficiency in information extraction is affected by the growth of data stored & used

  5. CAP theorem • Based from CAP theorem, the following guarantees can be defined: • Consistency • Availability • Partition tolerance • CAP theorem derives Relational and NoSQLprinciples

  6. ACID • “ACID is a principle based on CAP theorem and used as set of rules for relational database transactions.“ (Abramova & Bernardino, 2013-07) • ACID guarantees: • Atomic • Consistent • Isolated • Durable • What if the amount of data is large? • ACID may be hard to accomplish!

  7. BASE Principle & NoSQL • BASE principle: • Basically Available • Soft state • Eventually consistent • BASE still follows CAP theorem. • Two of the three guarantees should be selected if the system is distributed.

  8. Types of NoSQLDatabases • More than 150 different NoSQLdatabases • Based on same principles • Has some different characteristics. • Categories: • Key-value Store • Document Store • Column-family • Graph database

  9. Key-value store • Data is stored as a group of key and value • All keys are unique • Data Access is done by relating those keys to values • Hash contains all keys in order to provide information when needed

  10. Document Store • Databases are defined as set of Key-value stores that gets transformed into documents. • Each document is identified by unique key • Data access can be done using: • key • specific value

  11. Column Family • Similar to relational database model • Structure: • Column • Super-Column • Column family • Structure of database is defined by super-columns and column families. • Data access is accomplished by specifying column family, key and column in order to get value, using following structure: • <columnFamily>.<key>.<column> = <value>

  12. Graph database • Those databases are used when data can be represented as graph, for example, social networks.

  13. MONGODB • “MongoDB is an open source NoSQL database developed in C++” (Abramova & Bernardino, 2013-07). • MongoDBis a document store database • Documents are gathered into groups according to their structure • CAP theorem • Consistency • Partition tolerance

  14. MONGODB (Cont.) • Description • Data is sent to disc every 60 seconds. • Everything is flushed to disc once new files are created • Each document is identified by “id” field • An index for the “id” field is created • Characteristics • Durability • Concurrency

  15. MongoDB Characteristics • Durability • Durability of data is accomplished by the creation of replicas. • Master-Slave technique • Master: read & write • Slave: read • Slave with recent data becomes Master if the Master goes down • Replicas are asynchronous • Concurrency • Locks

  16. CASSANDRA • “Cassandra is a NoSQL database developed by Apache Software Foundation; written in Java” (Abramova & Bernardino, 2013-07) • Similar to the usual relational model • Difference is that stored data can be: • semi structured • unstructured. • CAP theorem • Partition tolerance • High Availability • Designed to save large amount of data and deal with huge volumes in an efficient way.

  17. CASSANDRA (Cont.) • Peer-to-peer architecture (NO MASTER) • High availability • High scalability • Replicates data over multiple nodes in a cluster. • Replication Factor: Total number of replicas. • RF(1): 1 copy of each row on 1 node • RF(2): 2 copies of same records on 2 nodes • Fail nodes are replaced with no downtime, and they are detected using “gossip” protocols

  18. CASSANDRA (Cont.) • Replication Strategy: • Simple: single data center • Network Topology: multiple data centers • Cassandra Characteristics: • Durability: • Two replication types: • Synchronous • Asynchronous • All writes & redundancies are known using a commit log. • Indexing: • “Each node maintains the indexes of the table it manages” • Data is manipulated using CQL

  19. YCSB • “The YCSB – Yahoo! Cloud Serving Benchmark is one of the most used benchmarks to test NoSQLdatabases” (Abramova & Bernardino, 2013-07). • YCSB has a client that consists of two parts: • Workload generator • Set of workloads. • Workloads are combinations of: • read • Write • update operations are done on randomly chosen records.

  20. Workload A: 50%reads & 50% updates Abramova, V., & Bernardino, J. (2013-07). NoSQL Databases: MongoDB vs Cassandra. 19

  21. Workload b: 95% Reads & 5%updates Abramova, V., & Bernardino, J. (2013-07). NoSQL Databases: MongoDB vs Cassandra. 20

  22. Workload C: 100% reads Abramova, V., & Bernardino, J. (2013-07). NoSQL Databases: MongoDB vs Cassandra. 20

  23. Workload f: Read-Modify-Write Abramova, V., & Bernardino, J. (2013-07). NoSQL Databases: MongoDB vs Cassandra. 20

  24. Workload G: 5% reads 95% updates Abramova, V., & Bernardino, J. (2013-07). NoSQL Databases: MongoDB vs Cassandra. 20

  25. Workload H: 100% updates Abramova, V., & Bernardino, J. (2013-07). NoSQL Databases: MongoDB vs Cassandra. 21

More Related