Marlon Louise Parco | Emmanuel Alarillo | Alex Rabago | Ezekiel Contreras | Bayani Sagisag

Marlon Louise Parco | Emmanuel Alarillo | Alex Rabago | Ezekiel Contreras |Bayani Sagisag

SQL has ruled for two decades Store persistent data Application Integration Mostly Standard Concurrency Control Reporting

but SQL’s dominance is cracking Relational databases are designed to run on a single machine, so to scale, you need buy a bigger machine. But it’s cheaper and more effective to scale horizontally by buying lots of machines.

but SQL’s dominance is cracking The machines in these large clusters are individually unreliable, but the overall cluster keeps working even as machines die - so the overall cluster is reliable. The “cloud” is exactly this kind of cluster, which means relational databases don’t play well with the cloud. • The rise of web services provides an effective alternative to shared databases for application integration, making it easier for different applications to choose their own data storage.

Big Data so now we have NoSQL databases Schemaless There is no formal definition of NoSQL, but there are some common characteristics of NoSQL databases Programmer-friendly Availability Highly Scalable Low-latency

Examples include… Object Store Document Store XML DB Key-Value Store Big Table

Scalaris Written in:Erlang Main point: Distributed P2P key-value store License: Apache Protocol: Proprietary & JSON-RPC Best used: If you like Erlang and wanted to use Mnesia or DETS or ETS, but you need something that is accessible from more languages (and scales much better than ETS or DETS). For example: In an Erlang-based system when you want to give access to the DB to Python, Ruby or Java programmers. Kyoto Tycoon Written in: C++ Main point: A lightweight network DBM License: GPL Protocol: HTTP (TSV-RPC or REST) Best used: When you want to choose the backend storage algorithm engine very precisely. When speed is of the essence. For example: Caching server. Stock prices. Analytics. Real-time data collection. Real-time communication. And wherever you used memcached before. Lets compare! VoltDB Written in: Java Main point: Fast transactions and repidly changing data License: GPL 3 Protocol: Proprietary Best used: Where you need to act fast on massive amounts of incoming data. For example: Point-of-sales data analysis. Factory control systems. Couchbase (ex-Membase) Written in:Erlang & C Main point:Memcache compatible, but with persistence and clustering License: Apache Protocol:memcached + extensions Best used: Any application where low-latency data access, high concurrency support and high availability is a requirement. For example: Low-latency use-cases like ad targeting or highly-concurrent web apps like online gaming (e.g. Zynga). The "long tail"(Not widely known, but definitely worthy ones) ElasticSearch Written in: Java Main point: Advanced Search License: Apache Protocol: JSON over HTTP (Plugins: Thrift, memcached) Best used: When you have objects with (flexible) fields, and you need "advanced search" functionality. For example: A dating service that handles age difference, geographic location, tastes and dislikes, etc. Or a leaderboard system that depends on many variables. Neo4j Written in: Java Main point: Graph database - connected data License: GPL, some features AGPL/commercial Protocol: HTTP/REST (or embedding in Java) Best used: For graph-style, rich or complex, interconnected data. Neo4j is quite different from the others in this sense. For example: For searching routes in social relations, public transport links, road maps, or network topologies. Special-purpose Accumulo Written in: Java and C++ Main point: A BigTable with Cell-level security License: Apache Protocol: Thrift Best used: If you need a different HBase. For example: Same as HBase, since it's basically a replacement: Search engines. Analysing log data. Any place where scanning huge, two-dimensional join-less tables are a requirement. Hypertable Written in: C++ Main point: A faster, smaller HBase License: GPL 2.0 Protocol: Thrift, C++ library, or HQL shell Best used: If you need a better HBase. For example: Same as HBase, since it's basically a replacement: Search engines. Analysing log data. Any place where scanning huge, two-dimensional join-less tables are a requirement. Cassandra Written in: Java Main point: Best of BigTable and Dynamo License: Apache Protocol: Thrift & custom binary CQL3 Best used: When you write more than you read (logging). If every component of the system must be in Java. ("No one gets fired for choosing Apache's stuff.") For example: Banking, financial industry (though not necessarily for financial transactions, but these industries are much bigger than that.) Writes are faster than reads, so one natural niche is data analysis. HBase Written in: Java Main point: Billions of rows X millions of columns License: Apache Protocol: HTTP/REST (also Thrift) Best used:Hadoop is probably still the best way to run Map/Reduce jobs on huge datasets. Best if you use the Hadoop/HDFS stack already. For example: Search engines. Analysing log data. Any place where scanning huge, two-dimensional join-less tables are a requirement. Clones of Google’s Bigtable Redis Written in: C/C++ Main point: Blazing fast License: BSD Protocol: Telnet-like Best used: For rapidly changing data with a foreseeable database size (should fit mostly in memory). For example: Stock prices. Analytics. Real-time data collection. Real-time communication. And wherever you used memcached before. Couch DB Written in: Erlang Main Point: DB consistency, ease of use License: Apache Protocol: HTTP/REST Best used: For accumulating, occasionally changing data, on which pre-defined queries are to be run. Places where versioning is important. For example: CRM, CMS systems. Master-master replication is an especially interesting feature, allowing easy multi-site deployments. Riak Written in: Erlang & C, some JavaScript Main Point: Fault tolerance License: Apache Protocol: HTTP/REST or custom binary Best used: If you want something Dynamo-like data storage, but no way you're gonna deal with the bloat and complexity. If you need very good single-site scalability, availability and fault-tolerance, but you're ready to pay for multi-site replication. For example: Point-of-sales data collection. Factory control systems. Places where even seconds of downtime hurt. Could be used as a well-update-able web server. Mongo DB Written in: C++ Main Point: Retains some friendly properties of SQL. (Query, index) License: AGPL (Drivers: Apache) Protocol: Custom, binary (BSON) Best used: If you need dynamic queries. If you prefer to define indexes, not map/reduce functions. If you need good performance on a big DB. If you wanted CouchDB, but your data changes too much, filling up disks. For example: For most things that you would do with MySQL or PostgreSQL, but having predefined columns really holds you back. The popular ones

So this means we can… Reduce Development Drag Embrace Large Scale but this does not mean relational is dead • The relational model is still relevant • ACID Transactions • Tools • Familiarity

this leads us to a world of Polyglot Persistence • Using multiple data storage technologies, chosen based upon the way data is being used by individual applications. • Polyglot persistence will occur over the enterprise as different applications use different data storage technologies. • It will also occur within a single application as different parts of an application’s data store have different access characteristics.

What might Polyglot look like? Polyglot Persistence provides lots of new opportunities for enterprises

The Future is: NoSQL Databases Polyglot Persistence Thank you!

Marlon Louise Parco | Emmanuel Alarillo | Alex Rabago | Ezekiel Contreras | Bayani Sagisag

Marlon Louise Parco | Emmanuel Alarillo | Alex Rabago | Ezekiel Contreras | Bayani Sagisag

Presentation Transcript

Ezekiel

EZEKIEL

Ezekiel

Marlon Dumas

Ezekiel

Ezekiel

Ezekiel

Marlon Miller

Ezekiel

Ezekiel

Ezekiel

Ezekiel

Ezekiel

Marlon Hawkins

Ezekiel

EZEKIEL

Ezekiel

Ezekiel

EZEKIEL

Ezekiel