1 / 19

Column-based dbs

Column-based dbs. BigTable , HBase , SimpleDB , and Cassandra. But first, the third assignment. This is due on Monday, the 18 th , by the beginning of class As with the first assignment, contact the grader when you are done

kapono
Download Presentation

Column-based dbs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Column-based dbs BigTable, HBase, SimpleDB, and Cassandra

  2. But first, the third assignment • This is due on Monday, the 18th, by the beginning of class • As with the first assignment, contact the grader when you are done • Build a Neo4J database with the Neo4j web GUI (localhost:7474) and Cypher and/or Gremlin • Note that the Console tab gives access to the documentation • Also note that the Console tab gives access to Gremlin • You can use either Cypher or Gremlin (or both) to do your assignment

  3. 3rd assignment, continued • Your Neo4J databases • Model customer sites and service personnel • Use at least 15 sites and 6 personnel • Each site is a node • Each service person is a node • As calls come in • a property is created for the given site that describes the nature of the problem • a person is assigned to a node (and a relationship is made) • Each node has a property that specifies the nature of its problem • Each person has a property that specifies the sorts of problems he/she can solve

  4. 3rd assignment, continued • Support the following operations • Creating a site • Creating a service personnel • Assigning a problem property to a site, sites can have many of these • Assigning a specialty to a service personnel, personnel can have many of these • Assigning a person to a site • Removing a problem property and a relationship that corresponds to it • Removing a site • Removing a personnel • Anything you want to add…

  5. Column-based DBs • BigTable • First notable column-based DB • No schema • Sparse tables, e.g., no empty columns • Groups (or families) of columns stored together

  6. Basic concepts • First column is a key • Column structure is next • Group of columns • We can select all or a given column • Idea is that the group is often accessed together • Generally, new columns can be added to a row at run time, but new families might require going offline

  7. Cassandra: columns and rows • Basic unit of data • A column is a name-value pair, the value is atomic • The name is a key • Each pair has a timestamp • Used to manage update conflicts and old data • A row • Is a collection of columns associated with a row key • This is a larger grained key – for a row, not a column • A collection of similar rows is a column family

  8. Cassandra: standard and super columns, and keyspaces • If the columns in a family are simple, it is a standard column family • The rows in a column family do not have to have the same structure • You can add columns to rows without having to do it to other rows in the family • A super column is a pair consisting of a name and a value, where the value is another map of columns • Standard and super column families are kept in keyspaces, essentially, this is a database

  9. Cassandra: updates and reads • Updates • Commit log is written to • Update goes to in-memory store called memtable • This means that it has succeeded • Writes batched in memory and written to structures called SSTable • Variable consistency • Setting 1 is default for read, we get the first replica even if it is stale • Subsequent reads will get the newest and this is called a read repair • Good for high read throughput

  10. Cassandra: writes • Level 1 means • Writes to a commit log and confirms to user • Some writes might be lost if they are not propagated to other replicas • Quarorum consistency • For a read, means that majority respond to a read • And the one with the newest timestamp is returned • Nodes without the most recent version must do a read repair • For a write has to be propagated to a majority of nodes before it is successful and client notified

  11. Cassandra writes, continued • The consistency level All • All nodes must respond to a read or write • This is very sensitive to nodes being down • Notes • A single application can use varying levels of consistency • Uses a distributed cluster model • No node in a cluster is a master

  12. Cassandra: transactions • Transactions • Cannot perform a system of reads and writes and then decide whether to abort • But there are apparently second party libraries that can be used to create true atomic transactions • Writes are atomic at the row level • So a column insertion or update is a single write that succeeds or fails • There are transaction libraries that can be used to coordinate reads and writes

  13. Cassandra: query language • First, set your keyspace • Query language • Basic Get, Set, Delete operations • Create a column family • Set column value • Get a column value or values • Delete column family • Delete column • There are SQL-like commands • SQL like set queries • We can create indices on both row keys and column keys

  14. Applications of Cassandra • Content management systems • Blogging systems

  15. Installing Cassandra • Go to: http://cassandra.apache.org/download/ • Download and un-compress • Look at: http://wiki.apache.org/cassandra/GettingStarted • Go to the cassandra folder • Run bin/cassandra –f • On my mac, I needed to use sudo • I also had to create the cassandra folders listed in the GettingStarted instructions • Try running bin/cassandra-cli (command line interface)

  16. Or to get it with a GUI • Go to: http://blog.shelan.org/2012/06/cassandra-gui-20-making-things-little.html • Run wso2server.sh (or bat) • Go to https://localhost:9443 • Login into https://your-ip-address:9443/services (NOT localhost)

  17. Another choice • Go to: http://www.datastax.com/resources/articles/getting-started-with-apache-cassandra • Install • Run it • Go to: http://localhost:8888/opscenter/index.html • To explore example db: http://localhost:8888/opscenter/online_help/docs/explorer/index.html

  18. Note on windows 7 • You might have to set your JAVA_HOME variable • Usually c:\Progra~1\Java\jdk1.7.0 (or similar)

  19. PostgreSQL: install • Go to: http://bitnami.org/stacks • Install WAPP (windows) or MAPP (mac) • Startup web server • Startup postgresql • Go to: http://127.0.0.1/phppgadmin/

More Related