CS 440 Database Management Systems

CS 440 Database Management Systems Key/Value Stores

Key/ Value Store • Stores and retrieve data in form of key/ value pairs. • Person: (Key, Value) • Unique keys (generally) • Does not define any schema (schema-less) • We build schema on top of it. • Our program must manage the type and semantic of keys and values. • Person: Key= SSN, Value=(Name, Address)

Key/ Value Stores • No (or little) SQL support • SQL belongs to higher levels • Storage engine • Replacing the storage engine of an RDBMS • Storage engine MySQL, InnoDB

Why key/ value stores? • No need for complex schema or queries • Advantage when the information needs are simple • Lookup user account by user-id. • None of the RDBMS overheads, faster • We talk more about these overheads later. • Real-life applications • Google Account was using Berkeley DB until recently. • Amazon customer preferences and shopping carts.

Why key/ value stores? • Give a good idea what is going on inside a database management system. • They have their own limitations • We may need more structure • We may need high quality data • We may need complex query models • …

Implementations • Local key/ value store • Berkeley DB (BDB) • Distributed key/ value store • Amazon Dynamo • We’ll talk more about them later.

Berkeley DB • A key/ value store library • Used through API • Linked to your program • Supports various access paths • Open source

A Bit of History • Started as a hashing library in 1991 • Released with BSD 4.4 in 1992 • Hash table and B-tree • Seltzer and Bostic started Sleepycat software in 1996 • Open source with a dual license • Oracle acquired Sleepycat in 2006 • Kept the dual license

Berkeley DB Product Family • The original library written in C • API in various languages Java, C, C++ • Berkeley DB Java edition • Pure Java implementation of the library • Java API • Berkeley DB XML • Based on the original library • Persistent API • More complex query model

Berkeley DB Product Family

Key and value • Un-interpreted byte strings • Whatever you like them to be • Can store multiple types of objects in the same table • Employee (name, birth-date, position) • Student (name, birth-date, GPA) • Different tables in RDBMS, same table in BDB

Keys • You can define their sort orders • Sequential access • Duplicate keys! • Possible but discouraged • You will lose many functionalities

Environment: database in RDBMS • A directory that contains related databases ( tables) import com.sleepycat.db.* EnvironmentConfigenvConfig_ = new EnvironmentConfig(); envConfig_.setAllowCreate(true); envConfig_.setInitializeCache(true); envConfig_.setTransactional(false); envConfig_.setInitializeLocking(false); envConfig_.setPrivate(true); envConfig_.setCacheSize(1024 * 1024); File envHome_ = new File(“/home/schoolDB/”); Environment env_ = new Environment(envHome_, envConfig_);

Database, Table in RDBMS • A collection of tuples (key/ value pairs) • A database is referred by a database handle • All method calls use this handle • A file may contain one or more databases

Opening/ Creating a Database //databasesettings DatabaseConfigdbConfig_ = new DatabaseConfig(); //primary access path dbConfig_.setType(DatabaseType.HASH); dbConfig_.setCacheSize(4 * 1024 * 1024); // databasename: student db= env_.openDatabase(null, “student.db“, “student“, dbConfig_);

Storing Tuples public class Student implements Cloneable { private String name; … public String getName(){ return name; } public void setName( String name){ this.name = name; } …. }

Values to Byte Strings • Read from / write to a byte stream public class StudentTupleBindingextends TupleBinding{ public void objectToEntry(Object o, TupleOutput out) { Student std = (Student)o; out.writeString(std.getName()); …} public object entryToObject(TupleInput in) { Student std = new Student(); std.setName(in.readString()); …}

Inserting Tuples DatabaseEntrykey = new DatabaseEntry(); DatabaseEntrydata = new DatabaseEntry(); intkeyvalue = 1; // Convert the key to a byte string IntegerBinding.intToEntry(keyvalue, key); StudentTupleBindingbinding = new StudentTupleBinding(); binding.objectToEntry(entry, data); db.put(null, key, data);

Retrieving Tuples int start = 1; DatabaseEntrykey = new DatabaseEntry(); IntegerBinding.intToEntry(start, key); DatabaseEntrydata = new DatabaseEntry(); intnext = start; //duplicate keys! while(db.get(null, key, data, null) ==perationStatus.SUCCESS){ //Convert from byte string to object Student std= (Student) binding.entryToObject(data); …. }

Access Paths • B-tree • Fast access • Hash table • Fast access for read only data • Heap • Efficient use of disk space • …

Cursors • Represent positions in a database • Iterative (forward and backward ) scan //Configurationinfo Cursor cursor = db.openCursor(null, null); DatabaseEntrykey = new DatabaseEntry(); DatabaseEntrydata = new DatabaseEntry(); while(cursor.getNext(key, data, null) == OperationStatus.SUCCESS){ // do something } cursor.close();

Secondary Index • Stored in another BDB database • No duplicate (primary) key! class sKeyCreatorimplements SecondaryKeyCreator{ public booleancreateSecondaryKey ( SecondaryDatabasesecDb, DatabaseEntrykeyEntry, DatabaseEntrydataEntry, DatabaseEntryresultEntry){ //set resultEntry to the secondary key value }

Secondary Indexes //new database SecondaryConfigsIndexConfig= new SecondaryConfig(); sIndexConfig.setType(DatabaseType.HASH); sIndexConfig.setTransactional(false); // Duplicates are frequently required for secondary databases. sIndexConfig.setSortedDuplicates(true); sKeyCreatorkeyCreator = new sKeyCreator(); sIndexConfig.setKeyCreator(skeyCreator); // Perform the actual open SecondaryDatabasesIndex= env_.openSecondaryDatabase (null, ”senindex.db", null, db, sIndexConfig);

Closing Database & Environment • Releasing resources sIndex.close(); db.close(); env_.close();

CS 440 Database Management Systems