240 likes | 380 Views
Key-value stores are a type of database management system that stores and retrieves data as key/value pairs. Unlike relational databases, they are schema-less, allowing for flexibility in data types and structures. This simplicity offers advantages in managing user accounts and preferences with minimal overhead. Key implementations include Berkeley DB and Amazon Dynamo, which serve various applications. Despite their usefulness, they have limitations concerning structured data and complex queries. This guide explores the functionalities, benefits, and specific use cases for key-value stores.
E N D
CS 440 Database Management Systems Key/Value Stores
Key/ Value Store • Stores and retrieve data in form of key/ value pairs. • Person: (Key, Value) • Unique keys (generally) • Does not define any schema (schema-less) • We build schema on top of it. • Our program must manage the type and semantic of keys and values. • Person: Key= SSN, Value=(Name, Address)
Key/ Value Stores • No (or little) SQL support • SQL belongs to higher levels • Storage engine • Replacing the storage engine of an RDBMS • Storage engine MySQL, InnoDB
Why key/ value stores? • No need for complex schema or queries • Advantage when the information needs are simple • Lookup user account by user-id. • None of the RDBMS overheads, faster • We talk more about these overheads later. • Real-life applications • Google Account was using Berkeley DB until recently. • Amazon customer preferences and shopping carts.
Why key/ value stores? • Give a good idea what is going on inside a database management system. • They have their own limitations • We may need more structure • We may need high quality data • We may need complex query models • …
Implementations • Local key/ value store • Berkeley DB (BDB) • Distributed key/ value store • Amazon Dynamo • We’ll talk more about them later.
Berkeley DB • A key/ value store library • Used through API • Linked to your program • Supports various access paths • Open source
A Bit of History • Started as a hashing library in 1991 • Released with BSD 4.4 in 1992 • Hash table and B-tree • Seltzer and Bostic started Sleepycat software in 1996 • Open source with a dual license • Oracle acquired Sleepycat in 2006 • Kept the dual license
Berkeley DB Product Family • The original library written in C • API in various languages Java, C, C++ • Berkeley DB Java edition • Pure Java implementation of the library • Java API • Berkeley DB XML • Based on the original library • Persistent API • More complex query model
Key and value • Un-interpreted byte strings • Whatever you like them to be • Can store multiple types of objects in the same table • Employee (name, birth-date, position) • Student (name, birth-date, GPA) • Different tables in RDBMS, same table in BDB
Keys • You can define their sort orders • Sequential access • Duplicate keys! • Possible but discouraged • You will lose many functionalities
Environment: database in RDBMS • A directory that contains related databases ( tables) import com.sleepycat.db.* EnvironmentConfigenvConfig_ = new EnvironmentConfig(); envConfig_.setAllowCreate(true); envConfig_.setInitializeCache(true); envConfig_.setTransactional(false); envConfig_.setInitializeLocking(false); envConfig_.setPrivate(true); envConfig_.setCacheSize(1024 * 1024); File envHome_ = new File(“/home/schoolDB/”); Environment env_ = new Environment(envHome_, envConfig_);
Database, Table in RDBMS • A collection of tuples (key/ value pairs) • A database is referred by a database handle • All method calls use this handle • A file may contain one or more databases
Opening/ Creating a Database //databasesettings DatabaseConfigdbConfig_ = new DatabaseConfig(); //primary access path dbConfig_.setType(DatabaseType.HASH); dbConfig_.setCacheSize(4 * 1024 * 1024); // databasename: student db= env_.openDatabase(null, “student.db“, “student“, dbConfig_);
Storing Tuples public class Student implements Cloneable { private String name; … public String getName(){ return name; } public void setName( String name){ this.name = name; } …. }
Values to Byte Strings • Read from / write to a byte stream public class StudentTupleBindingextends TupleBinding{ public void objectToEntry(Object o, TupleOutput out) { Student std = (Student)o; out.writeString(std.getName()); …} public object entryToObject(TupleInput in) { Student std = new Student(); std.setName(in.readString()); …}
Inserting Tuples DatabaseEntrykey = new DatabaseEntry(); DatabaseEntrydata = new DatabaseEntry(); intkeyvalue = 1; // Convert the key to a byte string IntegerBinding.intToEntry(keyvalue, key); StudentTupleBindingbinding = new StudentTupleBinding(); binding.objectToEntry(entry, data); db.put(null, key, data);
Retrieving Tuples int start = 1; DatabaseEntrykey = new DatabaseEntry(); IntegerBinding.intToEntry(start, key); DatabaseEntrydata = new DatabaseEntry(); intnext = start; //duplicate keys! while(db.get(null, key, data, null) ==perationStatus.SUCCESS){ //Convert from byte string to object Student std= (Student) binding.entryToObject(data); …. }
Access Paths • B-tree • Fast access • Hash table • Fast access for read only data • Heap • Efficient use of disk space • …
Cursors • Represent positions in a database • Iterative (forward and backward ) scan //Configurationinfo Cursor cursor = db.openCursor(null, null); DatabaseEntrykey = new DatabaseEntry(); DatabaseEntrydata = new DatabaseEntry(); while(cursor.getNext(key, data, null) == OperationStatus.SUCCESS){ // do something } cursor.close();
Secondary Index • Stored in another BDB database • No duplicate (primary) key! class sKeyCreatorimplements SecondaryKeyCreator{ public booleancreateSecondaryKey ( SecondaryDatabasesecDb, DatabaseEntrykeyEntry, DatabaseEntrydataEntry, DatabaseEntryresultEntry){ //set resultEntry to the secondary key value }
Secondary Indexes //new database SecondaryConfigsIndexConfig= new SecondaryConfig(); sIndexConfig.setType(DatabaseType.HASH); sIndexConfig.setTransactional(false); // Duplicates are frequently required for secondary databases. sIndexConfig.setSortedDuplicates(true); sKeyCreatorkeyCreator = new sKeyCreator(); sIndexConfig.setKeyCreator(skeyCreator); // Perform the actual open SecondaryDatabasesIndex= env_.openSecondaryDatabase (null, ”senindex.db", null, db, sIndexConfig);
Closing Database & Environment • Releasing resources sIndex.close(); db.close(); env_.close();