Sadna Project Distributed databases: Access Control Security vs. Performance

Sadna ProjectDistributed databases: Access Control Security vs. Performance Dr. Alexandra Shulman-PelegStorage Research, Cloud Platforms Dept., IBM Haifa Research Lab

Project Overview • Analyze and compare the security holes in the access control offered by two popular distributed databases developed for Cloud Computing application. • The two table stores manage access rights (permissions) at different resolutions. • The goal of the project is to analyze the security holes in the access rights configuration, improve the security of one of them and measure the performance penalty.

Agenda • What is Cloud Computing? • Introduction and Motivation • Cloud Storage challenges • New data consistency models: • eventual consistency • Access Control basics (ACLs) and access permissions • Distributed databases overview • Google’s BigTable example • Cassandra • Accumulo • Project presentation and specification

What is Cloud Computing? A user experience and a business model • Cloud computing is an emerging style of IT delivery in which applications, data, and IT resources are rapidly provisioned and provided as standardized offerings to users over the web in a flexible pricing model. An infrastructure management and services delivery methodology • Cloud computing is a way of managing large numbers of highly virtualized resources such that, from a management perspective, they resemble a single large resource. This can then be used to deliver services with elastic scaling.

vs. vs. Cloud Computing: What’s Driving it? vs. • Cost Reduction: • Efficiency: virtual resources for hardware utilization (memory, disk, machines) • Sharing of hardware/maintenance: multitenancy for cost reduction • Automation: automate mundane tasks • Commodity hardware for most public clouds • Cloud: Highly virtualized with many users sharing the same hardware • Payment model: Pay per use to reduce bar of adoption • Pay up front for all required capital • Finance terms (deferred financial cost) • Pay per use (for public cloud). • Cloud: Pay per use with immediate provisioning • Technology Maturity CycleFocus higher in the solution stack • Cloud: Companies who are moving to the cloud are focusing on their business, not technology.

DC A DC B Put Obj X Get Obj X Obj X Obj X Moshe David Consistency Models

CAP Theorem • The CAP theorem, also known as Brewer's theorem, states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees: • Consistency(all nodes see the same data at the same time) • Availability (node failures do not prevent survivors from continuing to operate) • Partition Tolerance (the system continues to operate despite arbitrary message loss) • According to the theorem, a distributed system can satisfy any two of these guarantees at the same time, but not all three. • The theorem began as a conjecture made by University of California, Berkeley computer scientist Eric Brewer at the 2000 Symposium on Principles of Distributed Computing (PODC). In 2002, Seth Gilbert and Nancy Lynch of MIT published a formal proof of Brewer's conjecture, establishing it as a theorem.

CAP Theorem Basic Graphics Borrowed from Jeff Chase’s (Duke) presentation on distributed consensus 2PC; 3PC; Paxos; State Machine Replication with quorum/ CATOCS with primary partition (e.g., ISIS) CVS, SVN, DNS, Lotus Notes, IceCube, Bayou, eBay, Amazon Dynamo, and many other…

Consistency Models: Client and Server • There are two ways of looking at consistency: • developer/client point of view: how they observe data updates • server side: how updates flow through the system and what guarantees systems can give with respect to updates. Client Side Consistency: • Strong consistency:after the update completes, any subsequent access will return the updated value. • Weak consistency: The system does not guarantee that subsequent accesses will return the updated value. The period between the update and the moment when it is guaranteed that any observer will always see the updated value is dubbed the inconsistency window. • Eventual consistency:a specific form of weak consistency; the storage system guarantees that if no new updates are made to the object, eventually all accesses will return the last updated value. The most popular system that implements eventual consistency is DNS (Domain Name System). Updates to a name are distributed according to a configured pattern and in combination with time-controlled caches; eventually, all clients will see the update.

Server Side Consistency Definitions: • N = the number of nodes that store replicas of the data • W = the number of replicas that need to acknowledge the receipt of the update before the update completes • R = the number of replicas that are contacted during a read operation • If W+R > N, then the write set and the read set always overlap and one can guarantee strong consistency. • In the primary-backup RDBMS scenario, which implements synchronous replication, N=2, W=2, and R=1. No matter from which replica the client reads, it will always get a consistent answer. • In asynchronous replication with reading from the backup enabled, N=2, W=1, and R=1. In this case R+W=N, and consistency cannot be guaranteed.

Server Side Consistency – Cont. • In distributed-storage systems that need to provide high performance and high availability, the number of replicas is in general higher than two. • Systems that focus solely on fault tolerance often use N=3, W=2, R=2 configurations. • Systems that need to serve very high read loads often replicate their data beyond what is required for fault tolerance; N can be tens or even hundreds of nodes, with R=1 such that a single read will return a result. • Systems that are concerned with consistency are set to W=N for updates, which may decrease the probability of the write succeeding (when the system cannot write to W nodes because of failures, the write operation has to fail, marking the unavailability of the system). • A common configuration for systems that are concerned about fault tolerance but not consistency is to run with W=1 to get minimal durability of the update and then rely on a lazy (epidemic) technique to update the other replicas.

Server Side Consistency – Cont. • Examples: What do we optimize for in R=1,N=W? What do we optimize for In W=1,R=N? • When optimizing for writes, durability is not guaranteed in the presence of failures, and if W < (N+1)/2, there is the possibility of conflicting writes when the write sets do not overlap. • Weak/eventual consistency arises when W+R <= N, meaning that there is a possibility that the read and write set will not overlap. • The period until all replicas have been updated is the inconsistency window discussed before. If W+R <= N, then the system is vulnerable to reading from nodes that have not yet received the updates.

Access Control Lists (ACLs)

Access Control Components • Authentication:“Who is this user?” Systems aiming to provide decentralized access control cannot rely on local identification and must employ a decentralized or indirect authentication. • Authorization: ”Is user X allowed to access resources R?” Looking up in the access control matrix, which can be implemented as ACLs or capabilities. The Lampson's access control matrix(1971).

Access Control Lists • Columns of access control matrix file1file2file3 Alice rx r rwo Bob rwxo r Charlie rx rwo w ACLs: • file1: { (Alice, rx) (Bob, rwxo) (Charlie, rx) } • file2: { (Alice, r) (Bob, r) (Charlie, rwo) } • file3: { (Alice, rwo) (Charlie, w) }

Access Control Lists • An ACL is associated with every resource, that is, every object in the file system, and lists all users authorized to access the object along with their access rights. • The identity of a user must be known before access rights can be looked up in the ACL. • Thus, authorization depends on prior authentication.

Unix Layout Reminder Simplified structure of the UNIX file system (from [Farmer and Venema 2004]).

File Mode Permission Bits File types: - is ordinary, d is directory, l is link Permissions: r = read, w = write, x = execute, s = setuid.

Distributed DatabasesGoogle’s BigTable Cassandra Accumulo

Google’s BigTable Example • Use URLs as row keys • Various aspects of web page as column names • Store contents of web pages in the contents: column under the timestamps when they were fetched.The anchor column family contains the text of any anchors that reference the page. • Column keys are grouped into sets called column families, which form the basic unit of access control. • All data stored in a column family is usually of the same type (and can be compressed together).

Why not just use commercial DB? • Scale is too large for most commercial databases • Even if it weren’t, cost would be very high • Building internally means system can be applied across many projects for low incremental cost • Low-level storage optimizations help performance significantly • Much harder to do when running on top of a database layer

Comparison of BigTable to databases • Similarity: • Implementation strategies similar to databases • Differences: • Scalability and high performance • Does not support full relational data model – uses a simple data model that supports dynamic control over data layout and format • Client can control the data locality • Schema parameters let clients control whether to serve data from memory or from disk • Different interface

Apache Cassandra • The Apache Cassandra Project develops a highly scalable second-generation distributed database, bringing together: • Amazon’s Dynamo fully distributed design • Goggle’s Bigtable's ColumnFamily-based data model.Features: Decentralized, Elastic, Fault-tolerant, Tunable consistency. http://cassandra.apache.org/ • Cassandra was developed at Facebook to power their Inbox Search feature by Avinash Lakshman (one of the authors of Amazon's Dynamo) and Prashant Malik. • It was released as an open source project on Google code in July 2008.[3] In March 2009, it became an Apache Incubator project.[7] On February 17, 2010 it graduated to a top-level project.[1]

Accumulo • Apache Accumulo is a sorted, distributed key/value store based on Google's BigTable design. • It is built on top of Apache Hadoop,Zookeeper, and Thrift. • It features a few novel improvements on the BigTable design in the form of cell-level access labels and a server-side programming mechanism that can modify key/value pairs at various points in the data management process.

References • Eventual Consistency: • http://www.allthingsdistributed.com/2008/12/eventually_consistent.html • Dynamo: Amazon’s Highly Available Key-value StoreGiuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels • http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html • Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber labs.google.com/papers/bigtable-osdi06.pdf

Project Presentation

Overview and Goals • This project focuses on two popular table stores, Cassandra and Accumulo. While the access control of Cassandra is at the level of column family, Accumulo has a higher level of security and allows defining cell-level access control. • The main goals of this project are to: • Add support for cell-level ACLs (Access Control Lists) to Cassandra • Compare the resulting system to Accumulo, evaluating the performance and measuring the security holes. • The project will attempt to improve the security of both systems by increasing the consistency, while measuring the performance penalty.

Stage 1: System Set-up • Install two most popular table stores Apache Cassandra [1] and Accumulo [2]. • Install the YCSB++ testing framework [3] for benchmarking and performance measurements. • Accumulo has built-in ACL at the cell level. This project will implement ACLs support in Cassandra by storing them as additional attributes.

Stage 2: ACLs performance comparison • Compare the performance of Cassandra with the added implementation of ACLs vs Accumulo (see throughput measurements with YCSB++ in [3]). Accumulo measurements example (figure take from [3]): Insert throughput (measured as the number of rows inserted per second)decreases with increasing number of ACL clauses when the CPU is a limiting resource.

DC A DC B Put Obj X Get Obj X Obj X Obj X Moshe David Stage 3: Analysis of the security holes • Measure the security holes that may exist due to the inconsistency of the ACLs configuration. This may occur, for example, when the user changes the permissions to deny access to a certain file, but this restriction is not propagated to all the nodes and other users can access it during the inconsistency window. YCSB++ allows to measure this inconsistency as a read-after-write latency.

Stage 4: Improving the security through stronger consistency • Improve the security of ACLs in Cassandra by providing a solution with higher consistency guarantees. • Measure the performance penalty (e.g. as a decrease in throughput).

Project References: • Cassandra: http://cassandra.apache.org/ • Accumulo: http://incubator.apache.org/accumulo/ • YCSB++: http://www.pdl.cmu.edu/ycsb++/index.shtmlYCSB++: Benchmarking and Performance Debugging Advanced Features in Scalable Table Stores. Swapnil Patil, Milo Polte, Kai Ren, Wittawat Tantisiriroj, Lin Xiao, Julio Lopez, Garth Gibson, Adam Fuchs, Billie Rinaldi. Proc. of the 2nd ACM Symposium on Cloud Computing (SOCC '11), October 27–28, 2011, Cascais, Portugal. Supersedes Carnegie Mellon University Parallel Data Laboratory Technical Report CMU-PDL-11-111, August 2011.http://www.pdl.cmu.edu/PDL-FTP/Storage/socc2011.pdf

Thank You

NFSv4 • NFSv4 ACL support is similar to the Windows NT model. • The NFSv4 ACL attribute is an array of access control entries (ACEs), with the following fields: • type: ALLOW, DENY, AUDIT, ALARM • who: who does the entry pertain to • flags: Inheritance, etc. • masks: Which permissions are covered by this ACE • NFSv4 uses character strings instead of integers to represent user and group identifiers. Uniqueness can be guaranteed by using a format of user@domain or group@domain and leveraging the global domain name registry. • File-access rights as specified in ACLs are checked on the server, not the client. Thus, while the server administrator still exports file systems rather than individual files, object access granularity is at the file level.

CDMI -Cloud Data Management Interface • HTTP/1.1 200 OK • Content-Type: application/cdmi-object • X-CDMI-Specification-Version: 1.0 • { • "metadata" : { "cdmi_acl" : [ • { • "acetype" : "0x00", • "identifier" : "EVERYONE@", • "aceflags" : "0x00", • "acemask" : "0x00020089", • } • ] • },

Sadna Project Distributed databases: Access Control Security vs. Performance