Scaleable Structured Datastorage for Web 2.0 Michael Armbrust, David Patterson October, 2007
RAD Lab 5-year Mission • Today’s Internet systems complex, fragile, manually managed, rapidly evolving • To scale Ebay, must build Ebay-sized company • “Moon shot” mission statement: Enable a single person to Develop, Assess, Deploy, and Operate the next-generation IT service • “The Fortune 1 Million” by enabling rapid innovation • Create core technology to enable vision via synergy across systems, networking, and Statisical Machine Learning • Making datacenter easier to manage enables vision of single person to analyze, deploy and operate a scalable IT service
If Datacenter is the computer… • What is the programming language? • What are the libraries? • How do trace/monitor programs? • What is the simulator? • What is Computer Aided Design? • What is the Operating System? • What is the Database System?
Storage Status Quo • Current status of data storage for Web 2.0 apps • Large relational databases running on expensive hardware • Manual horizontal and vertical partitioning of data • Problem: Requires redesign at each scaling milestone • Goal: Scaleable structured data storage for Web 2.0
Web 2.0 App Characteristics • Need to scale to YouTube or MySpace sizes • Require geographic replication • Short transactions • No ad-hoc queries • Willing to trade relaxed consistency for scalability and availability • Photos, not financials
Relaxed Consistency • Some things can be updated lazily • Eventual consistency is often acceptable • However users should see their own writes immediately • Need to provide simple choices to developers
Our Idea • Large scale distributed database underneath • Runs on 1000+ of shared nothing commodity servers • ActiveRecord-like layer in Ruby on Rails vs. SQL • Provides simple relationships and consistency guarantees between models • has_many • belongs_to • searchable_by (for full-text search) • Pre-compute joins for quick reads
Related Work (we know of) • G. DeCandia, D. Hastorun, et al. Dynamo: Amazon’s highly available key-value store. In SOSP. 2007.  M. Stonebraker and U. Cetintemel. one size fits all: an idea whose time has come and gone. pp. 211. 2005. • M. Stonebraker, S. R. Madden, et al. The end of an architectural era (its time for a complete rewrite). In VLDB. Vienna, Austria, 2007. • D. J. Abadi, A. Marcus, S. R. Madden, and K. Hollenbach. Scalable semantic web data management using vertical partitioning. In VLDB, Vienna, Austria, 2007. • F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. In OSDIﾕ06: Seventh Symposium on Operating System Design and Implementation, November 2006.