1 / 25

DROSS Distributed & Resilient Open Source Software

DROSS Distributed & Resilient Open Source Software. Andrew Hardie http://ashardie.com ECPRD WGICT 17-21 November 2010 Chamber of Deputies, Bucharest. Topics. Distributed, not virtualized or ‘cloud’ DRBD Gluster Heartbeat Nginx Trends: NoSQL Map / Reduce Cassandra, Hadoop & family

elyse
Download Presentation

DROSS Distributed & Resilient Open Source Software

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DROSSDistributed & Resilient Open Source Software Andrew Hardie http://ashardie.com ECPRD WGICT 17-21 November 2010 Chamber of Deputies, Bucharest ECPRD - WGICT - Bucharest

  2. Topics • Distributed, not virtualized or ‘cloud’ • DRBD • Gluster • Heartbeat • Nginx • Trends: • NoSQL • Map / Reduce • Cassandra, Hadoop & family • Other stuff ‘out there’ • Predictions… ECPRD - WGICT - Bucharest

  3. DRBD • Block-level disk replicator (effectively, net RAID-1) ECPRD - WGICT - Bucharest

  4. DRBD – Good/bad points • Good for HA clusters (e,g, LAMP servers) • Ideal for block-level apps, e.g. MySQL • Sync/Async operation • Auto recovery from disk, net or node failure • In Linux kernels from 2.6.33 (Ubuntu 10.10 is 2.6.35) • Supports Infiniband, LVM, XEN, Dual primary config • Hard to extend beyond two systems, three is maximum • Remote offsite really needs DRBD Proxy (commercial) • Requires dedicated disk/partition • Moderately difficult to configure • Documentation could be better ECPRD - WGICT - Bucharest

  5. Gluster • Filesystem-level replicator • More like NAS than RAID • Claims to scale to petabytes • Nodes can be servers, clients or both • On the fly reconfig of disks & nodes • Scripting interface • ‘Cloud compliant’ (isn’t everything?) ECPRD - WGICT - Bucharest

  6. Gluster – Use case - DublinReal-time mirroring of Digital Audio ECPRD - WGICT - Bucharest

  7. Gluster – Good/bad points • Moving to “turnkey system” (black box) • N-way replication easy • Easier than DRBD to configure • Dedicated partitions or disks not required • Supports Infiniband • Background self-healing (pull rather than push) • Aggregate and/or replicate volumes • POSIX support • Native support for NFS, CIFS, HTTP & FTP • No specific features for slow link replication • Similar documentation vs revenue earning tension ECPRD - WGICT - Bucharest

  8. Heartbeat • HA Cluster infrastructure (“cluster glue”) • Needs Cluster Resource manager (CRM), e.g. Pacemaker, to be useful • Part of the Linux-HA project • Provides: • hot-swap of synthetic IP address between nodes (Synthetic IP is in addition to node’s own IPs) • Node failure/restore detection • Start/stop of services to be managed, via init scripts ECPRD - WGICT - Bucharest

  9. Heartbeat/DRBD – use caseHA LAMP Server pair ECPRD - WGICT - Bucharest

  10. Heartbeat – good/bad points • Lots of resource agents available • e.g. Apache, Squid, Sphinx search, VMWare, DB2, WebSphere, Oracle, JBOSS, Tomcat, Postfix, Informix, SAP, iSCSI, DRBD, … • Beyond simple 2-way hot-swap, config can get very complicated • Good for stateless (e.g. HTTP); not so good for file shares (e.g. Samba) • Documentation out of date in some areas, e.g. Ububtu ‘upstart’ scripts (boot-time startup of services to be managed by Heartbeat has to be disabled) ECPRD - WGICT - Bucharest

  11. NGINX • Fast, simple Russian HTTP server • Reverse proxy server • Mail proxy server • Fast static content serving • Very low memory footprint • Load balancing and fault tolerance • Name and IP based virtual servers • Embedded Perl • FLV streaming • Non-threaded, event-driven architecture • Modular architecture • Can front-end Apache (instead of mod_proxy) ECPRD - WGICT - Bucharest

  12. Trends – NoSQL, etc… • NoSQL • Or, is it really NoACID (atomicity, consistency, isolation, durability)? • It’s really the ACID that’s hard to scale, esp. in the very large, very active data stores (e.g. SN) • Some NoSQLs now have SQL for query only • Ways of solving ACID scalability being discussed • The problems: • Huge numbers of simultaneous updates • Large JOINs across very large tables (= big SQL query) • Lots of updates & searches on small data elements in vast data sets • The alternative: • Key/value stores • De-normalized data ECPRD - WGICT - Bucharest

  13. Consequences of De-normalizing • Order(s) of magnitude increase in storage requirements • Difficulty of updating numerous “Key equivalents” in many places – can’t be done synchronously • Breaking relationship links allows parallel processing: • helps the bottleneck of storage read speed (storage capacity is growing much faster than transfer rates) • No JOINs or transactions ECPRD - WGICT - Bucharest

  14. Name/Value Models • Just name/value pairs, e.g. memcachedb, Dynamo • Name/value pairs plus associated data, e.g. CouchDB, MongoDB – think document stores with metadata • Name/value pairs with nesting, e.g. Cassandra ECPRD - WGICT - Bucharest

  15. Cassandra • Distributed, fault-tolerant database, based on ideas in Dynamo (Amazon) & BigTable (Google) • Developed by FaceBook, open-sourced in 2008 • Now Apache project • Key/value pairs, in column-oriented format • Standard column: name, value, timestamp • Super-column: name, map of columns, each with name, value, timestamp (think array of hashes) • Grouped by Column family, also either standard or super • Column family contains ‘rows’, roughly like a DB table • Column families then go in key-spaces ECPRD - WGICT - Bucharest

  16. Cassandra - NoACID • Cassandra, et al, e.g. Voldemort (LinkedIn), trade speed, distribution and availability for consistency and atomicity • No single point of failure • “Eventually consistent” model • Tunable levels of consistency • Atomicity only guaranteed within a column family • Accessed using Thrift (also developed by Facebook) • Used by: • Facebook • Digg • Twitter • Reddit ECPRD - WGICT - Bucharest

  17. NoSQL for Parliaments? • Much parliamentary material is naturally unstructured and suited to the name/value model (think XML) • Remember the old discussions about how to map such parliamentary material into relational databases? • Think of every MPs contribution (speech) in chamber or committee as a key/value pair, i.e. a column • Think of every PQ & answer as a super-column of name/value pairs for question, answer, holding, supplementary, pursuant, referral … • Hansard becomes a super-column family! ECPRD - WGICT - Bucharest

  18. Map / Reduce • Column (or record) oriented design & de-normalized data power the parallel “map reduce” model (think “sharding on speed”) ECPRD - WGICT - Bucharest

  19. Hadoop • Nothing to do with NoSQL • Hadoop is an infrastructure and now family of tools for managing distributed systems and immense datasets • How immense? Hundreds of GB and 10 node cluster is ‘entry-level’ in Hadoop terms • Developed by Yahoo for their cloud, now Apache project • Supports Map/Reduce by pre-dividing & distributing data • “Moves computation to the data instead of data to the computation” • HDFS file system particularly interesting – distributed, resilient (far more advanced than DRBD or Gluster), but not real time (more eventually consistent…) • Hive data warehouse front end – has SQL-like queries ECPRD - WGICT - Bucharest

  20. Who uses Hadoop? • Twitter • AOL • IBM • Last.fm • LinkedIn • E-Bay • Yahoo • 36,000 machines with > 100,000 cores running Hadoop • Largest cluster is only 4000 nodes • Largest known cluster is Facebook! • 2000 machines with 22,400 cores • 21Petabytes in a single HDFS store ECPRD - WGICT - Bucharest

  21. Hadoop for Parliaments? • Hadoop may seem overkill for parliaments now… • But, when you start your legacy collection digitization and digital preservation projects its model, for managing large datasets which essentially do not change & don’t need real-time commit, is very good fit! • Other interesting Hadoop projects: • Zookeeper (distributed apps co-ordination) • Hive (data warehouse infrastructure) • Pig (high-level data flow language) • Mahout (scalable machine learning library) • Scribe (for aggregating streaming log data) [not strictly Hadoop project, but can be integrated with it, using interesting work-around for the non-real time & NameNode single point of failure] ECPRD - WGICT - Bucharest

  22. Other things ‘out there’ • Drizzle • A database “optimized for Cloud infrastructure and Web applications” • “Design for massive concurrency on modern multi-cpu architecture” • But, doesn’t actually explain how to use it for these… • It’s SQL and ACID • Mostly seems to be a reaction against what’s happening at MySQL… • Has to be compiled from source – no distros available for it yet • CouchDB • Distributed, fault-tolerant, schema-free document-oriented database • RESTful JSON API (i.e. Web front end) • Incremental replication with bi-directional conflict detection • Written in Erlang (highly reliable language developed by Ericsson) • Supports ‘map/reduce’ like querying and indexing • Interesting model, different from most other offerings • Also now an Apache project • Still too immature for anything beyond experimentation ECPRD - WGICT - Bucharest

  23. Also ‘out there’ • Voldemort • Another distributed key/value storage system • Used at LinkedIn • Doesn’t seem to have much future • Cassandra is similar, better & more widely used • MonetDB • “database system for high-performance applications in data mining, OLAP, GIS, XML Query, text and multimedia retrieval “ • SQL and XQUERY front ends • Also hard to see where it’s going… • MongoDB • Tries to bridge the gap between RDBMS and map/reduce • JSON document storage (like CouchDB) • No JOINs, no transactions • Supports atomic transactions only on single documents • Interesting, but may ‘fall between two stools’ ECPRD - WGICT - Bucharest

  24. Predictions • Hadoop and Cassandra are the ones to watch • There will likely be some sort of re-convergence between NoSQL and query languages of some kind – can’t do everything with map/reduce (esp. not ad hoc queries) • SQL may be destined to become like COBOL – still around and running things but not something to use for new projects • Distributed storage models (with or without map/reduce) have good future • Datasets will only get bigger – compliance, audit, digital preservation, the shift to visuals, etc • Information management models (“strategy”) and access speed will remain key problems ECPRD - WGICT - Bucharest

  25. Questions “What’s it all about?”  http://ashardie.com ECPRD - WGICT - Bucharest

More Related