1 / 53

Etcetera!

Etcetera!. CMSC 491/691 Hadoop-Based Distributed Computing Spring 2014 Adam Shook. Agenda. Cassandra Redis Advanced HDFS Configuration Cluster Planning. Advanced HDFS Features. Highly Available NameNode. Highly Available NameNode feature eliminates SPOF

raven
Download Presentation

Etcetera!

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Etcetera! CMSC 491/691 Hadoop-Based Distributed Computing Spring 2014 Adam Shook

  2. Agenda • Cassandra • Redis • Advanced HDFS Configuration • Cluster Planning

  3. Advanced HDFS Features

  4. Highly Available NameNode • Highly Available NameNode feature eliminates SPOF • Requires two NameNodes and some extra configuration • Active/Passive or Active/Active • Clients only contact the active NameNode • DataNodes report in and heartbeat with both NameNodes • Active NameNode writes metadata to a quorum of JournalNodes • Standby NameNode reads the JournalNodes to stay in sync • There is no CheckPointNode(SecondaryNameNode) • The passive NameNode performs checkpoint operations

  5. HA NameNodeFailover • There are two failover scenarios • Graceful – Performed by an administrator for maintenance • Automated – Active NameNodefails • Failed NameNode must be fenced • Eliminates the 'split brain syndrome' • Two fencing methods are available • sshfence– Kill NameNodesdaemon • shell script – disables access to the NameNode, shuts down the network switch port, sends power off to the failed NameNode • There is no 'default' fencing method

  6. ZooKeeper Lock Created Lock Released Release lock Create Lock NFS or QJM ZKFC ZKFC Shared NN State Fence NN NN Active NN Active NN Standby Become Active I'm the Boss Data Node Data Node Data Node

  7. HDFS Federation • Useful for: • Isolation/multi-tenancy • Horizontal scalability of HDFS namespace • Performance • Allows for multiple independent NameNodesusing the same collection of DataNodes • DataNodesstore blocks from all NameNodepools

  8. Federated NameNodes • File-system namespace scalable beyond heap size • NameNode performance no longer a bottleneck • NameNode failure/degradation is isolated • Only data managed by the failed NameNode is unavailable • Each NameNode can be made Highly Available

  9. Hadoop Security • Hadoop's original design – web crawler and indexing • Not designed for processing of confidential data • Small number of trusted users • Access to cluster controlled by providing user accounts • Little / no control on what a user could do once logged in • HDFS permissions were added in the Hadoop 0.16 release • Similar to basic UNIX file permissions • HDFS permissions can be disabled via dfs.permissions • Basically for protection against user-induced accidents • Did not protect from attacks • Authentication is accomplished on the client side • Easily subverted via a simple configuration parameter

  10. Kerberos • Kerberos support introduced in the Hadoop 0.22.2 release • Developed at MIT / freely available • Not a Hadoop-specific feature • Not included in Hadoop releases • Works on the basis of 'tickets' • Allow communicating nodes to securely identify each other across unsecure networks • Primarily a client/server model implementing mutual authentication • The user and the server verify each other's identity

  11. How Kerberos Works Client forwards the username to KDC • KDC sends Client/TGS Session Key, encrypted with user's password • KDC issues a TGT, encrypted with TGS's key • Sends B and service ID to TGS • Authenticator encrypted w/A • TGS issues CTS ticket, encrypted with SS key • TGS issues CSS, encrypted w/A • New authenticator encrypted with F • Timestamp found in G+1 KDC - Key Distribution Center TGS – Ticket Granting Service TGT – Ticket Granting Ticket CTS – Client-to-Server Ticket CSS – Client Server Session Key

  12. Kerberos Services • Authentication Server • Authenticates client • Gives client enough information to authenticate with Service Server • Service Server • Authenticates client • Authenticates itself to client • Provides services to client

  13. Kerberos Limitations • Single point of failure • Must use multiple servers • Implement failback authentication mechanisms • Strict time requirements – 'tickets' are time stamped • Clocks on all host must be carefully synchronized • All authentication is controlled by the KDC • Compromise of this infrastructure will allow attackers to impersonate any user • Each network service requiring a different host name must have its own set of Kerberos keys • Complicates virtual hosting of clusters

  14. Apache Cassandra

  15. In a couple dozen words... • Apache Cassandra is an open source, distributed, decentralized, elastically scalable, highly available, fault-tolerant, tunably consistent, column-oriented database with a lot of adjectives

  16. Overview • Originally created by Facebook and opened sourced in 2008 • Based on Google Big Table & Amazon Dynamo • Massively Scalable • Easy to use • No relation to Hadoop • Specifically, data is not stored on HDFS

  17. Distributed and Decentralized • Distributed • Can run on multiple machines • Decentralized • No single point of failure • No master or slave issues by using a peer-to-peer architecture (gossip, specifically) • Can run across geographic datacenters

  18. Elastic Scalability • Scales horizontally • Adding nodes linearly increases performance • De creating and increasing nodecounts happen seamlessly

  19. Highly Available and Fault Tolerant • Multiple networked computers in a cluster • Facility for recognizing node failures • Forward failing over requests to another part of the system

  20. Tunable Consistency • Choice between strong and eventual consistency • Adjustable for reads and write operations separately • Conflicts are solved during reads

  21. Column-Oriented • Stored in spare multi-dimensional hash tables • Row can have multiple columns, and not necessarily the same amount of columns for each row • Each row has a unique key used for partitioning

  22. Query with CQL • Familiar SQL-like syntax that maps to Cassandra's storage engine and simplifies data modeling SELECT * FROM songs WHERE id = 'a3e648f...'; INSERT INTO songs (id, title, artist, album, tags) VALUES ( 'a3e648f...', 'La Grange', 'ZZ Top', 'Tres Hombres', {'cool', 'hot'}); CREATE TABLE songs ( id uuid PRIMARY KEY, title text, album text, artist text, data blob, tags set <text> );

  23. When should I use this? • Key features to compliment a Hadoop system: • Geographical distribution • Large deployments of structured data

  24. Redis

  25. Introduction • ANSI C open-source advanced key-value store • Commonly referred to as a data structure server, since keys can contain strings, hashes, lists, sets, and sorted sets • Operations are atomic and there are a bunch of them • All data is stored in-memory, and can be persisted using snapshots or transaction logs • Trivial master-slave replication

  26. Clients • Redis itself is ANSI C, but the protocol is open-source and developers have created support in many languages C C# C++ Clojure Common Lisp D Dart Emacs lisp Erland Fancy GNU Prolog Go Haskell haXe Java Lua Node.js Objective-C Perl PHP Pure Data Python Ruby Rust Scala Scheme Smalltalk Tcl

  27. Data Types • Rediskeys can be anything from a string to a byte array of a JPEG • Keys have associated data types, and we should talk about them • Strings • Lists • Hashes • Sets • Sorted Sets

  28. Strings! • The simplest type • Supports a number of operations, including sets, gets, and incremental operations for values > SET mkey "my binary safe value" OK > GET mkey "my binary safe value"

  29. Lists! • Linked Lists, actually, i.e. O(1) for inserts into the head or tail of the list • Accessing an element by index... O(N) > RPUSH messages "Hello how are you?: (integer) 1 > RPUSH messages "Fine thanks. I'm having fun with Redis" (integer) 2 > RPUSH messages "I should look into this NOSQL thing ASAP" (integer) 3 > LRANGE messages 0 2 1) "Hello how are you?" 2) "Fine thanks. I'm having fun with Redis" 3) "I should look into this NOSQL thing ASAP"

  30. Hashes! • Maps between string fields and string values > HMSET user:1000 username antirez password P1pp0 age 34 OK > HGETALL user:1000 1) "username" 2) "antirez" 3) "password" 4) "P1pp0" 5) "age" 6) "34" > HSET user:100 password 12345 (integer) 0 > HGETALL user:1000 1) "username" 2) "antirez" 3) "password" 4) "12345" 5) "age" 6) "34"

  31. Sets! • Unordered collection of strings • Supports adds, gets, is-member checks, intersections, unions, sorting... > SADD myset 1 (integer) 1 > SADD myset 2 (integer) 1 > SADD myset 3 (integer) 1 > SMEMBERS myset 1) "1" 2) "2" 3) "3" > SADD myotherset 2 (integer) 1 > SINTER mysetmyotherset 1) "2" > SUNION mysetmyotherset 1) "1" 2) "2" 3) "3"

  32. Sorted Sets! • Similar to sorted sets, but they have an associated score and can return items in order • Elements are already sorted via an O(log(n)) operation, so returning them is easy > ZADD hackers 1940 "Alan Kay" (integer) 1 > ZADD hackers 1953 "Richard Stallman" (integer) 1 > ZADD hackers 1965 "Yukihiro Matsumoto" (integer) 1 > ZADD hackers 1916 "Claude Shannon" (integer) 1 > ZADD hackers 1969 "Linus Torvalds" (integer) 1 > ZADD hackers 1912 "Alan Turing" (integer) 1 > ZRANGE hackers 0 -1 1) "Alan Turing" 2) "Claude Shannon" 3) "Alan Kay" 4) "Richard Stallman" 5) "Yukihiro Matsumoto" 6) "Linus Torvalds"

  33. Features • Transactions • Pub/Sub • Lua Scripting • Key Expiration • Redis Clustering

  34. Transactions • Guarantees no client requests are served in the middle of a transaction • Either all commands or none are processed, so they are atomic • MULTI begins a transaction, and EXEC commits it • Redis will queue commands and process them upon EXEC • All commands in the queue are processed, even if one fails > MULTI OK > INCR foo QUEUED > INCR bar QUEUED > EXEC 1) (integer) 1 2) (integer) 1

  35. Pub/Sub • Messaging paradigm where publishers send messages to subscribers (if any) via channels • Subscribers express interest in channels, and receive messages from publishers (if any) • SUBSCRIBE test • Clients can subscribe to channels and messages from publishers will be pushed to them by Redis • PUBLISH test Hello • Can do pattern-based subscriptions to channels • PSUBSCRIBE news.*

  36. Lua Scripting • You can run Lui scripts to manipulate Redis > eval "return redis.call('set','foo','bar')" 0 OK

  37. Expire Keys after time • Set a timeout on a key, having Redis automatically delete it after the set time • Use case: Maintain session information for a user for the last 60 seconds to recommend related products MULTI RPUSH pagewviews.user:<userid> http://..... EXPIRE pagewviews.user:<userid> 60 EXEC

  38. Redis Cluster • Redis Cluster is not production ready, but can be used to do partitioning of your data cross multiple Redis instances • A few abstractions exist today to partition among Multiple instances, but they are not out-of-the-box with a Redis download

  39. Use Cases • Session Cache • Ranking lists • Auto Complete • Twitter/Github/Pinterest/Snapchat/Craiglist/ StackOverflow/Flicker

  40. Cluster Planning

  41. Workload Considerations • Balanced workloads • Jobs are distributed across various job types • CPU bound • Disk I/O bound • Network I/O bound • Compute intensive workloads - Data Analytics • CPU bound workloads require: • Large numbers of CPU's • Large amounts of memory to store in-process data • I/O intensive workloads - Sorting • I/O bound workloads require: • Larger number of spindles ( disks ) per node • Not sure…go with balance workloads configuration

  42. Hardware Topology • Hadoop uses a master / slave topology • Master Nodes include: • NameNode - maintains system metadata • Backup NN- performs checkpoint operations and host standby • ResourceManager- manages task assignment • Slave Nodes include: • DataNode - stores hdfs files / manages read and write requests • Preferably co-located with TaskTracker • NodeManager - performs map / reduce tasks

  43. Sizing The Cluster • Remember... Scaling is a relatively simple task • Start with a moderate sized cluster • Grow the cluster as requirements dictate • Develop a scaling strategy • As simple as scaling is…adding new nodes takes time and resources • Don't want to be adding new nodes each week • Amount of data typically defines initial cluster size • rate at which the volume of data increases • Drivers for determining when to grow your cluster • Storage requirements • Processing requirements • Memory requirements

  44. Storage Reqs Drive Cluster Growth • Data volume increases at a rate of 1TB / week • 3TB of storage are required to store the data alone • Remember block replication • Consider additional overhead - typically 30% • Remember files that are stored on a nodes local disk • If DataNodes incorporate 4 - 1TB drives • 1 new node per week is required • 2 years of data - roughly 100TB • will require 100 new nodes

  45. Things Break • Things are going to break • This assumption is a core premise of Hadoop • If a disk fails, the infrastructure must accommodate • If a DataNodefails, the NameNode must manage this • If a task fails, the ApplicationMaster must manage this failure • Master nodes are typically a SPOF unless using a Highly Available configuration • NameNode goes down, HDFS is inaccessible • Use NameNode HA • ResourceManager goes down, can't run any jobs • Use RM HA (in development)

  46. Cluster Nodes • Cluster nodes should be commodity hardware • Buy more nodes... Not more expensive nodes • Workload patterns and cluster size drive CPU choice • Small cluster - 50 nodes or less • Quad core / medium clock speed is usually sufficient • Large cluster • Dual 8-core CPUs with a medium clock speed is sufficient • Compute intensive workloads might require higher clock speeds • General guideline is to buy more hardware instead of faster hardware • Lots of memory - 48GB / 64GB / 128GB / 256GB • Each map / reduce task consumes 1GB to 3GB of memory • OS / Daemons consume memory as well

  47. Cluster Storage • 4 to 12 drives of 1TB / 2TB capacity - up to 24TB / node • 3TB drives work • Network performance penalty if a node fails • 7200 rpm SATA drives are sufficient • Slightly above average MTBF is advantageous • JBOD configuration • RAID is slow • RAID is not required due to block replication • More smaller disks is preferred over fewer larger disks • Increased parallelism for DataNodes • Slaves should never use virtual memory

  48. Master Nodes • Still commodity hardware, but... better • Redundant everything • Power supplies • Dual Ethernet cards • 16 to 24 CPU cores on NameNodes • NameNodesand their clients are very chatty and need more cores to handle messaging traffic • Medium clock speeds should be sufficient

  49. Master Nodes • HDFS namespace is limited to the amount of memory on the NameNode • RAID and NFS storage on NameNode • Typically RAID5 with hot spare • Second remote directory such as NFS • Quorum Journal Manager for HA

  50. Network Considerations • Hadoop is bandwidth intensive • This can be a significant bottleneck • Use dedicated switches • 10Gb Ethernet is pretty good for large clusters

More Related