Download
slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
ZHT PowerPoint Presentation

ZHT

764 Views Download Presentation
Download Presentation

ZHT

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. ZHT A Fast, Reliable and Scalable Zero-hop Distributed Hash Table Tonglin Li, Xiaobing Zhou, Kevin Brandstatter, Dongfang Zhao, Ke Wang, Zhao Zhang, IoanRaicuIllinois Institute of Technology, Chicago, U.S.A

  2. A supercomputer is a device for turning compute-bound problems into I/O-bound problems. Ken Batcher

  3. Big problem: file systems scalability • Parallel file system (GPFS, PVFS, Lustre) • Separated computing resource from storage • Centralized metadata management • Distributed file system(GFS, HDFS) • Specific-purposed design (MapReduce etc.) • Centralized metadata management

  4. The bottleneck of file systems • Metadata Concurrent file creates

  5. Proposed work • A distributed hash table (DHT) for HEC • As building block for high performance distributed systems • Performance • Latency • Throughput • Scalability • Reliability

  6. Related work: Distributed Hash Tables • Many DHTs: Chord, Kademlia, Pastry, Cassandra, C-MPI, Memcached, Dynamo ... • Why another?

  7. Zero-hop hash mapping

  8. 2-layer hashing

  9. Architecture and terms • Name space: 264 • Physical node • Manager • ZHT Instance • Partition: n (fixed) • n = max(k)

  10. How many partition per node can we do?

  11. Membership management • Static: Memcached, ZHT • Dynamic • Logarithmic routing: most of DHTs • Constant routing: ZHT

  12. Membership management • Update membership • Incremental broadcasting • Remap k-v pairs • Traditional DHTs: rehash all influenced pairs • ZHT: Moving whole partition • HEC has fast local network!

  13. Consistency • Updating membership tables • Planed nodes join and leave: strong consistency • Nodes fail: eventual consistency • Updating replicas • Configurable • Strong consistency: consistent, reliable • Eventual consistency: fast, availability

  14. Persistence: NoVoHT • NoVoHT •  persistent in-memory hash map • Append operation • Live-migration

  15. Failure handling • Insert and append • Send it to next replica • Mark this record as primary copy • Lookup • Get from next available replica • Remove • Mark record on all replicas

  16. Evaluation: test beds • IBM Blue Gene/P supercomputer • Up to 8192 nodes • 32768 instance deployed • Commodity Cluster • Up to 64 node • Amazon EC2 • M1.medium and Cc2.8xlarge • 96 VMs, 768 ZHT instances deployed

  17. Latency on BG/P

  18. Latency distribution

  19. Throughput on BG/P

  20. Aggregated throughput on BG/P

  21. Latency on commodity cluster

  22. ZHT on cloud: latency

  23. ZHT on cloud: latency distribution ZHT on cc2.8xlarge instance 8 s-c pair/instance DynamoDB: 8 clients/instance ZHT 4 ~ 64 nodes DynamoDB write DynamoDB read 0.99 0.9

  24. ZHT on cloud: throughput

  25. Amortized cost

  26. Applications • FusionFS • A distributed file system • Metadata: ZHT • IStore • A information dispersal storage system • Metadata: ZHT • MATRIX • A distributed many-Task computing execution framework • ZHT is used to submit tasks and monitor the task execution status

  27. FusionFS result: Concurrent File Creates

  28. Istore results

  29. MATRIX results

  30. Future work • Larger scale • Active failure detection and informing • Spanning tree communication • Network topology-aware routing • Fully synchronized replicas and membership: Paxos protocol • More protocols support (UDT, MPI…) • Many optimizations

  31. Conclusion • ZHT :A distributed Key-Value store • light-weighted • high performance • Scalable • Dynamic • Fault tolerant • Versatile: works from clusters, to clouds, to supercomputers

  32. Questions? Tonglin Li tli13@hawk.iit.edu http://datasys.cs.iit.edu/projects/ZHT/