1 / 25

Data Currency in Replicated DHTs

Data Currency in Replicated DHTs. Reza Akbarinia , Esther Pacitti and Patrick Valduriez University of Nantes, France, INIRA ACM SIGMOD 2007 Presenter Jerry Wu. Motivation. P2P data sharing systems Enable large amount of users to share a massive number of files

aminia
Download Presentation

Data Currency in Replicated DHTs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Currency in Replicated DHTs Reza Akbarinia, Esther Pacitti and Patrick Valduriez University of Nantes, France, INIRA ACM SIGMOD 2007 Presenter Jerry Wu

  2. Motivation • P2P data sharing systems • Enable large amount of users to share a massive number of files • Query  Reply  Send request  Download • Message forwarding on these systems • Flooding : KaZaA, Gnutella • DHT : CAN, Chord, Pastry, … etc.

  3. Distributed Hash Table (DHT) • Use hash functions to locate files • h(meta data) = k (for identification) • g(k) = k1 (for routing) k1 U A Meta FreeLoop.mp3 B F g(k)=k1 (A) C E D

  4. Data Replication • What if node A fails? • Duplicate several copies U A k1 Meta FreeLoop.mp3 B F g(h(FreeLoop.mp3))=k1 (A) C E g2(h(FreeLoop.mp3))=k2 (D) D k3 g3(h(FreeLoop.mp3))=k3 (E) k2

  5. Basic Operations • putH(meta key k, File D) • Insert a file into the DHT • getH(meta key k) • Retrieve the file from the DHT • : { g(k , D) | g is used as a hash function} |H| : The replication level of the system Each file will be stored at |H| peers

  6. Additional Problems • If the owner can modify the data … • The nature of P2P system • Peers can join and leave dynamically • Update while some peers depart and rejoins later? • Concurrent update?

  7. Solution • If we have a timestamp for each transaction of update/insert ? • The currency of the file is judged by its timestamp • FileX = File + timestamp • Put (k, FileX) instead of (k, File) into the DHT!! • Then we know the freshness of the file • Only the latest update can succeed

  8. How Can We Get A Timestamp? • KTS (Key-based Timestamp Service) • Issue timestamps for each transaction • gen_ts(key k) • Generate a timestamp w.r.t. key k • last_ts(key k) • Return the finally issued timestamp

  9. The New DHT Functions • Based on the KTS service • Insert(key k, FileX D, Hash function set Hr) • Insert or update a file with identity key k into the DHT • Retrieve(k, Hr) • Retrieve the latest copy of the file with identity key k

  10. Insert A File putg2(k, (tA, P.avi)) putg(k, (tA, P.avi)) gen_ts(k)=tA H G h(P.avi)=k KTS Timestamp Service U A k1 Insert P.avi B F g(k)=k1 (A) C E g2(k)=k2 (C) k2 D

  11. Retrieve A File getg2(k) getg(k) last_ts(k)=tA H G h(P.avi)=k KTS Timestamp Service (t0, P.avi) U A k1 Get P.avi B F (tA, P.avi) g(k)=k1 (A) C E g2(k)=k2 (C) k2 D

  12. Update A File • If( tsx > ts0) then • Update File D putg(k, (tsx, File D))

  13. Retrieval Cost Analysis • C = Ckts + N * Cret • Ckts = Cret = O(logn), n = # of peers • Let X be the random variable of N • N : Number of retries to get the latest copy • pt : The probability of finding a fresh copy • Prob(X = i) = pt * (1 - pt)i-1 • |Hr| = number of replicas of the system

  14. Retrieval Cost Analysis • Then, how can we get a timestamp? • Key-based Timestamp Service (KTS)

  15. The KTS Service • Use the same DHT but with different hash function hts 4 3 TimeStamp Request (k) Req(k, hts)=p 1 Hash Table Req (k, hts) Hash Table Req(k, hts) 2

  16. The KTS Service • How can node p generate timestamps w.r.t. key k? • Receive the counters from a leaving peer • DHT system will distribute the load of the leaving peer to its neighbors • Direct initialization • Send a file request w.r.t. key k to obtain the latest timestamp • Take place if the leaving peer fails • Indirect initialization

  17. The KTS Service • Indirect initialization • The probability to fail  pf • pf = (1-pt)|H| • If pt = 30%, |H|=13, then pf < 1% • After initialization, increase timestamp on every timestamp request

  18. Experiments And Simulations • Environments • 64 node cluster • 10000 nodes on the SimJava platform • Metrics • Response time : Time to return a current replica in response to a query • Communication cost : # of messages to send to answer a query

  19. The Competitor - BRICKS • Use a function to map key k to multiple keys (k1, k2, k3, k4, …) • Each replica has a version number • Concurrent update problems • Must extract all replicas to find the newest one

  20. Response Time VS DHT Size

  21. Communication Cost VS DHT Size

  22. Response Time VS # of Replica

  23. Failure Rate VS Response Time

  24. Conclusion • Pros • Use DHT to provide timestamp service is smart! • Consider the concurrent update problem • Easy to apply on exiting DHTs • Cons • KTS service can raise additional communication overhead

  25. Thank You

More Related