1 / 46

An Architecture for Internet Data Transfer

This paper discusses the challenges in innovation of internet data transfer techniques and proposes a solution called Data Oriented Transfer (DOT) service. DOT decouples content negotiation from data transfer, allowing applications to utilize available transfer techniques without modification. It also introduces plugins for application-independent cache, multipath transfer, and non-networked transfers. The evaluation includes standard file transfer, portable storage, and multipath plugin experiments.

hamer
Download Presentation

An Architecture for Internet Data Transfer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Architecture for Internet Data Transfer Niraj Tolia Michael Kaminsky*, David G. Andersen, and Swapnil Patil Carnegie Mellon University and *Intel Research Pittsburgh

  2. Innovation in Data Transfer is Hard • Imagine: You have a novel data transfer technique • How do you deploy? • Update HTTP. Talk to IETF. Modify Apache, IIS, Firefox, Netscape, Opera, IE, Lynx, Wget, … • Update SMTP. Talk to IETF. Modify Sendmail, Postfix, Outlook… • Give up in frustration

  3. Barriers to Innovation in Data Transfer • Applications bundle: • Content Negotiation: What data to send • Naming (URLs, directories, …) • Languages • Identification • … • Data Transfer: Getting the bits across • Both are tightly coupled (e.g., HTTP, SMTP) • Hinders innovation and evolution of new services

  4. Xfer Service Xfer Service Solution: A Data Transfer Service • Decouple content negotiation from data transfer • Applications perform negotiation as before • But hand data objects to the Transfer Service • The Transfer Service is shared by applications Sender Receiver Application Protocol and Data Data

  5. Extensible Transfer Architecture Sender Receiver Application Protocol Xfer Service Xfer Service USB Keychain Bittorrent Bittorrent Local Cache USB Keychain Plugins • Application-independent cache • New network features • Non-networked transfers

  6. Transfer Service Benefits • Apps. can reuse available transfer techniques • No reimplementation needed • Easier deployment of new technologies • Applications need no modification • Provides for cross-application sharing • Can interpose on all data transfers • Handles transient disconnections

  7. Outline • Motivation • Data Oriented Transfer (DOT) service • Evaluation • Open Issues and Future Work • Conclusion

  8. ? 10,000 Foot View of Transfers using DOT Request File X Sender Receiver put(X) read() data ? Xfer Service Xfer Service • How does the transfer service namedata? • How does the transfer service locate data?

  9. Application defined names are not portable Use content-naming for globally unique names Objects represented by an OID Objects are further sub-divided into “chunks” Each OID corresponds to a list of descriptors Descriptor lists allow for partial transfers Foo.txt OID Cryptographic Hash File Desc1 Desc2 Desc3 DOT: Object Naming File

  10. DOT: Object Location • Data transfers in DOT are receiver driven • Receiver has better idea of available resources • Senders specify ‘hints’ - potential data locations • dot://sender.example.com:12000/ • dht://opendht.org/ • …

  11. Transfer Plugins A Transfer using DOT Request File X OID, Hints Sender Receiver put(X) read() data OID, Hints get(OID, Hints) Xfer Service Xfer Service

  12. Transfer Plugin Network DOT’s Modular Architecture Application (1) ApplicationAPI DOT (3) Storage Plugin API (2) Transfer Plugin API Storage Plugin Local Storage

  13. Transfer Plugin MultiPathPlugin Transfer Plugin Transfer Plugin Network DOT Transfer Plugin API • Simple API • get_descriptor_list( OID, hints ) • get_chunks( descriptor_list, hints ) • cancel_chunks( chunk_list ) • Transfer plugin chaining is easy • e.g., multipath plugin

  14. In C++ using libasync event-driven library One storage plugin: In-memory hash tables, disk backed. Three transfer plugins: Default Xfer-Xfer plugin Portable Storage plugin Multipath plugin Applications gcp, an scp-like tool for file transfers A DOT-enabled Postfix email server Included a socket-like adapter library Implementation

  15. USB USB NET wireless Multi- path cache Xfer NET MIRROR Current DOT Prototype Xfer Xfer NET Internet NET ( DSL ) SENDER RECEIVER Plugins • Application-independent cache • Multipath and Mirror support • Non-networked transfers

  16. Outline • Motivation • Data Oriented Transfer (DOT) service • Evaluation • Open Issues and Future Work • Conclusion

  17. Evaluation • Standard file transfer • Portable Storage • Multi-Path • Case Study: Postfix Email Server • Capture and analysis of email trace • Evaluation of DOT-enabled SMTP server • Integration effort

  18. Standard File Transfer Setup Network Emulator • Two DOT-enabled machines • Network Emulator • Evaluate various b/w + delay combinations • Use gcp for the file transfers • Used 40MB, 4MB, 400KB, 40KB, 4KB files • Presenting 40MB here

  19. Standard File Transfer • Overhead: hashing, extra RTT • No noticeable overheads with latency

  20. Portable Storage Experiment • 255 MB transfer over emulated DSL • Based on Virtual Machine transfers at Carnegie Mellon • DOT preemptively copies data onto Flash drive • Wait 5 minutes, plug flash drive into receiver • Two drive speeds • 8MB/s - 1GB • 20MB/s - 2GB 2 Mbit/s

  21. Portable Storage Results Device Inserted .. 1126s(~ 19 min)

  22. Multipath Plugin: Load Balancing Network Emulator • Varied capacity + delay of experimental links • Compare fastest link alone with multipath plugin on both links; what speedup? • Transferred 40MB file • 128 KB socket buffer sizes Gigabit Experimental links

  23. Multipath Plugin is Effective Link 1 Gigabit Link 2 • 40 MB @ 100Mbit/s ideal: 3.2 seconds • Multipath plugin nearly doubles throughput • TCP effects dominate. Pipe not full. • Multipath plugin doubles by adding second stream. Actual capacity irrelevant.

  24. Postfix Email Trace Replay • Generated 10,000 email messages from trace • Random data matched to chunk hash data • Preserves some similarity between messages • Replayed through Postfix to a single local server • Postfix disk bound… DOT CPU overhead negligible • Savings due to duplication within emails

  25. Postfix Integration • Integrated DOT with the Postfix mail server • 1 part-time week, 1 student new to Postfix • Includes time to write generic adapter library

  26. Discussion on Deployment • Application Resilience • DOT is a service - it’s outside the control of the application. • Our Postfix falls back to normal SMTP if • No Transfer Service contact • Transfer keeps failing • In the short term, a simple fallback is encouraged. However, this could interfere with some functions • DOT-based virus scanner… • In the long term, DOT would be a part of a system’s core infrastructure

  27. Future Work • Security • Application encrypts before DOT • No block-based caching, reuse, mirroring, … • No encryption • Resembles the status quo • In progress: Convergent encryption • Requires integration with DOT chunking • Application Preferences • Encryption, QoS, priorities, … • DOT might benefit from application input • Need an extensible way to express these

  28. Conclusion • DOT separates app. logic from data transfer • Makes it easier to extend both • Architecture works well • Overhead low (especially in wide-area) • Major benefits • Caching • Flexibility to implement new transfer techniques

  29. Backup Slides

  30. Server SMTP Client EHLO 250 Hello MAIL FROM: user … DATA 250 OK Normal SMTP

  31. Server SMTP Client EHLO 250 Hello MAIL FROM: user … X-DOT-DATA (OID+Hints) Xfer Service 250 OK DOT-Enabled SMTP

  32. Hash1 Hash2 Hash3 Convergent Encryption • Chunki is encrypted using Hashi • All identical cleartext blocks will map to the same encrypted block • Hashi is further encrypted using a private key File

  33. Standard File Transfer

  34. Mail Server Evaluation • Trace: 159 days at low volume academic mail server • 458,861 messages • hash, size of: message, headers, body • Message chunks • hash and size of each chunk • Static chunking and Rabin fingerprinting (Content-based block division)

  35. DOT chunk caching benefits email

  36. Default GTC-GTC Transfer Protocol • GTC-GTC protocol mirrors transfer plugins • Implemented as RPC calls • (Fetches are actually pipelined) Sender Receiver GET_DESCRIPTORS(OID) Desc list 1,2,… GET_CHUNKS(…) Chunk 1 Chunk 2 GET_CHUNKS(…)

  37. Hash 1 Hash 2 Natural Boundary Natural Boundary Rabin Fingerprinting File Data Rabin Fingerprints 4 7 8 2 8 Given Value - 8

  38. Rabin Fingerprinting: Examples of Edits 1. Original File 2. Addition in chunk • Changes only one hash 3. Addition creating a new breakpoint 4. Deletion changing size of chunk Figure from “A Low-bandwidth Network File System”

  39. Objects represented by an OID Divided into “chunks”, each with a descriptor Each OID corresponds to a list of descriptors Data is fetched using descriptor lists Supports partial transfers DOT Objects Naming

  40. Innovation in Data Transfer is Hard • Imagine: You have a novel data transfer technique • Say… Bittorrent, a P2P protocol for sharing large files • How do you deploy? • Update HTTP. Talk to IETF. Modify Apache, IIS, Firefox, Netscape, Opera, IE, Lynx, Wget, … • Update SMTP. Talk to IETF. Modify Sendmail, Postfix, Exchange, Mail.app, Eudora, … • Give up in frustration

  41. Transfer Plugin Transfer Plugin Transfer Plugin Network DOT’s Modular Architecture Application (1) ApplicationAPI DOT (3) Storage Plugin API (2) Transfer Plugin API Storage Plugin Local Storage

  42. Multipath plugin List of sub-plugins Balances load Portable storage plugin Sender: Copies new data onto USB flash device Receiver: Scans USB flash device for blocks naïve filesystem layout, unoptimized but effective DOT Plugins

  43. Portable Storage Results Device Inserted .. 1126s(~ 19 min)

  44. DOT email chunk caching evaluation • Step 1: Caching analysis (infinite cache) • SMTP default: What was really sent • DOT body: Whole-body only caching, headers sent separately • Rabin body: Headers sent separately, rabin fingerprint chunking of body • Rabin whole: Headers+body chunked together • Easiest to implement for application. Just send data… • Step 2: Trace replay through Postfix

  45. Related Work • BEEP • Proxy-based data interposition approaches • RON, X-Bone, OCALA • Other Content-Addressable Systems • Bittorrent, DHTs, DTNs, EMC’s Centera • Using Content-Addressability to save on data transfers • CASPER, LBFS, Rhea et al., Spring et al. • Portable Storage • Lookaside Caching, BlueFS • Other transfer protocols • GridFTP, IBP, HTTP, etc.

  46. Transfer plugin API • get_descriptors( OID, hints ) • get_chunks( descriptor, hints ) • cancel_chunks( chunk_list ) • Hints specify a plugin + data • gtc://sender.example.com:12000/ • dht://opendht.org/ • … • Transfer plugin chaining is easy • e.g., multipath plugin

More Related