460 likes | 473 Views
This paper discusses the challenges in innovation of internet data transfer techniques and proposes a solution called Data Oriented Transfer (DOT) service. DOT decouples content negotiation from data transfer, allowing applications to utilize available transfer techniques without modification. It also introduces plugins for application-independent cache, multipath transfer, and non-networked transfers. The evaluation includes standard file transfer, portable storage, and multipath plugin experiments.
E N D
An Architecture for Internet Data Transfer Niraj Tolia Michael Kaminsky*, David G. Andersen, and Swapnil Patil Carnegie Mellon University and *Intel Research Pittsburgh
Innovation in Data Transfer is Hard • Imagine: You have a novel data transfer technique • How do you deploy? • Update HTTP. Talk to IETF. Modify Apache, IIS, Firefox, Netscape, Opera, IE, Lynx, Wget, … • Update SMTP. Talk to IETF. Modify Sendmail, Postfix, Outlook… • Give up in frustration
Barriers to Innovation in Data Transfer • Applications bundle: • Content Negotiation: What data to send • Naming (URLs, directories, …) • Languages • Identification • … • Data Transfer: Getting the bits across • Both are tightly coupled (e.g., HTTP, SMTP) • Hinders innovation and evolution of new services
Xfer Service Xfer Service Solution: A Data Transfer Service • Decouple content negotiation from data transfer • Applications perform negotiation as before • But hand data objects to the Transfer Service • The Transfer Service is shared by applications Sender Receiver Application Protocol and Data Data
Extensible Transfer Architecture Sender Receiver Application Protocol Xfer Service Xfer Service USB Keychain Bittorrent Bittorrent Local Cache USB Keychain Plugins • Application-independent cache • New network features • Non-networked transfers
Transfer Service Benefits • Apps. can reuse available transfer techniques • No reimplementation needed • Easier deployment of new technologies • Applications need no modification • Provides for cross-application sharing • Can interpose on all data transfers • Handles transient disconnections
Outline • Motivation • Data Oriented Transfer (DOT) service • Evaluation • Open Issues and Future Work • Conclusion
? 10,000 Foot View of Transfers using DOT Request File X Sender Receiver put(X) read() data ? Xfer Service Xfer Service • How does the transfer service namedata? • How does the transfer service locate data?
Application defined names are not portable Use content-naming for globally unique names Objects represented by an OID Objects are further sub-divided into “chunks” Each OID corresponds to a list of descriptors Descriptor lists allow for partial transfers Foo.txt OID Cryptographic Hash File Desc1 Desc2 Desc3 DOT: Object Naming File
DOT: Object Location • Data transfers in DOT are receiver driven • Receiver has better idea of available resources • Senders specify ‘hints’ - potential data locations • dot://sender.example.com:12000/ • dht://opendht.org/ • …
Transfer Plugins A Transfer using DOT Request File X OID, Hints Sender Receiver put(X) read() data OID, Hints get(OID, Hints) Xfer Service Xfer Service
Transfer Plugin Network DOT’s Modular Architecture Application (1) ApplicationAPI DOT (3) Storage Plugin API (2) Transfer Plugin API Storage Plugin Local Storage
Transfer Plugin MultiPathPlugin Transfer Plugin Transfer Plugin Network DOT Transfer Plugin API • Simple API • get_descriptor_list( OID, hints ) • get_chunks( descriptor_list, hints ) • cancel_chunks( chunk_list ) • Transfer plugin chaining is easy • e.g., multipath plugin
In C++ using libasync event-driven library One storage plugin: In-memory hash tables, disk backed. Three transfer plugins: Default Xfer-Xfer plugin Portable Storage plugin Multipath plugin Applications gcp, an scp-like tool for file transfers A DOT-enabled Postfix email server Included a socket-like adapter library Implementation
USB USB NET wireless Multi- path cache Xfer NET MIRROR Current DOT Prototype Xfer Xfer NET Internet NET ( DSL ) SENDER RECEIVER Plugins • Application-independent cache • Multipath and Mirror support • Non-networked transfers
Outline • Motivation • Data Oriented Transfer (DOT) service • Evaluation • Open Issues and Future Work • Conclusion
Evaluation • Standard file transfer • Portable Storage • Multi-Path • Case Study: Postfix Email Server • Capture and analysis of email trace • Evaluation of DOT-enabled SMTP server • Integration effort
Standard File Transfer Setup Network Emulator • Two DOT-enabled machines • Network Emulator • Evaluate various b/w + delay combinations • Use gcp for the file transfers • Used 40MB, 4MB, 400KB, 40KB, 4KB files • Presenting 40MB here
Standard File Transfer • Overhead: hashing, extra RTT • No noticeable overheads with latency
Portable Storage Experiment • 255 MB transfer over emulated DSL • Based on Virtual Machine transfers at Carnegie Mellon • DOT preemptively copies data onto Flash drive • Wait 5 minutes, plug flash drive into receiver • Two drive speeds • 8MB/s - 1GB • 20MB/s - 2GB 2 Mbit/s
Portable Storage Results Device Inserted .. 1126s(~ 19 min)
Multipath Plugin: Load Balancing Network Emulator • Varied capacity + delay of experimental links • Compare fastest link alone with multipath plugin on both links; what speedup? • Transferred 40MB file • 128 KB socket buffer sizes Gigabit Experimental links
Multipath Plugin is Effective Link 1 Gigabit Link 2 • 40 MB @ 100Mbit/s ideal: 3.2 seconds • Multipath plugin nearly doubles throughput • TCP effects dominate. Pipe not full. • Multipath plugin doubles by adding second stream. Actual capacity irrelevant.
Postfix Email Trace Replay • Generated 10,000 email messages from trace • Random data matched to chunk hash data • Preserves some similarity between messages • Replayed through Postfix to a single local server • Postfix disk bound… DOT CPU overhead negligible • Savings due to duplication within emails
Postfix Integration • Integrated DOT with the Postfix mail server • 1 part-time week, 1 student new to Postfix • Includes time to write generic adapter library
Discussion on Deployment • Application Resilience • DOT is a service - it’s outside the control of the application. • Our Postfix falls back to normal SMTP if • No Transfer Service contact • Transfer keeps failing • In the short term, a simple fallback is encouraged. However, this could interfere with some functions • DOT-based virus scanner… • In the long term, DOT would be a part of a system’s core infrastructure
Future Work • Security • Application encrypts before DOT • No block-based caching, reuse, mirroring, … • No encryption • Resembles the status quo • In progress: Convergent encryption • Requires integration with DOT chunking • Application Preferences • Encryption, QoS, priorities, … • DOT might benefit from application input • Need an extensible way to express these
Conclusion • DOT separates app. logic from data transfer • Makes it easier to extend both • Architecture works well • Overhead low (especially in wide-area) • Major benefits • Caching • Flexibility to implement new transfer techniques
Server SMTP Client EHLO 250 Hello MAIL FROM: user … DATA 250 OK Normal SMTP
Server SMTP Client EHLO 250 Hello MAIL FROM: user … X-DOT-DATA (OID+Hints) Xfer Service 250 OK DOT-Enabled SMTP
Hash1 Hash2 Hash3 Convergent Encryption • Chunki is encrypted using Hashi • All identical cleartext blocks will map to the same encrypted block • Hashi is further encrypted using a private key File
Mail Server Evaluation • Trace: 159 days at low volume academic mail server • 458,861 messages • hash, size of: message, headers, body • Message chunks • hash and size of each chunk • Static chunking and Rabin fingerprinting (Content-based block division)
Default GTC-GTC Transfer Protocol • GTC-GTC protocol mirrors transfer plugins • Implemented as RPC calls • (Fetches are actually pipelined) Sender Receiver GET_DESCRIPTORS(OID) Desc list 1,2,… GET_CHUNKS(…) Chunk 1 Chunk 2 GET_CHUNKS(…)
Hash 1 Hash 2 Natural Boundary Natural Boundary Rabin Fingerprinting File Data Rabin Fingerprints 4 7 8 2 8 Given Value - 8
Rabin Fingerprinting: Examples of Edits 1. Original File 2. Addition in chunk • Changes only one hash 3. Addition creating a new breakpoint 4. Deletion changing size of chunk Figure from “A Low-bandwidth Network File System”
Objects represented by an OID Divided into “chunks”, each with a descriptor Each OID corresponds to a list of descriptors Data is fetched using descriptor lists Supports partial transfers DOT Objects Naming
Innovation in Data Transfer is Hard • Imagine: You have a novel data transfer technique • Say… Bittorrent, a P2P protocol for sharing large files • How do you deploy? • Update HTTP. Talk to IETF. Modify Apache, IIS, Firefox, Netscape, Opera, IE, Lynx, Wget, … • Update SMTP. Talk to IETF. Modify Sendmail, Postfix, Exchange, Mail.app, Eudora, … • Give up in frustration
Transfer Plugin Transfer Plugin Transfer Plugin Network DOT’s Modular Architecture Application (1) ApplicationAPI DOT (3) Storage Plugin API (2) Transfer Plugin API Storage Plugin Local Storage
Multipath plugin List of sub-plugins Balances load Portable storage plugin Sender: Copies new data onto USB flash device Receiver: Scans USB flash device for blocks naïve filesystem layout, unoptimized but effective DOT Plugins
Portable Storage Results Device Inserted .. 1126s(~ 19 min)
DOT email chunk caching evaluation • Step 1: Caching analysis (infinite cache) • SMTP default: What was really sent • DOT body: Whole-body only caching, headers sent separately • Rabin body: Headers sent separately, rabin fingerprint chunking of body • Rabin whole: Headers+body chunked together • Easiest to implement for application. Just send data… • Step 2: Trace replay through Postfix
Related Work • BEEP • Proxy-based data interposition approaches • RON, X-Bone, OCALA • Other Content-Addressable Systems • Bittorrent, DHTs, DTNs, EMC’s Centera • Using Content-Addressability to save on data transfers • CASPER, LBFS, Rhea et al., Spring et al. • Portable Storage • Lookaside Caching, BlueFS • Other transfer protocols • GridFTP, IBP, HTTP, etc.
Transfer plugin API • get_descriptors( OID, hints ) • get_chunks( descriptor, hints ) • cancel_chunks( chunk_list ) • Hints specify a plugin + data • gtc://sender.example.com:12000/ • dht://opendht.org/ • … • Transfer plugin chaining is easy • e.g., multipath plugin