1 / 24

Data Staging on Untrusted Surrogates

Data Staging on Untrusted Surrogates. Jason Flinn Shafeeq Sinnamohideen Niraj Tolia Mahadev Satyanarayanan Intel Research Pittsburgh, University of Michigan, Carnegie Mellon University. Mobile Data Access: Expectation vs. Reality. Mobile computers increasingly connected

Download Presentation

Data Staging on Untrusted Surrogates

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Staging on Untrusted Surrogates Jason Flinn Shafeeq Sinnamohideen Niraj Tolia Mahadev Satyanarayanan Intel Research Pittsburgh, University of Michigan, Carnegie Mellon University

  2. Mobile Data Access: Expectation vs. Reality • Mobile computers increasingly connected • expectation of ubiquitous data access • distributed file systems can help • Does reality match expectations? • Size, weight, energy constaints • Less storage, processing power, etc. • How to match reality and expectations? • Use untrusted, unmanaged infrastructure!

  3. Problem: Limited Storage • Latency often the real performance-killer • File systems: many sequential RPCs • Network latency not improving (much)! • What if one can’t cache all files of interest? • Borrow storage from nearby surrogate • Use as a “L2 file cache” Client Surrogate File server

  4. Problem: Limited Battery Energy • File system consumes a lot of energy: • Network communication • Storage (disk spin-ups, reads, writes) • Surrogate helps preserve client battery • Use surrogate cache to avoid disk spin-ups • Prefetch updates to surrogate, not client

  5. Problem: Limited Bandwidth • How to fetch large updates in a short window? • Example: passing through airport gate • 11 Mbps (or more) local wireless bandwidth • Wide-area Internet bandwidth often less • InfoStation (Wu, Badrinath, et al.) • Cache updates before mobile user arrives • Blast data as user passes through cell • Surrogate: mechanism for caching file data.

  6. Location, Location, Location • Requirement: surrogate located near the client! • Must be opportunistic (use what’s there) • Vision: surrogates ubiquitously deployed • Computers getting ever cheaper • Already 802.11b wireless networks in cafes • Can’t trust or assume good behavior!

  7. Outline • Motivation • Architecture and design • Implementation • Evaluation • Related work and conclusions

  8. Server Wimpy Client High Latency File Server File Client Coda Proxy Modifications & Unstaged reads Coda files Staged reads File keys and hashes (via secure channel) Desktop Surrogate File Client Coda Staging Server Encrypted files Data Pump Data Staging Architecture File system traffic

  9. Trust (or Lack Thereof) • Trusted: client, file server, desktop, file system • Untrusted: surrogate, network • How to deal with untrusted surrogate? • End-to-end encryption (privacy) • Cryptographic hashes (authenticity) • Read-only data (can’t “lose” updates) • Monitor performance (mitigate DoS)

  10. Ease of Management • Can’t require a system administrator! • Build on commodity software • Apache with Perl scripts (643 LoC) • No long-term state • OK to trip over power cord! • Allow file system diversity • Minimalist API • Currently support Coda and NFS

  11. Surrogate API • Register() Get lease, quota for surrogate • Renew() Renew a lease • Deregister() Explicitly stop using surrogate • Stage() Put data on the surrogate • Unstage() Remove data from surrogate • Get() Retrieve data from surrogate

  12. Which Files to Stage? • Must predict the files most likely to be accessed • Prediction orthogonal to data staging • Client proxy has hooks for prediction code • Hoarding: user manually specifies files, dirs • Clustering: per-activity LRU caching User-Driven Clustering Manual Copy Coda Hoarding SEER Less Transparent More Transparent

  13. Client Proxy Data Structures • Client proxy final arbiter of validity • For each staged file, maintains: • Valid bit • Data length • Encryption key and secure hash

  14. Staging Data • Client proxy sends list of files to data pump • For each file, data pump: • Reads file and attributes from file system • Encrypts file, generates hash over data • Sends encrypted data to surrogate • Sends key, hash, length to client • Staging asynchronous with client file accesses • If file staged, client gets it from surrogate • Otherwise, gets it from file server

  15. Outline • Motivation • Architecture and design • Implementation • Evaluation • Related work and conclusions

  16. Experimental Setup Client: IPAQ 3850 64 MB Coda cache 30 ms delay Ethernet 802.11b Wireless Access Point Coda file server Surrogate Cold cache: no data on client or surrogate Warm cache: data initially on client and surrogate

  17. Benchmark: Image Trace • Record accesses to digital photo library in Coda • Take the first 10,148 accesses • 150 MB unique data, 401 MB total data read • Replay trace as fast as possible (DFSTrace) • Variables: • Wastage ratio: extra data prefetched • Miss ratio: amount of data never prefetched • Assume wastage ratio 33%, miss ratio 0% • Then do sensitivity analysis

  18. Baseline Image Results Staging reduces execution time 45-48%!

  19. Sensitivity Analysis Higher miss ratio has relatively greatereffect

  20. Longer-Duration File Traces • Used Mummert’s Coda file system traces • Traces of client activity (open, mkdir, etc.) • Duration: 16-55 hours • Working set size: 57-254 MB • Methodology: • Keep inter-request delays when prefetching • Eliminate delays afterwards

  21. File Trace Results Up to 48% reduction in cumulative file access delay

  22. Request Latency Breakdown

  23. Related Work • Web Caching (Akamai, Squid) • Different data access patterns, consistency • Fluid Replication (Kim02) • Assume more trust and management • OceanStore (Kubiatowicz02) • Staging minimalist, file-system agnostic • Builds on work in file prefetching, InfoStations

  24. Conclusion • Possible to significantly improve distributed file system performance with untrusted, unmanaged infrastructure! • Future work: • Grow set of supported file systems • Surrogate discovery and migration • Support for energy-awareness • http://info.pittsburgh.intel-research.net

More Related