1 / 27

EMFS: Email-based Personal Cloud Storage NAS 2011

EMFS: Email-based Personal Cloud Storage NAS 2011. Jagan Srinivasan , Wei Wei, Xiaosong Ma, Ting Yu. 1 /32. Agenda. Introduction Data Organization and Access Email-based File System Design Performance Evaluation Related Work Conclusion. 2 /32. Motivation.

yosefu
Download Presentation

EMFS: Email-based Personal Cloud Storage NAS 2011

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EMFS: Email-based Personal Cloud StorageNAS 2011 Jagan Srinivasan, Wei Wei, Xiaosong Ma, Ting Yu 1/32

  2. Agenda Introduction Data Organization and Access Email-based File System Design Performance Evaluation Related Work Conclusion 2/32

  3. Motivation Existing personal cloud storage services • Tie storage with internal data format and processing applications • Non-free general-purpose storage and not widely utilized Existing email services • The capacity of a single email account has increased dramatically • Provided by many reliable and reputable online service providers Leveraging existing email services • Benefit service providers as it extends their access to valuable customer data 3/32

  4. EMFS Overview Target Workload and Assumptions • Typical personal workload • Reading, editing, and backing up documents such as Word, pdf, etc. • Targets file sizes ranging from several KBs to tens of MBs • Users will not share storage with others or allow concurrent access to his/her data. Design Goals • Usability (generic file system interface) • Scalability (extensible personal storage space) • Reliability (access despite single email failure) 4/32

  5. EMFS System Architecture Memory Cache Email File System Interface through FUSE Email Mapping Service Local Cache Email Cloud Storage Interface striping striping … replication replication replication replication 5/32 …

  6. Agenda Introduction Data Organization and Access Email-based File System Design Performance Evaluation Related Work Conclusion 6/32

  7. Data Organization and Access File Organization • Metadata • File Data stored as attachments or in the body of emails 7/32

  8. Data Organization and Access cont’d (a) Lost metadata update (b) Lost part of data update Metadata and Data Access • Client cache management • Metadata update • Data access operations Consistency and Failure Recovery • Adopt a mechanism to ensure the atomicity of updates

  9. Agenda Introduction Data Organization and Access Email-based File System Design Performance Evaluation Related Work Conclusion 9/32

  10. Email Protocol Selection Simple Mail Transfer Protocol(SMTP) • Only used for transferring emails to the server • Restriction on number of messages sent through SMTP Internet Message Access Protocol (IMAP) • Support both sending and retrieving messages • Allows users to “append” a message to their own mailbox • Not limited by traffic restrictions Post Office Protocol (POP) • Primarily used for retrieving emails • Supports simple download-and-delete access pattern

  11. Email Protocol Selection cont’d • Email sending and appending performance • IMAP is faster than SMTP in almost all cases, by 5.5% on average and up to 42.64%

  12. Data Placement Within Emails Multiple places used to store data in an email • Headers • Subject line • Body • Attachment In EMFS • Metadata is stored in the body section • The unique identifiers are stored in the subject line • Data can be stored either as attachments or in the body

  13. Data Placement Within Emails cont’s Single email sending/retrieving performance • Similar performance regardless of whether the payload is placed in the body or the attachment • Attachment payload slightly outperforms the body payload with Gmail

  14. Block Size and File Striping Organize email accounts as a RAID • Each account identified by a ”RAID Index” from 0 to n-1 • Data blocks striped across email accounts • Blocks stored on randomly chosen disks instead of having a fixed array of email disks and striping data in a round-robin manner • Metadata emails are usually small, so they are not striped EMFS uses 512KB as its default block size and 8 as the default stripe width

  15. Block Size and File Striping cont’d Figure 5 measures a 4MB file’s read/write latency • File access latency steadily decreases when we increase the file block (attachment) size, for both Gmail and Gaweb mail

  16. Block Size and File Striping cont’d • Figure 6 and 7 show the effect of striping with different block sizes • Striping provides a significant performance improvement • Increasing the stripe width beyond 8 or the block size beyond 1MB does not help the performance • Block sizes smaller than 256KB degrades performance in almost all cases

  17. Data Replication Replication group • Consists of two or more disks mirroring the same data • Updates written to one of the email disks within the group • Email disks (accounts) can be added or removed from a group Replication Strategies • Read-one and Write-one • All reads and writes from EMFS go to the same email account • Read-fast and Write-fast • Reads and writes go to different accounts based on their uploading and downloading performance

  18. Agenda Introduction Data Organization and Access Email-based File System Design Performance Evaluation Related Work Conclusion 18/32

  19. EMFS Evaluation System Implementation • Prototype is based on FUSE • Implemented in around 3000 lines of Python code • Two replication strategies implemented for comparison What we do • Compare EMFS with three existing distributed file systems • Use Postmark and IOZone and a synthetic file access benchmark Experiment Setup • Duo-core desktop (2.66 Ghz) with 3 GB of RAM running Ubuntu 8.10 • Both NFS and AFS servers were configured on dedicated machines inside the campus network • Jungle Disk was configured such that background or asynchronous transfers were disabled • EMFS was configured using accounts from Gmail and GawabMail

  20. Performance Results – Postmark • Postmark measures performance for network based systems by simulating access on short lived small files • Generate different workloads (equal bias, read heavy, append heavy, and create heavy) by varying the operation bias Settings • 200 files • File size range from 4K to 16MB • 200 transactions Results • AFS and NFS perform better than EMFS and Jungle Disk • EMFS offers comparable performance to Jungle Disk • EMFS-Fast does offer better performance than EMFS-One

  21. Performance Results – IOZone • Unlike Postmark, IOZone mainly focuses on file data access Settings • 16 MB file • Request sizes range from 128 KB to 4 MB Results • AFS and Jungle Disk achieve a transfer rate between 25 to 50 MB/s for sequential read • EMFS reports very high transfer rates • Jungle Disk reports very low throughput (about 550-600 KB/s) for random reads

  22. Performance Results – IOZone cont’d Settings • 16 MB file • Request sizes range from 128 KB to 4 MB Results • EMFS is slightly better than Jungle Disk in terms of write throughput • NFS and AFS are faster due to their high file transfer performance and low overhead

  23. Performance Results – Editing Workload • A synthetic benchmark that simulates a document editing task Settings • 100 files, 14 directories (with a maximum depth of 3) • File sizes range from 8KB to 4MB Results • Lookup operations for AFS is lightning fast • EMFS-Prefetchhelp reducing the total lookup time by 17.4% • All systems perform nearly the same for editing operations. • EMFS-Fast does bring an improvement of 31% for file save operation, which is quite close to Jungle Disk.

  24. Agenda Introduction Data Organization and Access Email-based File System Design Performance Evaluation Related Work Conclusion 24/32

  25. Related Work Email-based file systems • GmailFS [http://sr71.net/projects/gmailfs/] • YaFS[Lu, et al., IPDPS 2009] • Free email accounts for data backup [Traeger, et al., StorageSS 2006] • EMFS systematically examines email-based file system design issues Other existing client-server systems • LftpFS [http://lftpfs.sourceforge.net/] • ExpandDrive [http://en.wikipedia.org/wiki/ExpanDrive] • EMFS enables users to take advantage of widely available and increasingly powerful web-based email services Distributed file systems • NFS [Pawlowski,et al., USENIX 1994], AFS [Howard,et al., ACM Trans 1998], LBFS [Muthitacharoen, et al., SOSP 2001], GFS [Ghemawat, et al., SOSP 2003], and Ceph [Weil, et al., SODI 2006] • EMFS complements existing studies on distributed file/storage systems

  26. Conclusion • To our best knowledge, our work is the first that systematically examines email-based file system design issues, and thoroughly • Contributions • Provides a personal cloud storage solution on top of multiple web-based free email accounts • Implements a prototype based on FUSE • Evaluates the effectiveness of features such as multi-account space aggregation, file striping, and data replication 26/32

  27. Thank you Questions? 27/32

More Related