1 / 20

I/O TAGGING SMALL FILES vs BIG FILES

I/O TAGGING SMALL FILES vs BIG FILES. Sivaraman Sivaraman Ming C hen. MOTIVATION- DISK CACHING. Universal challenges in the industry Keeping the right data cached Conventional approaches Evict cold data (LRU commonly used ) How I/O classification can help

nevan
Download Presentation

I/O TAGGING SMALL FILES vs BIG FILES

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. I/O TAGGINGSMALL FILES vs BIG FILES Sivaraman Sivaraman Ming Chen

  2. MOTIVATION-DISK CACHING • Universal challenges in the industry • Keeping the right data cached • Conventional approaches • Evict cold data (LRU commonly used) • How I/O classification can help • Identify cacheable I/O classes • Assign relative caching priorities(e.g., large files shall be evicted before metadata and small files)

  3. Classify each I/O in-band BACKGROUND-Differentiated Storage Services FS classification FS policy assignment FS policy enforcement Storage system Computer system Management firmware Applications or DB QoS Policies I/O Classification Disk SSD Operating system I/O Classification Storage controller File system QoS Mechanisms I/O Classification

  4. Classifier and Classes 5 bits  32 classes

  5. Outline • Overview • Algorithm and Implementation • Methodology • Verification • Impression benchmarks • Results • Conclusion and Future work

  6. Overview • No Modification in OS • Implemented Pseudo Block Device Driver to track the I/O requests • Created our own benchmarks to test our implementation • Devised an Algorithm to identify if the I/O requests belong to a Small file or Big file, without any knowledge about the file system • Used information from DebugFS to make accurate prediction. • Compared the outputs of our prediction with the accurate model. • Ran Realistic benchmarks like Impressions to find the problem with our Implementation • Changed few lines in OS to improve our prediction accuracy • No modification in OS ---->Slight modification in OS.

  7. Algorithm and Implementation Verification User Space IOCTL syscall Ext3 FS Pseudo Block Device RAM Disk Kernel Space

  8. Data Structure of I/O Request • Write or Read • Address • Data block content within that request

  9. Methodology • Ext3 with ordered journal • One block group • Each block is 4 KB

  10. Algorithm and Implementation • Set a threshold to differentiate small files from big files • Bitmap • buf_queue structure

  11. Algorithm and Implementation structbuf_req{ int offset; int length; char req_data[BLOCKSIZE]; }; structbuf_queue{ structbuf_reqreq[threshold value]; //current pointer for the queue intcp; //addr of last request; intp_addr; //state of last request, small or big; intp_state; };

  12. Algorithm and Implementation • If read: • handle req in queue (tag as small) • tag the new req according to bitmap • p_state = small • If write: • if not contiguous: • handle req in queue (tag as small) • push it to queue • p_state = small • else: • ifp_state == big: • tag new req as big • else: • push it into queue • if queue is full: • handle req in queue (tag as big) • p_state = big

  13. verification • Scan all the files. • Debugfs “stat filename” • Tag all data blocks according to its inode information.

  14. IMPRESSIONS BENCHMARKS • Impressions, a tool for generating realistic, reproducible file system images for testing and performance evaluation. • Impressions incorporates various statistical techniques to produce realistic file system images. • Impressions gives the user flexibility to specify one or more parameters from a detailed list of file system parameters (file-system size, number of files, distribution of file sizes, etc.) • Impressions is deterministic: given the same set of starting parameters and random seeds, it will generate the same file system image. NitinAgrawal , Andrea C. Arpaci-Dusseau , Remzi H. Arpaci-DusseauGenerating realistic impressions for file-system benchmarking, Proccedings of the 7th conference on File and storage technologies, p.125-138, February 24-27, 2009, San Francisco, California

  15. KERNEL MODIFICATIONS • Impression benchmarks do not use Fsync • Our Algorithm requires Fsync after every writes-Not useful for realistic Benchmarks. • Fixed the problem by adding 5 lines of code in Linux Ext3 FS Kernel • No OS Changes ---> Few changes in OS • Solution : Allocate One free block between data blocks of different files. Helps us to differentiate between data blocks of different files. • Implementation: Before allocating Data blocks, the inode of the file requesting it is looked up. For files requesting first data block ,2 blocks are allocated and the data is stored only in the second block.

  16. RESULTS File System Size = 36 MB

  17. RESULTS Threshold =10

  18. CONCLUSION AND FUTURE WORK • I/O Classification can also be done in user-space with minor modifications in OS Block layer • Overall prediction Accuracy is greater than 95% • Knowing File System and its layout, more Accurate Classification can be done. But it is limited to that particular file system. • I/O tagging can be used to make effective use of Caches. Future Work: • Problems with indirect pointers can be fixed to improve accuracy percentage. • More simple algorithms can be designed to classify I/O into various other classes. • Intelligent caching is just the beginning. Other types of performance differentiation like Security, reliability can be exploredto make use of I/O Tagging.

  19. THANK YOU QUESTIONS????

  20. References • Michael Mesnier, Jason Akers, Feng Chen, TianLuo. Differentiated Storage Services. 23rd ACM Symposium on Operating Systems Principles (SOSP). October 2011. • NitinAgrawal , Andrea C. Arpaci-Dusseau , Remzi H. Arpaci-DusseauGenerating realistic impressions for file-system benchmarking, Proccedings of the 7th conference on File and storage technologies, p.125-138, February 24-27, 2009, San Francisco, California • http://blog.superpat.com/2010/05/04/a-simple-block-driver-for-linux-kernel-2-6-31/

More Related