I/O TAGGING SMALL FILES vs BIG FILES

I/O TAGGINGSMALL FILES vs BIG FILES Sivaraman Sivaraman Ming Chen

MOTIVATION-DISK CACHING • Universal challenges in the industry • Keeping the right data cached • Conventional approaches • Evict cold data (LRU commonly used) • How I/O classification can help • Identify cacheable I/O classes • Assign relative caching priorities(e.g., large files shall be evicted before metadata and small files)

Classify each I/O in-band BACKGROUND-Differentiated Storage Services FS classification FS policy assignment FS policy enforcement Storage system Computer system Management firmware Applications or DB QoS Policies I/O Classification Disk SSD Operating system I/O Classification Storage controller File system QoS Mechanisms I/O Classification

Classifier and Classes 5 bits  32 classes

Outline • Overview • Algorithm and Implementation • Methodology • Verification • Impression benchmarks • Results • Conclusion and Future work

Overview • No Modification in OS • Implemented Pseudo Block Device Driver to track the I/O requests • Created our own benchmarks to test our implementation • Devised an Algorithm to identify if the I/O requests belong to a Small file or Big file, without any knowledge about the file system • Used information from DebugFS to make accurate prediction. • Compared the outputs of our prediction with the accurate model. • Ran Realistic benchmarks like Impressions to find the problem with our Implementation • Changed few lines in OS to improve our prediction accuracy • No modification in OS ---->Slight modification in OS.

Algorithm and Implementation Verification User Space IOCTL syscall Ext3 FS Pseudo Block Device RAM Disk Kernel Space

Data Structure of I/O Request • Write or Read • Address • Data block content within that request

Methodology • Ext3 with ordered journal • One block group • Each block is 4 KB

Algorithm and Implementation • Set a threshold to differentiate small files from big files • Bitmap • buf_queue structure

Algorithm and Implementation structbuf_req{ int offset; int length; char req_data[BLOCKSIZE]; }; structbuf_queue{ structbuf_reqreq[threshold value]; //current pointer for the queue intcp; //addr of last request; intp_addr; //state of last request, small or big; intp_state; };

Algorithm and Implementation • If read: • handle req in queue (tag as small) • tag the new req according to bitmap • p_state = small • If write: • if not contiguous: • handle req in queue (tag as small) • push it to queue • p_state = small • else: • ifp_state == big: • tag new req as big • else: • push it into queue • if queue is full: • handle req in queue (tag as big) • p_state = big

verification • Scan all the files. • Debugfs “stat filename” • Tag all data blocks according to its inode information.

IMPRESSIONS BENCHMARKS • Impressions, a tool for generating realistic, reproducible file system images for testing and performance evaluation. • Impressions incorporates various statistical techniques to produce realistic file system images. • Impressions gives the user flexibility to specify one or more parameters from a detailed list of file system parameters (file-system size, number of files, distribution of file sizes, etc.) • Impressions is deterministic: given the same set of starting parameters and random seeds, it will generate the same file system image. NitinAgrawal , Andrea C. Arpaci-Dusseau , Remzi H. Arpaci-DusseauGenerating realistic impressions for file-system benchmarking, Proccedings of the 7th conference on File and storage technologies, p.125-138, February 24-27, 2009, San Francisco, California

KERNEL MODIFICATIONS • Impression benchmarks do not use Fsync • Our Algorithm requires Fsync after every writes-Not useful for realistic Benchmarks. • Fixed the problem by adding 5 lines of code in Linux Ext3 FS Kernel • No OS Changes ---> Few changes in OS • Solution : Allocate One free block between data blocks of different files. Helps us to differentiate between data blocks of different files. • Implementation: Before allocating Data blocks, the inode of the file requesting it is looked up. For files requesting first data block ,2 blocks are allocated and the data is stored only in the second block.

RESULTS File System Size = 36 MB

RESULTS Threshold =10

CONCLUSION AND FUTURE WORK • I/O Classification can also be done in user-space with minor modifications in OS Block layer • Overall prediction Accuracy is greater than 95% • Knowing File System and its layout, more Accurate Classification can be done. But it is limited to that particular file system. • I/O tagging can be used to make effective use of Caches. Future Work: • Problems with indirect pointers can be fixed to improve accuracy percentage. • More simple algorithms can be designed to classify I/O into various other classes. • Intelligent caching is just the beginning. Other types of performance differentiation like Security, reliability can be exploredto make use of I/O Tagging.

THANK YOU QUESTIONS????

References • Michael Mesnier, Jason Akers, Feng Chen, TianLuo. Differentiated Storage Services. 23rd ACM Symposium on Operating Systems Principles (SOSP). October 2011. • NitinAgrawal , Andrea C. Arpaci-Dusseau , Remzi H. Arpaci-DusseauGenerating realistic impressions for file-system benchmarking, Proccedings of the 7th conference on File and storage technologies, p.125-138, February 24-27, 2009, San Francisco, California • http://blog.superpat.com/2010/05/04/a-simple-block-driver-for-linux-kernel-2-6-31/

I/O TAGGING SMALL FILES vs BIG FILES

I/O TAGGING SMALL FILES vs BIG FILES

Presentation Transcript

91.580.203 Computer Network Forensics

Chapter 5 Working with Files and Directories PHP Programming with MySQL 2 nd Edition

Indexed Files

Working with Image Files

Chapter 6 Working with Files and Directories PHP Programming with MySQL

Managing Your Files

Managing Your Files

Downloading Files Using FTP

Lesson 3: File Management

Files in C

Working with Files in C

Syslog and Log Rotate

Managing Your Files

Files and Streams

Tagging b-jets with ’s

Managing Your Files

Chapter 11 Syslog And Log F iles

Reading from and Writing to Files

Files in C

Managing Your Files

Managing Your Files