240 likes | 362 Views
SaddleHill. A SCSI I/O Generator By Owen Parry. Project Motivation. To create an application that exercises new controller firmware and hardware. Provide the ability to rapidly add features. Provide more tolerance for hardware/firmware failures.
E N D
SaddleHill A SCSI I/O Generator By Owen Parry
Project Motivation • To create an application that exercises new controller firmware and hardware. • Provide the ability to rapidly add features. • Provide more tolerance for hardware/firmware failures. • To Develop a mechanism that allows multiple hosts to randomly access and efficiently share the SATA affiliations. • Current methods seek to avoid the Serial ATA limitation; using single initiator in a SAS domain; limiting communication with the disk drive to only 1 initiator at a time. • Need simple and decentralized strategy for an embedded environment. • Achieve long-term max-min fairness. • Avoid violating I/O time limits.
Background • SCSI targeted at the enterprise storage market. Used primarily to attach hard disk drives. • High performance • RPM: 10K, 15K • Seek Time: 3.2 – 7.4 ms • Greater reliability. • MTBF: 1.2 M Hr • Capacity, 18 – 300Gb • Expensive: $160 - $1400 • Multiple host support. • ATA targeted at the desktop market. • Medium performance • RPM: 5400, 7200 • Seek Time: 8.9 – 9.5 ms • Mediocre reliability • MTBF: 500 K hr • Capacity, 40GB – 1Tb • Cheap: $75 - $300 • Single host.
Background • Serial Attached SCSI is the new Transport protocol replacing parallel SCSI. • SAS Advantages. • Faster Data Rates. • SAS-1:300 MB/s • SAS-2:600MB/s • Larger Drive counts. • Typical Domain size 128 • 16K addresses using fan-out expanders • Increased data integrity. • Configuration flexibility. • Supports Serial ATA Drives.
Background • Problem with SATA in SAS topology • Architecture only allows a single host. • SAS uses mutual exclusion called an “Affiliation.” The first initiator to open a connection may own the affilation indefinitely. • Vendors want to simultaneously issue commands to SATA disks from multiple initiators.
Related Work • Unable to locate other works in the storage area. • Closely Related Research • Wireless LANs • Bandwidth sharing schemes: • Maxmin Fair Scheduling in Wireless Networks, Leandros Tassiulas and Saswati Sarkar. • Channel time sharing schemes: • Proportional Fairness in Wireless LANs and Ad Hoc Networks, Li Bin Jian, and Soung Chang Liew • Time-based fairness improves Performance in Multi-rate WLANs, Godfrey Tan and John Guttag.
SaddleHill Design / Implementation • Built using Trolltech’s Qt 4.2.3 • Compiled for x86_64 bit systems. • Comprised of four logical blocks.
SaddleHill Design / Implementation • MainWindow • Lists PCI SAS Initiators devices. • Lists SAS Target devices. • Displays Live Test Statistics. • Displays Application messages. • Accepts user input.
SaddleHill Design / Implementation • Management Unit • Manages SaddleHill’s physical I/O Data buffers, and Initiator operational buffers. • Address conversion: Virtual to Physical; Physical to Virtual. • Maintains a list of SAS Initiators and Targets. • Maintain the application message log. • Maintain the model objects (system device, message, statistics) which are used by the GUI to gather and display information to user. • Distributes device configurations. • Starts/Stops I/O tests. • Calculates I/O and Throughput rates.
SaddleHill Design / Implementation • IO Engine • Initializes SAS targets. • Maintains disk SAS Addresses, and Target ID. • Generates, Issues, and Completes SCSI Commands. e.g. Read10, Write10, Write And Verify10, Inquiry, Read Capacity etc. • Comprises three threads to perform each of the above tasks. • Maintains statistics: • Number of I/Os issued • Number of I/Os completed • Error count • Amount of Data Transferred. • I/O Response times.
SaddleHill Design / Implementation • Hardware Abstraction Layer • SaddleHillDriver • Registers with linux kernel as a character device. • Registers with PCI core. • Allocates blocks of physical memory. Currently 16 MB. • Reserves the physical memory to prevent swapping. • Provides the facilities to map PCI SAS I/O control registers to user space. • Provides the facilities to map the physical memory to user space. • Provides PCI Device configuration information to user space application. • HAL (User Level) • Implements the MPI specification • Initializes the SAS Adapter • Converts Requests from IO Engine to the MPI specific format. • Sends requests to and receive replies to/from the Initiator via the PCI control registers. • Processes MPI Replies and completes request to IO Engine. • Manages STP Affiliations. • Maintains test statistics • Number of I/Os issues. • Number of I/Os completed. • Error Count. • Amount of Data Transferred. • I/O Response Times. • Affiliation ownership times. • Affiliation synchronization count.
Affiliation Synchronization • Uses idea put forward in “Proportional Firness in Wireless LANs and Ad Hoc Networks.” • Fix the maximum transmission time. • Contend fairly among the initiators for the mutex. • Implementation • Affiliation Acquisition • Acquisition started be reception of new I/O • Calculate back-off. • Use uniform distribution random number generator to choose back-off time within contention window size. • Generate SCSI Inquiry command • Sleep for length of back-off • Issue Inquiry • Failed synchronization attempt doubles contention window size. • Start timer on successful acquisition • Affiliation Release • Resource released if no I/Os are waiting to be sent. • Resource released after ownership timer expires. • There are no preemptions. • I/Os are placed into a waiting state during the release and acquisition process. • I/Os outstanding at the time of release are allowed to completed. • The truncated binary exponential back-off strategy is used to calculate the back-off times.
Finding the Back-Off Strategy • Considered strategies for back-off included: • No Back-off • Fixed Window • BEB • TBEB • Logarithmic • Test Strategy: • Read10, Write10 commands • Single Block Transfers • Same LBA • Drive Caching Enabled • NCQ enabled • Drive Q-Depth = 8 • 3Gb SATA disk • Multiple Initiators
Finding the Back-Off Strategy • STP Ownership Times
Finding the Back-Off Strategy • Synchronization Requests
Finding the Back-Off Strategy • Average I/O Response
Finding the Back-Off Strategy • No Back-Off • Too many synchronization attempts • Depending on topology configuration will favor some initiators • Fixed Window • There is no way to chose the appropriate window size. • BEB • Violates the I/O time limits in long test runs. • Logarithmic • Achieves near perfect max-min fairness in resource ownership in both short and long terms. • Large number of synchronization requests. Unacceptable in large topologies. • The Truncated Binary Exponential Strategy was chosen for the implementation of the synchronization algorithm • Closely achieves long-term max-min fairness • Low number of synchronization attempts.
Performance • Transaction processing profile was used. • Small Block Transfer (1-16 Blocks) • Concerned with I/O Rates rather than throughput. • Single Initiator • Same IO size ~2250 IOPS. • Random IO sizes ~902 IOPS. • Dual Initiators • Same IO sizes ~ 2075 IOPS. 8% Performance decrease. • Random IO sizes ~ 786 IOPS. 12% Performance decrease • Quad Initiators • Same IO sizes ~ 1975 IOPS. 14% Performance decrease • Random IO sizes ~745 IOPS. 17% Performance decrease
Future Directions • Due to the challenges of SATA in enterprise storage environments. Vendors are employing varying strategies to deal with the SATA problem. These include: • Completely removing SATA from topologies. • Building special hardware that increase the affiliation resources. • The STP Resource sharing algorithm will be moved to the SAS Initiator port. • Requires a change in the mechanism that acquires an releases affiliations. • Utilize the SAS CLOSE(CLEAR AFFILIATION) primitive when tearing down connections. • Simply convert and issue host IO. • SaddleHill • Short Term • Support SAS-2 Initiator • Support additional SBC and SPC commands • Support SSC and MMC SCSI command sets • FW Upgrade Support • Initiator Configuration Modification • Long Term • Build into a automated firmware unit test system.
Conclusion • All project goals achieved • User-Level SCSI I/O generator • Synchronization algorithm that meets the simplicity, fairness and decentralization objectives.