Storage SystemsCSE 598D, Spring 2007 Lecture 1: Introduction and Overview January 25, 2007
How this course will work • Class meetings twice a week • Tue, Thu: 5.30 - 6.45 pm, 223B • Lectures by me in most classes • Some student presentations • To be determined as the course progresses • Everyone should participate in discussions • Part of your grade for participation! • Scribe notes to record lectures and discussions • 2-3 assignments • May involve some simple system building: details to be decided • Some written homeworks • Online resources • Class URL via my Web page • Slides/scribe notes/assignments on Angel • Please make sure your Angel email works
How this course will work • Background expected • Operating systems (411-level) • Basic knowledge of file systems, I/O subsystem, DMA, device drivers, … • Distributed systems • Consistency semantics, replication, caching, synchronization, … • Algorithms and data structures (undergraduate-level) • Analysis of algorithms, basic data structures • Will cover background material whenever needed • Your feedback important in deciding what to cover
How this course will work • No text-book • I will use some chapters from a set of books • If needed, photocopies of these will be made available to you • Syllabus consists of material presented in class • Most of it based on research papers made available on the course page • Not up yet but will be soon • Additional reading material: for background or to delve deeper • What you need to do • Read assigned papers BEFORE each class • During the class • Ask questions, express your opinions, argue! • Goal: Learn about storage systems • Also learn • How to read a research paper? • How to write a good systems paper? • What separates good (systems) research from bad?
How this course will work • Grading • Scribe notes: 10% • Detailed notes that one can go back to and find everything that was presented and discussed in the class • And that you can use for revision before the exam! • Participation in class: 10% • Mid-term exam: 20% • Presentation: 10% • Assignments (2-3): 20% • Survey or Project: 30%
How this course will work • Survey • A 10-15 page comprehensive exploration/synthesis of an area related to storage systems at the end of the semester • Project • Groups of up to 2 students • Identify a problem and motivate the need to solve it • Convince where existing research lacks • Develop and evaluate your solution • Present it in a paper-style write-up at the end of the semester
Today • Some background/history on storage systems • Overview of course content • A superset of topics we will study
Why Applications Need Storage • Memory is • Volatile: Durability is needed • Not enough: High Capacity is needed • Not easy to share/move: Portability is needed • Expensive • Non-volatile, cheap, long-lasting, reliable, abundant storage is needed for numerous applications • Personal/individual applications • Scientific applications • Enterprise applications • Internet scale applications • Emerging sensor networks, highly distributed systems such as some P2P systems
Personal Applications • Email, Contacts, Schedules, … • Financial data, personal files, … • Media files • Gaming
Sanger Institute Sequencing facility to add 100 TB each yr. CERN Particle Collider Scientific Applications • Manipulate large data sets: Either explicitly (files) or implicitly (VM). NASA EOSDIS
Enterprise Applications • File and Email servers • OLTP • OLAP • Other Database applications • SAP • Financial workloads • …
Data Grids Internet Scale Applications
IBM 305 RAMAC - 1956Random Access Method of Accounting and Control • 5 MB capacity, 50 disks each 24” diameter, 2000 bits/sq-inch density • First computer with magnetic hard disk • Replaced the “magnetic drum” • Could store roughly 2000 pages of text!
Seagate Savvio 10K.1 - 2004 • 10K RPM, 73.4 GBytes • Can read and write complete works of Shakespeare 15 times each second!
Seagate Savvio 15K - 2007 • 15K RPM, 73.4 Gbytes • World’s fastest disk?
Storage Devices/Hardware Storage Area Networks RAID Arrays Tape Archives
Overview of Course • What goes on inside a disk? • Hardware • Modeling the disk • Performance optimizations • Disk scheduling • Rearranging data blocks • How do you improve bandwidth to/from disks? • RAID arrays • Reduce data transferred from disks (Active Disks) • Storage Area Networks to allow concurrent transfers to/from several hosts • Shared Storage Model
How can software take advantage of these enhancements • Review of the OS I/O subsystem • How sys-admins manage storage • File Systems for NAS/SAN • Caching and Pre-fetching • Theory of storage • Which problems are hard? • Important data structures • With shared storage, and a very complicated storage system, how do we manage this hierarchy? • Storage Provisioning • QoS Control/Virtualization • Security • Case-studies of enterprise storage systems (e.g., EMC, Veritas)
Requirements are becoming more stringent - we need do guarantee availability, and store data for a long time (archival storage). How do we achieve this? • Dependability/Availability issues • Disaster management • Data lifetime • Power and thermal management of storage systems • Storage in highly distributed systems • Storage in P2P systems • Sensor storage • Grid-like infrastructure based storage: E.g., Oceanstore • Storage in search, information retrieval • Google File System • Are disks going to be the norm in the future? • Future of magnetic storage • MEMS • Flash storage • Windows Vista for laptops
Part of the material will be from these books • “Storage Networks Explained” (Wiley), Troppens, Erkens, and Muller • “The Holy Grail of Storage Management” by Toigo • “Storage Area Network Essentials” (Wiley) by Barker and Massiglia
Next time • Hard disk • Certain aspects of I/O subsystem • Spanning hardware and OS
L2 iL1 Memory Bus (e.g. PC133) Main Memory dL1 I/O System View CPU Software Stack Appln. File System Buffer Manager Device Driver e.g. SCSI I/O Bus (e.g. PCI) Disk Ctrller Controller(ASIC) Device Firmware Cache DMA engine Platters Actuator Motors Electronics