1 / 49

Design and Performance Evaluation of Networked Storage Architectures

Design and Performance Evaluation of Networked Storage Architectures. Xubin He (Hexb@ele.uri.edu) July 25,2002 Dept. of Electrical and Computer Engineering University of Rhode Island. Outline. Introduction STICS: SCSI-To-IP Cache for Storage Area Networks

Download Presentation

Design and Performance Evaluation of Networked Storage Architectures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Design and Performance Evaluation of Networked Storage Architectures Xubin He (Hexb@ele.uri.edu) July 25,2002 Dept. of Electrical and Computer Engineering University of Rhode Island

  2. Outline • Introduction • STICS: SCSI-To-IP Cache for Storage Area Networks • DRALIC: Distributed RAID & Location Independence Cache • vcRAID: Large Virtual NVRAM Cache for Software RAID • Performance Eval. on Distributed Web Server Architectures • Conclusions High Performance Computing Lab(HPCL),URI

  3. Background • Data storage plays an essential role in today’s fast-growing data-intensive network services. • Online data storage doubles every 9 months • Storage is approaching more than 50% of IT spending.The storage cost will be up to 75% of the total IT cost in year 2003. High Performance Computing Lab(HPCL),URI

  4. A Server-to-Storage Bottleneck Source: Brocade

  5. Motivations • How to deploy data over the network efficiently and reliably? • Disparities between SCSI & IP • SCSI remote handshaking over IP • Processor-disk gap growing • High speed network • Large client memories • Cheap Disk & RAM, expensive NVRAM • RAID5 is reliable, but low performance • E-commerce over the Internet, distributed web servers STICS DRALIC vcRAID High Performance Computing Lab(HPCL),URI

  6. Introduction STICS: SCSI-To-IP Cache for Storage Area Networks DRALIC: Distributed RAID & Location Independence Cache vcRAID: Large Virtual NVRAM Cache for Software RAID Performance Eval. on Distributed Web Server Architectures Conclusions High Performance Computing Lab(HPCL),URI

  7. Introducing a New Device:STICS • Whenever there is a disparity, cache helps • Features of STICS: • Smooth out disparities between SCSI and IP • Localize SCSI protocol and filter out unnecessary traffic reducing bandwidth requirement • Nonvolatile data caching • Improve performance, reliability, manageability and scalability over current iSCSI systems. High Performance Computing Lab(HPCL),URI

  8. Disks or SAN NAS Host 1 TCP/IP Internet STICS 1 STICS 2 SCSI SCSI TCP/IP TCP/IP Host 2 or Storage Host M or Storage STICS 3 STICS N SCSI SCSI System Overview System overview. A STICS connects to the host via SCSI interface and connects to other STICS’ or NAS via Internet.

  9. SCSI Interface Processor RAM Log Disk Storage device STICS Architecture Network Interface

  10. Internal Cache Structure Memory Meta Data Cache Data log Disk Cache High Performance Computing Lab(HPCL),URI

  11. Basic Operations • Write • Write requests from the host via SCSI • Write requests from another STICS via NIC • Read • Read requests from the host via SCSI • Read requests from another STICS via NIC • Destage • RAM —> log disk • Log disk —> storage device • Prefetch • Storage device —> RAM High Performance Computing Lab(HPCL),URI

  12. Web-based Network Management Web browser-based Manager HTTP HTTP Servlet Management App. TCP/IP TCP/IP Local Manage App. High Performance Computing Lab(HPCL),URI

  13. Implementation Platform • A STICS block is a PC running Linux • OS: Linux with kernel 2.4.2 • Compiler: gcc • Interfaces: STICS IP SCSI High Performance Computing Lab(HPCL),URI

  14. Performance Evaluations • Methodology • iSCSI implementation on Linux by Intel (iSCSI) • Initial STICS Implementation on Linux • Two modes: • Immediate report (STICS-Imm) • Report after complete (STICS) • Workloads • Postmark of Network Appliances: throughput • Two configurations • Small: 1000/50k/436MB • Large: 20k/100k/740MB • EMC Trace :response time • More than 230,000 I/O requests • Data set size: >900MB High Performance Computing Lab(HPCL),URI

  15. Target (Squid) Target (Squid) Host (Trout) Host (Trout) Block Data iSCSI commands and data SCSI SCSI Disks Disks NIC STICS 2 STICS 1 NIC Switch Switch Cod iSCSI configuration. The host Trout establishes connection to target, and the target Squid responds and connects. Then the Squid exports hard drive and Trout sees the disks as local. Cod STICS configuration. The STICS cache data from both SCSI and network. Experimental Settings

  16. PostMark Results: Throughput

  17. Where does the benefit come from? # Of packets with different sizes (bytes) Network traffic analysis

  18. a) STICS with immediate report(2.7 ms) b) STICS with report after complete (5.71 ms). c) iSCSI (16.73 ms). EMC Trace Results: Response Time Histograms of I/O response times for trace EMC-tel. High Performance Computing Lab(HPCL),URI

  19. Summary • A novel cache storage device that adds a new dimension to networked storages • Significantly improving performance of iSCSI • A cost-effective solution for building efficient SAN over IP • Allow easy manageability, maintainability, and scalability High Performance Computing Lab(HPCL),URI

  20. Introduction STICS: SCSI-To-IP Cache for Storage Area Networks DRALIC: Distributed RAID and Location Independence Cache vcRAID: Large Virtual NVRAM Cache for Software RAID Performance Eval. on Distributed Web Server Architectures Conclusions High Performance Computing Lab(HPCL),URI

  21. Web Servers • Overhead caused by FS is high • Enterprise web server is expensive • A Fujitsu Server: More than $5 million • PCs are cheap: $1000 • Disks: $160/120GB (IBM Deskstar@CompUSA) • DRAM:$100/256MB(@Crucial.com) High Performance Computing Lab(HPCL),URI

  22. My Solution • Combine or bridge the disk controller and network controller of existing PCs interconnected by a high-speed switch. • Share memory and storage among peers High Performance Computing Lab(HPCL),URI

  23. Performance analysis B: data block size (8KB) N: number of nodes Hlm: Local memory hit ratio Hrm: Remote memory hit ratio Tlm: Local memory access time Trm: Remote memory access time Traid: access time from the distributed RAID Tdralic: Average response time of DRALIC system High Performance Computing Lab(HPCL),URI

  24. Preliminary Performance Analysis

  25. Simulation Results • DRALICSim: a simulator based on socket communication. • Benchmark: • PostMark: measures performance in terms of transaction rates provided by Network Appliance Inc. • Configurations: 1000 initial files and 50000 transactions (small), 20000/50000(medium) and 20000/100000(large) • 4 Nodes running Windows NT High Performance Computing Lab(HPCL),URI

  26. Simulation Results High Performance Computing Lab(HPCL),URI

  27. Summary • Combination of HBAs and NICs will reduce the overhead. • Share memory and storage among peers • Make use of existing resources • Our simulator has the performance gain up to 4.2 with 4 nodes High Performance Computing Lab(HPCL),URI

  28. Introduction STICS: SCSI-To-IP Cache for Storage Area Networks DRALIC: Distributed RAID & Location Independence Cache vcRAID: Large Virtual NVRAM Cache for Software RAID Performance Eval. on Distributed Web Server Architectures Conclusions High Performance Computing Lab(HPCL),URI

  29. VC-RAID • Hiding the small write penalty of RAID5 by buffering small writes and destaging data back to RAID with parity computation when disk activity is low. • A combination of a small portion of the system RAM and a log disk to form a hierarchical cache. • This hierarchical cache appearing to the host as a large nonvolatile RAM. High Performance Computing Lab(HPCL),URI

  30. Architecture Buffer Cache Main Memory OS kernel Cache Disk RAID5 High Performance Computing Lab(HPCL),URI

  31. Approaches High Performance Computing Lab(HPCL),URI

  32. Performance Results • Test environment: Gateway G6-400, 64MB RAM, 4M RAM buffer, 200 MB Cache disk, 4 SCSI disks form a disk array. • Benchmarks • Postmark by Network Appliance • Untar/copy/remove • Compared to built-in RAID0 and RAID5 High Performance Computing Lab(HPCL),URI

  33. Throughput High Performance Computing Lab(HPCL),URI

  34. Response time (second)

  35. Summary • Reliable: • based on RAID5 • Hard drive is more reliable than RAM • Cost effective: • hard drives are much cheaper than RAM • Software, don’t need extra hardware • Fast: increasing the cache size High Performance Computing Lab(HPCL),URI

  36. Introduction STICS: SCSI-To-IP Cache for Storage Area Networks DRALIC: Distributed RAID & Location Independence Cache vcRAID: Large Virtual NVRAM Cache for Software RAID Performance Eval. on Distributed Web Server Architectures Conclusions High Performance Computing Lab(HPCL),URI

  37. Observations • E-Commerce has grown explosively • Static web pages that are stored as files are no longer the dominant web accesses. • about 70% of them start CGI, ASP, or Servlet calls to generate dynamic pages. • Web server behaviors and the interaction between web server and database servers High Performance Computing Lab(HPCL),URI

  38. Benchmark and workloads • Workloads • Static pages • Light CGI: 20% / 80%. • Heavy CGI: 90% / 10%. • Heavy servlet: 90% / 10%. • Heavy database access: 90% /10%. • Mixed workload: 7% / 8% / 30% /55% • WebBench 3.5 (6010 static pages, 300 cgi, 300 simple servlets, 400 DB servlets using JDBC, 2 databases with 15 and 18 tables) High Performance Computing Lab(HPCL),URI

  39. Introduction STICS: SCSI-To-IP Cache for Storage Area Networks DRALIC: Distributed RAID & Location Independence Cache vcRAID: Large Virtual NVRAM Cache for Software RAID Performance Eval. on Distributed Web Server Architectures Conclusions High Performance Computing Lab(HPCL),URI

  40. Summary • STICS couples reliable and high speed data caching with low overhead conversion between SCSI and IP. • DRALIC boosts the web server performance by combining disk controller and NIC to reduce FS overhead. • vcRAID presents a reliable and inexpensive solution for data storage. • We carried out an extensive performance study on distributed web server architectures under realistic workloads. High Performance Computing Lab(HPCL),URI

  41. Patents (with Dr. Yang) • STICS: SCSI-To-IP Cache Storage, File pending, Serial Number 60/312,471, August 2001 • DRALIC: Distributed RAid and Location Independence Cache, Filed pending, May 2001 High Performance Computing Lab(HPCL),URI

  42. Publications (Journal) • Xubin He, Qing Yang, and Ming Zhang, “STICS: SCSI-To-IP Cache for Storage Area Networks,” Submitted to IEEE Transactions on Parallel and Distributed Systems. • Xubin He, Qing Yang, “Performance Evaluation of Distributed Web Server Architectures under E-Commerce Workloads,” Submitted to Journal of Parallel and Distributed Computing. • Xubin He, Qing Yang, “On Design and Implementation of a Large Virtual NVRAM Cache for Software RAID,” Special Issue of Journal on Parallel I/O for Cluster Computing, 2002. High Performance Computing Lab(HPCL),URI

  43. Publications (Conference) • Xubin He, Qing Yang, and Ming Zhang, “ A Caching Strategy to Improve iSCSI Performance,” To appear in IEEE Annual Conference on Local Computer Networks, Nov. 6-8, 2002. • Xubin He, Qing Yang, and Ming Zhang, “Introducing SCSI-To-IP Cache for Storage Area Networks,” ICPP’2002, Vancouver, Canada, August 2002. • Xubin He, Ming Zhang, Qing Yang, “DRALIC: A Peer-to-Peer Storage Architecture”, Proc. of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA'2001), 2001. • Xubin He, Qing Yang, “Characterizing the Home Pages”, Proc. of the 2nd International Conference on Internet Computing (IC’2001), 2001. • Xubin He, Qing Yang, “VC-RAID: A Large Virtual NVRAM Cache for Software Do-it-yourself RAID”, Proc. of the International Symposium on Information Systems and Engineering (ISE'2001), 2001. • Xubin He, Qing Yang, “Performance Evaluation of Distributed Web Server Architectures under E-Commerce Workloads”, Proc. of the 1st International Conference on Internet Computing (IC’2000), 2000. High Performance Computing Lab(HPCL),URI

  44. Thank You! Dr. Qing Yang @ELE Dr. Jien-Chung Lo @ELE Dr. Joan Peckham @CS Dr. Peter Swaszek @ELE Dr. Lisa DiPippo @CS And more…

  45. Special thanks to my daughter, Rachel!

More Related