1 / 23

Performance Analysis of Cluster File System on Linux

Performance Analysis of Cluster File System on Linux. Yaodong CHENG IHEP, CAS chyd@ihep.ac.cn. Outline. Introduction Review of cluster file system Data access model Performance analysis formula Performance test Some useful methods. Introduction.

hong
Download Presentation

Performance Analysis of Cluster File System on Linux

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Performance Analysis of Cluster File System on Linux Yaodong CHENG IHEP, CAS chyd@ihep.ac.cn

  2. Outline • Introduction • Review of cluster file system • Data access model • Performance analysis formula • Performance test • Some useful methods CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland

  3. Introduction • Cluster systems made up with PCs are more and more popular • The improvement of commodity hardware and software • CPU, memory, hard disk, network • Linux software technology • How to use the our existing hardware and software more efficiently CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland

  4. job job Compute node1 Compute node N • • • disk disk High speed network I/O Node 1 disk I/O Node N disk • • • disk tape disk Architecture of a cluster system CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland

  5. Cluster file system review • one of the most important methods to share information of cluster system • General characteristics: • Single-system image • Transparency • Good scalability • High performance • Structure • C/S, share-disk, virtual share-disk CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland

  6. Disk Disk Disk IO node N IO node 1 IO node 2 Client N Client 1 Client 2 Manager Node Data access model N e t w o r k I/O Servers ● ● ● ● ● ● Meta Data Server CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland

  7. Some assumptions • Data is processed only in each client • Storage nodes only provide storage capacity and deal with file operations • The traffic between clients and management nodes is very small • The time for dealing with requests of clients is far smaller than the time consumed by transferring data CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland

  8. Performance analysis formula T = max (D*c/N, D/(N*I), D/(M*I), D/(P*R) ) S = D/T = min (N/c, N*I, M*I, P*R) • c: the CPU time to compute each byte; • D: the total of data; I: network speed; M: the number of I/O nodes; N: the number of clients; P: the number of disks in parallel; R: disk speed • T: the minimum access time to total data • S: the maximum aggregate bandwidth • Limitation: P/M >=1 CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland

  9. In above formula, if c is very small, the formula becomes: T = max (D/(N*I), D/(M*I), D/(P*R) ) S = D/T = min (N*I, M*I, P*R) and this formula is the basis of performance analysis in this work CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland

  10. Some cases • N=1, M>=1 (or N>=1 and M=1), R>I  S depends on I • N=1, M>=1 (or N>=1 and M=1), R<I  S depends on I and P*R • N>1, M>1, R>I  S depends on the number of clients and I/O nodes CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland

  11. Test environment • Twelve PCs • I/O nodes, Manager nodes and clients • P4 2.8G/512M/DiskWD80G-8M-7200RPM • OS • CERN Linux 7.3.3 • Kernel: 2.4.20-18.7.cernsmp • Local file system: ext3 • Network: 100M Ethernet • Cluster file system • OpenAFS 1.2.9, NFS v3, PVFS, CASTOR1.6.1.2 CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland

  12. Pre-test • Test tools • Netperf 2.2pl3 • Iozone 3.217 • Local area network bandwidth (I): • 100M Ethernet: about 94.11Mbits/sec • Local file system measurement (R) • ./iozone -Rab local.xls -g 2048M • Recompile IOzone linked with CASTOR RFIO library CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland

  13. One client one server • Only one client access files • Only one I/O nodes in server configuration • Write performance measurement • file size: 512MB • record size: 64KB-16MB • output unit: KB/sec CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland

  14. Results CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland

  15. Multi-process test • Only one client and one I/O node • Many processes access one I/O node simultaneously. • Write performance measurement • File size: 100MB • Record size: 512KB • Process number: 1  10 • Output unit: KB/sec CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland

  16. Results

  17. Multi-client to multi-server • Multiple clients read/write files • Multiple I/O nodes provide file storage • The output is aggregate bandwidth • Only measure CASTOR and PVFS • Write performance • The size of each file: 200M • Record size: 2MByte • Output unit: MB/sec CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland

  18. Results

  19. Some useful methods • In theory, good cluster file system • the data is physically balanced among the I/O devices • the data requirements are balanced among the application’s tasks • network has enough aggregate bandwidth to pass the data between the two without saturating • In practice, the following methods are useful CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland

  20. Use high-speed network, for example Gigabit Ethernet or Myrinet • Use or develop high performance network file transfer protocol • Use multi-server to improve the aggregate bandwidth • Improve the read/write speed of disks • File stripping and parallel I/O • Good file system design • Improve the processing ability of manager nodes CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland

  21. Summary • Cluster file system review • Performance analysis formula • Performance test • Some methods to improve the performance CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland

  22. Thank you!! CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland

More Related