Performance analysis of cluster file system on linux
Download
1 / 23

Performance Analysis of Cluster File System on Linux - PowerPoint PPT Presentation


  • 149 Views
  • Uploaded on

Performance Analysis of Cluster File System on Linux. Yaodong CHENG IHEP, CAS [email protected] Outline. Introduction Review of cluster file system Data access model Performance analysis formula Performance test Some useful methods. Introduction.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Performance Analysis of Cluster File System on Linux' - hong


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Outline
Outline

  • Introduction

  • Review of cluster file system

  • Data access model

  • Performance analysis formula

  • Performance test

  • Some useful methods

CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland


Introduction
Introduction

  • Cluster systems made up with PCs are more and more popular

  • The improvement of commodity hardware and software

    • CPU, memory, hard disk, network

    • Linux software technology

  • How to use the our existing hardware and software more efficiently

CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland


Architecture of a cluster system

job

job

Compute node1

Compute node N

• • •

disk

disk

High speed network

I/O

Node 1

disk

I/O

Node N

disk

• • •

disk

tape

disk

Architecture of a cluster system

CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland


Cluster file system review
Cluster file system review

  • one of the most important methods to share information of cluster system

  • General characteristics:

    • Single-system image

    • Transparency

    • Good scalability

    • High performance

  • Structure

    • C/S, share-disk, virtual share-disk

CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland


Data access model

Disk

Disk

Disk

IO node N

IO node 1

IO node 2

Client N

Client 1

Client 2

Manager Node

Data access model

N e t w o r k

I/O Servers

● ● ●

● ● ●

Meta Data

Server

CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland


Some assumptions
Some assumptions

  • Data is processed only in each client

  • Storage nodes only provide storage capacity and deal with file operations

  • The traffic between clients and management nodes is very small

  • The time for dealing with requests of clients is far smaller than the time consumed by transferring data

CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland


Performance analysis formula
Performance analysis formula

T = max (D*c/N, D/(N*I), D/(M*I), D/(P*R) )

S = D/T = min (N/c, N*I, M*I, P*R)

  • c: the CPU time to compute each byte;

  • D: the total of data; I: network speed;

    M: the number of I/O nodes; N: the number of clients;

    P: the number of disks in parallel; R: disk speed

  • T: the minimum access time to total data

  • S: the maximum aggregate bandwidth

  • Limitation: P/M >=1

CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland


T = max (D/(N*I), D/(M*I), D/(P*R) )

S = D/T = min (N*I, M*I, P*R)

and this formula is the basis of performance analysis in this work

CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland


Some cases
Some cases

  • N=1, M>=1 (or N>=1 and M=1), R>I

     S depends on I

  • N=1, M>=1 (or N>=1 and M=1), R<I

     S depends on I and P*R

  • N>1, M>1, R>I

     S depends on the number of clients and I/O nodes

CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland


Test environment
Test environment

  • Twelve PCs

    • I/O nodes, Manager nodes and clients

    • P4 2.8G/512M/DiskWD80G-8M-7200RPM

  • OS

    • CERN Linux 7.3.3

    • Kernel: 2.4.20-18.7.cernsmp

    • Local file system: ext3

  • Network: 100M Ethernet

  • Cluster file system

    • OpenAFS 1.2.9, NFS v3, PVFS, CASTOR1.6.1.2

CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland


Pre test
Pre-test

  • Test tools

    • Netperf 2.2pl3

    • Iozone 3.217

  • Local area network bandwidth (I):

    • 100M Ethernet: about 94.11Mbits/sec

  • Local file system measurement (R)

    • ./iozone -Rab local.xls -g 2048M

  • Recompile IOzone linked with CASTOR RFIO library

CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland


One client one server
One client one server

  • Only one client access files

  • Only one I/O nodes in server configuration

  • Write performance measurement

    • file size: 512MB

    • record size: 64KB-16MB

    • output unit: KB/sec

CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland


Results
Results

CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland


Multi process test
Multi-process test

  • Only one client and one I/O node

  • Many processes access one I/O node simultaneously.

  • Write performance measurement

    • File size: 100MB

    • Record size: 512KB

    • Process number: 1  10

    • Output unit: KB/sec

CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland



Multi client to multi server
Multi-client to multi-server

  • Multiple clients read/write files

  • Multiple I/O nodes provide file storage

  • The output is aggregate bandwidth

  • Only measure CASTOR and PVFS

  • Write performance

    • The size of each file: 200M

    • Record size: 2MByte

    • Output unit: MB/sec

CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland



Some useful methods
Some useful methods

  • In theory, good cluster file system

    • the data is physically balanced among the I/O devices

    • the data requirements are balanced among the application’s tasks

    • network has enough aggregate bandwidth to pass the data between the two without saturating

  • In practice, the following methods are useful

CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland


  • Use high-speed network, for example Gigabit Ethernet or Myrinet

  • Use or develop high performance network file transfer protocol

  • Use multi-server to improve the aggregate bandwidth

  • Improve the read/write speed of disks

  • File stripping and parallel I/O

  • Good file system design

  • Improve the processing ability of manager nodes

CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland


Summary
Summary

  • Cluster file system review

  • Performance analysis formula

  • Performance test

  • Some methods to improve the performance

CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland


Thank you!!

CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland


ad