Performance analysis of cluster file system on linux
Download
1 / 23

Performance Analysis of Cluster File System on Linux - PowerPoint PPT Presentation


  • 150 Views
  • Uploaded on

Performance Analysis of Cluster File System on Linux. Yaodong CHENG IHEP, CAS [email protected] Outline. Introduction Review of cluster file system Data access model Performance analysis formula Performance test Some useful methods. Introduction.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Performance Analysis of Cluster File System on Linux' - hong


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Outline
Outline

  • Introduction

  • Review of cluster file system

  • Data access model

  • Performance analysis formula

  • Performance test

  • Some useful methods

CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland


Introduction
Introduction

  • Cluster systems made up with PCs are more and more popular

  • The improvement of commodity hardware and software

    • CPU, memory, hard disk, network

    • Linux software technology

  • How to use the our existing hardware and software more efficiently

CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland


Architecture of a cluster system

job

job

Compute node1

Compute node N

• • •

disk

disk

High speed network

I/O

Node 1

disk

I/O

Node N

disk

• • •

disk

tape

disk

Architecture of a cluster system

CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland


Cluster file system review
Cluster file system review

  • one of the most important methods to share information of cluster system

  • General characteristics:

    • Single-system image

    • Transparency

    • Good scalability

    • High performance

  • Structure

    • C/S, share-disk, virtual share-disk

CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland


Data access model

Disk

Disk

Disk

IO node N

IO node 1

IO node 2

Client N

Client 1

Client 2

Manager Node

Data access model

N e t w o r k

I/O Servers

● ● ●

● ● ●

Meta Data

Server

CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland


Some assumptions
Some assumptions

  • Data is processed only in each client

  • Storage nodes only provide storage capacity and deal with file operations

  • The traffic between clients and management nodes is very small

  • The time for dealing with requests of clients is far smaller than the time consumed by transferring data

CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland


Performance analysis formula
Performance analysis formula

T = max (D*c/N, D/(N*I), D/(M*I), D/(P*R) )

S = D/T = min (N/c, N*I, M*I, P*R)

  • c: the CPU time to compute each byte;

  • D: the total of data; I: network speed;

    M: the number of I/O nodes; N: the number of clients;

    P: the number of disks in parallel; R: disk speed

  • T: the minimum access time to total data

  • S: the maximum aggregate bandwidth

  • Limitation: P/M >=1

CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland


T = max (D/(N*I), D/(M*I), D/(P*R) )

S = D/T = min (N*I, M*I, P*R)

and this formula is the basis of performance analysis in this work

CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland


Some cases
Some cases

  • N=1, M>=1 (or N>=1 and M=1), R>I

     S depends on I

  • N=1, M>=1 (or N>=1 and M=1), R<I

     S depends on I and P*R

  • N>1, M>1, R>I

     S depends on the number of clients and I/O nodes

CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland


Test environment
Test environment

  • Twelve PCs

    • I/O nodes, Manager nodes and clients

    • P4 2.8G/512M/DiskWD80G-8M-7200RPM

  • OS

    • CERN Linux 7.3.3

    • Kernel: 2.4.20-18.7.cernsmp

    • Local file system: ext3

  • Network: 100M Ethernet

  • Cluster file system

    • OpenAFS 1.2.9, NFS v3, PVFS, CASTOR1.6.1.2

CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland


Pre test
Pre-test

  • Test tools

    • Netperf 2.2pl3

    • Iozone 3.217

  • Local area network bandwidth (I):

    • 100M Ethernet: about 94.11Mbits/sec

  • Local file system measurement (R)

    • ./iozone -Rab local.xls -g 2048M

  • Recompile IOzone linked with CASTOR RFIO library

CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland


One client one server
One client one server

  • Only one client access files

  • Only one I/O nodes in server configuration

  • Write performance measurement

    • file size: 512MB

    • record size: 64KB-16MB

    • output unit: KB/sec

CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland


Results
Results

CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland


Multi process test
Multi-process test

  • Only one client and one I/O node

  • Many processes access one I/O node simultaneously.

  • Write performance measurement

    • File size: 100MB

    • Record size: 512KB

    • Process number: 1  10

    • Output unit: KB/sec

CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland



Multi client to multi server
Multi-client to multi-server

  • Multiple clients read/write files

  • Multiple I/O nodes provide file storage

  • The output is aggregate bandwidth

  • Only measure CASTOR and PVFS

  • Write performance

    • The size of each file: 200M

    • Record size: 2MByte

    • Output unit: MB/sec

CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland



Some useful methods
Some useful methods

  • In theory, good cluster file system

    • the data is physically balanced among the I/O devices

    • the data requirements are balanced among the application’s tasks

    • network has enough aggregate bandwidth to pass the data between the two without saturating

  • In practice, the following methods are useful

CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland


  • Use high-speed network, for example Gigabit Ethernet or Myrinet

  • Use or develop high performance network file transfer protocol

  • Use multi-server to improve the aggregate bandwidth

  • Improve the read/write speed of disks

  • File stripping and parallel I/O

  • Good file system design

  • Improve the processing ability of manager nodes

CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland


Summary
Summary

  • Cluster file system review

  • Performance analysis formula

  • Performance test

  • Some methods to improve the performance

CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland


Thank you!!

CHEP'04 Sep 27 - Oct 1, 2004 Congress Zentrum Interlaken, Switzerland


ad