System installation updates
This presentation is the property of its rightful owner.
Sponsored Links
1 / 22

System installation & updates PowerPoint PPT Presentation


  • 108 Views
  • Uploaded on
  • Presentation posted in: General

System installation & updates. A.Manabe (KEK). Installation & update. System(SW) installation & update is boring and hard work for me. Question: How do you install or update system for Cluster of more than 100 nodes.

Download Presentation

System installation & updates

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


System installation updates

System installation & updates

A.Manabe (KEK)

LSCCW A.Manabe


Installation update

Installation & update

  • System(SW) installation & update is boring and hard work for me.

  • Question:How do you install or update system for Cluster of more than 100 nodes.

  • Question:Did you postpone a system upgrading, because the work is too much?

LSCCW A.Manabe


Installation update methods

Installation & Update methods

  • Pre-installed, Pre-configured System

    • you can postpone your work, but soon or later ...

  • Manual installation; one PC by one PC.

    • many operators in parallel with many duplicated installation CDs.

    • it require many CRTs, days and cost (to hire operators)

  • Network Installation

    • with NFS/FTP server and Automated ‘batch’ installation.

    • ‘Server too busy’ in installation to many nodes.

    • A lot of works still remain (utility SW installation...).

LSCCW A.Manabe


Installation update methods1

Installation & update methods

  • Duplicate disk image

    • Attach many disks to one PC and dup. the installed disk, then distribute duplicated disks to nodes.

    • Hardware work is hard (attach/detach easy disk unit).

  • Diskless PC

    • Using local disks only for swap and /var directory, other dir. from NFS server.

    • Powerful server is necessary.

    • Node can do nothing alone (trouble shooting may become difficult).

LSCCW A.Manabe


An idea

An Idea

  • Make one installed host, clone the disk image to nodes via network.

  • 100PC installation in 10min. (objective value)

  • Necessary operator intervention as small as possible.

LSCCW A.Manabe


Our planning method 1

Our planning method (1)

  • Network Disk Cloning Software

    • dolly+

    • For cloning disk image.

  • Network Booting

    • PXE (Preboot Execution Environment) with Intel NIC

    • For starting an Installer.

  • Batch Installer

    • Modified RedHat kickstart

    • For disk format, network setup and starting cloning sw.make private /etc/fstab, /etc/sysconfig/network..

LSCCW A.Manabe


Our method 2

Our method (2)

  • Remote Power Controller

    • Network control power tap (Hardware)

    • For remote system reset.(replace ‘pushing reset button’ one by one)

  • Console server with a serial console feature of Linux.

    • For watching everything done well.

LSCCW A.Manabe


Dolly 100pc installation in 10 min

Dolly+100PC installation in 10 min.

  • A software to copy/clone files or/anddisk images among many PCs through a network.

  • Running on Linux as a user program.

  • Free Software

  • Dolly is developed by CoPs project in ETH. (Swiss)

LSCCW A.Manabe


Dolly

Dolly+

  • Sequential file & Block file transfer.

  • RING network connection topology.

  • Pipeline mechanism.

  • Fail recovery mechanism.

LSCCW A.Manabe


Config file

Config file

  • Need only for Server host.

Server = host having original images or files

iofiles 3

/data/image_hda1 > /dev/hda1

/data/image_hda5 > /dev/hda5

/dev/hda6 > /dev/hda6

server dcpcf001

clients 10

n001

n002

(listing of all nodes)

endconfig

LSCCW A.Manabe


Ring topology

S

Server = host having original image

Ring Topology

  • Utilize max. performance ability of full duplex ports switches.

  • Good for networks of complex of switches. (because connection is only needed between adjacent nodes)


Sever bottle neck in one server many clients topology

Sever bottle neck in One Server-many clients topology

Server = host having original image

  • Server bottle neck both in network and server itself.

  • Broadcast or Multicast

  • UDP

  • Difficulty in making reliable transfer on multicast.

S


Pipelining multi threading

BOF

EOF

1 2 3 4 5 6 7 8 9 …..

File chunk =4MB

6

9

8

7

6

network

Server

5

8

7

Node 1

network

5

7

6

Node 2

Next node

PIPELINING & multi threading

3 thread in parallel


Performance measured

Performance (measured)

  • 1Server - 1Nodes (Pent.III 500Mhz)

    • IDE disk/100BaseT network ~ 4MB/s

    • SCSI U2W/100BaseT network ~ 9MB/s

    • 4GB image copy >> 17min.(IDE), 8min.(SCSI)

  • 1Server - 7Nodes

    • IDE/100BaseT

    • 4GB image copy -> 17min.(IDE) (+8sec.)

  • +Time for booting process.

LSCCW A.Manabe


Expected performance

Expected performance

  • 1Server-100Nodes

    • IDE/100 ~ 19min.(+2min.Ovh)

    • SCSI/100 ~ 9min.(+1min.Ovh)

LSCCW A.Manabe


How many min to install to 1000 nodes

How many min. to install to 1000 nodes?

+100%

+50%


Fail recovery mechanism

S

Fail recovery mechanism

  • In my experience, ~2% initial HW problem.

  • Dolly+ provides automatic ‘short cut’ mechanism in node problem.

    • RING topology makes its implementation easy.

time out

Short cutting

LSCCW A.Manabe


Cascade topology

Cascade Topology

  • Server bottle neck could be overcome.

  • Week against a node failure. Failure will spread in cascade way as well and difficult to recover.


System installation updates

  • Beta version will be available from corvus.kek.jp/~manabe/pcf/dollyafter this work shop.

LSCCW A.Manabe


System installation updates

LSCCW A.Manabe


Pipelining multi threading1

BOF

EOF

1 2 3 4 5 6 7 8 9 …..

File chunk =4MB

6

9

8

7

6

network

Server

5

8

7

Node 1

network

5

7

6

Node 2

Next node

PIPELINING & multi threading


  • Login