system installation updates
Download
Skip this Video
Download Presentation
System installation & updates

Loading in 2 Seconds...

play fullscreen
1 / 22

System installation & updates - PowerPoint PPT Presentation


  • 164 Views
  • Uploaded on

System installation & updates. A.Manabe (KEK). Installation & update. System(SW) installation & update is boring and hard work for me. Question: How do you install or update system for Cluster of more than 100 nodes.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' System installation & updates' - dinos


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
system installation updates

System installation & updates

A.Manabe (KEK)

LSCCW A.Manabe

installation update
Installation & update
  • System(SW) installation & update is boring and hard work for me.
  • Question:How do you install or update system for Cluster of more than 100 nodes.
  • Question:Did you postpone a system upgrading, because the work is too much?

LSCCW A.Manabe

installation update methods
Installation & Update methods
  • Pre-installed, Pre-configured System
    • you can postpone your work, but soon or later ...
  • Manual installation; one PC by one PC.
    • many operators in parallel with many duplicated installation CDs.
    • it require many CRTs, days and cost (to hire operators)
  • Network Installation
    • with NFS/FTP server and Automated ‘batch’ installation.
    • ‘Server too busy’ in installation to many nodes.
    • A lot of works still remain (utility SW installation...).

LSCCW A.Manabe

installation update methods1
Installation & update methods
  • Duplicate disk image
    • Attach many disks to one PC and dup. the installed disk, then distribute duplicated disks to nodes.
    • Hardware work is hard (attach/detach easy disk unit).
  • Diskless PC
    • Using local disks only for swap and /var directory, other dir. from NFS server.
    • Powerful server is necessary.
    • Node can do nothing alone (trouble shooting may become difficult).

LSCCW A.Manabe

an idea
An Idea
  • Make one installed host, clone the disk image to nodes via network.
  • 100PC installation in 10min. (objective value)
  • Necessary operator intervention as small as possible.

LSCCW A.Manabe

our planning method 1
Our planning method (1)
  • Network Disk Cloning Software
    • dolly+
    • For cloning disk image.
  • Network Booting
    • PXE (Preboot Execution Environment) with Intel NIC
    • For starting an Installer.
  • Batch Installer
    • Modified RedHat kickstart
    • For disk format, network setup and starting cloning sw.make private /etc/fstab, /etc/sysconfig/network..

LSCCW A.Manabe

our method 2
Our method (2)
  • Remote Power Controller
    • Network control power tap (Hardware)
    • For remote system reset.(replace ‘pushing reset button’ one by one)
  • Console server with a serial console feature of Linux.
    • For watching everything done well.

LSCCW A.Manabe

dolly 100pc installation in 10 min
Dolly+100PC installation in 10 min.
  • A software to copy/clone files or/anddisk images among many PCs through a network.
  • Running on Linux as a user program.
  • Free Software
  • Dolly is developed by CoPs project in ETH. (Swiss)

LSCCW A.Manabe

dolly
Dolly+
  • Sequential file & Block file transfer.
  • RING network connection topology.
  • Pipeline mechanism.
  • Fail recovery mechanism.

LSCCW A.Manabe

config file
Config file
  • Need only for Server host.

Server = host having original images or files

iofiles 3

/data/image_hda1 > /dev/hda1

/data/image_hda5 > /dev/hda5

/dev/hda6 > /dev/hda6

server dcpcf001

clients 10

n001

n002

(listing of all nodes)

endconfig

LSCCW A.Manabe

ring topology

S

Server = host having original image

Ring Topology
  • Utilize max. performance ability of full duplex ports switches.
  • Good for networks of complex of switches. (because connection is only needed between adjacent nodes)
sever bottle neck in one server many clients topology
Sever bottle neck in One Server-many clients topology

Server = host having original image

  • Server bottle neck both in network and server itself.
  • Broadcast or Multicast
  • UDP
  • Difficulty in making reliable transfer on multicast.

S

pipelining multi threading

BOF

EOF

1 2 3 4 5 6 7 8 9 …..

File chunk =4MB

6

9

8

7

6

network

Server

5

8

7

Node 1

network

5

7

6

Node 2

Next node

PIPELINING & multi threading

3 thread in parallel

performance measured
Performance (measured)
  • 1Server - 1Nodes (Pent.III 500Mhz)
    • IDE disk/100BaseT network ~ 4MB/s
    • SCSI U2W/100BaseT network ~ 9MB/s
    • 4GB image copy >> 17min.(IDE), 8min.(SCSI)
  • 1Server - 7Nodes
    • IDE/100BaseT
    • 4GB image copy -> 17min.(IDE) (+8sec.)
  • +Time for booting process.

LSCCW A.Manabe

expected performance
Expected performance
  • 1Server-100Nodes
    • IDE/100 ~ 19min.(+2min.Ovh)
    • SCSI/100 ~ 9min.(+1min.Ovh)

LSCCW A.Manabe

fail recovery mechanism

S

Fail recovery mechanism
  • In my experience, ~2% initial HW problem.
  • Dolly+ provides automatic ‘short cut’ mechanism in node problem.
    • RING topology makes its implementation easy.

time out

Short cutting

LSCCW A.Manabe

cascade topology
Cascade Topology
  • Server bottle neck could be overcome.
  • Week against a node failure. Failure will spread in cascade way as well and difficult to recover.
slide19
Beta version will be available from corvus.kek.jp/~manabe/pcf/dollyafter this work shop.

LSCCW A.Manabe

pipelining multi threading1

BOF

EOF

1 2 3 4 5 6 7 8 9 …..

File chunk =4MB

6

9

8

7

6

network

Server

5

8

7

Node 1

network

5

7

6

Node 2

Next node

PIPELINING & multi threading
ad