status of the new na60 cluster
Download
Skip this Video
Download Presentation
Status of the new NA60 “cluster”

Loading in 2 Seconds...

play fullscreen
1 / 7

Status of the new NA60 “cluster” - PowerPoint PPT Presentation


  • 71 Views
  • Uploaded on

NA60 weekly meetings. Status of the new NA60 “cluster”. Objectives, implementation and utilization. Pedro Martins 03/03/2005. Why do we need another PC farm?.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Status of the new NA60 “cluster”' - genero


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
status of the new na60 cluster

NA60 weekly meetings

Status of the new NA60 “cluster”

Objectives, implementation and utilization

Pedro Martins

03/03/2005

why do we need another pc farm
Why do we need another PC farm?
  • Several monitoring/DAQ PCs used during the 2004 Proton Run are now switched off, waiting for the next run in 2006 (at least). It’s a waste of money.
  • Our “offliners” already asked for more computing power. With two farms, we always have one “backup” system. We have plenty of data from 2003 and 2004 to analyse and all the help we can get is welcome.
technical objectives
Technical Objectives
  • The user should see only one machine, like in a supercomputer.
  • The maintainance should be kept to a minimum.
  • The system should be flexible, able to easily add and remove nodes (PCs).

OpenMosix

Gentoo

Remote boot, diskless support

slide4

Remote boot, diskless support

Gentoo

Openmosix

  • The Scientific Linux does not allow (to the non-IT user) the creation of clusters,
  • Simple maintenance, using Portage.
  • Easy implementation,
  • Node discovery tool,
  • Migration of processes

Na60pc08 is our cluster\'s disk server:

  • It provides network information to all nodes,
  • controls the process migration,
  • makes the interface with the user.

A single process, like a macro, is NOT going to be shared among all computers. Each process goes to one CPU only.

CERN/IT does not support Gentoo based OS. Our IT experts will be the only ones solving the problems.

The previous data from the machines is kept, meaning that we always have a fallback option.

In principle we can run as many processes as the number of CPUs we have.

Advantage?

slide5

CASTOR, na60tera1, na60tera2...

NAT

Node 1 (diskless)

nfs

na60pc08

  • Boot and network information,
  • OS data.

OpenMosix migrator

vnc

ssh

  • Network information

user

Node 10 (OS on local disk)

present status
Present status
  • The main node (na60pc08) is operational.
  • One node with local disk is fully working.
  • Another diskless node is still in the debug phase.
  • Our “cluster” is still small but it has already a considerable power (P4 2.4GHz, AMD 1GHz and 1.5 GB RAM).
  • Since the Memory Migration feature doesn\'t deal properly with the X server, we are using VNC. Actually, VNC is much better than an X remote server, since it compresses the data transferred through the network, allows remote connections from windows and linux machines and is acessible from outside CERN (IT allows this).
to do list
To do list
  • Queuing (FIFO) tool, like auson:
    • It will use the “cpujob” and “iojob” commands that send the processes to the adequate nodes,
  • Debug the diskless implementation, checking with a new diskless machine,
  • Connect as many machines as we can.
  • Write a Howto/FAQ for the offliners.
ad