1 / 10

Computing cluster at NCG

Computing cluster at NCG. Introduction Past upgrades Current state of the cluster Problems with cluster Where to find out information about the cluster Conclusion. Introduction. The cluster has appeared at the end of 1999 Persons who started to tune the cluster :

bartonw
Download Presentation

Computing cluster at NCG

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computing cluster at NCG • Introduction • Past upgrades • Current state of the cluster • Problems with cluster • Where to find out information about the cluster • Conclusion Andrey Shevel@bnl.gov

  2. Introduction • The cluster has appeared at the end of 1999 • Persons who started to tune the cluster : • Jerome Lauret and Andrey Shevel • Initially there were 33 machines by 500 MHz and 256 MB of main memory, around 1 TB of disk space (all disks were connected to one RAID controller). • Main machine was Digital Alpha server. • About 25 persons were registered first year of operation (2000). Andrey Shevel@bnl.gov

  3. Past upgrades • With the time the disk storage was increased by 5 times • Computing power has been increased by 3 times at least. • Alpha server has been retired and main computer now is Intel based server (ram11). • All file systems are on separate disk controller. • Many other improvements. • All above permitted us to work many years almost without support. I am proud to inform you about this fact. Andrey Shevel@bnl.gov

  4. Currentstate of the cluster Andrey Shevel@bnl.gov

  5. Computing cluster problems • Liquid leaking from upper flour • The batteries in both UPSs were expired. • The UPS procedure for auto shutting down is out of order • No reservation for central machine (this machine was affected several times by water in past years) • Needs to be watched almost every day (power, water, temperature, etc) • No remote access to consoles of the machines • No remote control of electrical power • No policies (rules) how to use the resources on the cluster. Andrey Shevel@bnl.gov

  6. Andrey Shevel@bnl.gov

  7. Nearest upgrades • At first we need to move the cluster physically to another place in the same room. - DONE • We need to install all new machines (9 machines). – in progress • Prepare automatic procedure to install the software – in progress • To upgrade the version of SL to follow RACF (BNL). – in progress Andrey Shevel@bnl.gov

  8. Where is info about the cluster • General info about the clusterhttp://ram3.chem.sunysb.edu/ramdata/news.shtml • User mailing archivehttps://ram3.chem.sunysb.edu/ramdata-news • System mailing archivehttps://ram3.chem.sunysb.edu/ramdata-system Andrey Shevel@bnl.gov

  9. The cluster role • I think now role of the cluster is even more than at the beginning (more people are interested how to use cluster). • For those who needs relatively small fraction for computing power the cluster power is enough. For others who need huge computing power on largest remote clusters the local one is good gateway for remote large cluster. Andrey Shevel@bnl.gov

  10. Conclusion • Several steps must be undertaken to improve the situation: • To find one or two volunteers which would watch the cluster; • To find the funding agency where to submit new request for financial support for cluster upgrade. • May be we need to discuss how to use the cluster as the department computing facility. Andrey Shevel@bnl.gov

More Related