110 likes | 220 Views
Construction of the CIA-Card Cluster Interface Agent. This is a joint project with Siemens in Mannheim. 1. Problem: Actual experiments like ALICE at CERN will produce huge amount of data at high rates !.
E N D
Construction of the CIA-Card Cluster Interface Agent This is a joint project with Siemens in Mannheim 1
Problem: Actual experiments like ALICE at CERN will produce huge amount of data at high rates ! At this time, one thinks that computing power of about 3000 ordinary computers is needed for each experiment. Special requirements to schedule and handle such a cluster Motivation 1
Idea: Total control of each box via network: administration, bootstrap, hardware monitoring etc. Important: total control, even if the system does not boot or needs to be power-cycled Alice Implementation of a second network with a small controller card in each box. master controller Motivation 1
Stand alone simple monitoring computer with own RAM, ROM, network and FPGA logic controlling 10 BT Network monitoring Host CIA Card PCI bus 1
Host PCI devices external SRAM uCSimm eth PCI CORE RAM MUX CPU UC NFS Server Port80 internal register bank IRQ masks master controller (java-code) VGA RAM MUX FC SC MTC RAM Flash ROM floppy sensors mouse keyb. USB reset repower power-good accu Components of CIA Card CIA card FPGA IRQ 1
ucLinux NFS server file import Master controller signal handler Controlling: register, RAM PCI Bus janus crypt crypt janus GUI IRQ controlling image creation memory for floppy images etc. Screen I/O Implementation of the software FPGA 1
Summary of the CIA-Card Components of CIA-Card: • PCI Controller data exchange via host bus, scanning, monitoring • Network Controller data exchange with master server • CPU with RAM, ROM etc system like embedded linux (uCLinux) • Accu additional power supply, card works even if host is powered off, e.g. to turn on via network • FPGA to emulate VGA-presence and take all the needed glue-logic • Further Connectors Floppy, reset/power switches etc 1
Summary of the CIA-Card Monitored devices of host: • Accu-state • Microphone e.g. to detect defect harddrives early • Power supply of the host • Temperature-sensors CPU, chipset etc. • Function of FAN • POST-Logging status-messages on system bootup to detect wrong hardware Network: • Via TCP/IP: Telnet or HTTP (Web-frontend), encrypted 1
Summary of the CIA-Card Interaction with the host system: • Power Switch Power-cycle host if needed • Reset Switch • PCI-Bus get hardware information with pci-bus scans or interaction with other components • Floppy emulation to made host bootable via network depending on ist actual hardware • VGA emulation export the display over network to made host config-able, e.g. BIOS-setup, even with scripts • Keyboard, Mouse virtual keyb/mouse to set up commands to the host • USB further devices 1
Advantages of the CIA-Cardcompared to ordinary Linux-Clusters • Fast implementation Just take one card in each ordinary computer and have control forever and all managed from one central place down to BIOS-setups etc. • Power Off/On via Network • Possibility to change OS automatically if new hardware has been installed • Monitor the whole cluster, every host regardless of its current state with additional monitoring possibilities independent from OS • Using also hosts which are fare away • Save hardware: no need for real vga-card, floppy, etc. perhaps harddisk can be saved depending on the function of the cluster makes card cheaper 1
Stefan Philipp, Martin Kirsch, Heidelberg L3/Cluster Slow Control Features: • Battery Backed Completely independent of host • Power Controller Remote powering of host • Reset Controller Remote physical RESET • PCI Bus perform PCI bus scans, identify devices • Floppy/flash emulator create remotely defined boot image • Keyboard driver remote keyboard emulation • Mouse driver remote mouse emulation • VGA replace graphics card • price very low cost Functionality: • complete remote control of PC like terminal server but already at BIOS level • intercept port 80 messages (even remotely diagnose dead computer) • interoperate with remote server, providing status/error information • watch dog functionality • identify host and receive boot image for host • RESET/Power maintenance This is a joint project with Siemens in Mannheim 1