270 likes | 377 Views
MAP Project. T. Bowcock, A. Kinvig, I. Last M. McCubbin, A. Moreton C. Parkes, G. Patel University of Liverpool. Introduction. M onte Carlo A rray P rocessor justification Status Hardware Software COMPASS Summary. Monte Carlo. At LHCb about 1 interaction /25ns ! 4*10 14 /year
E N D
MAP Project T. Bowcock, A. Kinvig, I. Last M. McCubbin, A. Moreton C. Parkes, G. Patel University of Liverpool
Introduction • Monte Carlo Array Processor • justification • Status • Hardware • Software • COMPASS • Summary University Of Liverpool
Monte Carlo • At LHCb about 1 interaction /25ns ! • 4*1014/year • if you want to do physics you need to know the backgrounds • generating just the signals doesn’t work • need to generate large MC samples • O(107) to O(108) events. • LHCb needs to do this now! University Of Liverpool
Philosophy • Cheapest possible that works • No Gbit ethernet until price falls • Don’t buy top of range processors • No SMP boards • No tapes • obsolete? • Develop architecture with future in mind] University Of Liverpool
MAP hardware • 300 processors • 400MHz PII • 128 Mbytes memory • 3 Gbytes disk • D-Link 100BaseT ethernet +hubs • commercial units BUT • custom boxes for packing and cooling University Of Liverpool
MAP University Of Liverpool
MAP cont’d University Of Liverpool
MAP cont’d University Of Liverpool
MAP Architecture MAP Slaves Master External Ethernet Hub 100BaseT Hub University Of Liverpool
MAP software • Overview • Linux • based on RedHat 5.2 • stripped down version • Batch System • Network • Control At the UDP level • Robust Packet Handling Overloading of master ethernet interfaces (300 at once) implied need for total control of data flow Broadcast of control required phased reply University Of Liverpool
MAP user • Prepare a job • Submit to Batch Queue • Histograms/Ntuples transmitted back at end of job/DST’s • Random Numbers handled automatically University Of Liverpool
MAP Status • In production for about 6 weeks • 300 Processors • produced about 240,000 LHCb events 24/hrs • 5 million events produced to date • Also produced DELPHI DST’s (500,000 24Hrs) • All Processors tested • Further Air-Conditioning installed • fully commissioned 22/11/99 University Of Liverpool
MAP Issues • Packet Loss • At UDP (or frame level) have to handle with code. Now not a probem(!) • Higher performance with shielded cables? • no • Power • Infrastructure for cooling • Power up/down University Of Liverpool
Emergency Power Down • Unplanned power interruption • Exploding substation! • About 4% of PC’s need manual intervention University Of Liverpool
MAP capabilites • Can be used in “throwaway” mode • Also write events as genenerated • MAP possesses 1Tbyte internal storage • 3 Gbytes/machine • events stored locally (1million events) • repeatedly analyse QUICKLY • MAP can handle interprocess communication University Of Liverpool
MAP++ University Of Liverpool
COMPASS Computerized Analysis and Storage Server
COMPASS • Purpose • Will show this in place and working with MAP • Model for LHC analysis • store events on disks (cheap!) • move JOB to the DATA • NO HSM University Of Liverpool
Outline • Hardware • Linux Device Drivers • Linux Installation and Limits • Benchmarking Tests • Results • Future University Of Liverpool
Trial Hardware • Dell PowerEdge Server, 450 Mhz Pentium III, 256 Mb RAM with 4 internal SCSI disks. • 4 PowerVault 1200 Disk Servers each with 8 Ultra Wide SCSI LVD disks.(spindle 7200 rpm)Total > 1Tb disk space • Adaptec Ultra Wide SCSI cards. University Of Liverpool
ITS • Purchased Rack mounted • 1TByte based on 50GByte 7200 rpm disks • Redundant Power Supplies • 15KGBP/Tbyte including 2 500MHz PIII • More storage underway University Of Liverpool
Linux Devices Drivers • Linux Device Drivers: • Devices accessed through special files in /dev directory specifying block or character device and major / minor number pairs. • Major number refers to a device driver e.g. 8 is a SCSI disk (see /usr/src/linux/include/linux/major.h) • For disks, minor number refers to disk / partition on disk e.g. /dev/sda major:8 minor:0 first SCSI disk found on system /dev/sda1 major:8 minor:1 first partition /dev/sda15 major:8 minor:15 last partition on first disk /dev/sdb major:8 minor:16 second SCSI disk found on system • minor numbers are 8-bit i.e. only have values in range 0-255 only 16 disks per disk major number. University Of Liverpool
Linux Installation & Limits • RedHat Linux 5.2: Kernel 2.0.x • Used at Liverpool and CERN – problem: only one SCSI major number is defined – maximum of 16 SCSI disks allowed. • Kernel “hacking” necessary to register new SCSI major number with system. • RedHat Linux 6.0: Kernel 2.2.x • Defines 8 SCSI major numbers : 8, 65-71- max. 128 SCSI disks. • Have to create some special files in /dev by hand – relatively trivial with mknod • Physical limit of only 4 PCI slots for SCSI cards on motherboard University Of Liverpool
Benchmarking Tests • Use CERN sequential IO tests for read / write / calibration. • Block sizes from 1024 Bytes to 0.5MBytes • Calculates average write rate over previous 10 writes • Read ... • Calibration: Comment out write statement and run write tests again. • Modified version of above calculates averages over the whole file. University Of Liverpool
Results • All disks accessible • Performance uniform • writing about 20MBytess • reading at 50 MBytes (or better) • large block-sizes faster University Of Liverpool
Future • Can we find funding for large(r) scale prototype? • Applications outside of Physics • Interdisciplinary funding University Of Liverpool
Summary • MAP yields high performance at low cost • Storage can be cheap • R&D to Enhance performance • Production for LHCb vertex detector University Of Liverpool