1 / 22

DAQ

DAQ. Andrea Petrucci 6 May 2008 – CMS-UCSD meeting. OUTLINE Introduction SCX Setup Run Control Current Status of the Tests Summary. Introduction. Started commissioning the Readout Builder at its full size Many people working together to get this done

jeneva
Download Presentation

DAQ

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DAQ Andrea Petrucci 6 May 2008 – CMS-UCSD meeting • OUTLINE • Introduction • SCX Setup • Run Control • Current Status of the Tests • Summary

  2. Introduction • Started commissioning the Readout Builder at its full size • Many people working together to get this done • For the first time we have almost two full DAQ Slices to test • For now tests are limited to two slices of ~640 PCs (rows A, B, E and F) • Still in process of making experience with the installation and maintenance of a cluster of O(1000) PCs • Also for the XDAQ software and Run Control it is the first time we work with ~1000 PCs communicating to each other Andrea Petrucci - UC San Diego

  3. SCX layout • RU 320 PCs : • Row A with 2 rails (ru-c2a[1-4]-[1-20]) • Row B with 2 rails (ru-c2b[1-4]-[1-20]) • Row E with 2 rails (ru-c2e[1-4]-[1-20]) • Row F with 4 rails (ru-c2f[1-4]-[1-20]) • BU-FU 320 PCs : • Row A with 2 rails (ru-c2a[5-8]-[1-20]) • Row B with 2 rails (ru-c2b[5-8]-[1-20]) • Row E with 2 rails (ru-c2e[5-8]-[1-20]) • Row F with 2 rails (ru-c2f[5-8]-[1-20]) • Row A and B are connected to 1 Force10 switch and row E and F to other. F E 2 Force 10 B A Andrea Petrucci - UC San Diego

  4. SCX Setup • Used DummyRUs and ~200 FRLs (Tracker) for testing • Different type of trapezoidal configurations: • 1 slice with 4 rails (68 DummyRUs x 224 BUs) • 1 slice with 4 rails (200 FRLs x 68 RUs x 24 BUs x 672 FUs) close to the final slice • 4 slices with 2 rails (per slice: 32 RUs x 47 BUs x 147 FUs ) • 8 slices with 2 rails (per slice: 32 RUs x 47 BUs x 147 FUs ) • A lot of different activities are going on in parallel: • System and software installation/update • System monitoring optimization • … • Testing the first slice • The XDAQ installation is XDAQ build 6 • Monitoring system (slp, sentinel, …) is enabled • During last months the system was down many times and it takes some time to set up. Andrea Petrucci - UC San Diego

  5. RU Builder Slices Andrea Petrucci - UC San Diego

  6. DAQ Software Installation • All the DAQ software installation is managed by a central Quattor server. Quattor is a system administration toolkit providing a powerful, portable and modular tool suite for the automated installation, configuration and management of clusters and farms running Linux. • Quattor allows to re-install a pc in few minutes. • There are different Quattor templates for each type of PC: • RU and BUFU PCs • Run Control PCs • FRL and FMM PCs • Etc… • All the DAQ software developers had put a lot effort to Quattorize their software (RPM). Andrea Petrucci - UC San Diego

  7. DAQ Configurator • Central DAQ System • Currently O(1000) hosts • ~10% controlling custom hardware • O(10000) XDAQ applications 40 MHz 2 107 electronics channels • A DAQ Configuration contains • One XML configuration file per XDAQ executive • Including Myrinet FED-Builder configuration • including O(100000) I2O connections • Up to several 100 MB of XML • Control structure • Hierarchy of function managers • Executives and Applications to be controlled 100 Hz Andrea Petrucci - UC San Diego

  8. Software TemplateDB RS3 Configurator GUI JAVA Fillers SWTemplate GUI FBSet EQSet DPSet DAQ Configurator Data Flow CMS DAQ Configurator Create FEDBuilderSets & DAQPartitionSets Manage/create Software Templates 1 2 Select DAQPartition (Hardware Structure) & Software Template 3 5 Load configuration and configure the system Configurator API 4 Fill DB RS API Hardware Configuration API Software Template API HWCfg Database Andrea Petrucci - UC San Diego

  9. Run Control and Monitor System RCMS is integrated in the general CMS DAQ system, providing control and monitor of the two other components: • the DAQ components that have the task to manage the main data flow. They include the Front End Drivers (FED), the Readout Units (RU), the Builder Unit (BU), the Filter Unit (FU), the trigger and data flow control system. • the “Detector Control System” DCS, managing the slow controls of the whole experiment • The XML data format and the W3C standard SOAP protocol have been adopted as the main means for communication. • XDAQis a C++ framework for a distributed Data Acquisition System, implements: • configuration (parameterization) • communication over multiple network technologies concurrently • high-level provision of system services (memory management, tasks, ...) Andrea Petrucci - UC San Diego

  10. RCMS Services • SECURITY SERVICE • login and user account management; • RESOURCE SERVICE (RS) • informationabout DAQ resources and partitions; • INFORMATIONAND MONITOR SERVICE (IMS) • Collects messages and monitor data; distributes them to the subscribers; • JOB CONTROL • Starts, monitors and stops the software elements of RCMS, including the DAQ components; Andrea Petrucci - UC San Diego

  11. Logging System Access via TCP Log Collector Publish Subscriber System … RCMS applications and XDAQ applications Access via JDBC Storage System • Collects log information from log4j compliant applications (i.e. on-line process). Relational DB Oracle,MySQL • Send log information directly to a Display System (Chainsaw) . • Stores log information in a database and visualizes them (LogDBViewer) . Andrea Petrucci - UC San Diego

  12. Function Managers Control Structure User interaction with Web Browser connected to Level 0 FM. Web Browser (GUI) TOP Level 0 FM is entry point to Run Control System. Level 0 FM ECAL LTC RPC DT Level 1 FM interface to the Level 0 FM and have to implement a standard set of inputs and states. CSC DAQ TRK HCAL Level 1 FM FB RB FF FEC FED Level 2 FMs are sub-system specific custom implementations. Level 2 FM Resources are on-line system components Resources Andrea Petrucci - UC San Diego

  13. Run Control GUIs 1) RCMS GUI 2) Function Manager Level Zero GUI 3) FED and TTS GUI Andrea Petrucci - UC San Diego

  14. Tests & Measurements DAQ System • GOALS • Understand problems to run big DAQ system: • Reliability, scalability and monitoring system. • Measurements: • Comprehend if the performances of the system are acceptable. • TESTED CONFIGURATIONS • Different configurations have been tested: • 68 dummy RUs x 224 BUs 4 rail from the RUs and 2 rail to the Bus . • 68 dummy RUs x 224 Bus x 672 FUs 4 rail from the RUs and 2 rail to the Bus. • 8 Slices with GTPe and ~200 FRLs, per slice:32 RUs x 47 BUs x 147 FUs (CMSSW locally). • 4 Slices with GTPe and ~100 FRLs, per slice: 32 RUs x 47 BUs x 147 FUs (CMSSW NFS). • The test B should perform almost the same as the final slice configuration (72 RU x 288 Bus x 864 FUs) For these tests I create a Java stand-alone application. It controls the Level Zero FM over the following commands: Andrea Petrucci - UC San Diego

  15. Test A: Only EVB • Setup parameters: • Dummy events are created in the BUs in generation mode. • 1 Slice with 1x1 FED Builders and events are dropped at BUs. • 68 dummy RUs x 224 BUs 4 rail from the RUs and 2 rail to the Bus. • Used row E and F (~ 320 PCs). • Controlled 293 XDAQ executives and 585 XDAQ Applications (ATCPs, EVM, RUs and Bus). • XDAQ Monitor Application enabled. • 50 iterations of measurement loop (Create, Initialize , Connect, Configure, Get Ready, Start, Stop and Destroy). • Results: • RU Throughput at 16, 32 kByte fragment size: ~480 MB/s. Andrea Petrucci - UC San Diego

  16. Test B: EVB & Filter Farms • Setup parameters: • Dummy events are created in the BUs in generation mode. • 1 Slice with 1x1 FED Builders and events are dropped at FUs. • 68 dummy RUs x 224 BUs 4 rail from the RUs and 2 rail to the Bus. • 3 FUs per BU and 1 Storage Manager. • Used row E and F (~ 320 PCs). • Controlled 965 XDAQ executives and 1539 XDAQ Applications (ATCPs, EVM, RUs, BUs, FUResourceBrokers and FUEventProcessors ). • All libraries was loaded from local disk. • XDAQ Monitor Application enabled. • 100 iterations of measurements loop (Create, Initialize , Connect, Configure, Get Ready, Start and Destroy). • Results: • Could not reach running state because Filter farm applications crashed. Andrea Petrucci - UC San Diego

  17. Test C: all system with 8 Slices • Setup parameters: • Events are generated in ~200 FRLs and used GTPe. • 8 Slice with 8x8 FED Builders and events are sent to the Storage Manager. • 2 rail from the RUs and the BUs. • Per Slice: 32 RUs x 47 BUs x 147 FUs. • Used rows A,B, E and F (~ 640 PCs) for Event Builder and Filter Farm. • Controlled 1976 XDAQ executives and 3202 XDAQ Applications (ATCPs, FRLs, EVM, RUs, Bus, FUResourceBrokers, FUEventProcessors and Storage Managers). • XDAQ Monitor Application enabled and all libraries was loaded from local disk. • 83 iterations of measurement loop (Create, Initialize , Connect, Configure, Get Ready, Start, Stop and Destroy). • Results: • 240 MB/s throughput all the way to the Storage Manager disk (event size 480k) Andrea Petrucci - UC San Diego

  18. Test D: all System 4 Slices • Setup parameters: • Events are generated in ~100 FRLs and used GTPe. • 4 Slice with 4x4 FED Builders and events are sent to the Storage Manager. • 2 rail from the RUs and the BUs. • Per Slice: 32 RUs x 47 BUs x 147 FUs. • Used rows E and F (~ 320 PCs) for Event Builder and Filter Farm. • Controlled 988 XDAQ executives and 1601 XDAQ Applications (ATCPs, FRLs, EVM, RUs, Bus, FUResourceBrokers, FUEventProcessors and Storage Managers). • XDAQ Monitor Application enabled and Filter Farm libraries was loaded from NFS. • 100 iterations of measurement loop (Create, Initialize , Connect, Configure, Get Ready, Start, Stop and Destroy). • Results: • The system is getting slower if we load libraries from NFS and less reliable. Andrea Petrucci - UC San Diego

  19. Tests summary • Performance: • Configuration B (close to final slice): • Reasonable time to initialize, connect and configure. • Configuration C: • The system scales well. • Configuration D: • The system loses performance if it loads library from NFS disk ( ~ 2 times slower). Andrea Petrucci - UC San Diego

  20. Problems during the tests • Problems observed during the tests: • ~15% times the system failed to initialize. The XDAQ executive could not start because the HTTP address was already in use. Also the ATCP application had the same problem. • FIXED: It was enough to set the XDAQ HTTP port outside the UNIX Ephemeral port range to solve the problem. • The system could not reach running state because of a fault (segmentation fault) between the communication with BU and FUResourceBroker. • FIXED: A bug was found and it is fixed with CMSSW version 2.0.4. • The system gets stuck in configuring state ~5% times. It is reproducible only with big system (8 slices and all rows A,B,E and F). • Working in progress: the problem seems to be in the RunControl Framework. • The system fails to start (~5% times) and stop (~40% times). • Working in progress: DAQ function managers need to be improved. • The XDAQ monitor system has a latency between 2 or 3 minutes. • Working in progress: XDAQ developers are working to improve it. Andrea Petrucci - UC San Diego

  21. ATCP application • Reasonable time to connect all the sockets (max. 15 sec. for 1 slice) • Solved the problem of the “address already in use” when starting the listening socket. • Created a new HyperDAQ interface: • Added “Standard configuration” parameters. • Added “debug” page. • Integrated to XDAQ monitor system. Andrea Petrucci - UC San Diego

  22. Summary • RU Builder Commissioning • First time used a RU Builder configuration almost the same as the final slice • It seems to work fine at 20 kHz per slice and a maximum throughput on the RUs of ~480 MB/s • FUs and monitor system applications are included • Reasonable time to initialize and start the system • Some things are not yet understood (ex. fails to start and stop) • Main worries are system instabilities • Cooling and its monitoring • Power cuts • Quattor installation • System configuration • Difficulties issuing the commands on many PCs at the same time Andrea Petrucci - UC San Diego

More Related