MONARC 2 - distributed systems simulation -

MONARC 2- distributed systems simulation -

The Goals of the Project • To perform realistic simulation and modelling of large scale distributed computing systems, customised for specific large scale HEP applications. • To provide a design framework to evaluate the performance of a range of possible computer systems, as measured by their ability to provide the physicists with the requested data in the required time, and to optimise the cost. • To narrow down a region in this parameter space in which viable models can be chosen by any of the future LHC-era experiments. • To offer a dynamic and flexible simulation environment.

LHC Computing: Different from Previous Experiment Generations One of the four LHC detectors (CMS) online system multi-level trigger filter out background reduce data volume 40 MHz (40 TB/sec) level 1 - special hardware 75 KHz (75 GB/sec) level 2 - embedded processors 5 KHz (5 GB/sec) level 3 - PCs 100 Hz (100-1000 MB/sec) data processing offline analysis, selection Raw recording rate 0.1 – 1 GB/sec3 - 8 PetaBytes / year

Off-Line LHC Computing Data Analysis Geographical dispersion: of people and resources Complexity: the detector and the LHC environment; Scale: ~100 times more processing power; Petabytes per year of data CMS 1800 Physicists 150 Institutes 32 Countries VERY LARGE SCALE DISTRIBUTED SYSTEM AND IT HAS TO PROVIDE (NEAR) REAL-TIME DATA ACCESS FOR ALL THE PARTICIPANTS

Tier2 Center Tier2 Center Tier2 Center Tier2 Center Tier2 Center HPSS HPSS HPSS HPSS Regional Center Hierarchy (Worldwide Data Grid) Experiment ~PBytes/sec Online System 100–1000 MBytes/sec Bunch crossing per 25 nsecs.Event is ~1 MByte in size Offline Farm,CERN Computer Tier 0 +1 HPSS ~0.6 - 2.5 Gbits/sec FNAL Center Italy Center UK Center France Center Tier 1 ~2.4 Gbits/sec Tier 2 ~622 Mbits/sec Tier 3 Physicists work on analysis “channels”. Processing power: ~200,000 of today’s fastest PCs Institute ~0.25TIPS Institute Institute Institute 100 - 1000 Mbits/sec Physics data cache Tier 4 Workstations

The simulation model: abstracts the components of the real system and their interactions must be equivalent to the simulated system Simulation models: continuous time - the system is described by a set of differential equations discrete time - the state changes only at certain time moments In MONARC: one of the discrete time models (Discrete Event Simulation – DES); the events represent important activities from the system, managed with the aid of an internal clock Simulation Models

Computing Models Specific Components Basic Components Simulation Engine LAN WAN DB CPU Analysis Job MetaData Scheduler Jobs Catalog Distributed Scheduler A Global View for Modelling MONITORING REAL Systems Testbeds

WAN DB Server DB Server DB Index AJob AJob AJob AJob AJob AJob AJob AJob AJob ... ... ... Job Job Job CPU CPU CPU Job Scheduler LinkPort LinkPort LinkPort LinkPort LinkPort Activity Activity Activity Regional Center Model REGIONAL CENTER LAN FARM

Provides the multithreading mechanism for the simulation The entities with time dependent behavior are mapped on “active objects” In the simulation engine: management of active objects and events Thread reusability (thread pool) The Simulation Engine Activity Scheduler AJob Job Event Task EventQueue Farm JobScheduler Pool WorkerThread Engine CPUUnit

It provides: An efficient mechanism to simulate multitask processing. Handling of concurrent jobs with different priorities. An easy way to apply different load balancing schemes. Multitasking Processing Model Concurrent running tasks share resources (CPU, memory, I/O) “Interrupt” driven scheme: For each new task or when one task is finished, an interrupt is generated and all “processing times” are recomputed.

Engine tests Processing a TOTAL of 100 000 simple jobs in 1 , 10, 100, 1000, 2 000 , 4 000, 10 000 CPUs (number of CPUs = number of parallel threads): more tests: http://monalisa.cacr.caltech.edu/MONARC/

Dynamically loadable modules for each regional center Basic job scheduler: assigns the jobs to CPUs from the local farm More complex schedulers: allow job migration between regional centers CPU FARM Site A JobScheduler Job Scheduling Dynamically loadable module

Site B JobScheduler JobScheduler CPU FARM CPU FARM Centralized Scheduling Site A GLOBAL Job Scheduler

Site A Site B JobScheduler JobScheduler CPU FARM CPU FARM CPU FARM Distributed Scheduling – market model – COST Request DECISION JobScheduler Site A

Very simple scheduling algorithm, based on searching the center with the minimum load We simulated the activity of 4 regional centers When all the centers are heavily loaded, the number of job transfers grows unnecessarily Example: simple distributed scheduling

Network Model Simulated network components Farm Farm WAN WAN LinkPort LinkPort LAN LAN Simulated local traffic Simulated inter-regional traffic

Link Link Node Node Node Node Node LAN LAN Node Node Node LAN/WAN Simulation Model Link Node LAN ROUTER Internet Connections “Interrupt” driven simulation : for each new message an interrupt is created and for all the active transfers the speed and the estimated time to complete the transfer are recalculated. ROUTER Continuous Flow between events ! An efficient and realistic way to simulate concurrent transfers having different sizes / protocols.

Network Job Protocol: TCPProtocol UDPProtocol LinkPort, LAN, WAN Message Network Model The TCP/IP layers are closely followed Application Layer Transport Layer Internet Layer Network Access Layer

Data Model Database Index Client Mapare Database LinkPort Database Task Database Entity Database DContainer DContainer Database Server Mass Storage DContainer

Data Model Generic Data Container • Size • Event Type • Event Range • Access Count • INSTANCE META DATA Catalog Replication Catalog Network FILE FILE Data Base Custom Data Server FTP Server Node DB Server NFS Server Export / Import

Data Container Data Container Data Container Data Model Data Processing JOB META DATA Catalog Replication Catalog Data Request Data Container Select from the options JOB List Of IO Transactions

Regional Centre Farm Activity Activity for( int k =0; k< jobs_per_group; k++) { Job job = new Job( this, Job.ANALYSIS, "TAG”, 1, events_to_process); farm.addJob(job ); // submit the job sim_hold ( 1000 ); // wait 1000 s } Job Job Job Activities: Arrival Patterns A flexible mechanism to define the Stochastic process of how users perform data processing tasks Dynamic loading of “Activity” tasks, which are threaded objects and are controlled by the simulation scheduling mechanism Physics Activities Injecting “Jobs” Each “Activity” thread generates data processing jobs These dynamic objects are used to model the users behavior

Output of the simulation Node Simulation Engine DB Output Listener Filters GRAPHICS Router Output Listener Filters Log Files EXCEL User C Any component in the system can generate generic results objects Any client can subscribe with a filter and will receive the results it is Interested in . VERY SIMILAR structure as in MonALISA . We will integrate soon The output of the simulation framework into MonaLISA

http://monalisa.cacr.caltech.edu/MONARC Conclusions • Modelling and understanding current systems, their performance and limitations, is essential for the design of the large scale distributed processing systems. This will require continuous iterations between modelling and monitoring • Simulation and Modelling tools must provide the functionality to help in designing complex systems and evaluate different strategies and algorithms for the decision making units and the data flow management. • For future development: efficient distributed scheduling algorithms, data replication, more complex examples.

MONARC 2 - distributed systems simulation -

MONARC 2 - distributed systems simulation -

Presentation Transcript

Distributed Systems

Distributed Systems

Distributed Systems

CS6223 Distributed Systems: Tutorial 2

Distributed Systems

Distributed Systems

Distributed simulation of realistic unmanned systems at FFI

Distributed Systems Design 2

Part 2 Distributed Systems 2009

Distributed Systems

Distributed Systems

MONARC Project Status Report cern.ch/MONARC

Distributed Systems Course Distributed Multimedia Systems

Distributed Systems Course Distributed File Systems

Distributed Simulation

2. Communication in Distributed Systems

Distributed Systems in Cloud (2)

2. Communication in Distributed Systems

MONARC (RD55) Data Management and Computing using Distributed Architectures

Distributed Systems Design 2

Distributed Systems Course Distributed File Systems

Distributed Systems