1 / 20

Farm Batch System (FBS) and Fermi Inter-Process Communication and Synchronization Toolkit (FIPC)

Farm Batch System (FBS) and Fermi Inter-Process Communication and Synchronization Toolkit (FIPC). M.Breitung, J.Fromm, T.Levshina, I.Mandrichenko, M.Schweitzer Fermi National Accelerator Laboratory. CHEP 2000 Presentation. Off-Line Data Processing for Run II FBS Requirements

lethia
Download Presentation

Farm Batch System (FBS) and Fermi Inter-Process Communication and Synchronization Toolkit (FIPC)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Farm Batch System (FBS) andFermi Inter-Process Communication and Synchronization Toolkit (FIPC) M.Breitung, J.Fromm, T.Levshina, I.Mandrichenko, M.Schweitzer Fermi National Accelerator Laboratory CHEP 2000

  2. CHEP 2000 Presentation • Off-Line Data Processing for Run II • FBS • Requirements • Design and features • FIPC • Why FIPC ? • Design and features • FBS and FIPC CHEP 2000

  3. Off-line Data Processing for Run II • CDF and D0 off-line processing power estimate: • 100 – 250 thousand MIPS • 350 – 900 Pentium 500 MHz CPUs • Linux PC farms will be used for off-line processing • 170 – 450 dual-CPU PCs • Number of processes • 350 – 900 concurrent processes • Typical farm job is parallel job • 10 parallel processes per job • Duration 10 hours • Number of jobs • 35 – 90 concurrent jobs CHEP 2000

  4. Farm Batch System(FBS) Requirements and Design Features Status and Future CHEP 2000

  5. FBS Requirements • Scalability: FBS should scale up to: • 2000 processors • 2000 concurrent user processes • 200 simultaneously running jobs • 200 jobs per hour started • Unit of operation - parallel job • Typical job size is 10 processes CHEP 2000

  6. FBS Requirements • Cost and Reliability • Low maintenance and support cost • Low cost per node • Robust with respect to node shutdowns • Should not require 24/7 support of all farm nodes • Should recover after failure of FBS components • Portability • Linux and other UNIX OS flavors CHEP 2000

  7. Traditional Solution: Load Measuring FBS Solution: Resource Allocation FBS Design: Load Balancing CHEP 2000

  8. FBS Design: Farm Model CHEP 2000

  9. FBS Design: Job and Sections • FBS job consists of sections. • Each section is an array of “identical” processes of certain type. • Sections are identified by name. • Sections can depend on one or more other sections of the job. • Dependency types: • Done successfully • Failed • Finished • Started CHEP 2000

  10. FBS: Sample Job Description File SECTION Init QUEUE = IO_QUEUE EXEC = my_bin/dump_tape.sh XYZ1234 /mnt/stage/XYZ1234 NUMPROC = 1 STDERR = /dev/null STDOUT = logs/%j.%n.out DISK = 3 SECTION Process QUEUE = CPU_QUEUE EXEC = my_bin/do_processing.sh /mnt/stage/XYZ1234 NUMPROC = 5 STDERR = logs/%j.%n.errors STDOUT = logs/proc_%j.%n.log DISK = 10 NEED = 1 DEPEND = done(Init) SECTION CleanUp QUEUE = FAST_QUEUE EXEC = my_bin/std_cleanup.sh /mnt/stage/XYZ1234 NUMPROC = 1 DEPEND = exited(Process) CHEP 2000

  11. FBS Design: Components CHEP 2000

  12. FBS Status • In production since fall 1998 • Fixed target experiments (15 nodes, Linux + OSF1) • Prototype farm for CDF and D0 (18 nodes, Linux) • Currently, 2 fixed target farms (37 and 21 nodes, Linux, IRIX) • CDF and D0 are setting up 50-node farms (Linux, IRIX) • Successfully used for more than a year for off-line data processing CHEP 2000

  13. FBS Re-design Project (FBSNG) • Goals: • Stop using LSF as scheduler and job storage • Reduce support and maintenance cost • Make room for new features • Abstract resources • Customizable scheduler • Make FBS more farm-friendly and farm-aware • Avoid possible scalability problems • Status: • We plan to release first version in April-May 2000 CHEP 2000

  14. Fermi Inter-process Communication and Synchronization Toolkit (FIPC) Why FIPC ? Design and Features FBS and FIPC CHEP 2000

  15. Why FIPC ? • FBS: Long-term resource allocation • Batch system provides long-term control. Resources are allocated for job lifetime. • FIPC: Short-term resource allocation • Some resources are used for only short intervals during job execution: • Transfer data over network: watch for network overload • Access shared disk areas uploading output data • … and non-resource related synchronization and communication CHEP 2000

  16. FIPC Objects • Gate, counted semaphore • Has room for certain number of clients • Client can wait at the gate, enter the gate, exit the gate • Lock, binary semaphore • Equivalent to Gate with room for 1 client • Client can lock and unlock the lock • Client queue • Client enters the queue, waits in queue, exits the queue • Integer flag • Client can wait for value to reach threshold, and optionally increment, or decrement, or set new value • List of strings (double-ended queue) • Client can append or insert a string to the list’s tail or head • Remove first or last item of the list • String variable • Client can perform “set” or “match-and-set” operations using Regular Expressions notation CHEP 2000

  17. FIPC Design • FIPC Servers run on some farm nodes (server nodes). • Server node can run one or more FIPC Servers. • Servers communicate via Ring Protocol. • Servers are redundant: have the same information about all FIPC objects. • FIPC objects are truly distributed. • Servers can go down and then re-join the Ring at any time. • Client communicates with randomly selected server. • All operations on FIPC objects are atomic CHEP 2000

  18. Using FIPC • FIPC is written in Python • Portability • Command line user interface • Shell level commands • GUI • Monitoring, simple operations • API • Python binding • Plans for C/C++ bindings CHEP 2000

  19. # # writer.csh # fipc create flag /test/writing_f 1 fipc create queue /test/writer_q while (1) fipc append /test/writer_q while (fipc qwait -t 100 /test/writer_q) fipc clean queue /test/writer_q end fipc fwait /test/writing_f \> 0 write_file fipc fset /test/writing_f = 0 fipc remove /test/writer_q end # # reader.csh # fipc create flag /test/writing_f 1 fipc create queue /test/reader_q while (1) fipc append /test/reader_q while (fipc qwait -t 100 /test/reader_q) fipc clean queue /test/reader_q end fipc fwait /test/writing_f < 1 read_file fipc fset /test/writing_f = 1 fipc remove /test/reader_q end FIPC: Example, readers/writers problem CHEP 2000

  20. FIPC and FBS • FIPC was designed as complimentary product for FBS users. • However, FBS and FIPC are completely independent. • FIPC can be used in batch or non-batch distributed environment. • FBS and FIPC form a suite of farm batch data processing tools that have been successfully used by fixed target experiments and will be used for Run II data processing. CHEP 2000

More Related