1 / 8

D0RACE: Testbed Session

D0RACE: Testbed Session. Lee Lueking D0 Remote Analysis Workshop February 12, 2002. Overview.

lottie
Download Presentation

D0RACE: Testbed Session

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. D0RACE: Testbed Session Lee Lueking D0 Remote Analysis Workshop February 12, 2002

  2. Overview • The network reliability and performance is of great importance to the D0 Data Model. D0 would like to be involved in the DataGrid/WP7 and DataTag studies to monitor and improve the network performance. DataGrid/WP7 covers the network within Europe whereas DataTag concentrates on the Europe-US intercontinental link. An initiative to enhance the performance of the connecting network in the US could be a result of this initiative. • Until recently SAM used ftp and bbftp for the transport of files between the data storage location and the cache area of the computer used to process the files. Recently tests have started to make use of GridFTPtogether with the Grid Security Infrastructure GSI. This is particularly interesting because it will require a marriage between this security layer and the Kerberos security in operation at Fermilab. • SAM can be used to select files and run production or analysis jobs on them. Recently an initiative has started to use Condor as a workload scheduler within SAM in order to make maximal use of the compute resources available at Fermilab and the participating institutes. • D0 has a request system for Monte Carlo generation of specific data channels. At this moment these requests are send to the (mostly external) Monte Carlo production sites by email and submitted through human intervention. The goal is to evolve to an automatic system where the user requests are submitted to the full D0 Monte Carlo Compute Fabric consisting of all cpu resources within the collaboration. Most likely this services will be integrated within SAM. • At this moment only limited use is made of the data storage capacity of institutes other than Fermilab. One of the difficulties has been the multitude of storage systems. At this moment Monte Carlo data is stored at SARA in Amsterdam and at the Computer Centre CCIN2P3 in Lyon but an effort will be made to integrate maximally all available storage locations within the collaboration for Monte Carlo generated data as well as analysis data. • Planning • Most of the above projects have at least been discussed or have even been started at some level as part of the D0 PPDG and wider D0 Grid efforts. We estimate that the International Grid Testbed initiative will boost most if not all of these activities. The network monitoring and GridFTP tests have started on the European side but could be taken to a similar level on the US site still this year. • First contacts have been made with the Condor team and some initial tests have been done but more work is needed. The use of Condor as a workload scheduler within SAM will still take several months and the use of Condor to make use of grid cpu resources in the participating institutes in Europe will have about the same timescale. • The distributed data storage can only proceed with the speed new storage locations become available but the present situation can still be largely improved. Within half a year it should be possible to store all externally produced data locally at the producing institutes. One year from now it should be possible to store any D0 data at a location where the grid (or SAM in the D0 case) decides the data can be stored best. • The network performance has to be increased such that the user will not notice the difference in use of files that are stored locally or not. Distributed compute resources within the collaboration should become more integrated just as the data storage systems. A workload scheduler should be able to make optimal use of all these resources. • Management • D0 is preparing a more detailed proposal for an International D0 Grid Testbed and the management will be described in there. It will have a small managerial board with people from the participating institutes in the US and Europe and it will have a technical board, which will address architectural issues and practical problems. The managerial board will have its representatives in the International DataGird Coordination meeting as well in the other appropriate bodies such as the PPDG. Lee Lueking - D0 RACE

  3. Who is interested • Oklahoma U., IN2P3, Wuppertal, NIKHEF, UTA, Lancaster, Imperial, Prague, Micigan). • Standard network testing procedure: • Netperf, iperf(Horst Severini, Shawn McKee ) performance. • Transmission rates from station logs as function of time. • Throughput numbers. • Measurements of error rates, packet and higher level. Lee Lueking - D0 RACE

  4. What are we trying to achieve? • Find bottlenecks, network understanding and debugging at various sites. • Understand scalability issues, operation of multiple sites. • End-to-end test • Transfers not only from FNAL but among all sites, or configured locations. • Break into specific tests, isolate components.Unit testing. • How to optimize caching • How can we do real work? Run reco or reco analyze at multiple sites. Simultainously . Lee Lueking - D0 RACE

  5. By March 15 Iperf test • Single file, from cache, from enstore, from central-analysis cache • Project package running. Iain will set up • Begin more complex tests and monitoring • Touch base biweekly, at least in e-mail, maybe in d0grid meetings or other. Lee Lueking - D0 RACE

  6. Linux clusters: Clued0/sam combination. Roger • How do we judge when a release is useful? Gordon • Remote tasks: software shifts, moderating FAQ pages, web master. • Suggestion: Maintain a standard, operational, sam reference station for people to look at and see how things are configured. • Pass the torch. Encourage and help each other get things setup and running. • Documentation needs to be kept up to date. Meena Lee Lueking - D0 RACE

  7. All of these projects are working towards the common goal of providing transparent access to the massively distributed computing infrastructure that is needed to meet the challenges of modern experiments … (From the EU DataTAG proposal) Lee Lueking - D0 RACE

  8. Grid Projects Timeline Q3 00 GriPhyN: $11.9M+$1.6M Q4 00 EU DataGrid: $9.3M Q1 01 Q2 01 PPDG:$9.5M Q3 01 EU DataTAG:4M Euros iVDGL:$13.65M Q4 01 GridPP: Q1 02 Lee Lueking - D0 RACE

More Related