D mc and data processing on the grid
1 / 22

D Ø MC and Data Processing on the Grid - PowerPoint PPT Presentation

  • Uploaded on

D Ø MC and Data Processing on the Grid. Brad Abbott University of Oklahoma D0SAR Sept 21, 2006. Computing at D Ø. Provide the necessary resources for primary processing of data, reprocessing, fixing, skimming, data analysis, MC production, data handling, data verification…

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' D Ø MC and Data Processing on the Grid' - wesley-doyle

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
D mc and data processing on the grid

DØ MC and Data Processing on the Grid

Brad Abbott

University of Oklahoma

D0SAR Sept 21, 2006

Computing at d
Computing at DØ

  • Provide the necessary resources for primary processing of data, reprocessing, fixing, skimming, data analysis, MC production, data handling, data verification…

  • Provide this in a timely manner to allow researchers to analyze data in efficient manner.


  • Collecting data at ~ 50 events/sec.

  • Processing time is ~ 70 GHz-sec event

  • ~ 900 CPU’s on DØ farm running 24/7 to keep up with data

  • Need Millions of Monte Carlo events

  • Store data to tape and allow easy access (SAM)

  • Have ability to reprocess, fix, data in timely manner.

  • Provide computing resources to analyzers

Local Facilities

  • 70 TB of project disk CluedØ/CAB

  • CAB

    • 2.2 THz of CPU (comparable to the FNAL production farm)

    • 235TB of SAM Cache

    • More CPU/Disk on order

  • CluedØ

    • An incredible resource by the people for the people!

    • 1+ THz

    • SAM Cache

    • 70 TB (nodes) + 160 TB (servers)

Monday Report

August 14, 2006,

Typical week


What does a typical week look like?


data analyzed events projects

clued0 15.09T 402M 646

fnal-cabsrv2 115.51T 2685M 1611

fnal-cabsrv1 85.56T 2358M 985

D0 TOTAL 216.16T 5446M 3242

Analysis over time

  • Events consumed by station since “the beginning of SAM time”

  • Integrates to 300B events consumed

Cabsrv-Blue, red


Current computing status
Current Computing Status

  • Overall very good.

  • Reconstruction keeping up with data taking.

  • Data handling working well

  • Remote sites for MC, reprocessing, processing, fixing

  • Significant analysis CPU

Future challenges
Future challenges

  • Larger data sets

    • Luminosities > 200 E 30

  • Increased sharing of manpower with LHC

    • Reduced manpower for DØ

  • Tight budgets

    • Need to use shared resources

Significantly longer to process

Computing resources need to

Deal with this


Need to plan on luminosities of 400 E 30

D computing model
DØ computing model

  • Distributed computing, moving toward automated use of common tools on grid

  • Scalable

  • Work with LHC, not against, increased resources

  • Need to conform to standards

  • DØ running experiment and is taking data. Need to take prudent approach to computing

  • SAMgrid


  • SAM: Data Handling

    • Over 7PB consumed last year

    • Up to 1 PB/month

  • SAMGrid:

    • JIM: Job submission and monitoring

    • SAM+JIM: SAMGrid

    • 20 native execution sites

    • Automated submission to other grids

Progression on remote farms
Progression on Remote Farms

  • MC  data reprocessing  processing  skimming* analysis*

  • Facilities: Dedicated farms  shared farm OSG/LCG

  • Automation: Expert  regional farmer  any user*

*Not yet implemented

Data reprocessing on grid
Data Reprocessing on Grid

  • Reprocessing of data: 1 Billion events (250 TB from raw)

    • SAMGrid as default, using shared resources

    • 3.5 THz for 6 months – Largest such effort in HEP

  • Refixing: 1.4 B events in 6 weeks

    • Used SAMGrid, automated use of LCG,OSG

  • Finished on time. Very successful

Processing on grid
Processing on Grid

  • Prefer not to do primary processing on Grid.

  • Can do processing on a few select sites that have been well certified (Has been shown, Cable swap data processed at OU)

  • Certification of Grid is problematic

  • Do not need to worry about fair-share, availability of nodes etc.

Cable swap data at ou
Cable swap data at OU

  • First time that primary processing performed at a remote site for DØ

  • Processed 9463 files

  • Total of 3421.6 GB

  • Events: 18391876

  • Took ~ 3 months. Partly since we only had ~70 of the available 270 CPU’s

Mc production resources
MC Production resources

  • All produced offsite

  • MC less stringent, i.e. can always make more

  • Native SAMGrid Producers: CMS-FNAL. Gridka, LTU, LU, MSU, OU(2), SPRACE, TATA, Westgrid, Wuppertal, FZU

  • Non-SAMGrid: Lyon and Nikhef

  • LCG -21 CE’s (10 UK, 6 FR, 3NL, 1 CZ, 1 DE)


Monte carlo
Monte Carlo

  • More than 250 Million events produced

  • Up to 10 million events/week

  • LCG and OSG

  • 59% SAMGrid

  • 80.4%Europe

  • 15.7% N. America

  • 3.5% S. America

  • 0.3% Asia

Current plans
Current plans

  • Reprocessing of Run IIB data needed

  • 300 million events

  • Takes ~ 80 GHZ-sec/event to process

  • Expect to need ~ 2000 CPUs for 4 months to reprocess data

  • Utilize OSG sites much more extensively

  • SAM v7 (One version of SAM)

  • Plan on beginning in November

Current plans (cont)

  • Overall priority is to reduce manpower needs for midterm and long term by assuring additional functionality is quickly developed. First in SAMGrid mode with rapid transfer to automated forwarding nodes.

  • CAB running as part of Fermigrid

  • Moving full functionality to the forwarding mechanisms

  • Automated production of MC with OSG

  • Sam shifters take over responsibility of submitting jobs

  • Automated submission to use full power of interoperability/grid resources


  • DØ computing model very successful

  • MC and data are continuing to move more toward using Grid resources

  • LCG has been used more heavily in past but soon OSG will be more heavily utilized

  • Remote computing critical for continued success of DØ