d mc and data processing on the grid n.
Download
Skip this Video
Download Presentation
D Ø MC and Data Processing on the Grid

Loading in 2 Seconds...

play fullscreen
1 / 22

D Ø MC and Data Processing on the Grid - PowerPoint PPT Presentation


  • 102 Views
  • Uploaded on

D Ø MC and Data Processing on the Grid. Brad Abbott University of Oklahoma D0SAR Sept 21, 2006. Computing at D Ø. Provide the necessary resources for primary processing of data, reprocessing, fixing, skimming, data analysis, MC production, data handling, data verification…

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'D Ø MC and Data Processing on the Grid' - wesley-doyle


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
d mc and data processing on the grid

DØ MC and Data Processing on the Grid

Brad Abbott

University of Oklahoma

D0SAR Sept 21, 2006

computing at d
Computing at DØ
  • Provide the necessary resources for primary processing of data, reprocessing, fixing, skimming, data analysis, MC production, data handling, data verification…
  • Provide this in a timely manner to allow researchers to analyze data in efficient manner.
challenges
Challenges
  • Collecting data at ~ 50 events/sec.
  • Processing time is ~ 70 GHz-sec event
  • ~ 900 CPU’s on DØ farm running 24/7 to keep up with data
  • Need Millions of Monte Carlo events
  • Store data to tape and allow easy access (SAM)
  • Have ability to reprocess, fix, data in timely manner.
  • Provide computing resources to analyzers
slide4

Local Facilities

  • 70 TB of project disk CluedØ/CAB
  • CAB
    • 2.2 THz of CPU (comparable to the FNAL production farm)
    • 235TB of SAM Cache
    • More CPU/Disk on order
  • CluedØ
    • An incredible resource by the people for the people!
    • 1+ THz
    • SAM Cache
    • 70 TB (nodes) + 160 TB (servers)
slide5

Monday Report

August 14, 2006,

Typical week

Usage

What does a typical week look like?

ANALYSIS STATIONS

data analyzed events projects

clued0 15.09T 402M 646

fnal-cabsrv2 115.51T 2685M 1611

fnal-cabsrv1 85.56T 2358M 985

D0 TOTAL 216.16T 5446M 3242

slide6

Analysis over time

  • Events consumed by station since “the beginning of SAM time”
  • Integrates to 300B events consumed

Cabsrv-Blue, red

Clued0-grey

current computing status
Current Computing Status
  • Overall very good.
  • Reconstruction keeping up with data taking.
  • Data handling working well
  • Remote sites for MC, reprocessing, processing, fixing
  • Significant analysis CPU
future challenges
Future challenges
  • Larger data sets
    • Luminosities > 200 E 30
  • Increased sharing of manpower with LHC
    • Reduced manpower for DØ
  • Tight budgets
    • Need to use shared resources
slide10

Significantly longer to process

Computing resources need to

Deal with this

Previously

Need to plan on luminosities of 400 E 30

d computing model
DØ computing model
  • Distributed computing, moving toward automated use of common tools on grid
  • Scalable
  • Work with LHC, not against, increased resources
  • Need to conform to standards
  • DØ running experiment and is taking data. Need to take prudent approach to computing
  • SAMgrid
samgrid
SamGrid
  • SAM: Data Handling
    • Over 7PB consumed last year
    • Up to 1 PB/month
  • SAMGrid:
    • JIM: Job submission and monitoring
    • SAM+JIM: SAMGrid
    • 20 native execution sites
    • Automated submission to other grids
progression on remote farms
Progression on Remote Farms
  • MC  data reprocessing  processing  skimming* analysis*
  • Facilities: Dedicated farms  shared farm OSG/LCG
  • Automation: Expert  regional farmer  any user*

*Not yet implemented

data reprocessing on grid
Data Reprocessing on Grid
  • Reprocessing of data: 1 Billion events (250 TB from raw)
    • SAMGrid as default, using shared resources
    • 3.5 THz for 6 months – Largest such effort in HEP
  • Refixing: 1.4 B events in 6 weeks
    • Used SAMGrid, automated use of LCG,OSG
  • Finished on time. Very successful
processing on grid
Processing on Grid
  • Prefer not to do primary processing on Grid.
  • Can do processing on a few select sites that have been well certified (Has been shown, Cable swap data processed at OU)
  • Certification of Grid is problematic
  • Do not need to worry about fair-share, availability of nodes etc.
cable swap data at ou
Cable swap data at OU
  • First time that primary processing performed at a remote site for DØ
  • Processed 9463 files
  • Total of 3421.6 GB
  • Events: 18391876
  • Took ~ 3 months. Partly since we only had ~70 of the available 270 CPU’s
mc production resources
MC Production resources
  • All produced offsite
  • MC less stringent, i.e. can always make more
  • Native SAMGrid Producers: CMS-FNAL. Gridka, LTU, LU, MSU, OU(2), SPRACE, TATA, Westgrid, Wuppertal, FZU
  • Non-SAMGrid: Lyon and Nikhef
  • LCG -21 CE’s (10 UK, 6 FR, 3NL, 1 CZ, 1 DE)
  • OSG 8 CE’s ( UNL, IU, Purdue, SPGRID, OCHEP, TOPDAWG, UWM, CMS-FNAL
monte carlo
Monte Carlo
  • More than 250 Million events produced
  • Up to 10 million events/week
  • LCG and OSG
  • 59% SAMGrid
  • 80.4%Europe
  • 15.7% N. America
  • 3.5% S. America
  • 0.3% Asia
current plans
Current plans
  • Reprocessing of Run IIB data needed
  • 300 million events
  • Takes ~ 80 GHZ-sec/event to process
  • Expect to need ~ 2000 CPUs for 4 months to reprocess data
  • Utilize OSG sites much more extensively
  • SAM v7 (One version of SAM)
  • Plan on beginning in November
slide20

Current plans (cont)

  • Overall priority is to reduce manpower needs for midterm and long term by assuring additional functionality is quickly developed. First in SAMGrid mode with rapid transfer to automated forwarding nodes.
  • CAB running as part of Fermigrid
  • Moving full functionality to the forwarding mechanisms
  • Automated production of MC with OSG
  • Sam shifters take over responsibility of submitting jobs
  • Automated submission to use full power of interoperability/grid resources
conclusions
Conclusions
  • DØ computing model very successful
  • MC and data are continuing to move more toward using Grid resources
  • LCG has been used more heavily in past but soon OSG will be more heavily utilized
  • Remote computing critical for continued success of DØ