1 / 10

Simulation Production at UTD

Simulation Production at UTD. Shuwei YE, UT-Dallas DOE Visit, Nov. 14, 2003. Outline. Upgrade on computing farm ( see Xinchou’s talk ) SP4  SP5 migration Installation Trouble shooting Fine tuning Operation challenges UTD Production rate.

powa
Download Presentation

Simulation Production at UTD

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Simulation Production at UTD Shuwei YE, UT-Dallas DOE Visit, Nov. 14, 2003

  2. Outline • Upgrade on computing farm ( see Xinchou’s talk ) • SP4  SP5 migration • Installation • Trouble shooting • Fine tuning • Operation challenges • UTD Production rate

  3. Computing Farm Upgrade Original system: • 16 dual-CPU (P-III 1.0G), 512M-1G memory/node • 500G RAID + 1.5T soft RAID • Capacity: 5-6 MilloEvents/month Current system: + 64 dual-CPU (P4 2.66G), 1G memory/node + 1.8T RAID • Designed capacity: 30 MilloEvents/month

  4. SPSP5 Migration: Installation • OS : RedHat-6.2  7.2 • PBS installation • Objy-6.2  7.1 • AFS, CERNLIb, Perl, tcl, CVS, ROOT … • SP5 installation and validation

  5. SPSP5 Migration: Trouble Shooting • AFS behind campus firewall • Non-standard software in SP: • Downgrade compiler gcc, libtcl, perl modules • Unrecognizable NFS in Objy: – insecure export • bbftp behind firewall: –passive mode, bug fixed

  6. SPSP5 Migration: Fine Tuning • Hyper-thread testing • Condition and bkg dbs in Objy set read-only • high RPC failure rate – improved by release upgrade • Automatic job submission improvement – more smart, benefit to all sites • Weekend monitor (benefit of laptop)

  7. Hardware Problems • A/C problem (addressed in Xinchou’s talk) • Old RAID problem (spare disks available) • Rare unexpected power outage (could damage databases)

  8. Production RateBefore upgrade Total by UTD: 66.5M

  9. Production RateAfter upgrade Total by UTD: 111.7M

  10. Summary MilloEvents/day after upgrade: Average: 30-40 MilloEvents/month

More Related