100 likes | 223 Views
This document outlines the key points from the D0 Grid Data Production Initiative Coordination Meeting held on November 6, 2008. It includes deployment planning updates, task status, and action items related to data production processes. The meeting discussed deployment features, current configurations, and progress on the integration of Forwarding Nodes and Queuing nodes. Notable advancements like upcoming tests and installations are highlighted, along with noted challenges in post-installation testing. Continuous updates on critical tasks were provided to ensure project timelines are met.
E N D
Version 1.0 (meeting edition) 06 November 2008 Rob Kennedy and Adam Lyon Attending: … D0 Grid Data Production Initiative:Coordination Mtg 9 D0 Grid Data Production
D0 Grid Data Production Outline • Summary and News • Open Action Items: none to call out • Deployment “Feature List”: drives what is critical • No change since last week • Task Status (4 slides): Most of our time. • Deployment 1 Planning: start today
D0 Grid Data Production Summary and News • Summary • Umbrella Packages, Installation Manuals: good • Post-Install Tests: not good yet, seems more like “add user” issues than “FWD Node” issues • A day or so behind, deployment next week. • News: • A job did run via FWD4 in the first test! • …
D0 Grid Data Production Open Action Items(Green = effectively done, Yellow = added notes, Blue = coming week) • <none to call out>
D0 Grid Data Production Current Deployment “Feature” Lists • Deployment 1: Split Data/MC Production Services (NO CHANGE) • Time frame: November 13-17, with 1 week+ observation before holidays • 1. Config: Basic Splitting of Fwd,Que Services between Data and MC Production with 2 Fwd nodes assigned to each, plus 1 Fwd dedicated to all Merging • 2. Fwd4 deployed (w/o virtualization) • 3. Fwd5 deployed • 4. Que2 deployed, with client software to enable parallel use of 2 QUE nodes • 5. New SAM Station (moved off of FWD1) • 6. Condor 7 via “new” 1.10.1m official release from UWisc • 7. FileMax increase on all Fwd nodes to handle large nJob actions • 8. D0Runjob Upgrade for Data Production: Prerequisite for deploying new SAM-Grid release • Deployment 2: Optimize Data and MC Production Configurations (NO CHANGE) • Time frame: December 8-10, with 1 week+ observation before holidays • 1. Config: Optimize Configurations separately for Data and MC Production, especially to increase Data Production “queue” length • 2. New SAM-Grid Release with support for new Job status value at Queuing node
D0 Grid Data Production Task Status (1 of 4)(Red = critical tasks, Green = done, Blue = in progress,Yellow = added notes) • 1.1.1 Forwarding Node 4 (Fwd4) • <Snip some completed tasks> • 1.1.1.13 Fwd4: Install with Version-Based FWD Umbrella Product AL JB Thu 10/30/08 Thu 10/30/08 1d • 1.1.1.9 Fwd4: Few Jobs FileMax=As-Is Test AL JB Mon 11/3/08 Wed 11/5/08 3d • 1.1.1.10 Fwd4: Pre-Deployment FileMax=16k Test AL JS Thu 11/6/08 Mon 11/10/08 3d • 1.1.1.11 Milestone: Fwd4 Ready to Deploy AL Mon 11/10/08Mon 11/10/08 0d • 1.1.2 Forwarding Node 5 (Fwd5) • <Snip some completed tasks> • 1.1.2.10 Fwd5: Install with Version-Based FWD Umbrella Product AL JB Thu 10/30/08 Thu 10/30/08 1d • 1.1.2.7 Fwd5: Few Jobs FileMax=As-Is Test AL JB Mon 11/3/08 Wed 11/5/08 3d • 1.1.2.8 Fwd5: Pre-Deployment FileMax=16k Test AL JS Thu 11/6/08 Mon 11/10/08 3d • 1.1.2.9 Milestone: Fwd5 Ready to Deploy AL Mon 11/10/08Mon 11/10/08 0d • 1.1.8 FWD and QUE Packaging with Version-Based Umbrella Product • <Snip some completed tasks> • Milestone: FWD Umbrella Product ready to use "GG,AL" Wed 10/29/08 Wed 10/29/08 0d • 1.1.8.6 Umbrella Product: Update FWD Installation Procedure AL JB Fri 11/7/08 Mon 11/10/08 2d • Change in scheme: Red = ALL critical tasks for deployment 1 completion. • Notes…
D0 Grid Data Production Task Status (2 of 4)(Red = critical tasks, Green = done, Blue = in progress,Yellow = added notes) • 1.1.8 FWD and QUE Packaging with Version-Based Umbrella Product • 1.1.8.7 Umbrella Product: Initial QUE Umbrella Product Release GG PM Thu 10/30/08 Thu 10/30/08 1d • 1.1.8.8 Umbrella Product: Rework QUE Installation Procedure AL PM Fri 10/31/08 Fri 10/31/08 1d • 1.1.8.9 Milestone: QUE Umbrella Products ready to use GG PM Fri 10/31/08 Fri 10/31/08 0d • 1.1.8.12 Umbrella Product: Update QUE Umbrella… AL PM Mon 11/3/08 Mon 11/3/08 0.5d • 1.1.8.10 Umbrella Product: Update QUE Installation Procedure AL JB Mon 11/10/08Tue 11/11/08 2d • 1.1.8.13 Umbrella Product: FWD, QUE Installation Procedures archive ALREX Wed 11/12/08Thu 11/13/08 2d • 1.1.8.11 Milestone: FWD and QUE Packaging … done "GG,AL" Thu 11/13/08 Thu 11/13/08 0d • 1.1.3 Queuing Node 2 (Que2) • <Snip some completed tasks> • 1.1.3.12 Que2: Install with Version-Based FWD Umbrella Product AL JB Tue 11/4/08 Tue 11/4/08 1d • 1.1.3.10 Que2: Jim_Client 2-QUE Support: Client Deployment AL REX Wed 11/5/08 Wed 11/5/08 1d • 1.1.3.8 Que2: Regression Test w/1-QUE Client (skipped by ABa) AL JB Thu 11/6/08 Fri 11/7/08 2d • 1.1.3.9 Que2: Integration Test w/2-QUE Client AL JB Mon 11/10/08Mon 11/10/08 1d • 1.1.3.11 Milestone: Que2 Ready to Deploy AL Mon 11/10/08Mon 11/10/08 0d • Notes…
D0 Grid Data Production Task Status (3 of 4)(Red = critical tasks, Green = done, Blue = in progress,Yellow = added notes) • 1.1.5 New Distinct Sam Station • <Snip some completed tasks> • 1.1.5.4 SAM Station: Install and Setup Station AL RI? Thu 11/6/08 Fri 11/7/08 2d • 1.1.5.5 SAM Station: Pre-Deployment Test AL RI? Mon 11/10/08Mon 11/10/08 1d • 1.1.5.6 SAM Station: Deployment Plan (Deactivate old/Activate new) AL AL Tue 11/11/08 Tue 11/11/08 1d • 1.1.5.7 Milestone: SAM Station Ready to Deploy AL Tue 11/11/08 Tue 11/11/08 0d • 1.1.5.8 SAM Station: Setup Context Server AL AL Thu 11/13/08 Fri 11/14/08 2d • Not done, original resource busy. Now, this is late and at risk • Notes… • 1.1.6 Deployment Stage 1 • 1.1.6.1 Dep 1: Plan: Split Data/MC Prod Services AL ALL Mon 11/10/08 Wed 11/12/08 3d • 1.1.6.2 Deployment 1: Execute AL REX Thu 11/13/08 Mon 11/17/08 3d • 1.1.6.3 Deployment 1: Monitor AL REX Tue 11/18/08 Mon 11/24/08 5d • 1.1.6.4 Deployment 1: Sign-off AL REX Tue 11/25/08 Tue 11/25/08 1d • 1.1.6.5 MILE 1: Deployment 1 Completed AL Tue 11/25/08 Tue 11/25/08 0d • Bootstrap this today to work ahead: rough work list and known order/priorities • Meeting on Monday (RDK to arrange, I propose 9-10:30am) to work out the details • 17 November 2008 is the drop-dead date to be deployed, what we run for one week before sign-off.
D0 Grid Data Production Task Status (4 of 4)(Red = critical tasks, Green = done, Blue = in progress,Yellow = added notes) • 1.3.1 SAM-Grid Job Status Info • <snip some tasks> • 1.3.1.4 Upgrade D0Runjob version used by Data Production AL "MD,AL“ Thu 10/23/08 Fri 10/24/08 2d • 1.3.2 Slow Fwd-CAB Job Transition • Note: FileMax change requires a schedd restart (ST). Work into deployment plans. • 1.3.3 Improved H/w Uptime • 1.4 Metrics • nSubmissions plot for Sep ’08 Mike? • Ganglia-base D0Farm plot from Keith • Notes…
D0 Grid Data Production Deployment 1 Work • Rough Work List • Verify FWD4-5,QUE2 installed; FWD4-5 FileMax increased • FWD1-3 install/upgrade via umbrella package; Increase FileMax • QUE1 install/upgrade via umbrella package • Deactivate SAM station on FWD1 • Activate new SAM station • Configure FWD1-5 • Configure QUE1-2 • Test system: Data Production, MC Production, Reco/MC Merge Jobs • … • Work on Client Side: adapt to use new jim client package • … • Post-Deployment Work: move context server?