210 likes | 238 Views
CMS Monitoring tools. Farida Fassi. November 28 th , 2008. Goal. Review of some CMS monitoring tools using ARDA Dashboard Useful features of dashboard for remote monitoring Services status for your site SAM tests for basic diagnostics - Job activities status for your site
E N D
CMS Monitoring tools Farida Fassi November 28th, 2008
Goal • Review of some CMS monitoring tools using ARDA Dashboard • Useful features of dashboard for remote monitoring • Services status for your site • SAM tests for basic diagnostics - Job activities status for your site • PhEDEx monitoring tool for transfer activities • http://cmsweb.cern.ch/phedex/
Starting point http://arda-dashboard.cern.ch/cms/ Jobs SAM
SAM visualization • 4 clickable buttons • Latest Results • Historical View • Feedback Savannah • Help Twiki • Every page you’ll find has an URL
Latest results: CE view Click to reset to menus • The one that ‘comes easy’ Click to see log From GOCDB Click to see 48h history
Last 48h • This view is not clickable ! • But shows when tests ran
select service Types menu • Great instructions from Facility Operation team • https://twiki.cern.ch/twiki/bin/view/CMS/SAMChecklist • Your favorite site will look like this SRMv2,CE tests
SAM availability browsing • Can browse and click down to single test and • then will get log • every time the color matrix • has a blue border • Means it is clickable Click
SAM visualization (2) Click to see log of this test
Job processing on the Grid • To follow the job processing and analysis on the Grid You can use the main CMS Dashboard page: http://dashboard.cern.ch/cms Click on the “Interactive view”
Job Dashboard • Direct link is : http://lxarda09.cern.ch/dashboard/request.py/jobsummary You have a choice: 1).Select to see all jobs submitted in the selected time window (default), By default you get last 24 hours time Window 2).Select all jobs which had been terminated in last 24 hours or are pending or running at the current moment. Then select ‘all jobs regardless submission time’ option
Running time (wall clock, from job wrapper) One random day http://tinyurl.com/2l6s4s click here • One random day
Waiting time (from submission to start of job) http://tinyurl.com/22vknn click here • One random day
Interactive viewWhat info it can provide me? All my jobs at a given site had failed, does the site have a problem? Supposing you are having Problems in FZK. Let’s check whether you are the only one who. Sort by site. The sites having a lot of light green or red, are those which might have a trouble. FKZ looks suspicious in this respect, Let’s investigate further.
Expand using bars • Left click on color bars to get menu for expanding by… • Keep doing it Note: more items then on left menu, in particular by task, by submission type (crab server/direct), etc
Interactive viewWhat info it can provide me? (1) Each column can be used for sorting Each blue number is clickable\Get list of jobs, Grid/Crab id’s, times, exit codes, WorkerNode name (or IP)
Interactive viewWhat info it can provide me? (2) The full list of job failure codes you can get it by clicking at ExitCode Jobs are failing with the code 50115 cmsRun did not produce a valid/readable job report at runtime The full list of job failure codes You can get by clicking at ExitCode Jobs indicating site problem are all marked there
Feedback ! Link to Savannah
Useful links • Commissioning Twiki: https://twiki.cern.ch/twiki/bin/view/CMS/ComputingCommissioning • Dashboard: http://arda-dashboard.cern.ch/cms • SAM: http://lxarda16.cern.ch/dashboard/request.py/samvisualization • Squid Monitoring • http://belforte.home.cern.ch/belforte/misc/Squid-Hit-Summary.html • Details for your site • http://frontier.cern.ch/squidstats/indexcms.html • PhEDEx: http://cmsweb.cern.ch/phedex/