tsm monitoring @ cern
Download
Skip this Video
Download Presentation
TSM Monitoring @ CERN

Loading in 2 Seconds...

play fullscreen
1 / 16

TSM Monitoring @ CERN - PowerPoint PPT Presentation


  • 340 Views
  • Uploaded on

TSM Monitoring @ CERN. Daniele Francesco Kruse CERN IT/DSS Presented by Giuseppe Lo Presti. 20th HEPiX - Vancouver - October 2011. Outline. TSM at CERN TSM Management Station Overview Main features TSMMSv2 Motivations Design New ideas. TSM at CERN (1/3). We back up:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' TSM Monitoring @ CERN' - nakia


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
tsm monitoring @ cern

TSM Monitoring @ CERN

Daniele Francesco Kruse

CERN IT/DSS

Presented by Giuseppe Lo Presti

20th HEPiX - Vancouver - October 2011

outline
Outline
  • TSM at CERN
  • TSM Management Station
    • Overview
    • Main features
  • TSMMSv2
    • Motivations
    • Design
    • New ideas

20th HEPiX - Vancouver - October 2011

slide3

TSM at CERN (1/3)

  • We back up:
    • Network filesystems (60’000 AFS, 1’500 DFS volumes)
    • Email (18’000 mailboxes)
    • Web sites (12’000 websites)
    • Databases (120 DB servers)
    • Servers (1’000 Linux and Windows servers)
    • Virtual Machines (120 hypervisors)
  • We don’t back up:
    • Physics data (using CASTOR for this)
    • User PCs (already backing up home AFS/DFS directories)

20th HEPiX - Vancouver - October 2011

slide4

TSM at CERN (2/3)

  • We currently have around 3.8 PB of backup data and 0.6 PB of archived data
    • … and growing superlinearly (last year 1 PB)
  • Average daily traffic is 50 TB also growing steadily
  • Around 1,200 nodes are backed up, for a total 1,500 million files

20th HEPiX - Vancouver - October 2011

slide5

TSM at CERN (3/3)

17 TSM Servers

in production

on RHEL4/5

80 TB of disk

storage

  • 2 IBM TS3500 libraries
  • 48 IBM drives
  • 4’500 IBM 3952 cartridges

20th HEPiX - Vancouver - October 2011

slide6

TSM Management Station

  • TSM monitoring tool developed in-house
  • Gathers data from the TSM servers
  • Generates graphs and reports with various statistics
  • Sends e-mails to users and administrators to inform them about potential issues
  • Very useful to manage the increasing number of TSM servers

20th HEPiX - Vancouver - October 2011

slide7

TSM Management Station

20th HEPiX - Vancouver - October 2011

slide8

TSM Management Station

TSMMS daily report example:

TSMMS also sends an email for each error in each TSM server

20th HEPiX - Vancouver - October 2011

slide9

TSM Management Station

  • Allows management of groups of nodes (by department and division) and generates graphs and stats for each group
  • Sends alerts to nodes whenever an operation fails or whenever they miss their periodic backup
  • Features options to suspend or stop the alerting system
  • Gives information of each node about file spaces, backup history performance and stats, associated schedules, etc.
  • … and many other stats and graphs

20th HEPiX - Vancouver - October 2011

slide10

Motivations for a new TSMMS

  • TSMMS provides 90% of all the information that is needed
  • However:
    • not use-case oriented
    • not compatible with TSM v6.x(heavily depending on the TSM 5 database schema)
  • The choice was then to start from scratch with a cleandesign and architecture
  • Change in philosophy: the focus is now on how to convey the relevant information for each use-case

20th HEPiX - Vancouver - October 2011

slide11

Splunk

  • TSMMS takes care of the monitoring and the alerting system
  • TSMMSv2 will be only responsible for the monitoring while the alerting tasks will be moved to Splunk
  • Splunk is a commercially available tool (with a free trial):
    • Log aggregator/mining
    • Search engine
    • New features: alerting and reporting
  • TSMMSv2 and Splunk will work together to provide the TSM admin with proper information and alerts

20th HEPiX - Vancouver - October 2011

slide12

Splunk

20th HEPiX - Vancouver - October 2011

slide13

TSMMSv2 modeled on a typical TSM admin day

Add nodes to TSM

Need to find a suitable server ...

Check DB space and Tape pools

Need to have a clear view of DB and pools ...

TSM Admin

Handle user support tickets

Check quickly for any anomaly in the system

Spot issues and solve them

Scope reduced: Splunk does the rest!

20th HEPiX - Vancouver - October 2011

slide14

Structure of TSMMSv2

View Layer (HTML and Javascript Templates)

Controller Layer (Display Logic)

Model Layer

TSMMS

DB

TSM

Server 3

TSM

Server 1

TSM

Server 2

TSM

Server 4

TSM

Server N

20th HEPiX - Vancouver - October 2011

slide15

TSMMSv2 New Ideas

  • TSMMSv2 will focus on helping TSM admins with daily tasks
  • Display only relevant information (not everything else) for the most important issues that may arise
  • Not only monitoring → also GUI for selected common administrative tasks
      • Add new nodes to approriate server
  • Automation of certain tasks, such as:
    • Add new storage space where needed (ex. DB)
    • Automatically deal with faulty tapes or drives

20th HEPiX - Vancouver - October 2011

slide16

Thank you, Questions ?

20th HEPiX - Vancouver - October 2011

ad