Tier1
Download
1 / 19

Tier1 - PowerPoint PPT Presentation


  • 132 Views
  • Uploaded on

Tier1. Andrew Sansum GRIDPP 10 June 2004. Production Service for HEP (PPARC). GRIDPP (2001-2004).

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Tier1 ' - phoebe


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Tier1

Tier1

Andrew Sansum

GRIDPP 10

June 2004


Production service for hep pparc
Production Service for HEP (PPARC)

  • GRIDPP (2001-2004).

    • “GridPP will enable testing of a prototype Grid of significant scale, providing resources for the LHC experiments ALICE, ATLAS, CMS and LHCb, the US-based experiments BaBar, CDF and D0, and lattice theorists from UKQCD”

    • Tier1 provides access to large scale compute resources for experiments

    • Tier A service for Babar physics analysis

    • LHC data challanges

    • Support wide range of prototype PP GRID software (eg Certificate Auth)

    • Close involvement in European Datagrid Project EDG (many testbeds)

  • GRIDPP2 (2004-2007)

    • “From Prototype to Production”

    • Close engagement in LCG project, preparing for LHC startup

    • Continue to provide Tier A centre for Babar

    • EGEE Resource and member of EGEE Testbed

    • Ramp-up to a production quality GRID service

    • Gradually move to only GRID access

Tier1A


Tier1 in gridpp2 2004 2007
Tier1 in GRIDPP2 (2004-2007)

  • The Tier-1 Centre will provide GRIDPP2 with a large computing resource of a scale and quality that can be categorised as an LCG Regional Computing Centre

  • January 2004 – GRIDPP2 confirm RAL to host Tier1 Service

    • GRIDPP2 to commence September 2004

  • Tier1 Hardware budget:

    • £2.3M over 3 years

  • Staff

    • Increase from 12.1 to 13.5(+3 CCLRC) by September

Tier1A


So what exactly is a tier1
So What Exactly is a Tier1

The Tier1 will differentiate itself from the Tier2s by:

  • Providing data management at high QoS able to host Primary/Master copies of data

  • Providing state-of-the-art network bandwidth

  • Contributing to collaborative services/core infrastructure

  • Providing high quality technical support

  • Responding rapidly to service faults

  • Being able to make long term service commitments

Tier1A


Tier1 staffing
Tier1 Staffing

Deploy Tier1 and UK GRID interfaces

Support Support Experiments and their services

Disks Severs/Filesystems

Tapes Robot and Interfaces

Hardware Fix systems/Hardware support

CPU Farms Farm systems

Core Critical systems .. Oracle/Mysql/AFS/Home/Monitoring..

Operations Machine rooms/Tape ops/Interventions…

Manage Project/Planning/Policy/Finance..

Network Site Infrastructure/Tier1 LAN

1FTE

1.5FTE

2FTE

2.5FTE

Tier1A


Current tier1 hardware
Current Tier1 Hardware

  • CPU

    • 350 dual Processor Intel – PIII and Xeon servers mainly rack mounts

    • About 400KSI2K

  • Disk Service – mainly “standard” configuration

    • Dual Processor Server

    • Dual channel SCSI interconnect

    • External IDE/SCSI RAID arrays (Accusys and Infortrend)

    • ATA drives (mainly Maxtor)

    • About 80TB disk

    • Cheap and (fairly) cheerful

  • Tape Service

    • STK Powderhorn 9310 silo with 8 9940B drives

Tier1A


Network
Network

Production

VLAN

Site

Routable

Network

Test

VLAN

Superjanet

Test network (eg MBNG)

Firewall

Rest of

Site

Site Router

Server

Production Subnet

Test Subnet

Servers

Workers

Workers

Servers

Tier1A


Network1
Network

Production

VLAN

Test

VLAN

Superjanet

Test network (eg MBNG)

Firewall

Rest of

Site

Site Router

Server

Tier1 Network

Workers

Servers

Servers

Workers

Tier1A


Uklight
UKlight

  • Connection to RAL in September

  • Funded to end 2005 after which probably merges with SuperJanet 5

  • 2.5Gb/s now  10Gb/s from 2006

  • Effectively dedicated lightpath to CERN

  • Probably not for Tier1 production but suitable for LCG Data challenges etc, building experience for Superjanet upgrade.

Tier1A


New hardware arrives 7 th june
New Hardware Arrives 7th June

  • CPU Capacity (500 KSI2K)

    • 256 dual processor 2.8GHz Xeons

    • 2/4GB Memory

    • 120GB HDA

  • Disk Capacity (140TB)

    • Infortrend Eonstore SATA/SCSI RAID Arrays

    • 16*250GB Western Digital SATA per array

    • Two arrays per server

Tier1A



Next delivery
Next Delivery

  • Need in production by end of year

    • Original schedule of December delivery seems late

    • Will have to start very soon

    • Less chance for testing / new technology

  • Exact proportions not agreed, but …

    • 400 KSI2K (300-400 CPUs)

    • 160TB disk

    • 120TB tape??

    • Network infrastructure?

    • Core servers (H/A??)

    • Redhat?

  • Long range plan needs reviewing – also need long range experiment requirements

Tier1A


Cpu capacity
CPU Capacity

Tier1A



Forthcoming challanges
Forthcoming Challanges

  • Simplify service – less “duplication”

  • Improve storage management

  • Deploy new Fabric Management

  • Redhat Enteprise 3 upgrade

  • Network upgrade/reconfigure????

  • Another procurement/install

  • Meet challenge of LCG – professionalism

  • LCG Data Challenges

Tier1A


Clean up spaghetti diagram
Clean up Spaghetti Diagram

  • Simplify Interfaces: Less GRIDS “More is not always better”

  • How to phase out “Classic” service ..

Tier1A


Storage plus and minus

ATA and SATA drives

External RAID arrays

SCSI interconnect

Ext2 filesystem

Linux O/S

NFS/Xrootd/http/gridftp/bbftp/srb/….

NO SAN

No management layer

NO HSM

2.5% failure per annum - OK

Good architecture, choose well

Surprisingly unreliable: change

OK – but need journal: XFS?

Move to Enterprise 3

Must have SRM

Need SAN (Fibre or iSCSI …)

Need virtualisation/DCACHE..

????

Storage: Plus and Minus

Tier1A


Fabric management
Fabric Management

  • Currently run:

    • Kickstart – cascadingconfig files

    • SURE exception monitoring

    • Automate – automatic interventions

  • Running out of steam with old systems …

    • “Only” 800 systems – but many, many flavours

    • Evaluating Quator – no obvious alternatives – probably deploy

    • Less convinced by Lemon – bit early – running Nagios in parallel

Tier1A


Conclusions
Conclusions

  • After several years of relative stability must start re-engineering many Tier1 components.

  • Must start to rationalise – support limited set of interfaces, operating systems, testbeds … simplify so we can do less better

  • LCG becoming a big driver

    • Service commitments

    • Increase resilience and availability

    • Data challenges and move to steady state

  • Major reality check in 2007!

Tier1A


ad