Site report
1 / 39

Site Report - PowerPoint PPT Presentation

  • Uploaded on

Site Report Roberto Gomezel INFN Outline of Presentation Computing Environment Security Services Network AFS BBS INFN Farms Tier 1 at CNAF Computing Environment and security 95% of boxes are PCs running Linux or Windows Mac OS boxes keep on living

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Site Report' - albert

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Site report l.jpg

Site Report

Roberto Gomezel


Outline of presentation l.jpg
Outline of Presentation

  • Computing Environment

  • Security

  • Services

  • Network

  • AFS

  • BBS

  • INFN Farms

  • Tier 1 at CNAF

Computing environment and security l.jpg
Computing Environment and security

  • 95% of boxes are PCs running Linux or Windows

  • Mac OS boxes keep on living

  • Just a few commercial unix boxes only used for specific tasks or needs

  • VPNs available in many sites

    • Cisco boxes using IPsec

    • NetScreen boxes using IPsec

    • SSL VPNs are under evaluation

      • The use of SSL eliminates the need of installing client software

      • it enables instant access for users simply using a Web browser

  • Network Security

    • Dedicated Firewall machines just in a few sites

    • Implemented with access lists on router connected to WAN

INFN Site Report – R.Gomezel

Desktop l.jpg

  • PCs running Linux and Windows

  • Automatic installation using Kickstart for Linux and RIS for Windows

  • Metaframe Citrix or Vmware used to reduce the need to install Windows OS on all PCs for desktop applications

  • A few sites chose to outsource support for desktop environment due to lack of personnel

INFN Site Report – R.Gomezel

Backup l.jpg

  • Tape Libraries used:

    • AIT2 – a few sites

    • IBM Magstar – just used at LNF

    • DLT, LTO – wide spread

  • Backup tools:

    • IBM Tivoli – quite used

    • HP Omniback – quite used

    • Atempo Time Navigator – just a few sites

    • Domestic tool - widespread

INFN Site Report – R.Gomezel

Wireless lan l.jpg
Wireless LAN

  • Access point running standard 802.11b,g

  • All sites are using wireless connection as meeting or conferences are running

  • Most of them use it to give connection to laptop computers

  • Security issues:

    • Permission based on Secure Port filtering (MAC Address) – poor security

    • No encryption used

    • Some sites are using 802.1X

INFN Site Report – R.Gomezel

E mail l.jpg

  • Mail Transfer Agent

    • Sendmail – widespread and more used (86%)

    • Postfix – a few sites (14%)

      • But there is an increasing number of sites planning to move from sendmail to postfix

  • Hardware and OS

INFN Site Report – R.Gomezel

E mail user agent l.jpg
E-mail user agent

  • All INFN sites provide an HTTP mail user agent

    • One-third uses IMP

    • One-third uses SQUIRREL

    • Others:

      • IMHO, Open WebMail, Cyrus+Roxen…

  • Other mail user agents

    • Pine, Internet Explorer, Mozilla…

INFN Site Report – R.Gomezel

E mail antivirus l.jpg
E-mail antivirus

INFN Site Report – R.Gomezel

E mail antispam l.jpg
E-mail antispam

  • 75% of INFN sites are using SPAM Assassin as tool to reduce junk e-mail

  • Some sites use RAV or Sophos

  • Just a few sites (5%) are using nothing

  • An acl filter was set on port 25 in order to avoid that hosts not authorized can act as mail relay

  • Only authorized mail relay are allowed to send and receive mail for a specific site

INFN Site Report – R.Gomezel

Security issues l.jpg
Security issues

Monitored by GARR-CERT

Incidents coming from INFN hosts (percentage)

  • Goal by the end 2004:

    • define a new policy for ACL setting

    • Input filter: default deny

      • services just on hosts checked very strictly

        • Output filter:

    • port 25

INFN Site Report – R.Gomezel

Infn network l.jpg
INFN network

  • LAN backbone network mainly based on Gigabit Ethernet

    • Layer 2 and 3 switching

    • No layer 4 switching

  • The INFN WAN network is completely integrated into the GARR, nation-wide infrastructure, providing a backbone connectivity at 2.5 Gigabit

    • POP typical access bandwidth for INFN sites: 34Mbps, 155 Mbps, Gigabit ethernet

    • There is a trend to have a Gigabit Ethernet access in any site with a bandwidth management through rate limiting mechanism (CAR) according to the needs of the specific site

INFN Site Report – R.Gomezel

Slide13 l.jpg

  • INFN sites keep on using AFS services to share data and software throughout sites

  • Most of local cells have completely moved server functionality to Linux boxes running OpenAFS software

  • Authentication and file server functionalities of the nation-wide cell INFN.IT are running on Linux boxes with OpenAFS

  • The migration of INFN.IT authentication servers from Kerberos IV to Kerberos V is expected to be accomplished by the end of the year

INFN Site Report – R.Gomezel

Bbs bologna batch system l.jpg

BBS - Bologna Batch System

The Bologna Batch System (BBS) is a software tool that allows users from INFN Bologna to submit batch jobs to a set of well defined machines, from any INFN Bologna machines with Condor installed.

Collaboration between the C. S. Dept., Univ. of Wisconsin-Madison and the INFN Bologna.

Main features of BBS:

Any executable can be submitted to the system (scripts, compiled and linked programs, etc.).

Two different 'queues' , short and long. Short and long jobs have a different priority (nice) when running on the same machine.

Short jobs may run for no longer than an hour, but run at a higher priority.

BBS tries to balance the load of the BBS CPUs. 

  • P.Mazzanti

Slide15 l.jpg

Presently the system consists of 16 2-CPU servers, Linux RedHat 9 and a single CPU machine. 7 machines are from ALICE experiment.

BBS machines belong to the large INFN WAN Pool; they may be accessed from outside when no BBS job is running, while becoming IMMEDIATELY available when a BBS job asks to be run.

Only short jobs will be accepted by the 7 ALICE machines if submitted

non ALICE group user.

  • P.Mazzanti

Slide16 l.jpg

Aggregate jobs, daily

Aggregate jobs, weekly

  • P.Mazzanti

Slide17 l.jpg

daily Load

weekly Load

  • P.Mazzanti

Infn site farm a new challenge l.jpg
INFN Site Farm: a new challenge

  • Some sites are planning to reconfigure and integrate computing facilities and local experiment-specific farm into a unique computing farm

    • Reason: in order to avoid the increasing deployment of a lot of little and private farms for each single experiment in addition to the general purpose computing facility

  • Introduction of SAN infrastructure to connect storage systems and computing units

    • GFS file system is under evaluation as an efficient way of providing a cluster file sytem and volume manager

    • Interesting because it is part of the SL3 distribution

  • A lot of work for designing a mechanism to provide computing resources to different experiments according to their needs in a dynamic way

    • We can learn from the experience coming from CNAF Tier1 and other Labs

INFN Site Report – R.Gomezel

Hardware solutions for the tier1 at cnaf l.jpg
Hardware solutions for the Tier1 at CNAF

Luca dell’Agnello

Stefano Zani

(INFN – CNAF, Italy)

  • Luca dell’Agnello -Stefano Zani

Tier1 l.jpg

  • INFN computing facility for HEP community

    • Ending prototype phase last year, now fully operational

    • Location: INFN-CNAF, Bologna (Italy)

      • One of the main nodes on GARR network

    • Personnel: ~ 10 FTE’s

      • ~ 3 FTE's dedicated to experiments

  • Multi-experiment

    • LHC experiments(Alice, Atlas, CMS, LHCb), Virgo, CDF, BABAR, AMS, MAGIC, ...

    • Resources dynamically assigned to experiments according to their needs

  • 50% of the Italian resource for LCG

    • Participation to experiments data challenge

    • Integrated with Italian Grid

    • Resources accessible also in traditional way

  • Luca dell’Agnello -Stefano Zani

Logistics l.jpg

  • Moved to a new location (last January)

    • Hall in the basement (-2nd floor)

    • ~ 1000 m2 of total space

      • Computing Nodes

      • Storage Devices

      • Electric Power System (UPS)

      • Cooling and Air conditioning system

      • Garr GPop

    • Easily accessible with lorries from the road

    • Not suitable for office use (remote control needed)

  • Luca dell’Agnello -Stefano Zani

Slide22 l.jpg

Electric Power

  • Electric Power Generator

    • 1250 KVA (~ 1000 KW)

       up to 160 racks

  • Uninterruptible Power Supply (UPS)

    • Located into a separate room (conditioned and ventilated)

    • 800 KVA (~ 640 KW)

  • 380 V three-phase distributed to all racks (Blindo)

    • Rack power controls output 3 independent 220 V lines for computers

    • Rack power controls sustain burden up to 16 or 32 A

      • 32 A power controls needed for Xeon 36 bi-processors racks

    • 3 APC power distribution modules (24 outlets each)

  • Luca dell’Agnello -Stefano Zani

Slide23 l.jpg

Cooling & Air Conditioning

  • RLS (Airwell) on the roof

    • ~ 700 KW

    • Water cooling

    • Need “booster pump” (20 mts T1  roof)

    • Noise insulation

  • 1 Air Conditioning Unit (uses 20% of RLS refreshing power and controls humidity)

  • 12 Local Cooling Systems (Hiross) in the computing room

  • Luca dell’Agnello -Stefano Zani

Wn typical rack composition l.jpg
WN typical Rack Composition

  • Power Controls (3U)

  • 1 network switch (1-2U)

    • 48 FE copper interfaces

    • 2 GE fiber uplinks

  • 34-36 1U WNs

    • Connected to network switch via FE

    • Connected to KVM system

  • Luca dell’Agnello -Stefano Zani

Slide25 l.jpg

Remote console control

  • Paragon UTM8 (Raritan)

    • 8 Analog (UTP/Fiber) output connections

    • Supports up to 32 daisy chains of 40 nodes (UKVMSPD modules needed)

    • Costs: 6 KEuro + 125 Euro/server (UKVMSPD module)

    • IP-reach (expansion to support IP transport) evaluted but not used

  • Autoview 2000R (Avocent)

    • 1 Analog + 2 Digital (IP transport) output connections

    • Supports connections up to 16 nodes

      • Optional expansion to 16x8 nodes

    • Compatible with Paragon (“gateway” to IP)

  • Luca dell’Agnello -Stefano Zani

Networking 1 l.jpg
Networking (1)

  • Main Network infrastructure based on optical fibres (~ 20 Km)

    • To ease adoption of new (High Performances) transmission technologies

    • To insure a better electrical insulation on long distances

    • Local (Rack wide) links with UTP (copper) cables

  • LAN has a “classical” star topology

    • GE core switch (Enterasys ER16)

    • NEW core switch (Black Diamond 10808 ) is in pre production

      • 120 Gbit Fiber (Scale up to 480 ports)

      • 12 10 Gbit Ethernet (Scale up to max 48 ports)

    • Farms up-link via GE trunk (Channel) to core switch

    • Disk Servers directly connected to GE switch (mainly fibre)

  • Luca dell’Agnello -Stefano Zani

Networking 2 l.jpg
Networking (2)

  • WN's connected via FE to rack switch (1 switch per rack)

    • Not a single brand for switches (as for wn's)

      • 3 Extreme Summit 48 FE + 2 GE ports

      • 3 3550 Cisco 48 FE + 2 GE ports

      • 8 Enterasys 48 FE 2GE ports

      • 10 switch Summit400 48 GE copper + 2 GE ports + (2x10Gb ready)

    • Homogeneous characteristics

      • 48 Copper Ethernet ports

      • Support of main standards (e.g. 802.1q)

      • 2 Gigabit up-links (optical fibers) to core switch

  • CNAF interconnected to GARR-G backbone at 1 Gbps.

  • Luca dell’Agnello -Stefano Zani

Slide28 l.jpg















Babar SW














Network Configuration

Internal services

1 Gb/s


1st Floor







Disk Servers




L2 configuration l.jpg
L2 Configuration

  • Each Experiment has its own VLAN

  • Solution adopted for complete granularity

    • Port based VLAN

    • VLAN identifiers are propagated across switches (802.1q)

    • Avoid recabling (or physical moving) of machines to change farm topology

  • Level 2 isolation of farms

  • Possibility to define multi-tag (Trunk) ports (for servers)

  • Luca dell’Agnello -Stefano Zani

Slide30 l.jpg

Power Switches

  • 2 models used at Tier1:

    • “Old” APC MasterSwitch Control Unit AP9224 controlling 3x8 outlets 9222 PDU from 1 Ethernet

    • “New” APC PDU Control Unit AP7951 controlling 24 outlets from 1 Ethernet

  • “zero” Rack Unit (vertical mount)

  • Access to the configuration/control menu via serial/telnet/web/snmp

  • 1 Dedicated machine running APC Infrastruxure Manager Software (in progress)

  • Luca dell’Agnello -Stefano Zani

Slide31 l.jpg

Remote Power Distribution Unit

Screenshot of APC Infrastruxure Manager Software

with the status of all TIER1 PDU

  • Luca dell’Agnello -Stefano Zani

Computing units l.jpg
Computing units

  • ~ 800 1U rack-mountable Intel dual processor servers

    • 800 MHz – 3.06 GHz

    • ~ 700 wn’s (~ 1400 CPU’s) available for LCG

  • Tendering:

    • HPC farm with MPI

      • Servers interconnected via Infiniband

    • Opteron farm (near future)

  • Luca dell’Agnello -Stefano Zani

Storage resources l.jpg
Storage Resources

~200 TB RAW Disk Space ON LINE.

  • NAS

    • NAS1+NAS4 (3Ware low cost) Tot 4.2 TB

    • NAS2+NAS3 (Procom) Tot 13.2 TB

  • SAN

    • Dell Powervault 660f Tot 7 TB

    • Axus (Brownie) Tot 2 TB

    • STK Bladestore Tot 9 TB

    • Infortrend ES A16F-R Tot 12 TB

    • IBM Fast-T 900 Tot 150 TB

  • Luca dell’Agnello -Stefano Zani

Slide34 l.jpg

STORAGE resource


STK180 with 100 LTO (10Tbyte Native)





1800 Gbyte

2 SCSI interfaces


1800+2000 Gbyte


STK L5500 robot

(max 5000)

6 LTO-2

Gadzoox Slingshot

FC Switch 18 port

Fileserver CMS


Fileserver Fcds2

Alias diskserv-ams-1 diskserv-atlas-1



12 TB


8100 Gbyte



4700 Gbyte



Circa 2200 GByte

2 FC interface


7100 GByte

2 FC interface

STK BladeStore

Circa 10000 GByte

4 FC interface



  • Luca dell’Agnello -Stefano Zani

Slide35 l.jpg

Storage management and access (1)

  • Tier1 storage resources accessible as classical storage or via grid

  • Non grid disk storage accessible via NFS

  • Generic WN’s also have AFS client

  • NFS mount volumes configured via autofs and ldap

    • unique configuration repository eases maintenance

    • in progress: integration of ldap configuration with Tier1 db data

  • Scalability issues with NFS

    • Experienced stalled mount points

  • Luca dell’Agnello -Stefano Zani

Slide36 l.jpg

Storage management and access (2)

  • Part of disk storage used as front-end to CASTOR

    • Balance between disk and CASTOR according to experiments needs

  • 1 stager for each experiment (installation in progress)

  • CASTOR accessible both directly or via grid

    • CASTOR SE available

  • ALICE Data Challenge used CASTOR architecture

    • Feedback to CASTOR team

    • Need optimization for file restaging

  • Luca dell’Agnello -Stefano Zani

Slide37 l.jpg

Tier1 Database

  • Resource database and management interface

    • Postgres database as back end

    • Web interface (apache+mod_ssl+php)

    • Hw servers characteristics

    • Sw servers configuration

    • Servers allocation

  • Possible direct access to db for some applications

    • Monitoring system

    • Nagios

  • Interface to configure switches and interoperate with installation system.

  • Luca dell’Agnello -Stefano Zani

Slide38 l.jpg

Installation issues

  • Centralized installation system

    • LCFG (EDG WP4)

    • Integration with a central Tier1 db

    • Moving from a farm to another implies just changes in IP address (not name)

    • Unique dhcp server for all VLANs

    • Support for DDNS (

  • Investigating Quattor for future needs

  • Luca dell’Agnello -Stefano Zani

Slide39 l.jpg

Our Desired Solution for Resource Access

  • SHARED RESOURCES among all experiments

    • Priorities and reservations managed by the scheduler

  • Most of Tier1 computing machines installed as LCG Worker Nodes, with light modifications to support more VOs

  • Application Software not directly installed on WNs but accessed from outside (NFS, AFS, …)

  • One or more Resource Manager to manage all the WNs in a centralized way

  • Standard way to access Storage for each application

  • Luca dell’Agnello -Stefano Zani