site report l.
Skip this Video
Loading SlideShow in 5 Seconds..
Site Report PowerPoint Presentation
Download Presentation
Site Report

Loading in 2 Seconds...

play fullscreen
1 / 39

Site Report - PowerPoint PPT Presentation

  • Uploaded on

Site Report Roberto Gomezel INFN Outline of Presentation Computing Environment Security Services Network AFS BBS INFN Farms Tier 1 at CNAF Computing Environment and security 95% of boxes are PCs running Linux or Windows Mac OS boxes keep on living

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

Site Report

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
site report

Site Report

Roberto Gomezel


outline of presentation
Outline of Presentation
  • Computing Environment
  • Security
  • Services
  • Network
  • AFS
  • BBS
  • INFN Farms
  • Tier 1 at CNAF
computing environment and security
Computing Environment and security
  • 95% of boxes are PCs running Linux or Windows
  • Mac OS boxes keep on living
  • Just a few commercial unix boxes only used for specific tasks or needs
  • VPNs available in many sites
    • Cisco boxes using IPsec
    • NetScreen boxes using IPsec
    • SSL VPNs are under evaluation
      • The use of SSL eliminates the need of installing client software
      • it enables instant access for users simply using a Web browser
  • Network Security
    • Dedicated Firewall machines just in a few sites
    • Implemented with access lists on router connected to WAN

INFN Site Report – R.Gomezel

  • PCs running Linux and Windows
  • Automatic installation using Kickstart for Linux and RIS for Windows
  • Metaframe Citrix or Vmware used to reduce the need to install Windows OS on all PCs for desktop applications
  • A few sites chose to outsource support for desktop environment due to lack of personnel

INFN Site Report – R.Gomezel

  • Tape Libraries used:
    • AIT2 – a few sites
    • IBM Magstar – just used at LNF
    • DLT, LTO – wide spread
  • Backup tools:
    • IBM Tivoli – quite used
    • HP Omniback – quite used
    • Atempo Time Navigator – just a few sites
    • Domestic tool - widespread

INFN Site Report – R.Gomezel

wireless lan
Wireless LAN
  • Access point running standard 802.11b,g
  • All sites are using wireless connection as meeting or conferences are running
  • Most of them use it to give connection to laptop computers
  • Security issues:
    • Permission based on Secure Port filtering (MAC Address) – poor security
    • No encryption used
    • Some sites are using 802.1X

INFN Site Report – R.Gomezel

e mail
  • Mail Transfer Agent
    • Sendmail – widespread and more used (86%)
    • Postfix – a few sites (14%)
      • But there is an increasing number of sites planning to move from sendmail to postfix
  • Hardware and OS

INFN Site Report – R.Gomezel

e mail user agent
E-mail user agent
  • All INFN sites provide an HTTP mail user agent
    • One-third uses IMP
    • One-third uses SQUIRREL
    • Others:
      • IMHO, Open WebMail, Cyrus+Roxen…
  • Other mail user agents
    • Pine, Internet Explorer, Mozilla…

INFN Site Report – R.Gomezel

e mail antivirus
E-mail antivirus

INFN Site Report – R.Gomezel

e mail antispam
E-mail antispam
  • 75% of INFN sites are using SPAM Assassin as tool to reduce junk e-mail
  • Some sites use RAV or Sophos
  • Just a few sites (5%) are using nothing
  • An acl filter was set on port 25 in order to avoid that hosts not authorized can act as mail relay
  • Only authorized mail relay are allowed to send and receive mail for a specific site

INFN Site Report – R.Gomezel

security issues
Security issues

Monitored by GARR-CERT

Incidents coming from INFN hosts (percentage)

  • Goal by the end 2004:
    • define a new policy for ACL setting
    • Input filter: default deny
      • services just on hosts checked very strictly
          • Output filter:
      • port 25

INFN Site Report – R.Gomezel

infn network
INFN network
  • LAN backbone network mainly based on Gigabit Ethernet
    • Layer 2 and 3 switching
    • No layer 4 switching
  • The INFN WAN network is completely integrated into the GARR, nation-wide infrastructure, providing a backbone connectivity at 2.5 Gigabit
    • POP typical access bandwidth for INFN sites: 34Mbps, 155 Mbps, Gigabit ethernet
    • There is a trend to have a Gigabit Ethernet access in any site with a bandwidth management through rate limiting mechanism (CAR) according to the needs of the specific site

INFN Site Report – R.Gomezel

  • INFN sites keep on using AFS services to share data and software throughout sites
  • Most of local cells have completely moved server functionality to Linux boxes running OpenAFS software
  • Authentication and file server functionalities of the nation-wide cell INFN.IT are running on Linux boxes with OpenAFS
  • The migration of INFN.IT authentication servers from Kerberos IV to Kerberos V is expected to be accomplished by the end of the year

INFN Site Report – R.Gomezel

bbs bologna batch system

BBS - Bologna Batch System

The Bologna Batch System (BBS) is a software tool that allows users from INFN Bologna to submit batch jobs to a set of well defined machines, from any INFN Bologna machines with Condor installed.

Collaboration between the C. S. Dept., Univ. of Wisconsin-Madison and the INFN Bologna.

Main features of BBS:

Any executable can be submitted to the system (scripts, compiled and linked programs, etc.).

Two different 'queues' , short and long. Short and long jobs have a different priority (nice) when running on the same machine.

Short jobs may run for no longer than an hour, but run at a higher priority.

BBS tries to balance the load of the BBS CPUs. 

  • P.Mazzanti

Presently the system consists of 16 2-CPU servers, Linux RedHat 9 and a single CPU machine. 7 machines are from ALICE experiment.

BBS machines belong to the large INFN WAN Pool; they may be accessed from outside when no BBS job is running, while becoming IMMEDIATELY available when a BBS job asks to be run.

Only short jobs will be accepted by the 7 ALICE machines if submitted

non ALICE group user.

  • P.Mazzanti

Aggregate jobs, daily

Aggregate jobs, weekly

  • P.Mazzanti

daily Load

weekly Load

  • P.Mazzanti
infn site farm a new challenge
INFN Site Farm: a new challenge
  • Some sites are planning to reconfigure and integrate computing facilities and local experiment-specific farm into a unique computing farm
    • Reason: in order to avoid the increasing deployment of a lot of little and private farms for each single experiment in addition to the general purpose computing facility
  • Introduction of SAN infrastructure to connect storage systems and computing units
    • GFS file system is under evaluation as an efficient way of providing a cluster file sytem and volume manager
    • Interesting because it is part of the SL3 distribution
  • A lot of work for designing a mechanism to provide computing resources to different experiments according to their needs in a dynamic way
    • We can learn from the experience coming from CNAF Tier1 and other Labs

INFN Site Report – R.Gomezel

hardware solutions for the tier1 at cnaf
Hardware solutions for the Tier1 at CNAF

Luca dell’Agnello

Stefano Zani

(INFN – CNAF, Italy)

  • Luca dell’Agnello -Stefano Zani
  • INFN computing facility for HEP community
    • Ending prototype phase last year, now fully operational
    • Location: INFN-CNAF, Bologna (Italy)
      • One of the main nodes on GARR network
    • Personnel: ~ 10 FTE’s
      • ~ 3 FTE's dedicated to experiments
  • Multi-experiment
    • LHC experiments(Alice, Atlas, CMS, LHCb), Virgo, CDF, BABAR, AMS, MAGIC, ...
    • Resources dynamically assigned to experiments according to their needs
  • 50% of the Italian resource for LCG
    • Participation to experiments data challenge
    • Integrated with Italian Grid
    • Resources accessible also in traditional way
  • Luca dell’Agnello -Stefano Zani
  • Moved to a new location (last January)
    • Hall in the basement (-2nd floor)
    • ~ 1000 m2 of total space
      • Computing Nodes
      • Storage Devices
      • Electric Power System (UPS)
      • Cooling and Air conditioning system
      • Garr GPop
    • Easily accessible with lorries from the road
    • Not suitable for office use (remote control needed)
  • Luca dell’Agnello -Stefano Zani

Electric Power

  • Electric Power Generator
    • 1250 KVA (~ 1000 KW)

 up to 160 racks

  • Uninterruptible Power Supply (UPS)
    • Located into a separate room (conditioned and ventilated)
    • 800 KVA (~ 640 KW)
  • 380 V three-phase distributed to all racks (Blindo)
    • Rack power controls output 3 independent 220 V lines for computers
    • Rack power controls sustain burden up to 16 or 32 A
      • 32 A power controls needed for Xeon 36 bi-processors racks
    • 3 APC power distribution modules (24 outlets each)
  • Luca dell’Agnello -Stefano Zani

Cooling & Air Conditioning

  • RLS (Airwell) on the roof
    • ~ 700 KW
    • Water cooling
    • Need “booster pump” (20 mts T1  roof)
    • Noise insulation
  • 1 Air Conditioning Unit (uses 20% of RLS refreshing power and controls humidity)
  • 12 Local Cooling Systems (Hiross) in the computing room
  • Luca dell’Agnello -Stefano Zani
wn typical rack composition
WN typical Rack Composition
  • Power Controls (3U)
  • 1 network switch (1-2U)
    • 48 FE copper interfaces
    • 2 GE fiber uplinks
  • 34-36 1U WNs
    • Connected to network switch via FE
    • Connected to KVM system
  • Luca dell’Agnello -Stefano Zani

Remote console control

  • Paragon UTM8 (Raritan)
    • 8 Analog (UTP/Fiber) output connections
    • Supports up to 32 daisy chains of 40 nodes (UKVMSPD modules needed)
    • Costs: 6 KEuro + 125 Euro/server (UKVMSPD module)
    • IP-reach (expansion to support IP transport) evaluted but not used
  • Autoview 2000R (Avocent)
    • 1 Analog + 2 Digital (IP transport) output connections
    • Supports connections up to 16 nodes
      • Optional expansion to 16x8 nodes
    • Compatible with Paragon (“gateway” to IP)
  • Luca dell’Agnello -Stefano Zani
networking 1
Networking (1)
  • Main Network infrastructure based on optical fibres (~ 20 Km)
    • To ease adoption of new (High Performances) transmission technologies
    • To insure a better electrical insulation on long distances
    • Local (Rack wide) links with UTP (copper) cables
  • LAN has a “classical” star topology
    • GE core switch (Enterasys ER16)
    • NEW core switch (Black Diamond 10808 ) is in pre production
      • 120 Gbit Fiber (Scale up to 480 ports)
      • 12 10 Gbit Ethernet (Scale up to max 48 ports)
    • Farms up-link via GE trunk (Channel) to core switch
    • Disk Servers directly connected to GE switch (mainly fibre)
  • Luca dell’Agnello -Stefano Zani
networking 2
Networking (2)
  • WN's connected via FE to rack switch (1 switch per rack)
    • Not a single brand for switches (as for wn's)
      • 3 Extreme Summit 48 FE + 2 GE ports
      • 3 3550 Cisco 48 FE + 2 GE ports
      • 8 Enterasys 48 FE 2GE ports
      • 10 switch Summit400 48 GE copper + 2 GE ports + (2x10Gb ready)
    • Homogeneous characteristics
      • 48 Copper Ethernet ports
      • Support of main standards (e.g. 802.1q)
      • 2 Gigabit up-links (optical fibers) to core switch
  • CNAF interconnected to GARR-G backbone at 1 Gbps.
  • Luca dell’Agnello -Stefano Zani















Babar SW














Network Configuration

Internal services

1 Gb/s


1st Floor







Disk Servers




l2 configuration
L2 Configuration
  • Each Experiment has its own VLAN
  • Solution adopted for complete granularity
    • Port based VLAN
    • VLAN identifiers are propagated across switches (802.1q)
    • Avoid recabling (or physical moving) of machines to change farm topology
  • Level 2 isolation of farms
  • Possibility to define multi-tag (Trunk) ports (for servers)
  • Luca dell’Agnello -Stefano Zani

Power Switches

  • 2 models used at Tier1:
    • “Old” APC MasterSwitch Control Unit AP9224 controlling 3x8 outlets 9222 PDU from 1 Ethernet
    • “New” APC PDU Control Unit AP7951 controlling 24 outlets from 1 Ethernet
  • “zero” Rack Unit (vertical mount)
  • Access to the configuration/control menu via serial/telnet/web/snmp
  • 1 Dedicated machine running APC Infrastruxure Manager Software (in progress)
  • Luca dell’Agnello -Stefano Zani

Remote Power Distribution Unit

Screenshot of APC Infrastruxure Manager Software

with the status of all TIER1 PDU

  • Luca dell’Agnello -Stefano Zani
computing units
Computing units
  • ~ 800 1U rack-mountable Intel dual processor servers
    • 800 MHz – 3.06 GHz
    • ~ 700 wn’s (~ 1400 CPU’s) available for LCG
  • Tendering:
    • HPC farm with MPI
      • Servers interconnected via Infiniband
    • Opteron farm (near future)
  • Luca dell’Agnello -Stefano Zani
storage resources
Storage Resources

~200 TB RAW Disk Space ON LINE.

  • NAS
    • NAS1+NAS4 (3Ware low cost) Tot 4.2 TB
    • NAS2+NAS3 (Procom) Tot 13.2 TB
  • SAN
    • Dell Powervault 660f Tot 7 TB
    • Axus (Brownie) Tot 2 TB
    • STK Bladestore Tot 9 TB
    • Infortrend ES A16F-R Tot 12 TB
    • IBM Fast-T 900 Tot 150 TB
  • Luca dell’Agnello -Stefano Zani

STORAGE resource


STK180 with 100 LTO (10Tbyte Native)





1800 Gbyte

2 SCSI interfaces


1800+2000 Gbyte


STK L5500 robot

(max 5000)

6 LTO-2

Gadzoox Slingshot

FC Switch 18 port

Fileserver CMS


Fileserver Fcds2

Alias diskserv-ams-1 diskserv-atlas-1



12 TB


8100 Gbyte



4700 Gbyte



Circa 2200 GByte

2 FC interface


7100 GByte

2 FC interface

STK BladeStore

Circa 10000 GByte

4 FC interface



  • Luca dell’Agnello -Stefano Zani

Storage management and access (1)

  • Tier1 storage resources accessible as classical storage or via grid
  • Non grid disk storage accessible via NFS
  • Generic WN’s also have AFS client
  • NFS mount volumes configured via autofs and ldap
    • unique configuration repository eases maintenance
    • in progress: integration of ldap configuration with Tier1 db data
  • Scalability issues with NFS
    • Experienced stalled mount points
  • Luca dell’Agnello -Stefano Zani

Storage management and access (2)

  • Part of disk storage used as front-end to CASTOR
    • Balance between disk and CASTOR according to experiments needs
  • 1 stager for each experiment (installation in progress)
  • CASTOR accessible both directly or via grid
    • CASTOR SE available
  • ALICE Data Challenge used CASTOR architecture
    • Feedback to CASTOR team
    • Need optimization for file restaging
  • Luca dell’Agnello -Stefano Zani

Tier1 Database

  • Resource database and management interface
    • Postgres database as back end
    • Web interface (apache+mod_ssl+php)
    • Hw servers characteristics
    • Sw servers configuration
    • Servers allocation
  • Possible direct access to db for some applications
    • Monitoring system
    • Nagios
  • Interface to configure switches and interoperate with installation system.
  • Luca dell’Agnello -Stefano Zani

Installation issues

  • Centralized installation system
    • LCFG (EDG WP4)
    • Integration with a central Tier1 db
    • Moving from a farm to another implies just changes in IP address (not name)
    • Unique dhcp server for all VLANs
    • Support for DDNS (
  • Investigating Quattor for future needs
  • Luca dell’Agnello -Stefano Zani

Our Desired Solution for Resource Access

  • SHARED RESOURCES among all experiments
    • Priorities and reservations managed by the scheduler
  • Most of Tier1 computing machines installed as LCG Worker Nodes, with light modifications to support more VOs
  • Application Software not directly installed on WNs but accessed from outside (NFS, AFS, …)
  • One or more Resource Manager to manage all the WNs in a centralized way
  • Standard way to access Storage for each application
  • Luca dell’Agnello -Stefano Zani