site report
Download
Skip this Video
Download Presentation
Site Report

Loading in 2 Seconds...

play fullscreen
1 / 39

INFN Site report - PowerPoint PPT Presentation


  • 250 Views
  • Uploaded on

Site Report Roberto Gomezel INFN Outline of Presentation Computing Environment Security Services Network AFS BBS INFN Farms Tier 1 at CNAF Computing Environment and security 95% of boxes are PCs running Linux or Windows Mac OS boxes keep on living

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'INFN Site report' - albert


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
site report

Site Report

Roberto Gomezel

INFN

outline of presentation
Outline of Presentation
  • Computing Environment
  • Security
  • Services
  • Network
  • AFS
  • BBS
  • INFN Farms
  • Tier 1 at CNAF
computing environment and security
Computing Environment and security
  • 95% of boxes are PCs running Linux or Windows
  • Mac OS boxes keep on living
  • Just a few commercial unix boxes only used for specific tasks or needs
  • VPNs available in many sites
    • Cisco boxes using IPsec
    • NetScreen boxes using IPsec
    • SSL VPNs are under evaluation
      • The use of SSL eliminates the need of installing client software
      • it enables instant access for users simply using a Web browser
  • Network Security
    • Dedicated Firewall machines just in a few sites
    • Implemented with access lists on router connected to WAN

INFN Site Report – R.Gomezel

desktop
Desktop
  • PCs running Linux and Windows
  • Automatic installation using Kickstart for Linux and RIS for Windows
  • Metaframe Citrix or Vmware used to reduce the need to install Windows OS on all PCs for desktop applications
  • A few sites chose to outsource support for desktop environment due to lack of personnel

INFN Site Report – R.Gomezel

backup
Backup
  • Tape Libraries used:
    • AIT2 – a few sites
    • IBM Magstar – just used at LNF
    • DLT, LTO – wide spread
  • Backup tools:
    • IBM Tivoli – quite used
    • HP Omniback – quite used
    • Atempo Time Navigator – just a few sites
    • Domestic tool - widespread

INFN Site Report – R.Gomezel

wireless lan
Wireless LAN
  • Access point running standard 802.11b,g
  • All sites are using wireless connection as meeting or conferences are running
  • Most of them use it to give connection to laptop computers
  • Security issues:
    • Permission based on Secure Port filtering (MAC Address) – poor security
    • No encryption used
    • Some sites are using 802.1X

INFN Site Report – R.Gomezel

e mail
E-mail
  • Mail Transfer Agent
    • Sendmail – widespread and more used (86%)
    • Postfix – a few sites (14%)
      • But there is an increasing number of sites planning to move from sendmail to postfix
  • Hardware and OS

INFN Site Report – R.Gomezel

e mail user agent
E-mail user agent
  • All INFN sites provide an HTTP mail user agent
    • One-third uses IMP
    • One-third uses SQUIRREL
    • Others:
      • IMHO, Open WebMail, Cyrus+Roxen…
  • Other mail user agents
    • Pine, Internet Explorer, Mozilla…

INFN Site Report – R.Gomezel

e mail antivirus
E-mail antivirus

INFN Site Report – R.Gomezel

e mail antispam
E-mail antispam
  • 75% of INFN sites are using SPAM Assassin as tool to reduce junk e-mail
  • Some sites use RAV or Sophos
  • Just a few sites (5%) are using nothing
  • An acl filter was set on port 25 in order to avoid that hosts not authorized can act as mail relay
  • Only authorized mail relay are allowed to send and receive mail for a specific site

INFN Site Report – R.Gomezel

security issues
Security issues

Monitored by GARR-CERT

Incidents coming from INFN hosts (percentage)

  • Goal by the end 2004:
    • define a new policy for ACL setting
    • Input filter: default deny
      • services just on hosts checked very strictly
          • Output filter:
      • port 25

INFN Site Report – R.Gomezel

infn network
INFN network
  • LAN backbone network mainly based on Gigabit Ethernet
    • Layer 2 and 3 switching
    • No layer 4 switching
  • The INFN WAN network is completely integrated into the GARR, nation-wide infrastructure, providing a backbone connectivity at 2.5 Gigabit
    • POP typical access bandwidth for INFN sites: 34Mbps, 155 Mbps, Gigabit ethernet
    • There is a trend to have a Gigabit Ethernet access in any site with a bandwidth management through rate limiting mechanism (CAR) according to the needs of the specific site

INFN Site Report – R.Gomezel

slide13
AFS
  • INFN sites keep on using AFS services to share data and software throughout sites
  • Most of local cells have completely moved server functionality to Linux boxes running OpenAFS software
  • Authentication and file server functionalities of the nation-wide cell INFN.IT are running on Linux boxes with OpenAFS
  • The migration of INFN.IT authentication servers from Kerberos IV to Kerberos V is expected to be accomplished by the end of the year

INFN Site Report – R.Gomezel

bbs bologna batch system

BBS - Bologna Batch System

The Bologna Batch System (BBS) is a software tool that allows users from INFN Bologna to submit batch jobs to a set of well defined machines, from any INFN Bologna machines with Condor installed.

Collaboration between the C. S. Dept., Univ. of Wisconsin-Madison and the INFN Bologna.

Main features of BBS:

Any executable can be submitted to the system (scripts, compiled and linked programs, etc.).

Two different \'queues\' , short and long. Short and long jobs have a different priority (nice) when running on the same machine.

Short jobs may run for no longer than an hour, but run at a higher priority.

BBS tries to balance the load of the BBS CPUs. 

  • P.Mazzanti
slide15
BBS

Presently the system consists of 16 2-CPU servers, Linux RedHat 9 and a single CPU machine. 7 machines are from ALICE experiment.

BBS machines belong to the large INFN WAN Pool; they may be accessed from outside when no BBS job is running, while becoming IMMEDIATELY available when a BBS job asks to be run.

Only short jobs will be accepted by the 7 ALICE machines if submitted

non ALICE group user.

  • P.Mazzanti
slide16

Aggregate jobs, daily

Aggregate jobs, weekly

  • P.Mazzanti
slide17

boi1.bo.infn.it

daily Load

boi1.bo.infn.it

weekly Load

  • P.Mazzanti
infn site farm a new challenge
INFN Site Farm: a new challenge
  • Some sites are planning to reconfigure and integrate computing facilities and local experiment-specific farm into a unique computing farm
    • Reason: in order to avoid the increasing deployment of a lot of little and private farms for each single experiment in addition to the general purpose computing facility
  • Introduction of SAN infrastructure to connect storage systems and computing units
    • GFS file system is under evaluation as an efficient way of providing a cluster file sytem and volume manager
    • Interesting because it is part of the SL3 distribution
  • A lot of work for designing a mechanism to provide computing resources to different experiments according to their needs in a dynamic way
    • We can learn from the experience coming from CNAF Tier1 and other Labs

INFN Site Report – R.Gomezel

hardware solutions for the tier1 at cnaf
Hardware solutions for the Tier1 at CNAF

Luca dell’Agnello

Stefano Zani

(INFN – CNAF, Italy)

  • Luca dell’Agnello -Stefano Zani
tier1
Tier1
  • INFN computing facility for HEP community
    • Ending prototype phase last year, now fully operational
    • Location: INFN-CNAF, Bologna (Italy)
      • One of the main nodes on GARR network
    • Personnel: ~ 10 FTE’s
      • ~ 3 FTE\'s dedicated to experiments
  • Multi-experiment
    • LHC experiments(Alice, Atlas, CMS, LHCb), Virgo, CDF, BABAR, AMS, MAGIC, ...
    • Resources dynamically assigned to experiments according to their needs
  • 50% of the Italian resource for LCG
    • Participation to experiments data challenge
    • Integrated with Italian Grid
    • Resources accessible also in traditional way
  • Luca dell’Agnello -Stefano Zani
logistics
Logistics
  • Moved to a new location (last January)
    • Hall in the basement (-2nd floor)
    • ~ 1000 m2 of total space
      • Computing Nodes
      • Storage Devices
      • Electric Power System (UPS)
      • Cooling and Air conditioning system
      • Garr GPop
    • Easily accessible with lorries from the road
    • Not suitable for office use (remote control needed)
  • Luca dell’Agnello -Stefano Zani
slide22

Electric Power

  • Electric Power Generator
    • 1250 KVA (~ 1000 KW)

 up to 160 racks

  • Uninterruptible Power Supply (UPS)
    • Located into a separate room (conditioned and ventilated)
    • 800 KVA (~ 640 KW)
  • 380 V three-phase distributed to all racks (Blindo)
    • Rack power controls output 3 independent 220 V lines for computers
    • Rack power controls sustain burden up to 16 or 32 A
      • 32 A power controls needed for Xeon 36 bi-processors racks
    • 3 APC power distribution modules (24 outlets each)
  • Luca dell’Agnello -Stefano Zani
slide23

Cooling & Air Conditioning

  • RLS (Airwell) on the roof
    • ~ 700 KW
    • Water cooling
    • Need “booster pump” (20 mts T1  roof)
    • Noise insulation
  • 1 Air Conditioning Unit (uses 20% of RLS refreshing power and controls humidity)
  • 12 Local Cooling Systems (Hiross) in the computing room
  • Luca dell’Agnello -Stefano Zani
wn typical rack composition
WN typical Rack Composition
  • Power Controls (3U)
  • 1 network switch (1-2U)
    • 48 FE copper interfaces
    • 2 GE fiber uplinks
  • 34-36 1U WNs
    • Connected to network switch via FE
    • Connected to KVM system
  • Luca dell’Agnello -Stefano Zani
slide25

Remote console control

  • Paragon UTM8 (Raritan)
    • 8 Analog (UTP/Fiber) output connections
    • Supports up to 32 daisy chains of 40 nodes (UKVMSPD modules needed)
    • Costs: 6 KEuro + 125 Euro/server (UKVMSPD module)
    • IP-reach (expansion to support IP transport) evaluted but not used
  • Autoview 2000R (Avocent)
    • 1 Analog + 2 Digital (IP transport) output connections
    • Supports connections up to 16 nodes
      • Optional expansion to 16x8 nodes
    • Compatible with Paragon (“gateway” to IP)
  • Luca dell’Agnello -Stefano Zani
networking 1
Networking (1)
  • Main Network infrastructure based on optical fibres (~ 20 Km)
    • To ease adoption of new (High Performances) transmission technologies
    • To insure a better electrical insulation on long distances
    • Local (Rack wide) links with UTP (copper) cables
  • LAN has a “classical” star topology
    • GE core switch (Enterasys ER16)
    • NEW core switch (Black Diamond 10808 ) is in pre production
      • 120 Gbit Fiber (Scale up to 480 ports)
      • 12 10 Gbit Ethernet (Scale up to max 48 ports)
    • Farms up-link via GE trunk (Channel) to core switch
    • Disk Servers directly connected to GE switch (mainly fibre)
  • Luca dell’Agnello -Stefano Zani
networking 2
Networking (2)
  • WN\'s connected via FE to rack switch (1 switch per rack)
    • Not a single brand for switches (as for wn\'s)
      • 3 Extreme Summit 48 FE + 2 GE ports
      • 3 3550 Cisco 48 FE + 2 GE ports
      • 8 Enterasys 48 FE 2GE ports
      • 10 switch Summit400 48 GE copper + 2 GE ports + (2x10Gb ready)
    • Homogeneous characteristics
      • 48 Copper Ethernet ports
      • Support of main standards (e.g. 802.1q)
      • 2 Gigabit up-links (optical fibers) to core switch
  • CNAF interconnected to GARR-G backbone at 1 Gbps.
  • Luca dell’Agnello -Stefano Zani
slide28

FarmSW2(Dell)

FarmSW3(IBM)

FarmSWG1

FarmSW11

FarmSW10

FarmSW12

FarmSW6

FarmSW9

FarmSW1

FarmSW8

FarmSW7

GARR

SAN

LHCBSW1

Babar SW

IBM

FasT900

NAS4

DELL

AXUS

Infortrend

STK

FarmSW5(3Com)

NAS1

NAS3

NAS2

FarmSW4(IBM3)

Catalyst3550

Network Configuration

Internal services

1 Gb/s

SSR8600

1st Floor

F.C.

F.C.

F.C.

F.C.

F.C.

FarmSWG2

Disk Servers

F.C.

131.154.99.121

T1

S.Zani

l2 configuration
L2 Configuration
  • Each Experiment has its own VLAN
  • Solution adopted for complete granularity
    • Port based VLAN
    • VLAN identifiers are propagated across switches (802.1q)
    • Avoid recabling (or physical moving) of machines to change farm topology
  • Level 2 isolation of farms
  • Possibility to define multi-tag (Trunk) ports (for servers)
  • Luca dell’Agnello -Stefano Zani
slide30

Power Switches

  • 2 models used at Tier1:
    • “Old” APC MasterSwitch Control Unit AP9224 controlling 3x8 outlets 9222 PDU from 1 Ethernet
    • “New” APC PDU Control Unit AP7951 controlling 24 outlets from 1 Ethernet
  • “zero” Rack Unit (vertical mount)
  • Access to the configuration/control menu via serial/telnet/web/snmp
  • 1 Dedicated machine running APC Infrastruxure Manager Software (in progress)
  • Luca dell’Agnello -Stefano Zani
slide31

Remote Power Distribution Unit

Screenshot of APC Infrastruxure Manager Software

with the status of all TIER1 PDU

  • Luca dell’Agnello -Stefano Zani
computing units
Computing units
  • ~ 800 1U rack-mountable Intel dual processor servers
    • 800 MHz – 3.06 GHz
    • ~ 700 wn’s (~ 1400 CPU’s) available for LCG
  • Tendering:
    • HPC farm with MPI
      • Servers interconnected via Infiniband
    • Opteron farm (near future)
  • Luca dell’Agnello -Stefano Zani
storage resources
Storage Resources

~200 TB RAW Disk Space ON LINE.

  • NAS
    • NAS1+NAS4 (3Ware low cost) Tot 4.2 TB
    • NAS2+NAS3 (Procom) Tot 13.2 TB
  • SAN
    • Dell Powervault 660f Tot 7 TB
    • Axus (Brownie) Tot 2 TB
    • STK Bladestore Tot 9 TB
    • Infortrend ES A16F-R Tot 12 TB
    • IBM Fast-T 900 Tot 150 TB
  • Luca dell’Agnello -Stefano Zani
slide34

STORAGE resource

CLIENT SIDE

STK180 with 100 LTO (10Tbyte Native)

CASTOR

Server+staging

WAN or TIER1 LAN

RAIDTEC

1800 Gbyte

2 SCSI interfaces

IDE NAS1,NAS4

Nas4.cnaf.infn.it

1800+2000 Gbyte

CDF LHCB

STK L5500 robot

(max 5000)

6 LTO-2

Gadzoox Slingshot

FC Switch 18 port

Fileserver CMS

diskserv-cms-1

Fileserver Fcds2

Alias diskserv-ams-1 diskserv-atlas-1

Infortrend

ES A16F-R

12 TB

PROCOM NAS2

Nas2.cnaf.infn.it

8100 Gbyte

VIRGO ATLAS

PROCOM NAS3

Nas3.cnaf.infn.it

4700 Gbyte

ALICE ATLAS

AXUS BROWIE

Circa 2200 GByte

2 FC interface

DELL POWERVAULT

7100 GByte

2 FC interface

STK BladeStore

Circa 10000 GByte

4 FC interface

FAIL-OVER

support

  • Luca dell’Agnello -Stefano Zani
slide35

Storage management and access (1)

  • Tier1 storage resources accessible as classical storage or via grid
  • Non grid disk storage accessible via NFS
  • Generic WN’s also have AFS client
  • NFS mount volumes configured via autofs and ldap
    • unique configuration repository eases maintenance
    • in progress: integration of ldap configuration with Tier1 db data
  • Scalability issues with NFS
    • Experienced stalled mount points
  • Luca dell’Agnello -Stefano Zani
slide36

Storage management and access (2)

  • Part of disk storage used as front-end to CASTOR
    • Balance between disk and CASTOR according to experiments needs
  • 1 stager for each experiment (installation in progress)
  • CASTOR accessible both directly or via grid
    • CASTOR SE available
  • ALICE Data Challenge used CASTOR architecture
    • Feedback to CASTOR team
    • Need optimization for file restaging
  • Luca dell’Agnello -Stefano Zani
slide37

Tier1 Database

  • Resource database and management interface
    • Postgres database as back end
    • Web interface (apache+mod_ssl+php)
    • Hw servers characteristics
    • Sw servers configuration
    • Servers allocation
  • Possible direct access to db for some applications
    • Monitoring system
    • Nagios
  • Interface to configure switches and interoperate with installation system.
  • Luca dell’Agnello -Stefano Zani
slide38

Installation issues

  • Centralized installation system
    • LCFG (EDG WP4)
    • Integration with a central Tier1 db
    • Moving from a farm to another implies just changes in IP address (not name)
    • Unique dhcp server for all VLANs
    • Support for DDNS (cr.cnaf.infn.it)
  • Investigating Quattor for future needs
  • Luca dell’Agnello -Stefano Zani
slide39

Our Desired Solution for Resource Access

  • SHARED RESOURCES among all experiments
    • Priorities and reservations managed by the scheduler
  • Most of Tier1 computing machines installed as LCG Worker Nodes, with light modifications to support more VOs
  • Application Software not directly installed on WNs but accessed from outside (NFS, AFS, …)
  • One or more Resource Manager to manage all the WNs in a centralized way
  • Standard way to access Storage for each application
  • Luca dell’Agnello -Stefano Zani
ad