technical details of setting up a rac and associated iac s n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Technical Details of Setting up a RAC and associated IAC’s PowerPoint Presentation
Download Presentation
Technical Details of Setting up a RAC and associated IAC’s

Loading in 2 Seconds...

play fullscreen
1 / 9

Technical Details of Setting up a RAC and associated IAC’s - PowerPoint PPT Presentation


  • 83 Views
  • Uploaded on

Technical Details of Setting up a RAC and associated IAC’s. SAR Workshop April 18-19, 2003 Arlington, Texas Lee Lueking Fermilab Computing Division, CEPA Dept. D Ø Liaison to PPDG Batavia, Illinois. RAC Prototype: GridKa. Overview:Aachen, Bonn, Freiburg, Mainz, Munich, Wuppertal

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Technical Details of Setting up a RAC and associated IAC’s' - yardley


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
technical details of setting up a rac and associated iac s

Technical Details of Setting up a RACand associated IAC’s

SAR Workshop

April 18-19, 2003

Arlington, Texas

Lee Lueking

Fermilab Computing Division, CEPA Dept.

DØ Liaison to PPDG

Batavia, Illinois

rac prototype gridka
RAC Prototype: GridKa
  • Overview:Aachen, Bonn, Freiburg, Mainz, Munich, Wuppertal
    • Location: Forschungszentrum Karlsruhe (FZK)
    • Regional Grid development, data and computing center. Established: 2002
    • Serves 8 HEP experiments: Alice, Atlas, BaBar, CDF, CMS, Compass, DØ, and LHCb
  • Political Structure: Peter Mattig (wuppertal) FNAL rep. to Overview Board, C. Zeitnitz (Mainz), D. Wicke (Wuppertal) Tech. Advs. Board reps.
  • Status: Auto caching Thumbnails since August
    • Certified w/ physics samples
    • Physics results for Winter conferences
    • Some MC production done there
    • Very effectively used by DØ in Jan and Feb.

Phy Result Plot

I ran out

of time to get

  • Resource Overview: (summarized on next page)
    • Compute: 95 x dual PIII 1.2GHz, 68 x dual Xeon 2.2 GHz. D0 requested 6%. (updates in April)
    • Storage: DØ has 5.2 TB cache. Use of % of ~100TB MSS. (updates in April)
    • Network: 100Mb connection available to users.
    • Configuration: SAM w/ shared disk cache, private network, firewall restrictions, OpenPBS, Redhat 7.2, k 2.418, D0 software installed.
required server infrastructure

client

client

CORBA interface

Middle-tier

DB Server proxy

DAN Server

ocean?

Middle-tier

DB Server

Linux

DAN Server

SQL net interface

High Availability Oracle DB Server

Central DB

Server

RAID array

Required Server Infrastructure
  • SAM-Grid (SAM + JIM) Gateway
  • Oracle database access servers (DAN)
  • Accommodate realities like:
    • Policies and culture for each center
    • Sharing with other organizations
    • Firewalls, private networks, et cetera
sam station dzero distributed cache reconstruction farm
SAM Station: Dzero Distributed Cache Reconstruction Farm
  • Network
    • Each Stager Node accesses Enstore (MSS) directly
    • Worker nodes get data from stagers.
    • Intra-station data transfers are “cheap”
  • Job Dispatch
    • Fermi Batch System
    • A job runs on many nodes.
    • Goal is to distribute files evenly among workers

Enstore

Mass Storage

Master Node

“D0bbin”

SAM

Station

Servers

Stager 1

SAM

Stager

Stager 10

SAM

Stager

High Speed

Switch

Worker

N

Worker

1

Worker

2

Worker

3

SAM

Stager

SAM

Stager

SAM

Stager

SAM

Stager

sam stations dzero central analysis and central analysis backend
SAM Stations: Dzero Central Analysis and Central Analysis Backend
  • Network
    • Access to Enstore is through D0mino
    • Intra-station file transfers “cheap” through a switch
  • Job Dispatch
    • LSF used for Central Analysis station
    • PBS used for Central analysis backend Station
  • D0 Analysis
  • Server “D0mino”:
  • SGI Origin 2000
  • 176 processor
  • 6 Gigabit NICs
  • 45 GB Memory
  • 27 TB Disk

12 TB

SAM

Cache

Enstore

Mass Storage

High Speed

Switch

SAM

Station

Servers

SAM

Stager

SAM

Stager

SAM

Stagers

Compute

Server

N

Compute

Server

1

Compute

Server

2

Compute

Server

3

SAM

Stager

SAM

Stager

SAM

Stager

SAM

Stager

sam station shared cache configuration w pn used at gridka and u michigan npaci
SAM Station: Shared Cache Configuration w/ PN(used at GridKa and U. Michigan NPACI)

Fire-

wall

WAN

  • Network
    • Gateway node has acces to the intrenet
    • Worker nodes are on VPN
  • Job Dispatch
    • PBS or other local Batch System
    • Appropriate adapter for SAM
  • Software and Data Access
    • Common disk server is NFS mounted to Gateway and Worker nodes

Gateway Node

Calibration

DB

Servers

Local

Naming

Service

May be optional

SAM

Station

Servers

SAM

Stagers

RAID Server

Virtual Private Network

Worker

N

Worker

1

Worker

2

Worker

3

more details
More Details
  • SAM is distributed to clients via fnal ups/upd products distribution and versioning.
  • Gateway runs sam servers, special setup, user sam account.
  • Runs GridFTP demon for parallel transfers. Needs service certificates (KCA for FNAL transfers).
  • Experience with the SAM shared cache configuration is:
    • It is great in environments where nodes are shared, but…
    • Can be NFS and RAID server bottlenecks, doesn’t scale easily.
  • Calibration DB servers are caching proxies connected through primary servers at FNAL to the central data base. Needed for RAW Reconstruction.
  • Interface to tape storage system is still customized for each site (GridKa had to do this)
  • Testing new “network file access” station feature at Lyon (CCIN2P3) which allows access to files not in local cache (eg. IN2P3 uses rfio).
  • Making the D0 code distribution work caused some delays at GridKa
specific ports for firewalls
Specific Ports for Firewalls
  • If the gateway node is behind a firewall, ports > 1024 need to be unrestricted to d0ora1.fnal.gov, d0ora3.fnal.gov, and d0mino.fnal.gov.
  • This is what is used for the nameservice:
    • d0ora3> ups inquire sam_config -q ns_dev
      • SAM_NAMING_SERVICE=d0db-dev.fnal.gov:9000
    • <d0ora1> ups inquire sam_config -q ns_prd
      • SAM_NAMING_SERVICE=d0db.fnal.gov:9010
    • <d0ora1> ups inquire sam_config -q ns_int
      • SAM_NAMING_SERVICE=d0db-int.fnal.gov:9005
  • gridftp 4567 to all station hosts.
  • We are not enforcing optimizers to run on any particular port.
  • Additional ports for JIM- Globus & Condor-G
data to and from remote sites data forwarding and routing
Data to and from Remote SitesData Forwarding and Routing
  • Station Configuration
  • Replica location
    • Prefer
    • Avoid
  • Forwarding
    • File stores can be forwarded through other stations
  • Routing
    • Routes for file transfers are configurable

SAM

Station 1

SAM

Station 2

Remote

SAM

Station

Remote

SAM

Station

MSS

Remote

SAM

Station

SAM

Station 3

SAM

Station 4

Extra-domain transfers use bbftp or GridFTP

(parallel transfer protocols)