slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
MonALISA Team Iosif Legrand , Harvey Newman, Ramiro Voicu , PowerPoint Presentation
Download Presentation
MonALISA Team Iosif Legrand , Harvey Newman, Ramiro Voicu ,

Loading in 2 Seconds...

play fullscreen
1 / 21

MonALISA Team Iosif Legrand , Harvey Newman, Ramiro Voicu , - PowerPoint PPT Presentation


  • 137 Views
  • Uploaded on

MonALISA capabilities for the LHCOPN. MonALISA Team Iosif Legrand , Harvey Newman, Ramiro Voicu , Costin Grigoras , Ciprian Dobre , Alexandru Costan. USLHCNet Team Harvey Newman, Artur Barczyk , Ramiro Voicu , Azher Mughal , Sandor Rozsa. LHCOPN meeting March 2010 London.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'MonALISA Team Iosif Legrand , Harvey Newman, Ramiro Voicu ,' - keona


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

MonALISA capabilities

for the LHCOPN

MonALISA Team

IosifLegrand, Harvey Newman, Ramiro Voicu,

CostinGrigoras, CiprianDobre, AlexandruCostan

USLHCNet Team

Harvey Newman, ArturBarczyk,

Ramiro Voicu, AzherMughal, SandorRozsa

LHCOPN meeting March 2010 London

slide2

Outline

  • MonALISA Framework
    • Architecture
    • Data handling
    • Automatic actions
  • USLHCNet
    • Network topology
    • Monitoring modules
    • Reliable monitoring & accounting
    • Alarms & triggers
  • Conclusions

2

Ramiro Voicu LHCOPN London March 2010

the monalisa architecture
The MonALISA Architecture

Regional or Global High Level Services,

Repositories & Clients

HL services

Secure and reliable communication

Dynamic load balancing

Scalability & Replication

AAA for Clients

Proxies

Distributed System for gathering and analyzing information based on mobile agents:

Customized aggregation, Triggers,

Actions

Agents

MonALISA services

Distributed Dynamic

Registration and Discovery-based on a lease

mechanism and remote events

Network of

JINI-Lookup Services Secure & Public

Fully Distributed System with no Single Point of Failure

3

Ramiro Voicu LHCOPN London March 2010

monalisa service data handling
MonALISA Service & Data Handling

Postgres

Data Store

Lookup

Service

Lookup

Service

Registration

Data Cache

Service & DB

Web

Service

WSDL

SOAP

Discovery

WS Clients and

service

Data (via ML Proxy)

Predicates & Agents

Clients or

Higher Level

Services

Configuration Control (SSL)

Applications

AGENTS

FILTERS / TRIGGERS

Dynamic (Re)Loading

Collects any

type of information

Monitoring Modules

Push and Pull

4

Ramiro Voicu LHCOPN London March 2010

local and global decision framework
Two levels of decisions:

local (autonomous),

global (correlations).

Actions triggered by:

values above/below given thresholds,

absence/presence of values,

correlations between any values.

Action types:

alerts (emails/instant msg/atom feeds),

running an external command,

automatic charts annotations in the repository,

running custom code, like securely ordering a ML service to (re)start a site service.

Local and Global Decision Framework
  • Traffic
  • Jobs
  • Hosts
  • Apps

ML Service

Actions based on

global information

Global

ML

Services

Actions based on

local information

  • Temperature
  • Humidity
  • A/C Power

ML Service

Sensors

Local decisions

Global decisions

Ramiro Voicu LHCOPN London March 2010

uslhcnet
USLHCNet
  • USLHCNet provides transatlantic connections of the Tier1 computing facilities at Fermilab and Brookhaven with the Tier0 and Tier1 facilities at CERN as well as Tier1s elsewhere in Europe and Asia.
  • Together with ESnet, Internet2 and the GEANT, USLHCNet supports connections between the Tier2 centers.
  • The USLHCNet core infrastructure is using the Ciena Core Director devices that provide time-division multiplexing and packet-forwarding protocols that support virtual circuits with bandwidth guarantees. The virtual circuits offer the functionality to develop efficient data transfer services with support for QoS and priorities.
  • Hybrid network: uses both Ciena CD and Force10 routers
  • 6 transatlantic 10G links at the moment

Ramiro Voicu LHCOPN London March 2010

uslhcnet ml weather map
USLHCnet ML weather map

Ramiro Voicu LHCOPN London March 2010

monitoring modules
Monitoring modules

We developed a set of monitoring modules for USLHCNet network devices:

  • Force10 (SNMP & sFlow)
    • Traffic per interface
    • sFlow traffic
    • Link status monitoring
  • Ciena Core Director (TL1 – Transaction Language1)
    • ETTP (Ethernet Termination Point) traffic
    • EFLOW (Ethernet Flow) traffic
    • OSRP (routing protocol) topology
    • VCG Provisioned / Available Bandwidth
    • Dynamic circuits inside the optical core of the network
  • Ping module/MLPing trigger which sends alarms in case of packet loss

Ramiro Voicu LHCOPN London March 2010

uslhcnet monitoring
USLHCnet monitoring

MonALISA

@GVA

MonALISA

@AMS

SNMP

SNMP

TL1

MonALISA

@NYC

MonALISA

@CHI

Ramiro Voicu LHCOPN London March 2010

uslhcnet redundant monitoring
USLHCnet redundant monitoring

MonALISA

@GVA

MonALISA

@AMS

Each Circuit

is monitored at both

ends by at least two

MonALISA services;

the monitored data

is aggregated by

global filters in

the repository

MonALISA

@NYC

MonALISA

@CHI

Ramiro Voicu LHCOPN London March 2010

local and global filters
Local and global filters
  • Based on the MonALISA actions framework a set of triggers have been deployed inside the service to notify by email, SMS and IM the USLHCNet network engineers in case of problems
  • The filters developed for USLHCNet repository aggregate the redundant monitoring data (traffic and link status) collected from all the MonALISA services
    • The link status is computed as a logical “AND” between both end points of a link. This also cross checks the status reported by the hardware equipment.
  • We collect data in two repository instances, each with replicated database back-ends. These instances are dynamically balanced in DNS.

Ramiro Voicu LHCOPN London March 2010

uslhcnet precise measurements for the operational status on the wan link
USLHCnet: Precise measurements for the Operational Status on the WAN Link
  • Operations & management assisted by agent-based software
  • Used on the new CIENA equipment used for network managment

Ramiro Voicu LHCOPN London March 2010

uslhcnet all eflow traffic last 2 months
USLHCnet: ALL EFLOW traffic - last 2 months

Ramiro Voicu LHCOPN London March 2010

uslhcnet accounting for integrated traffic
USLHCnet: Accounting for Integrated Traffic

Ramiro Voicu LHCOPN London March 2010

uslhcnet ciena alarms monitoring
USLHCnet: Ciena alarms monitoring

Ramiro Voicu LHCOPN London March 2010

slide16

NETWORKS

ROUTERS

AS

Topology monitoring and discovery

Real Time Topology Discovery & Display

Ramiro Voicu LHCOPN London March 2010

slide17

Storage discovery in Alice

  • distance(IP, IP)
    • Same IP-class network
    • Common domain name
    • Same AS
    • Same country (+ function of RTT between the respective AS-es if known)
    • If distance between the AS-es is known, use it
    • Same continent
    • Far away
  • distance(IP, Set<IP>): Client's public IP to all known IPs for the storage

France

Nordic

Countries

Italy

Russia

USA

C. Grigoras (Alice) – ACAT 2010

Ramiro Voicu LHCOPN London March 2010

slide18

FDT Bandwidth tests in Alice (E2E avbw)

http://monalisa.cern.ch/FDT/

Newer kernel

Tuned TCP Buffers

1 Gbps network card

Default kernels

Default TCP Buffers

Different trends = different kernels

100 Mbps network card

Ramiro Voicu LHCOPN London March 2010

conclusions
Conclusions

http://monalisa.caltech.edu

http://repository.uslhcnet.org

  • The MonALISA framework provides a flexible and reliable monitoring infrastructure
    • 350+ installed services, 1.5M+ unique parameters, 25kHz value updates
    • Truly distributed architecture with no single points of failure
    • Highly modular platform
    • Automatic decision taking capability at both local and global levels
  • USLHCNet provides a hybrid network with support for circuit oriented network services
    • Monitoring this infrastructure proved to be a challenging task, but we are running with 99.5+% monitoring uptime (100% in the last 6 months)
    • We are investigating dynamic provisioning of circuits from collaborating agents

Ramiro Voicu LHCOPN London March 2010

monitoring optical switches
Monitoring Optical Switches

Dynamic restoration

of lightpath if a segment has problems

Ramiro Voicu LHCOPN London March 2010

controlling optical planes automatic path recovery
Controlling Optical Planes Automatic Path Recovery

CERN

Geneva

USLHCnet

Internet2

Starlight

CALTECH

Pasadena

Manlan

200+ MBytes/sec

From a 1U Node

FDT Transfer

“Fiber cut” simulations

The traffic moves from one

transatlantic line to the other one

FDT transfer (CERN – CALTECH)

continues uninterrupted

TCP fully recovers in ~ 20s

4

2

3

1

4 fiber cut emulations

4 Fiber cuts simulations

Ramiro Voicu LHCOPN London March 2010