Network Measurements Working Group

Network MeasurementsWorking Group Mark Leese, m.j.leese@dl.ac.uk GGF17, Tokyo, May 11th 2006, Session 1: 13:45-15:15 NM-WG: GGF17, Tokyo, 11th May 2006 Mark Leese (Daresbury Laboratory)

GGF IP Policy: Note Well • All statements related to the activities of the GGF and addressed to the GGF are subject to all provisions of Section 17 of GFD-C.1 which grants to the GGF and its participants certain licenses and rights in such statements. Such statements include verbal statements in GGF meetings, as well as written and electronic communications made at any time or place, which are addressed to: • the GGF plenary session, • any GGF working group, research group, or portion thereof, • the GFSG, or any member thereof on behalf of the GFSG, • the GFAC, or any member thereof on behalf of the GFAC, • any GGF mailing list, including any working group or research group list, or any other list functioning under GGF auspices, • the GFD Editor or the GWD process • Statements made outside of a GGF meeting, mailing list or other function, that are clearly not intended to be input to an GGF activity, group or function, are not subject to these provisions. NM-WG: GGF17, Tokyo, 11th May 2006 Mark Leese (Daresbury Laboratory)

Agenda • 13:45-15:15, Session 1: • NM-WG introduction and history (5’) • Motivation: Why do this? (15-20’) • V1 Schemas: where we started from (30’)inc. details of EGEE-JRA4 (v1 user) • V2 Schemas: where we are now (30’)inc. details of perfSONAR (v2 user) • Questions (5-10’) • 15:15-15:45, Coffee Break (very important :-) • 15:45-16:30, Session 2: • Discussion session: the NM-WG schemas are gaining popularity in the research and educational network domain within Europe and the US (e.g. with EGEE, DANTE and Internet2). What are the barriers to their adoption (or trials) in Asia? Richard failed to make it, so it’s 2.25 hours of me. How lucky you are :) NM-WG: GGF17, Tokyo, 11th May 2006 Mark Leese (Daresbury Laboratory)

NM-WG Introduction and History NM-WG: GGF17, Tokyo, 11th May 2006 Mark Leese (Daresbury Laboratory)

Introduction: Charter The performance of most grid applications is dependent on the performance of the networks forming the grid. The Network Measurements Working Group (NMWG) identifies network metrics (aka characteristics) useful to grid applications and middleware, and develops standard mechanisms to describe and publish these characteristics to the Grid. The NMWG focuses on characteristics of interest to grid applications and works in collaboration with other standards groups such as the IETF IPPM WG and the Internet2 End-to-end initiative. The NMWG will determine which of the network characteristics are relevant to Grid applications, and pursue standardization of the attributes required to describe these characteristics. The first product of the NMWG will be a document categorizing the characteristics in use by network monitoring tools. This document will establish a dictionary that can be used by tools to publish their results. The second product will be a document recommending the XMLSchema to be used to publish the detailed attributes for these characteristics. In plain English: NM-WG will describe Grid relevant network metrics, and how this can be made available to middleware, network operators etc. NM-WG will not look at how measurements should be made. NM-WG: GGF17, Tokyo, 11th May 2006 Mark Leese (Daresbury Laboratory)

Introduction: Contact Details • Chairs: • Eric Boyd (Internet2), eboyd@internet2.edu • Richard Hughes-Jones (University of Manchester), R.Hughes-Jones@manchester.ac.uk • Mark Leese (Daresbury Laboratory), m.j.leese@dl.ac.uk • Website under re-construction: http://nmwg.internet2.edu • Mailing list: nm-wg@gridforum.org • To subscribe, send email to majordomo@gridforum.org, with a message of “subscribe nm-wg” • For general networking info + performance issues you can try: • Materials from Mark’s two Networks For Non-Networkers (NFNN) workshops: http://gridmon.dl.ac.uk/nfnn/ NM-WG: GGF17, Tokyo, 11th May 2006 Mark Leese (Daresbury Laboratory)

History: Previous Work path.delay.oneWay “A Hierarchy of Network Performance Characteristics for Grid Applications and Services” published June 2004 (Recommendation, GFD.23, http://www.ggf.org/documents/GFD.23.pdf) • describes set of network characteristics and their classification hierarchy • used to create common schemas for describing network monitoring data • using a standard classification maximises data portability: your TCP achievable bandwidth is the same as someone else’s, even if the measurements were made differerently (e.g. with different tools) description + hierarchy NM-WG: GGF17, Tokyo, 11th May 2006 Mark Leese (Daresbury Laboratory)

Motivation: Why do this? NM-WG: GGF17, Tokyo, 11th May 2006 Mark Leese (Daresbury Laboratory)

Justification: Network Operations Upper bound ≈ 85 Mbps Lower bound ≈ 35 Mbps Saturday Network performance monitoring has traditionally been important to the operation of networks of any significant size: • fault detection • determining expected performance, e.g. NM-WG: GGF17, Tokyo, 11th May 2006 Mark Leese (Daresbury Laboratory)

Justification: Network Operations dl gw-nnw.core. netnw.net.uk manchester-bar.ja.net liv ncl Backbone… man Fault Detection: • 2002: DL’s performance to most sites is poor • But not man.ac.uk • Topology tells us likely location of fault • Compare different paths to divide and conquer • In this case, all paths to NCL not using Manchester BAR are ok • = router misconfiguration • Trivial example, IF you own all the data dl to man Use cases exist for GOCs applying these principles to investigate multi-site Grid performance. It’s not about shifting responsibilty, just adding to the GOCs toolkit. dl to ncl NM-WG: GGF17, Tokyo, 11th May 2006 Mark Leese (Daresbury Laboratory)

Justification: One Grid Use Case Replica Catalogue 1.LFN 2. Multiple locations (PFNs) 3. Get performance data/ predictions 5. GridFTP commands 4. Selected replica (PFN) Replica Selection • File replication = proven technique for improving data access • Distribute multiple copies of the same file across a Grid • A file has Logical File Name (LFN) which maps to ≥ 1 Physicals • Replica(tion) Manager responsible for replication issues, such as: • maintaining mapping, LFNPFNs • deciding which replicas should exist, and where (e.g. based on recent use) • Replica Selection Service uses network performance data (from somewhere) to find the “best” replica • Similar principle can be applied to selecting CEs and SEs for jobs Replica Selection Service Simplified example Grid App Net Mon Service NM-WG: GGF17, Tokyo, 11th May 2006 Mark Leese (Daresbury Laboratory)

Grid Net Perf Mon: Common Myths (1) Network performance monitoring has been around for a long time (as long as networks themselves). It is well understood, so there is no need for this fuss over the Grid. The Grid is a special case: • As last slide showed, Grid middleware and applications could use network data to optimise their performance, adapting to changing network conditions = addition of non-human consumption • We’re talking about moving and sharing datasets, the sizes of which we haven’t seen before. Data intensive applications (e.g. Large Hadron Collider @ CERN (HEP) in the PetaByte range) need networks debugged for efficiency • The Grid as “utility computing” needs measurable SLAs NM-WG: GGF17, Tokyo, 11th May 2006 Mark Leese (Daresbury Laboratory)

Grid Net Perf Mon: Common Myths (2) Q: Okay, so why don’t we just throw some more bandwidth at the problem? Upgrade the links. A: Bandwidth is bad for you. It’s like a narcotic… • It’s very addictive. You start off with a little, but that’s not really doing it for you; it’s not enough. You increase the dose, but it’s never as good as you thought it would be. • By analogy you can keep buying more and more bandwidth to make your network faster but it's never quite as good as you thought it would be • Why? Because simple over-provisioning is not sufficient • Doesn’t address the key issue of end-to-end performance: • Network backbone in most cases is genuinely not the source of the problem • Last mile (campus networkend-user systemyour application) often cause of the problem: firewall, network wiring, hard disc, application and many more potential culprits NM-WG: GGF17, Tokyo, 11th May 2006 Mark Leese (Daresbury Laboratory)

Grid Net Perf Mon: Common Myths (3) Q: Okay, so why don’t we use dedicated optical fibre everywhere? A: Costs are still prohibitive. LHC will have 19 Tier-2 sites in the UK. Q: Kamai. I will win this argument :-) What if we share existing fibre, and used circuit-switched lightpaths? That is dedicated bandwidth, but without the cost of dedicated fibre. A: Good idea in theory, and we can see the benefits from a fibre infrastructure like UKLight infrastructure via the ESLEA* project, but this still doesn’t address the end-to-end issue. Take a real-life ESLEA example (thanks to ESLEA for the figures)… • UCL (London) wanted to transfer data from FermiLab (Chicago) to for analysis, before returning the results • datasets were 1-50TB • 50TB would take > 6 mths on production network, or one week at 700Mbps • 1 Gbps circuit-switched light path provisioned as a result • Result = disc-to-disc transfers @ 250Mbps, just 1/4 of theoretical network maximum • Tests revealed an end-site problem NM-WG: GGF17, Tokyo, 11th May 2006 Mark Leese (Daresbury Laboratory) * Exploitation of Switched Lightpaths for e-Science Applications

Questions? NM-WG: GGF17, Tokyo, 11th May 2006 Mark Leese (Daresbury Laboratory)

V1 Schemas: Where we started from Don’t intend to say too much about this, just introduce the work and talk about a user of the V1 schemas. It’s what we can do with the schemas that’s the important aspect, not the schemas themselves. NM-WG: GGF17, Tokyo, 11th May 2006 Mark Leese (Daresbury Laboratory)

V1 Schemas: Where We Started From measurement data request (request schema) NetworkMonitoringService results (response/publication schema) • After the Hierarchy Document work, the logical next step was to produce a standardised means of publishing and sharing our standardised network metrics. Similarly, an XML schema seemed the logical choice. • A schema for publishing network measurements first appeared in June 2003… • …but that schema was lonely, so in October 2003 we agreed to give it a friend: a unified way of requesting network measurement data (whether this refers to historic data or requires the running of new tests) • Work began after GGF9 with input from: DANTE, Daresbury, GATech, Internet2, LBL, NCSA and UCL • An internal draft (1a) appeared in January 2004 • The schemas have evolved a lot since then, but are being superceded by V2 schemas (more later) • Yet they have served a valuable purpose: selling the idea of accessing network performance data from multiple adminstrative network domains using the same method NM-WG: GGF17, Tokyo, 11th May 2006 Mark Leese (Daresbury Laboratory)

Why Do This? Quite simply to give software access to measurement information: • Grid middleware and applications needs access, as we’ve seen • And so do network operators… • When gathering performance information from equipment or test boxes within your network, there are many options • SNMP • have a machine publish into a database etc. etc. • You can build yourself an efficient, robust and scaleable architecture without going anywhere near XML or Web Services • But what happens when you want to share your data with other operators? You know they will have different databases/tools/ways of working. • Sharing data and requesting tests in multiple domains would allow you to perform previously impossible tasks like partial path analysis in the international networks that are often used for science and research NM-WG: GGF17, Tokyo, 11th May 2006 Mark Leese (Daresbury Laboratory)

V1 Schemas: What They Supported • Request schema had four parts defining the what, where, when, how of the data you wanted or the measurement you wanted made • Response schema could similarly express the what, where, when, how of the results • Two schemas, to do everything • less to maintain • less to worry about coding for validating etc. • Summary of the basics…. What: • request particular network metric or statistical data with a specified sample interval, e.g. daily averages for one-way delay over the last month Where: • specify source and destination as IPv4|6, hostnames, or textual names such as “core router” When: • Two ways to specifiy times: start and end, target time with +/- time tolerance How: • supply values to act as parameters for tests, or filters for querying past data, e.g. request 20 second iperf test, or only interested in historic ping data for tests using 1000 byte packets • request the reporting of the actual parameter values used in tests (if known) • We take forward as much as possible in V2 NM-WG: GGF17, Tokyo, 11th May 2006 Mark Leese (Daresbury Laboratory)

V1 Schemas: Lessons Learned To be documented by Richard Hughes-Jones in a GGF Experimental Doc,and taken forward into V2: • The one size fits all solution by it’s very nature allowed you to do invalid things, e.g. specify TCP Buffer Size for a ping (non-TCP) test – although these could be ignored. • Business logic is important. The overall aim of the request schema is for compliant requestors to drive predictable behaviour from compliant responders. The requirements must therefore define a set of rules following: “IF you receive X, THEN do Y”. This is business logic, just like rules can be associated with a database to encode business policies (e.g. auto-generate reminder letter for unpaid gas bill)It’s important as it often encompasses “rules” that cannot be explicitly expressed within the schema. • No standardised way of returning faults, without which faults can't be passed between and understood by different NM-WG compliant Web Services • No support for discovery (where are the monitoring points, where are the monitoring infrastructures?) and capability discovery (what tests can you run, what type of data do you have?) is a problem • The request schema allows you to specify multiple requests in each instance document (each message based on the schema that you send). Since the requests:responses relationship is M:N, it becomes unclear which report belongs to which request. Either force just one request to be sent at a time or provide message tagging. • Units as strings is a bad idea because it's too flexible. People can specify whatever they want (Mbps, Mbit/s etc.) making it difficult to interpret the results and unreliable to convert between them (e.g. bps to Mbps). We planned to use an appropriate units standard but it didn't happen. NM-WG: GGF17, Tokyo, 11th May 2006 Mark Leese (Daresbury Laboratory)

EGEE JRA4 • European Grid project, the successor to European Data Grid • EGEE completed on 31st March 2006. EGEE-II now begun (not covered here) • EGEE JRA4 (Joint Research Activity) = group responsible for “Development of Network Services”, inc. Network Performance Monitoring (NPM) • Some work continues in EGEE-II in Service Activity SA1 (European Grid Operations, Support and Management) • Various monitoring tools and frameworks have been produced: • End-to-end: e.g. EDG::WP7 (now e2emonit) • Backbone: e.g. Internet2 and DANTE’s perfSONAR collaboration • JRA4 have not built another one! The work was about standardising access to NPM data across multiple domains and using it. • NM-WG v1 schemas (available at the time) were the selected basis for standardisation • So what has been produced?: • Mediator (standardise access to NPM data) • Diagnostic tool (present the data to NOCs and GOCs) • Publisher (present the data to Grid middleware) Mark Leese - Daresbury Laboratory

GOC/NOC Diagnostic Client NM-WG End Site Home grown NM-WG NM-WG Backbone perfmonit Backbone piPEs NPM Architecture (1) Some Client NM-WG End Site e2emonit Mark Leese - Daresbury Laboratory

GOC/NOC Diagnostic Client NM-WG JRA4 NPM Mediator NM-WG End Site Home grown NM-WG NM-WG Backbone perfmonit Backbone piPEs NPM Architecture (2) Some Client NM-WG End Site e2emonit Mark Leese - Daresbury Laboratory

Client Application Web Service NPM Mediator Discoverer Aggregator Response Cache Web Service Web Service Network Monitoring Infrastructure Network Monitoring Point NPM Architecture (3) “Mediator” • So the Mediator, provides uniform access to data from heterogeneous monitoring frameworks • Human & machine users interact with the Mediator via client applications which “speak” NM-WG • Discoverer locates MP(s) or infrastructures that can answer the client’s query • Aggregator • obtains query results from MP(s) • aggregates results (if necessary) – this functionality was not implemented, although basic feasibility was demonstrated • To improve performance and reduce loading, results of recent requests will be cached Mark Leese - Daresbury Laboratory

Discovery • When interfacing to mon infrastructures, you need to know what paths are tested and for what metrics (knowing the tools and parameters is a bonus) • JRA4 took some steps in this direction: • The Mediator is statically configured with the list of available test paths. This is requested by the DT when it displays the list to users, so from the DT’s viewpoint, it’s list is dynamic. BUT JRA4 had to define a custom "discovery" schema for this. • The DT makes Capability Discovery requests to the Mediator, which pass-thru’s the request to the PerfSONAR TL or NMWG4RGMA CapDiscovery interfaces. • PerfSONAR TL and NMWG4RGMA are statically configured with their capability information (not a problem for infrastructures with a fixed set of tools testing a fixed set of paths). Again, as far as the DT is concerned, the information it gets is dynamic BUT another custom schema was required. Knows monitored routes Know their capabilities Mark Leese - Daresbury Laboratory

Mediator & Diagnostic Tool (DT) • 1st Mediator prototype (deliverable DJRA4.2) was produced in PM9 (Dec ‘04): • Proved we could harness (multi-domain) backbone and end-site tools together • Improved in subsequent iterations • For more info see: https://edms.cern.ch/file/695235/1/EGEE-DJRA4.7-695235-v1-1.pdf • DT provides Web interface access to any network data accessible via the Mediator, i.e. any data that the Mediator can access via the unified NM-WG data – a lot :) • Aimed at NOCs and GOCs • JSPs (Java Server Pages) obtaining data via the Mediator Web Service • Must have a valid X.509 certificate to gain access • https://edms.cern.ch/file/653967/1/EGEE-JRA4-TEC-653967-DTUserGuide-v1-3.pdf • Demonstrated at GGF15 and 4th EGEE conference (October 2005) graphing data from Abilene, ESNet, GÉANT2 and e2emonit (JRA4 end-to-end monitoring infrastructure). • DT can access lots of data (EGEE, DANTE etc.) but must do so through a Web Services interface – not very efficient for graphing. • The Gridmon Web interface can access data more natively using a simple TCP connection, or a DB interface such as PerlDBI, but is Gridmon/UK only • Different approaches to deployment & dissemination throughout the World. People not necessarily recreating the wheel. Need to see what’s best for different cases. Mark Leese - Daresbury Laboratory

Diagnostic Tool Mark Leese - Daresbury Laboratory

Publisher (1) • gLite = lighweight middleware coming out of the EGEE project, which includes Resource Brokering Middleware (RBMW)… • Workload Management System (WMS) - performs job scheduling • Discovers CEs that match a user’s requirements, and according to those requirements identifies the most appropriate candidates for running the user’s job. Could do the same for SEs in the future. • Data Scheduler (DS) - manages data transfers between SEs • User specifies LFN to be copied over the network • LFNs translated into Storage URLs (SURLs) • DS schedules the movement of the files via an underlying mechanism like GridFTP • The key metric (unsurprisingly) is achievable bandwidth, applied to three key scenarios… • Data fetch: • As in our earlier Grid use case, a dataset is fully/partially replicated in multiple locations. • In simple cases (fetching from a single source) the best location to from which to fetch data is ordinarily the source with the highest sourcesink achievable bandwidth. • It may however be possible to download parts of the file in parallel from multiple sources. If the size of each source’s data is identical, the required transfer time = the time required to download data from the slowest source (source with the least achievable bandwidth). Mark Leese - Daresbury Laboratory

Publisher (2) • Point-to-point data transfer: • A data source (e.g. a known SE) is instructed to transfer data to a particular sink. • The nature of how inter-networks are established and maintained (cost, reliability, politics...) can cause a route other than the default to offer better throughput and hence a shorter transfer time • But we can apply graph theory: a Minimax Path algorithm will find us the path between two points in a set that has the least cost. • Representing the network as a graph and using achievable bandwidth as the cost of the edges, Minimax Path can find us the path between the src and dest with the highest bandwidth, and hence the least cost in transferring the data. • Distribute single file or dataset to a set of nodes: • Represents a 1:N transfer, or reliable multicast. • A Minimum Spanning Tree (MST) algorithm finds the collection of edges that join together all points in a set with the minimum possible sum of edge values • In this case it finds the collection of network paths that join together all the destinations in a multicast, with the maximum total achievable bandwidth (thus giving the minimum transfer time) • For more info: Bridging Network Monitoring and the Grid, EGEE paper presented at CESNET 2006 Mark Leese - Daresbury Laboratory

NM-WG v1 NPM Mediator To NM-WG compliant monitoring infrastructures Publisher (3) • RBMW requested a response within 0.2s of receiving a NPM request containing up to 100 element pairs (e.g. CE-SE) • Web Services preferred, if possible in relation to above • Mediator satisfies one request at a time, with one path per request • And unless data is already cached, each request requires communication with a monitoring infrastructure • So WS is not going to do 100 pairs in ≤ 0.2s • Speed/efficiency issue solved by caching data in relational DB • RBMW reads directly from DB • Only last 24 hours data of interest • Publisher performs regular polling for data, cleaning of DB etc. RBMW(WMS/DS) RBMW(WMS/DS) Site Y Site X SQL Query SQL Query MeasurementDB MeasurementDB SQL Insert SQL Insert NPMPublisher NPMPublisher Regular polling On-demand NM-WG queries Mark Leese - Daresbury Laboratory

RBMW(WMS/DS) Site X SQL Query MeasurementDB SQL Insert NPMPublisher Regular polling NM-WG v1 NPM Mediator To NM-WG compliant monitoring infrastructures Network Cost Function file src & destfile size estimated transfer time Publisher (4) • Each publisher sends regular requests for just two metrics on pre-configured paths as a proof of concept • achievable bandwidth, path loss • The RBMW identifies the subject of a query (a network path) in terms of the CE/SEs of interest. A DB table maps these to the Monitoring Points which make the relevant measurements, e.g. ce01.cern.chnmp.cern.ch • The Publisher creates the table from a static config file • RBMW can access any DB. We illustrated one per site since a local DB provides convenience and fast access • The existence of many Publishers is not a problem since Mediator caches most recent queries, reducing load on itself and underlying monitoring infrastructures • A RBMW group wanted raw data (no statistics), mainly to experiment with different network cost functions… Mark Leese - Daresbury Laboratory

Why Web Services? • Why not? • Can create a common, implementation independent way for systems to communicate. • Speed can be an issue, e.g. JRA4 Publisher • Could the JRA4 Mediator be threaded? • What happens asre Web Services become more mature? • Perhaps if you are building an integrated Grid aware network fabric you use something to simpler to link your modules (e.g. performance prediction service interrogates BAR services). • So far for us, the advantages outweigh the speed issue. NM-WG: GGF17, Tokyo, 11th May 2006 Mark Leese (Daresbury Laboratory)

Questions? NM-WG: GGF17, Tokyo, 11th May 2006 Mark Leese (Daresbury Laboratory)

V2 Schemas: Where we are now Again more on what people are doing with it that the schemas themselves. Martin Swany, swany@cis.udel.edu Jason Zurawski, zurawski@eecis.udel.edu Dan Gunter, dkgunter@lbl.gov Have put a lot of work into this! NM-WG: GGF17, Tokyo, 11th May 2006 Mark Leese (Daresbury Laboratory)

Why a V2? • The main goal is to separate… • the rapidly changing material – the data – measurement results: actual values + associated timestamps, from • The relatively constant material – the metadata – src and dst of tests, test parameters etc. • Why? Efficiency – metadata can be transmitted, processed, stored separately from it’s evil twin, the constantly changing data • Single request and response schemas can’t facilitate this • Secondary goal is to do more than the request-response model, via reusability • e.g. subscribe to event notifications and report those events • The V1 schemas are only good for what they were made for: request-response. • While the V1 schemas share “modules” of common XML, they are not particular extensible, nor have they particular donated re-usable definitions of that can be easily imported into or plagiarised in other schemas • With a fixed definiton as a request and response they cannot be used as the basic data in higher level (more intelligent) NM-WG XML protocols such as a notification systems • In response, all V2 schemas will define one of two, simple, generic types: • Message • store NM-WG: GGF17, Tokyo, 11th May 2006 Mark Leese (Daresbury Laboratory)

So What Changed? • All schema instances are either: • a store: stationary entity for storing measurement data • a message: transient request or response that is to be exchanged • Both consist of either or both of: • Data: • Datum: an item of factual information derived from measurement or research. In our case, is associated with a particular point in time • Time • Metadata: • Subject: the measured/tested entity • Event Type: the type of measurement, value or event which occurred (already you can see that this extends beyond the request-response model) • Parameters: how, or under what conditions, did this event occur? NM-WG: GGF17, Tokyo, 11th May 2006 Mark Leese (Daresbury Laboratory)

Instance Structure Metadata <id>MarksID</id> Metadata <id>MarksID</id> Data <metadataIdRef> MarksID</metadataIdRef> Metadata <id>JasonsID</id> <metadataIdRef> MarksID</metadataIdRef> • An instance of Data refers to an instance of Metadata • An instance of Metadata can also refer to another instance of Metadata (useful when chaining operations, e.g. give me raw data, and then the mean of that set) NM-WG: GGF17, Tokyo, 11th May 2006 Mark Leese (Daresbury Laboratory)

Your Flexible, Extensible Friend • Our previous representations of measurements have been broken down into their atomic parts • The base schema is just that, a base. It is really just a framework that must be extended to be of any use. • The atomic parts can be “plugged into” the framework (and new ones added) in varying combinations to to produce schemas tailored to their exact purpose (no longer a one size fits all solution) • Because of the flexible schema framework, the fine granularity of atomic elements, and the readily re-usable components (e.g. definitions of time and topology) schemas can be easily defined for particular tools or the metrics which they measure. The results of a ping test for example could be recorded in an instance of a ping schema, or instances of RTT and packet loss schemas. • Providing both increase potential usage of the data, e.g. if your software understands TCP achievable bandwidth, you software can accept and process data from iperf, pathchar etc. providing it is expressed as an instance of the TCP achievable bandwidth schema NM-WG: GGF17, Tokyo, 11th May 2006 Mark Leese (Daresbury Laboratory)

Namespaces • In the example (two slides that way ) you’ll see that the elements don’t all share the same namespace • The specific structures of the Data and Metadata elements are dependent on the measurement they record, event they represent etc. • Because of this, we can change the namespace without upsetting anything, encoding the event type in the namespace of the elements. • Why would we want to do this? • Some software components could pass-through Data and Metadata elements without understanding their specific structure, leaving validation to software which understands those elements (because it recognises the namespace) • This also allows an implementation to decide whether it supports a specific varieties of element. In the example, does the receiving system support the http://ggf.org/ns/nmwg/characteristic/delay/roundTrip variety of “parameters” or “packetType”? • Individuals or organisations can create their own independent extensions to the schemas without central co-ordination or vetting by simply placing their schema in a unique namespace, e.g. http://yourorg.org/ns/measurements/yourmetric NM-WG: GGF17, Tokyo, 11th May 2006 Mark Leese (Daresbury Laboratory)

Relax NG • Both V2 and V1 schemas were implemented using the “Relax NG compact syntax” schema language • Standardised by OASIS • Better for our needs that the usual WXS (W3C XML Schema) • Simpler, intuitive and highly readanle (perfect for networkers :) • More focused on validating the structure of documents (what we want) rather than classifying the relationship between the nodes of an XML tree to help translate it into an object system (e.g. for an OO programming language) • There are tools such as Trang to convert Relax to WXS anyway • Simple example: Data = elementnmwg:data { Identifier & MetaIdentifierRef & ( CommonTime? & Datum* ) } So “Data” consists of: • an identifier • a Metadata indentifier reference • Zero or one time values • Zero or more values (datums) ? is “zero or one” * is “zero or more” | is “or” & means element allowed in any order NM-WG: GGF17, Tokyo, 11th May 2006 Mark Leese (Daresbury Laboratory)

Example Instance (Round Trip Delay) <?xml version="1.0" encoding="UTF-8"?> <nmwg:message type="store" xmlns="http://ggf.org/ns/nmwg/" xmlns:nmwg="http://ggf.org/ns/nmwg/" xmlns:delay= "http://ggf.org/ns/nmwg/characteristic/delay/roundTrip/" xmlns:nmwgt= "http://ggf.org/ns/nmwg/topology/" xmlns:nmtm= "http://ggf.org/ns/nmwg/time/"> <nmwg:metadata id="34534"> <delay:subject id="243"> <nmwgt:endPointPair> <nmwgt:src type="hostname" value="blackseal.pc.cis.udel.edu" /> <nmwgt:dst type="hostname" value="ellis.internet2.edu" /> </nmwgt:endPointPair> </delay:subject> <delay:parameters id="8374"> <delay:packetType>TCP</delay:packetType> <delay:numPackets>2</delay:numPackets> <delay:packetSpacing>poisson</delay:packetSpacing> <delay:packetGap>5</delay:packetGap> <delay:protocolID>ipV4</delay:protocolID> <delay:typeOfService>FIFO</delay:typeOfService> <delay:valueUnits>ms</delay:valueUnits> <delay:numBytes>64</delay:numBytes> <delay:numBytesUnits>bytes</delay:numBytesUnits> </delay:parameters> </nmwg:metadata> <nmwg:data id="4454" metadataIdRef="34534"> <nmwg:commonTime type="unix" value="1107492095"> <delay:datum value="14.1"/> <delay:datum value="12.8"/> </nmwg:commonTime> </nmwg:data> </nmwg:message> From latest Schema Developers Guide: http://stout.pc.cis.udel.edu/NMWG/devguide.pdf NM-WG: GGF17, Tokyo, 11th May 2006 Mark Leese (Daresbury Laboratory)

perfSONAR Slides by Eric L. Boyd, eboyd@internet2.edu and Jeff Boote, boote@internet2.edu Hacked about by Mark

The Big Picture: NPM Covers… • Analysis and Visualization • Performance Data Sharing • Performance Data Generation • Clean APIs between each layer • Widespread deployment of measurement infrastructure • Widespread deployment of common performance measurement tools

perfSONAR Overview • What: Measurement infrastructure for exchanging data under development • How: Webservices network performance framework • Network measurement tools • Network measurement archives • Distributed scheduling/authorization † • Multi-domain policy • Common language (GGF NMWG Schema) • Where: Deployed / to be deployed across: • Network Backbones (Abilene, ESNet, GÉANT) • Regional Networks (NRENs, RONs, Gigapops) • Universities • When: First product release early summer ’06 • † In large networks it is important for individual services to be at least partially autonomous. In the case of network monitoring, centralised control over ther scheduling of active measurement tests makes it difficult to support on-demand tests from other domains (one of the main targets of perfSONAR and it's piPEs forebear). As a result the principle distributed scheduling is promoted, with each component scheduling itself based on the requests it gets, but within the safety of a Resource Protector (later slide).

perfSONAR is a joint effort: ESnet Fermilab GÉANT2 JRA1 Internet2 RNP (Brazil) Internet2 includes: University of Delaware Georgia Tech Internet2 staff GÉANT2 JRA1 includes: Arnes Belnet Carnet Cesnet DANTE DFN FCCN GRNet ISTF PSNC Nordunet (Uninett) Renater RedIRIS Surfnet SWITCH perfSONAR Credits

How can you use it? • perfSONAR Link Utilization and Capacity data available from Abilene, ESnet, and GÉANT (prototype) • Build your own components to integrate into the open source framework • The focus of Internet2, DANTE and ESNet is on on the framework of Web Services. • Some sample tools have been and will be deployed to make sure everything works - these will naturally be the ones of most interest to the “backbone” - that is their area of expertise, and more importantly their responsibility • Inovative diagnostic and analysis tools are still the domain of the Universities and Research Labs but perfSONAR will certain provide a new and exciting, and more accessible environment in which to operate. • There would be nothing to stop end-to-end tools being plugged into perfSONAR, just bear in mind that this is not a backbone operator’s area of interest

perfSONAR: System Description • Domains represented by a set of services • Each domain can deploy services important to thedomain • Analysis clients interact with service across multiple domains

perfSONAR: Services (1) • Lookup Service • Allows the client to discover the existing services and other LS services. • Dynamic: services register themselves with the LS and mention their capabilities. • They can also leave or be removed if a service gets down. • LS will eventually talk across network domains so users in one domain can find out about services in another • AuthN/Z Service (not yet implemented) • Provides authorization functionality for the framework • Internet2 MAT, GN2-JRA5 (eduGAIN) • Users can have several roles, the authorisation is done based on the “user” role. • Trust relationships defined between users affiliated with different administrative domains. • If a trust relationship exists between two networks (for example they are part of the same AA federation such as eduGAIN) network2 will have some way of deciding what privileges they are willing to give to user A of network1. Like Shibboleth, network2 will not be told the identity of user A, but will rely on network1 to provide information about them - perhaps their role or privilege level (network support staff, superuser, student...).

perfSONAR Services (2) • Transformation Service • Transform the data in some way (aggregation, concatenation, correlation, translation, etc). • Topology Service • Make the network topology information available to the framework and visualisation tools • Find the closest MP to X • I’m guessing that X could be the src and dst of a path on which a user wants to run a network test, whether they named X explicitly or picked it from a display in a visualisation tool • Either way, depending on exactly where the closest MPs are in relation to the requested src and dst, they will either give an approximation of the performance of requested test path or the performance of a sub-set of it. • Resource protector • Arbitrate the consumption of limited resources between multiple services. • Only used when multiple services are trying to use the same resource • For example, may prevent too many services running test on the same link if it would cause congestion or saturate that link

Inter-domain perfSonar example interaction Useful graph Client Token MA Here is who I am, I’d like to access MA B Things have changed, now that a new AA method will be adopted, but this still shows the general princple Here is who I am, I’d like to access MA A Token MB a,b,c : Network A, MA A, AA A Where can I get the Link utilisation along - Path a,b,c? AA A Here you go Get Link utilisation a,b,c Get link utilisation c,d,e,f AA B Here you go a,b,c: Network A – LS A, c,d,e,f : Network B, MA B, AA B Where can I get the Link utilisation along - Path a,b,c,d,e,f? LS A LS B MA B MA A a b f e c d Network A Network B

Network Measurements Working Group