1 / 41

The RIPE NCC Internet Measurement Data Repository

Shane Alcock. The RIPE NCC Internet Measurement Data Repository. Introductions. Research Programmer with WAND NOT affiliated with RIPE NCC, just speaking on their behalf Passive measurement Organise packet trace captures Maintainer of the WITS website

hollie
Download Presentation

The RIPE NCC Internet Measurement Data Repository

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Shane Alcock The RIPE NCC Internet Measurement Data Repository

  2. Introductions Research Programmer with WAND NOT affiliated with RIPE NCC, just speaking on their behalf Passive measurement Organise packet trace captures Maintainer of the WITS website Experienced in dealing with measurement data sets

  3. Outline Sharing Internet datasets Challenges Case studies The RIPE NCC repository Available datasets Other RIPE datasets that may be added

  4. Sharing Measurement Data Internet measurement research requires data Often it is difficult to collect suitable data Privacy Security Cost of infrastructure Selecting appropriate times and locations

  5. Sharing Measurement Data Sharing data with the community is an awesome idea Saves time and effort Promotes collaboration Enables validation of previous results Encourages others to share their data as well

  6. Sharing Measurement Data WITS – Waikato Internet Traffic Storage http://www.wand.net.nz/wits CAIDA http://www.caida.org/data/ PREDICT https://www.predict.org/ CRAWDAD http://crawdad.cs.dartmouth.edu/data.php NLANR No longer exists :(

  7. Challenges Community awareness Datasets are scattered amongst multiple hosts Lack of publicity and detailed information about datasets Meta-data DatCat (CAIDA) http://www.datcat.org Catalogue of publicly available datasets Not an actual repository – data is hosted externally Not a comprehensive resource

  8. Challenges Repositories often maintained by research groups Limited funding, therefore limited resources People Expertise Disk space Bandwidth

  9. Case Study: WITS Maintenance is intermittent Maintainer has many other responsibilities Disk space is a huge limitation No room on the FTP server to put new data sets Adding new disks costs both money and time Sanitizing datasets requires even more space as we must retain the original version as well Bandwidth Cost of commercial bandwidth hinders availability of data Enable access via KAREN (NZ national research network) only Fortunately, KAREN peers with many international NRENs

  10. Challenges Permanence Research groups typically depend on competitive funding Funding runs out – repository vanishes Loss of data is a major issue No longer able to replicate and validate previous studies

  11. Case Study: NLANR Large public archive of measurement data Auckland, Abilene traces (PMA) AMP US government ceased funding Repository no longer maintained Domain eventually expired CAIDA and WAND salvaged the data Traces now available on WITS Without intervention, the data could easily have been lost permanently

  12. Challenges Avoiding inappropriate disclosure Anonymisation of sensitive information, e.g. IP addresses Developing policy to cover user access and agreements Many datasets have unique restrictions or policies Policy that is appropriate for one dataset is not for another Personal contact information IP addresses User payload in packet traces

  13. Challenges Communication with users Data sharing is often not top priority for collectors Collection designed to suit their purposes Small changes to the collection process can often make the data more useful to a wider audience Encourage users to engage with collectors

  14. Challenges Support Measurement data is complicated to deal with Steep learning curve Formats, e.g. PCAP vs ERF vs legacy DAG formats for traces Tools / Processing libraries Timezones Documentation of shared datasets is often poor User support is intermittent, due to lack of resources again

  15. Challenges Size Internet measurement datasets are huge Push modern storage technologies to the limit Server hosting and maintenance

  16. The RIPE NCC Repository RIPE NCC collects a lot of measurement data already They want to share this data with the community Most is already available through various repositories Develop a single common and consistent platform Hosting Browsing Accessing and downloading data Open to other collectors who wish to share data Still under development

  17. Hardware 2 servers – Master and back-up Size: 9U Disk: 48x 2TB on 2 controllers – 2 cold spares CPU: 2x Quad core Xeon L5420 2.5GHz Memory: 32GB Chassis: Chenbro RM91250

  18. Hardware

  19. Features of the RIPE NCC Repository Longevity RIPE NCC does not depend on competitive research funding Generating and keeping Internet measurement data for ~20 years Long time-series data Much less likely that the repository will disappear Emphasis on mirroring rather than replacing other repositories Host anonymized versions of data

  20. Features of the RIPE NCC Repository Resources RIPE NCC manages servers, infrastructure Larger repository can justify a dedicated support staff Experience and expertise are important Diversity Variety of datasets from different collectors Increased awareness of new datasets One user account can access many different datasets Self sign-up for “basic access”

  21. Features of the RIPE NCC Repository Communication Bridge the gap between data collectors and users Raise awareness of existing data Gather feedback from the user community Develop relationships with other data collectors Links to useful tools and libraries for processing data Share expertise as well as data

  22. Available Datasets Data collected by RIPE NCC RIS routing database Reverse DNS delegations made by RIRs Data from external sources WITS Ex-NLANR data

  23. Routing Information Server (RIS) 16 route collectors peering with 600 BGP routers Mostly within the RIPE region ~100 peers provide complete routing tables Routes are collected and published in MRT format Updates every 5 minutes Full table dump every 8 hours All data collected since 2000 has been retained

  24. Routing Information Server (RIS) Other methods of access Last 3 months of data exported to MySQL database Weekly statistical reports Looking Glass queries Tools to query and visualise RIS data

  25. Reverse DNS Zone s (Partial) Reverse DNS delegations made by RIRs Generated using RIPE DB reverse DNS objects ~410,000 reverse DNS objects

  26. Auckland Passive traces taken at the University of Auckland Auckland II – VII were previously available through NLANR Frequently feature in measurement literature Currently available from WITS archive

  27. Waikato Passive traces taken at the University of Waikato Long duration continuous traces Waikato I is available Other Waikato sets will be included at a later date

  28. NLANR Other NLANR datasets that were preserved by WAND IPLS (also known as Abilene) Leipzig Active Measurement Project (AMP) Much of this data is also currently available from WITS

  29. Other Datasets Collected by RIPE NCC Not currently in the repository but may be added later K-root and reverse DNS server statistics and traces Hostcount TTM DNSMON AS112 Other parts of RIPE DB These are covered in more detail in the paper

  30. K-root Internet root name service operated by RIPE NCC PCAP traces of incoming port 53 traffic (DNS queries) 50 hours of traces included in CAIDA's DITL project DNS Statistics Collector (DSC) Summarises DNS traffic into 1 minute bins Generate graphs shown on the K-root website Raw data exported to DNS-OARC SNMP statistics Originate from RIPE NCC in Amsterdam Summarised and exported to an RRD

  31. Reverse DNS 4 reverse DNS servers operated by RIPE NCC 50,000 queries per second (3x load of K-root) High query rate means regular trace collection is infeasible DSC used on each of the rDNS servers Raw data and graphs only available within RIPE NCC Could be made available if there was a need

  32. AS112 AS number for RFC 1918 private address space http://public.as112.net/ Dynamic DNS update and rDNS server for AS112 Hosted by RIPE NCC Goal is to measure and analyse DNS updates for invalid addresses PCAP trace collected annually and contributed to DITL More frequent captures could be scheduled if needed DSC data also collected Graphs publicly available from RIPE NCC AS112 site

  33. Hostcount Monthly DNS scan of ~100 TLDs within the RIPE region Count A and PTR records for both forward and reverse Ipv4 Also count forward AAAA for IPv6 addresses Not exhaustive, due to public zone transfers being disabled Statistics published via Hostcount website Raw data from 1990-2007 is archived off-line Current policy is to discard raw data after statistic extraction But this could be reversed if there is a need

  34. Test Traffic Measurements (TTM) Active measurement system of ~100 probes Most probes located at ISPs and universities within Europe Not all are included in public measurements Regular series of active tests UDP one-way delay, traceroute, DNSMON, IPv6 PMTU Also supports ad-hoc measurements by authorised users Ping, HTTP page fetch Can also develop and run arbitrary tests Results not released outside of RIPE NCC

  35. Test Traffic Measurements (TTM) Bulk data published using CERN ROOT Performance graphs on the TTM website

  36. DNSMON Measures the reachability and latency of DNS Collected using 60 TTM probes Root domain, .com, .net, .org, e164.arpa, 24 CC-TLDs measured IPv4 and IPv6 performance measured Summary statistics and graphs are publicly available Only paying subscribers can access most recent graphs Raw data also available upon request

  37. RIPE DB Internet number registration objects for the RIPE region IP addresses and AS numbers Reverse DNS objects Used to create zone files for the reverse DNS service Route registry objects Used to provide an Internet Routing Registry Conforms to RPSL and RFC 2650

  38. RIPE DB Public queries supported via command-line and web Daily limit imposed on queries that include personal info Bulk data is available via FTP Personal details are not included Can subscribe to a near real-time mirror of the database Restrictions on personal data are very broad Can result in inappropriate limitations Better access policies and mechanisms should resolve this

  39. Links RIS http://www.ripe.net/ris RIPE DB http://www.ripe.net/db K-root http://k.root-servers.org TTM http://www.ripe.net/ttm Hostcount http://www.ripe.net/is/hostcount/stats DNSMON http://dnsmon.ripe.net/dns-servmon AS112 http://www.ripe.net/as112 WITS http://www.wand.net.nz/wits

  40. Conclusion Repository is a 'beta' Server exists and some datasets are available for download Interested users can be given access Looking for feedback and ideas Development of policy, particularly for access Data collection Improving the RIPE datasets to be more useful to researchers Acquiring more external datasets Contributions of data, analysis tools

  41. Contact http://data-repository.ripe.net data-repository-info@ripe.net salcock@cs.waikato.ac.nz

More Related