1 / 32

Marcos Portnoi*, Martin Swany*, Jason Zurawski † ,

An information services algorithm to heuristically summarize IP addresses for a distributed, hierarchical directory service. Marcos Portnoi*, Martin Swany*, Jason Zurawski † , . *Computer and Information Sciences Dept. University of Delaware mportnoi@ieee.org, swany@udel.edu. † Internet2

shandi
Download Presentation

Marcos Portnoi*, Martin Swany*, Jason Zurawski † ,

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An information services algorithm to heuristically summarize IP addresses for a distributed, hierarchical directory service Marcos Portnoi*, Martin Swany*, Jason Zurawski†, *Computer and Information Sciences Dept. University of Delaware mportnoi@ieee.org, swany@udel.edu †Internet2 zurawski@internet2.edu

  2. What is perfSONAR? • Infrastructure for network performance monitoring, making it easier to solve end-to-end performance problems on paths crossing several networks. • Service Oriented • Each service has a well defined function • Construct your own framework of arbitrary size and complexity • Open Protocols (developed through the OGF’s NM-WG/NMC-WG/NML-WG) • Consortium of Organizations • ESnet • GÉANT2/GÉANT3 • Internet2 • RNP • Software Releases/Products • perfSONAR-MDM – Java Based • perfSONAR-PS – Perl Based

  3. The architecture of perfSONAR

  4. The architecture of perfSONAR • Data Services • Close to the network – performing, storing, exchange measurements • Vary in type and capability • Interoperable via the aforementioned protocols • Information Services • “Glue” that holds the infrastructure together • Locating information and services wherever they may be • Controlling access to services or altering the view of the available data • Presentation (Analysis/Visualization) • Doing “useful” work, e.g. visualizing performance into a graph • Transforming data into other formats • Alarming based on prior baselines or meeting certain conditions

  5. perfSONAR Information Services • Each service and client is “aware” of the Information Service Plane • May be statically configured • Could dynamically locate where it may be (or find the closest instance) • Services: Must register their location and capabilities on a regular basis • Services have a name, associated domain, contact information • Services more than likely have measurement data (e.g. interfaces they are monitoring, or a set of host pairs that perform active tests) • Consist of a data type • Hostname/IP information • Other “Metadata” • Regular “push” of data to the Information Services Plane (e.g. facilitates a “heartbeat” to establish the service is operational • Clients: Interested in locating services • Act on behalf of the end user to do something useful • Interested in data of specific types, but may not know address of a service • Consult the Information Services with specific questions (e.g. “I want metric A for domain X”)

  6. perfSONAR Information Services • Idea is similar to DNS for measurement services, and is not new (e.g. Globus MDS, gLite BDII) • Home Lookup Service (hLS) • “Local” service that ideally lives in a domain • Accepts registrations directly from measurement services • Automatically “finds” the upper layer • Global Lookup Service (gLS) • “Global” cloud of information services, similar to the root DNS system • Peer with each other to exchange information • Accept the registrations of hLS instances only • Currently several maintained by partners in the perfSONAR project

  7. Lookup Service = distributed directory for services • The Lookup Service (LS) is a distributed directory, composed of levels. • Local directories (hLS): knowledge of local services (measurement tools, archives) that directly register. • Global directories (gLS) of local directories [hLSs]

  8. Summarization in the Information Service • Trade off between information size and locality to the service • Information loss occurs the further away it travels • Client in the same domain may want to know specifics on a IP routed interface and links • Client across the country will simply want to know if there are metrics related to interface utilization in the domain • Information size will decrease as we shed unnecessary information • Service: Origin of the data set, should be the largest • hLS: • Contains a copy of the service set and is able to answer any and all queries a service could (draw activity away from service) • Contains a reduced data set of “everything” in the hLS • gLS: • Contains a copy of the data sets for all registered hLSs • Contains a further reduced data set built from these sets

  9. Service discovery using the Lookup Service Services register to hLS hLS finds a gLS hLS “Summarizes” internal data set Client is interested in finding data – locates a gLS to speak with Query from Client to gLS to locate services. Response will be address of hLS Similar query to hLS to find something. Response is a service to ask Service and Client transfer data

  10. Summarization: What’s Important? • Hostnames • Measurement points may have 100’s of measurements for a domain • Hostnames are not important, domains (and subdomains) are • E.g. damsl.cis.udel.edu has 3 pieces of info we care about as we move away from the MP: cis.udel.edu, udel.edu, edu • Metrics (e.g. eventTypes) • Interface utilization is different than interface drops is different than Layer4 achievable bandwidth • Enumerate all – don’t summarize • Note that with better adoption of the OGF Hierarchy of Characteristics, we could summarize the metrics as well (Lowekamp et al., Grid2003) • IP Addresses (v4 and v6) • Observations: natural structure, divided into CIDR ranges operated by a given administrative domain • Wish to find common CIDR descriptions for a given set of (unrelated) addresses • May not know a priori which domain/operator owns an address (and may not look this up) • Crux of this work

  11. The Problem of Balancing Compression and Miss Rate • A key aspect of Information Summarization is dealing with IP addresses (v4 and v6) • IP summarization must fulfill two goals: • Decrease the original set of IP addresses by reasonable amount. • i.e., achieve a good compression rate. • But it must not summarize too much. • Results in claiming many more IP addresses than original set. • Less precision.

  12. The Problem of Balancing Compression and Miss Rate • Current mode of operation: • hLS has some set of addresses from measurement tools • Simplistic approach: find natural cut points, determine a CIDR range • hLS may advertises something like a /20 (or larger). • It is claiming to have (in its directory) all 212 hosts in advertised /20 subnet! • Even if hLS truly only holds small portion of this range. • Claiming large subnet for comparable small number of hosts within that subnet: extra burden in search process. • Client will believe advertiser hLS does possess all hosts in subnet. • Must query hLSto confirm. • Multiple hLSs may overlap in what they claim • If desired IP address is not in the hLS; • Penalty: wasted time and resources to perform confirming query. • Analogous to a cache miss, and the penalty, to a miss penalty.

  13. The Problem of Balancing Compression and Miss Rate • Less compression  more precision  more “space” consumed • More compression  less precision  more miss penalty • IP summarization must balance compression and miss rate. • Optimum balance between compression and miss rate is susceptible to administrator interpretation.

  14. Our heuristic for IPv4 summarization • Our heuristic summarizes a list of IP addresses by employing IP subnet addresses to represent the actual host IP addresses controlled by an hLS. 198.129.248.121 134.55.217.89 134.55.219.9 134.55.209.41 134.55.218.5 134.55.213.205 134.55.213.74 198.124.194.9 134.55.42.10 134.55.208.126 198.124.216.157 134.55.217.82 134.55.42.18 198.124.238.1 134.55.217.6 134.55.200.74 192.168.201.5 192.107.175.3 134.55.222.62 134.55.221.42 134.55.218.70 134.55.217.113 … 134.55.0.0/16 134.167.160.49/32 138.18.155.22/32 192.0.0.0/9 192.150.29.210/32 192.150.31.78/32 192.168.201.0/26 192.188.106.140/32 198.0.0.0/8 IP summarization heuristic engine

  15. How does it do it • The heuristic constructs a special data structure – a PATRICIA trie • Within which the inner nodes are placeholders, and the leaves contain the data • Example:

  16. How does it do it • For our needs: • Within which the inner nodes are the subnet addresses, • The leaves are the actual host IP addresses. • Data set we will manipulate: • 10.10.0.1 • 10.10.0.2 • 10.10.0.3 • 10.10.0.4

  17. (a) First IP address inserted • Data set: • 10.10.0.1 • 10.10.0.2 • 10.10.0.3 • 10.10.0.4

  18. (b) Second IP address inserted • Data set: • 10.10.0.1 • 10.10.0.2 • 10.10.0.3 • 10.10.0.4

  19. (c) Third IP address inserted • Data set: • 10.10.0.1 • 10.10.0.2 • 10.10.0.3 • 10.10.0.4

  20. (d) Last IP address inserted • Data set: • 10.10.0.1 • 10.10.0.2 • 10.10.0.3 • 10.10.0.4

  21. How does it do it • Uses three metrics to decide which inner nodes to pick: • Distance: notion of how many IPs a subnet claims, but do not actually exist in the network; • Density: number of actual IP addresses over total number of possible IPs in a subnet; • Minimum Subnet Mask: avoids too large subnets. • User-controllable by two parameters.

  22. Metric: Distance • Distance: notion of how many IPs a subnet claims, but do not actually exist in the network. • Difference, in bits, between a child node’s mask and its parent’s mask.

  23. Metric: Density • Density: number of actual IP addresses over total number of possible IPs in a subnet.

  24. Metric: Minimum Subnet Mask • MinMask: avoids too large subnets; • Assures no node with mask < minMask will be selected as summarizing node. • This metric takes precedence over the previous ones.

  25. How it integrates with perfSONAR LS • Two parameters to control the summarization algorithm • summarization_granularity: Controls the granularity or coarseness of the summarization. • summarization_minMask: Controls the minimum mask that a summarizing node must have. Accepts values from 0 to 32 (IPv4). • Default = 8.

  26. Granularity • Granularity: from 0 to 3 • 0: finer, less compressed summarization (more IP addresses). • 3: coarser, more compressed, less precise summarization (less IP addresses). • To compose this parameter: mapping of threshold values from distance and density. • Empirical.

  27. Granularity • To compose this parameter: mapping of threshold values from distance and density. • Empirical. • Internally, convert from granularity to distance and density by means of equations.

  28. Some IP summarization techniques • Route aggregation algorithms: • Degermark, M., Brodnik, A., Carlsson, S., & Pink, S. (1997). Small forwarding tables for fast routing lookups. SIGCOMM Computer Communication Review, 27, 3-14. • Draves, R.; King, C.; Venkatachary, S. & Zill, B. (1999), ‘Constructing optimal IP routing tables’, in 'INFOCOM '99. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings. IEEE', pp. 88 -97 vol.1. • Nilsson, S., & Karlsson, G. (1998). Fast address look-up for internet routers. (pp. 11-22). Chapman & Hall, Ltd. • Srinivasan, V., & Varghese, G. (1998). Faster IP lookups using controlled prefix expansion. SIGMETRICS Performance Evaluation Review, 26, 1-10. • Waldvogel, M., Varghese, G., Turner, J., & Plattner, B. (1997). ‘Scalable high speed IP routing lookups’. SIGCOMM Computer Communication Review, 27, 25-36.

  29. Some IP summarization techniques • Main objective of those efforts: IP lookup performance improvement for routing. • They utilize “next hop” information to make decisions. • Our algorithm is primarily not intended to be used for routing. • “Next hop” information is not available.

  30. Summarization algorithm currently in perfSONAR • Relies on a voting scheme to identify subnets that represent most of the original IP addresses. • For each original address, algorithm expands all subnets. • Stores them into a list. • If a subnet was already expanded by a previous address, increment its vote counter. • Select candidates for summarizing addresses by picking subnets that have at least one original, /32 IP address child.

  31. Summarization algorithm currently in perfSONAR • User cannot influence address selection. • Final summarizing subnets might be of any size. • Very large sets (/8s) were common: there is no where near that amount of data available • Distinctively, our heuristic allows for control of the compression level of the summarization. • Also implements mechanisms to avoid selecting summarizing subnets that might be considered too large.

  32. Conclusion • Algorithm being evaluated as a replacement for perfSONARhLS and gLS instances • Experimental results being collected on PlanetLab to evaluate efficiency and accuracy • Additional enhancements to the heuristic are being evaluated • Questions? • Thanks! • Marcos Portnoi (mportnoi@ieee.org) • Martin Swany (swany@udel.edu) • Jason Zurawski (zurawski@internet2.edu)

More Related