On the Accuracy of Embeddings for Internet Coordinate Systems

On the Accuracy of Embeddings for Internet Coordinate Systems Eng Keong Lua, Tim Griffin, Marcelo Pias, Han Zheng, Jon Crowcroft. University of Cambridge, Computer Laboratory.

RTT Estimation : Is this a good one? Depends on the APPLICATIONS! Estimated RTT from planetlab1.comet.columbia.edu RTT (ms) Measured RTT from planetlab1.comet.columbia.edu PlanetLab’s sites, from closest to farthest using measured RTT

RTT Estimation : Is this a good one? Estimated RTT from planetlab1.pop-mg.rnp.br Measured RTT from planetlab1.pop-mg.rnp.br RTT (ms) PlanetLab’s sites, from closest to farthest using measured RTT

Internet Coordinates: How accurateare they? • What are Internet Coordinates? • A Close Look at the Lipschitz Embedding • New Sets of Accuracy Metrics • Experimental Methodology - PlanetLab Experiments • Using Other Embeddings • Revisiting Previous Work • Conclusion Both of the previous examples where generated using the same Internet coordinate technique on the same data set Outline

What are Internet Coordinates? • Internet Coordinate System • Embed Round-Trip-Times (RTTs) into geometric spaces • Unmeasured RTTs are estimated using geometric distance • Why Internet Coordinate Systems? • Extensive measurement of network delays can be • time consuming • add to network load • Construction of overlay topologies through scalable distance estimation • If accurate, embedding techniques allow us to predict Internet RTTs without extensive measurements.

How embeddings work L = Landmarks H = Hosts N = Nodes = L + H L H L H Embed = Associate a point with each node in N Compute “distance” matrix L L H A Metric Space H Estimated RTT matrix Measured RTT matrix This data is not used in embedding. (But is needed for judging accuracy!) Why we don’t use Skitter data!

Full Embedding: L = N L L Compute “distance” matrix L Embed L A Metric Space Estimated RTT matrix Measured RTT matrix In general, some accuracy is lost even when the “full mesh” of data is used

Two Basic Approaches: Method I • Predicting Internet Network Distance with Coordinates-based • Approaches (GNP) [Ng, Zhang. INFOCOM 2002] • Big Bang Simulation (BBS) [Shavitt, Tankel. INFOCOM 2003, 2004] • Vivaldi [Dabek, Cox, Kaashoek, Morris. SIGCOMM 2004] |L| = m • PIC [Costa, Kastro, Rowstron, Key. ICDCS 2004] L H L Embed using optimization algorithms w.r.t an accuracy metric (n < m) Space of n dimensions H Measured RTT matrix

Two Basic Approaches: Method II • Virtual Landmarks [Tang, Crovella, IMC 2003] • Constructing Internet Coordinate Systems based on • Delay Meausurements [Lim, Hou, Choi, IMC 2003] |L| = m • Lighthouses for Scalable Distributed Location • [Pias, Crowcroft, Wilbur, Harris, Bhatti, IPTPS 2003] L H L Dimensionality reduction (n < m) Lipschitz embedding Euclidean Space of m dimensions Euclidean Space of n dimensions H Measured RTT matrix May attempt to optimize this using a specific accuracy metric w.r.t the measured RTTs, and/or the m-dimensional distances Accuracy may be lost – We will look at the “inherent” loss of accuracy of this step

Lipschitz Embedding – Example using binary trees Full Lipschitz embedding into R7 by reading each Row 7-dimensional Coordinate of the node: E.g. Coordinates of Node 1 is Φ(1) = [0, 1, 2, 2, 1, 2, 2]

View from a leaf in a binary tree, depth 4 Full 32-D Lipschitz

View from root in a binary tree, depth 4

What should Accuracy Mean? • Several ways to capture Accuracy formally • Notion depend on the needs of an application • Some applications require the distances in embedding accurately reflect the original distances In earlier example, we have Φ(7) = [2, 3, 4, 4, 1, 2, 0] δ(1,7) ≈ 4.47 But it is only 2 in original metric space

Relative distance of other nodes Is Node A closer than Node B? Relative ranking of distances is not lost We define Relative Rank Loss (rrl) From Node z, if sign(R) ≠ sign(R’) Order has changed! Relative Rank Loss (rrl)

Formal Definition - rrl rrl is a type of “swap distance” Define:

Formal Definition - rrl • Define Local rrl at Node z is • Note that 0 (0%) < rrl(Φ,z) < 1 (100%) • Maximal Local rrl at Node z = MAX(rrl(Φ,z)) • Average Localrrl at Node z =

Closest Neighbor Loss (cnl) • Some applications interested only in determining which nodes are closest • Accurately preserve the set of closest nodes • For a Node x: • Its Closest Neighbor Loss, cnl(Φ,x) is 0, if any of nodes closest to xX are mapped to the nodes closest to Φ(x) • Otherwise, cnl(Φ,x) is 1 • Global Average cnl(Φ,x) denotes as cnl(Φ) =

Relative error for Lipschitz embedding on binary trees, depth 1 (3 nodes) to 8 (511 nodes) It is not obvious or intuitive how to interpret

Scalar independent measures for Lipschitz embedding on binary trees, depth 1 to 8 cnl tells us that about 96% of 511 nodes in a tree of depth 8 have a different closest neighbors Maximal Local rrl tells us that at least 1 node see over 30% of its relative distance relationships swapped rrl shows that on average nodes see over 20% of their relative distance relationships swapped

View from a leaf in a hub with 30 spokes Root node is PUSHED away to a distance of 3.3

Hub and Spoke Accuracyn spokes and 1 root, where n ranges from 1 to 30 Rising cnl and falling rrl after n=6

Why PlanetLab? • Skitter project makes RTT data available from a small number of monitoring nodes n to m target nodes, m is order of hundreds of thousands • Yields an asymmetric n x m • Embedded distances between target nodes cannot be verified • PlanetLab – testbed for Internet planetary-scale mesh topology

Methodology • RTT measurement data collected between PlanetLab nodes from March 22-28, 2004 • Minimum value between each pair of nodes on consecutive of 15-min periods • Each day has 96 matrices of pair-wise RTT, with size of each matrix is 325 x 325 • Over 7-day period, we have 672 matrices

Methodology • A representative node is selected in each site to build a site-by-site matrix, and clean up for missing entries • Finally, we have 69 x 69 RTT site-by-site matrix • We further classify into geographical locations: • North America (NA-PL): 44 x 44 RTT site matrix, majority sites obtain connectivity through Abilene • Outside North America (ONA-PL): 25 x 25 RTT site matrix between research and commercial, includes Australia, Europe, Latin America and Asia • ALL (ALL-PL): 69 x 69 RTT site matrix, consists of NA-PL & ONA-PL

Results and Observations – ALL-PL • Apply full Lipschitz Embedding • Minimum, Mean and Maximum rrl • Difference between Max and Min rrls is high (57.71%) – Flip a coin is better! • Global cnl measure is 84.06%, only about 15% of the sites retain their closest neighbors in their embedding

Scalability (Meta-) Metric: Can embeddings scale? • Suppose applications only interested in a subset of nodes, e.g. North America • Would it be better to use an Internet Coordinate System from ALL-PL or from NA-PL? • To answer to this question will determine if embedding services could scale • If YX, we first could use the full Lipschitz embedding to obtain Φ(X), then restrict this to nodes in Y, denote as Superspace embedding • Φ(Y) and may have very different embeddings with different accuracy for metric space spanned by Y

Superspace and Subspace EmbeddingsLooking at NA-PL

Superspace and Subspace Results • We used NA-PL as a Subspace of ALL-PL: Φ(NA-PL) = Subspace Embedding of NA-PL Φ(NA-ALL) NA-PL = Superspace Embedding of NA-PL • Lipschitz Subspace embedding in Euclidean space is a much better one

North America (Superspace Embedding): PlanetLab site with Maximum rrl –planetlab1.flux.utah.edu

North America (Subspace Embedding): PlanetLab site with Maximum rrl –planetlab1.enel.ucalgary.ca

CDFs of rrl for Subspace and Superspace Embeddings

Using Other Embeddings with our PlanetLab ALL-PL sites using our Accuracy Metrics • Both BBS (Euclidean) and Vivaldi embeddings in Euclidean space have the samecnl measure of 75.36% • BBS (Hyperbolic) LRN has the lowestcnl • Vivaldi has higher maximum rrl compared to BBS (Euclidean) • BBS (Euclidean) has lowest maximum rrl • BBS (Hyperbolic) TP embedding has a much higher maximum rrl than BBS (Hyperbolic) LRN embedding • It has the largest maximum rrl • Its minimum rrl is lower than BBS (Hyperbolic) LRN

Signature plots: BBS (Hyperbolic) TP Lists of close neighbors are being pushed away in embedded geometric space

Signature plots: Vivaldi Lists of close neighbors are being pushed away in embedded geometric space

Scalability (Meta-) Metric – Superspace and Subspace embeddings • Vivaldi and BBS embeddings in Euclidean space have same behavior as Lipschitz embedding • Subspace embedding has betterrrl accuracy than Superspace embedding in Euclidean space • BBS embeddings in Hyperbolic space • Superspace embedding tends to have a close or betterrrl accuracy than Subspace embedding in Hyperbolic space

Revisiting Previous Work with their data sets using our Accuracy Metrics • BBS (Hyperbolic) TP in Hyperbolic space has similar inaccuracy behaviors in rrl as Lipschitz embedding in Euclidean space for tree-like network topology • All experiments show list of closest nodes being pushed away with sharp bi-modal errors • BBS (Hyperbolic) LRN, list of close neighbors is being pushed away very much further and has higher maximum rrl

BBS (Euclidean) using Jan 2000 AS Hierarchical Tree Network Topology of 150 nodes

BBS (Hyperbolic) TP using Jan 2000 AS Hierarchical Tree Network Topology of 150 nodes

BBS (Hyperbolic) TP using BA Network Topology of 150 nodes

BBS (Hyperbolic) TP using Mar 2001 AS Network Topology of 200 nodes

BBS (Hyperbolic) LRN using Mar 2001 AS Network Topology of 200 nodes

Conclusion • Goal of this work is to apply our new accuracy metrics to study the accuracy of embeddings for Internet Coordinate systems • Results of this attempt is not encouraging • Worthwhile to develop a collection of accuracy metrics that are able to quantify different aspects of user-oriented quality • Can we characterize the impact of network topologies that have good embeddings with respect to an accuracy metric? • Embeddable Overlay Network (EON) • Routing nodes are selected to avoid violations of triangle inequality (for overlay forwarding) • Overlay topology selected to embed with high accuracy with respect to multiple useful accuracy metrics

Discussion • Strength • Extensive study on the accuracy of diverse embedding techniques • New metrics for measuring the accuracy of diverse embeddings • Weakness • Lack of verification studies of cnl and rrl • rrl and cnl can be biased • Topology dependent metrics • Will RTT based mechanisms be working? • Speed of light for measuring the distance to the planets vs. RTT for measuring the distance to the destinations • Measuring the distances by time in error prone environments

Thank you. Questions?

On the Accuracy of Embeddings for Internet Coordinate Systems

On the Accuracy of Embeddings for Internet Coordinate Systems

Presentation Transcript

Coordinate Systems

Improving Prediction Accuracy of Matrix Factorization Based Network Coordinate Systems

Coordinate Systems

COORDINATE SYSTEMS

Coordinate Systems

Coordinate Systems

Coordinate Systems

Coordinate Systems

Coordinate Systems

More on Coordinate Systems

Slicer’s Coordinate Systems

Coordinate Systems

Internet Coordinate Systems

Coordinate Systems

Coordinate Systems

Coordinate systems

Coordinate Systems

Revit coordinate systems for the rest of us

Coordinate Systems

Coordinate Systems

Coordinate Systems