Scott Kirkpatrick, School of Engineering, Hebrew University of Jerusalem EVERGROW and OneLab2 Collaborators (thanks, not blame…) Yuval Shavitt, Eran Shir, Udi Weinsberg, Shai Carmi, Shlomo Havlin, Avishalom Shalit, Daqing Li.
School of Engineering, Hebrew University of Jerusalem
EVERGROW and OneLab2
Collaborators (thanks, not blame…)
Yuval Shavitt, Eran Shir, Udi Weinsberg,
Shai Carmi, Shlomo Havlin, Avishalom Shalit, Daqing LiThe Internet’s Physical Topology(or, Will the Internet ever measure itself?)
Federated initially from military and commercial networks, some of which involved highly proprietary and gratuitously different platforms.
Arpanet, DECnet, PC-based systems, IBM’s SNA, BitNet, Euronet…
As a result, there are two distinct layers: BGP and above (inter-AS), and intra-AS (OSPF shortest path, MPLS, ATM, …)
BGP information is exchanged by sharing recommended routes, exposing only those for which an AS will be properly compensated.
Engineering the Internet has always been distributed using a formal model, IETF RFP’s etc. similar to international standards formation, yet with less commercial involvement than typical ISO practice, and a US center-of-gravity for the deliberations.
Layering of communications protocols has permitted high degree of refinement, but now seems in stasis.
There are no global databases, many local databases, poor data quality.
Has undergone a revolution
Traceroute – an old hack basic tool in wide use
Active monitors – hardware intensive distributed software
DIMES (“[email protected]”) an example, not the only one now
Many enhancements under consideration, as the problems in traceroute become very evident
Ultimately, we expect every router (or what they become in the future internet) will participate in distributed active monitoring.
The payoff comes with interactive and distributed services that can achieve greater performance at greatly decreased overhead
Jacobson, “traceroute” from LBL, February 1989
And this is something that can be rewritten for special situations, such as cellphones
Single machine traces to many destinations – Lucent, 1990s (Burch and Cheswick)
Great pictures, but interpretation not clear, demonstrate need for more analytic visualization techniques
But excellent for magazine covers, t-shirts…
First attempt to determine the time evolution of the Internet
First experience in operating under the “network radar”
Skitter and subsequent projects at CAIDA (SDSC)
15-50 machines (typically <25), at academic sites around world
RIPE and NLANR, 1-200 machines, commercial networks and telco backbones, information is proprietary
DIMES (>10,000 software agents) represents the next step
19,597 agents registered (in 115 countries)
29,404 Ases and 204,204 AS-AS links
6.6 B measurements saved since 9/2004
A flood of feigned suicide packets (with TTL values t=1 to about 30 hops), each sent more than one time.
Ideal situation, each packet dies at step t, router returns echo message, “so sorry, your packet died at ip address I, time T”
Non ideal situations must be filtered to avoid data corruption:
Errors – router inserts destination address for I
Non-response is common
Multiple interfaces for a single (complex) router
Route flaps, load balancing create false links
Route instabilities can be reduced with careful header management (requires guessing router tricks)
Resulting links must be resolved – to Ases, to routers, to POPs
Prune by grouping sites in “shells” with a common connectivity further into the Internet: All sites with connectivity 1 are removed (recursively) and placed in the “1-shell,” leaving a “2-core” then removing 2-shell leaves 3-core, and so forth.
The union of shells 1- k is called the “k-crust”
At some point, kmax, pruning runs to completion.
Identify nucleus as kmax-core
This is a natural, robust definition, and should apply to other large networks of interest in economics and biology.
Cluster analysis finds interesting structure in the k-crusts
These are the hanging
tentacles of our (Red Sea)
For subsequent analysis, we distinguish three components:
Core, Connected, Isolated
Largest cluster in each shell
Data from 01.04.2005
This picture has been stable from January 2005 (kmax = 30) to present day, with little change in the nucleus composition. The precise definition of the tendrils: those sites and clusters isolated from the largest cluster in all the crusts – they connect only through the core.
Need to address the specifics of the “network discoveries”
How frequently observed?
How sensitive are the observations to the number of observers?
How do the measurements depend on the time of observation?
The extensive literature on the subject is mostly straw-man counterexamples, that show bias from this class of observation can be serious, in graphs of known structure, but do not address how to estimate structure from actual measurements.
Current efforts (me, Weinsberg, Carmi) are studying how the Meduza model and other observations are affected by removal of the less-reliable data:
Infrequently seen links
Less than three days presence in a week
Some things seen only once
Stuff seen by rogue agents
Is it intentional? Probably not.
So far all the basic observations are proving robust.
Peer-connected component (PCC) capable of long ranged communications as well as local
We've used “betweenness” to test alternate routings which ignore “Tier One” links.
Betweenness is essentially a traffic model.
Each node in a set sends one packet to each other node in the set. (Example, all 1 and 2 shell nodes)
Compare maximum betweenness with and without the nucleus ASes.