1 / 55

Geolocation by IP address

Geolocation by IP address. Locating Internet hosts. Sándor Laki lakis@inf.elte.hu http://lakis.web.elte.hu. Outline. RADAR a wireless solution IP2Geo on the Internet Constraint-Based Geolocation (GeoLim) OCTANT framework Topology-Based Geolocation. RADAR. RADAR, a wireless approach.

dillan
Download Presentation

Geolocation by IP address

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sándor Laki (C) Geolocation by IP address Geolocation by IP address Locating Internet hosts Sándor Laki lakis@inf.elte.hu http://lakis.web.elte.hu

  2. Sándor Laki (C) Geolocation by IP address Outline • RADAR a wireless solution • IP2Geo on the Internet • Constraint-Based Geolocation • (GeoLim) • OCTANT framework • Topology-Based Geolocation

  3. Sándor Laki (C) Geolocation by IP address RADAR

  4. Sándor Laki (C) Geolocation by IP address RADAR, a wireless approach • Focus on the indoor environment • GPS does not work indoors • Dedicated technologies • Goals • Leverage existing infrastucture • Use wireless LAN • Software solution • Better scalability and lower cost than dedicated technology

  5. Sándor Laki (C) Geolocation by IP address RADAR • Key idea: Signal strength matching • Offline calibration • Construct radio map (<location, Sstr>) • Real-time location and tracking • Extract SStr from beacons • Find table entry that best matches the measured SStr

  6. Sándor Laki (C) Geolocation by IP address RADAR: Determine location • Find nearest neighbor in signal space (NNSS) • 1st solution • Physical position of NNSS gives the user location • 2nd solution • K-NNSS • Average the coordinates of k nearest neighbor gives the wanted position

  7. Sándor Laki (C) Geolocation by IP address Correlation between physical location and signal strength • Base system: • INFOCOM 2000 paper • Enhanced system: • Microsoft Technical Report MSR-TR-2000-12

  8. Sándor Laki (C) Geolocation by IP address IP2Geo Single-point localization

  9. Sándor Laki (C) Geolocation by IP address IP2Geo - Motivation • Much focus on location-aware services in wireless and mobile contexts • Such services are relevant in the Internet context too • targeted advertising • event notification • territorial rights management • network diagnostics • It is a challenging problem • IP address does not inherently contain an indication of location

  10. Sándor Laki (C) Geolocation by IP address IP2Geo • Multi-pronged approach that exploits various “properties” of the Internet • DNS names of router interfaces often indicate location • network delay tends to correlate with geographic distance • hosts that are aggregated for the purposes of Internet routing also tend to be clustered geographically • GeoTrack • determine location of closest router with a recognizable DNS name • GeoPing • use delay measurements to estimate location • GeoCluster • extrapolate partial (and possibly inaccurate) IP-to-location mapping information using BGP prefix clusters

  11. Sándor Laki (C) Geolocation by IP address GeoTrack – main idea • Extract geographical information from DNS names of routers on the path • Localizes the target to the last router whose position is known • Example • ngcore1-serial8-0-0-0.Seattle.cw.net => Seattle • 184.atm6-0.xr2.ewr1.alter.net => New York • dnvr-scrm.abilene.ucaid.edu => Denver

  12. Sándor Laki (C) Geolocation by IP address GeoTrack • GeoTrack operation • do a traceroute to the target IP address • determine location of last recognizable router along the path • Key ideas in GeoTrack • partitioned city code database to minimize chance of false match • ISP-specific parsing rules • delay-based correction • Limitations • routers may not respond to traceroute • DNS name may not contain location information or lookup may fail • target host may be behind a proxy or a firewall

  13. Sándor Laki (C) Geolocation by IP address GeoPing - Delay based localization • Delay-based triangulation is conceptually simple • delay to distance • distance from 3 or more non-colinear points => target location • But there are practical difficulties • network path may be circuitous • transmission & queuing delays may corrupt delay estimate • OWD is hard to measure • OWD ≠ RTT/2 because of routing asymmetry

  14. Sándor Laki (C) Geolocation by IP address GeoPing - details • Measure the network delay to the target host from several geographically distributed probes • typically more than 3 probes are used • round-trip delay measured using ping utility • small-sized packets => transmission delay is negligible • pick minimum among several delay samples • Nearest Neighbor in Delay Space (NNDS) • akin to Nearest Neighbor in Signal Space (NNSS) in RADAR • construct a delay map containing (delay vector,location) tuples • given a vector of delay measurements, search through the delay map for the NNDS • location of the NNDS is our estimate for the location of the target host • More robust that directly trying to map from delay to distance

  15. Sándor Laki (C) Geolocation by IP address GeoPing – Delay tends to increase with geographic distance

  16. Sándor Laki (C) Geolocation by IP address GeoPing – Estimation error

  17. Sándor Laki (C) Geolocation by IP address GeoCluster A passive method

  18. Sándor Laki (C) Geolocation by IP address GeoCluster • A passive technique unlike GeoTrack and GeoPing • Basic idea: • breaks the IP address space into clusters • assign a geographical location to each cluster based on IP-to-location third party databases • given a target IP address, first find the matching cluster using longest-prefix match. • location of matching cluster is our estimate of host location

  19. Sándor Laki (C) Geolocation by IP address GeoCluster • Example: • consider the cluster 128.95.0.0/16 (containing 65536 IP addresses) • suppose we know that the location corresponding to a few IP addresses in this cluster is Seattle • then given a new address, say 128.95.4.5, we deduce that it is likely to be in Seattle too

  20. Sándor Laki (C) Geolocation by IP address GeoCluster – Clustering IP addresses • Exploit the hierarchical nature of Internet routing • inter-domain routing in the Internet uses the Border Gateway Protocol (BGP) • BGP operates on address aggregates • we treat these aggregates as clusters • in all we had about 100,000 clusters of different sizes

  21. Sándor Laki (C) Geolocation by IP address IP-to-location mapping • Data sources • e-mail service, business web-hosting companies, etc. • requires a large, fine-grain and fresh database! • Information • partial information (i.e., only for a small subset of addresses) • possibly inaccurate (e.g., manual input from user)

  22. Sándor Laki (C) Geolocation by IP address Extrapolating IP-to-location mapping • Determine location most likely to correspond to a cluster • majority polling • “average” location • dispersion is an indicator of our confidence in the location estimate • What if there is a large geographic spread in locations? • some clusters correspond to large ISPs and the internal subdivisions are not visible at the BGP level • sub-clustering algorithm: keep sub-dividing clusters until there is sufficient consensus in the individual sub-clusters • some clients connect via proxies or firewalls (e.g., AOL clients) • sub-clustering may help if there are local or regional proxies • otherwise large dispersion => no location estimate made • many tools fail in this regard

  23. Sándor Laki (C) Geolocation by IP address Performance of GeoCluster Median errors: GeoCluster ~30km GeoPing ~300km GeoTrack ~100km

  24. Sándor Laki (C) Geolocation by IP address Other database-oriented applications • NetGeo and IP2LL • based on WHOIS DB • not closely regulated • the address information often indicates the head office of the owner which may be far from the actual target • Quova • Commercial service with thier own database • Gtrace • using DNS LOC entries

  25. Sándor Laki (C) Geolocation by IP address Octant framework A very impressive solution

  26. Sándor Laki (C) Geolocation by IP address Octant overview • Combine very different techniques • Active and passive • Constraint-based • Weighted positive and negative constraints • Constraint => region • Using Bézier-regions • Efficient implementations of clipping and union operations are available

  27. Sándor Laki (C) Geolocation by IP address Octant - Notations • bi : the region in which the target node is located • gj : a constraint • It is a region where the node might be reside associated with weight • Set of nodes • Landmarks: physical locations are at least partially known (Lj) • Every Lj has an „estimated” location bLj

  28. Sándor Laki (C) Geolocation by IP address Octant – Landmarks and constraints • Primary landmark • GPS, street address • Low error • Secondary landmark • Position computed by Octant itself • Positive constraints ( set: ) • Node A is within d miles of Lk • g = (x,y) in bkc(x,y,d), where c(x,y,d) is a disc. • Negative constraints( set:  ) • Node A is further than d miles from Lk • g = (x,y) in bkc(x,y,d)

  29. Sándor Laki (C) Geolocation by IP address Estimated location • bi = XiXi \ XiXi

  30. Sándor Laki (C) Geolocation by IP address Mapping latencies to distances • Latency between a target and a landmark • bounds thier maximum distance • Calculate with speed of light • delay*2/3*c • Low precision • Octant’s way • Dynamic calibration • For each L landmark compute two bounds RL(d) and rL(d) • where d is the ping time of node i • rL(d)  || loc(L) – loc(i) ||  RL(d) • When queuing delays are dominant then rL(d) = 0.

  31. Sándor Laki (C) Geolocation by IP address Mapping latencies to distances • Each landmark periodically pings all other landmarks => creating a correlation table • Determines the convex hull around the points => R(d) and r(d) • It is sufficient when the target has a direct and congestion-free path to the landmark • Octant introduce a cut off at latency p • a tunable percentage of landmark lie to the left of p • discard the others • (z is a fictitious datapoint, • placed far away)

  32. Sándor Laki (C) Geolocation by IP address Mapping latencies to distances

  33. Sándor Laki (C) Geolocation by IP address Last hop delays • Mapping is further complicated by queuing and transmission delays associated with the last hop • Cable and DSL connections • Overloaded PlanetLAB nodes • Goal: isolate the delay components which artificially inflate latencies • Detailed maps of the underlying physical network, as in network tomography (not in Octant) • Octant introduce a simple metric called height

  34. Sándor Laki (C) Geolocation by IP address Last hop delays in Octant • Based on pair-wise latency measurements between landmarks • Primary landmarks: a, b, c • Measure thier latencies(RTT): [a,b], [a,c], [b,c] • The positions of primary landmarks are known -> we can estimate the transmission delays: (a,b), (a,c), (b,c) • Lasthop delay(a,b) = [a,b] - (a,b) • Landmark coordinates (alon, alat),…

  35. Sándor Laki (C) Geolocation by IP address Last hop delays in Octant • How much of the delays can be attributed to each landmark? • Denoted by a’, b’ and c’ // height • Similarly, for a target t, we can compute t’, as an estimation… • We can solve for • t’, tlon, tlat

  36. Sándor Laki (C) Geolocation by IP address Last hop delays • tlon and tlat has relatively high error • not used in the later stages • Given the target and landmark heights • Each landmark can shift its • RL up if t’ < heights of the other landmarks • rL down if t’ > heights of the other landmarks

  37. Sándor Laki (C) Geolocation by IP address Indirect routes • The preceding assumption • Route lengths are proportional to great circle distances • not the case in practise, due to policy routing • Example: a subscriber Ithaca, NY -> Cornell Univ. (Ithaca) • Syracuse, NY -> Brockport, IL -> New York City -> Cornell Univ. • ~1 mile physical distance VS. ~800 miles length path

  38. Sándor Laki (C) Geolocation by IP address Indirect routes discovery • Landmark’s heigth can indicate… • Localizing routers on the network path • Secondary landmarks • Localization by latencies • Extract location from router names • Reverse DNS lookup – undns tool • Using ZIP code to determine geographical location

  39. Sándor Laki (C) Geolocation by IP address Handling uncertainty • Filter out errorneous constraint • Latency based constraints • Weight system that decreases exponentially with increasing latency • Weight threshold

  40. Sándor Laki (C) Geolocation by IP address Iterative refinement • Two phase: • First, we use accurate and mostly conservative constraints • Second, less acurate and more aggressive constraints to obtain a better estimation (inside the initial estimated region) • …

  41. Sándor Laki (C) Geolocation by IP address Results

  42. Sándor Laki (C) Geolocation by IP address Results

  43. Sándor Laki (C) Geolocation by IP address Topology-based Geolocation

  44. Sándor Laki (C) Geolocation by IP address Motivations • Problems with CBG • Use constraints that are less than speed of light • Risk of underestimates • When an underestimate occurs, the final region does not contain the true location • Topology based geolocation • using the speed of light to generate constraints • inspired by Sensor Network Localization

  45. Sándor Laki (C) Geolocation by IP address Summary of techniques • Traceroute from landmarks • Map topology • Estimate hop latency • Improve accuracy • Cluster network interfaces • Increase structuring • Validate location hints • Incorporate location hints • Constraint optimization • Geolocate targets

  46. Sándor Laki (C) Geolocation by IP address Estimate hop latencies • Using traceroute tool to infer link latency • Estimate hop latency from the difference in RTT to adjacent routers • Accurate only if the link is traversed both directions (symmetric routing) • How can we discover this property? • Three different techniques

  47. Sándor Laki (C) Geolocation by IP address Estimate hop latencies • First, observing the reverse TTL values • Most routers initialize the TTL values for thier packets from a small set. • 30,32,64,128,150,255 • If TTL values changes significantly from one node to the next => discard the link estimate • Second, measuring paths in both direction between pairs of landmarks • If both paths traverse a particular link => taking the differences of measurements to the two endpoints • This estimation has high confidence • Third, increasing vantage points from which we probe a certain link • For every link on a path from a landmark we probes to both endpoints from all other landmarks… • If these probes pass over the link => estimate for the link

  48. Sándor Laki (C) Geolocation by IP address Clustering interfaces • Clustering interfaces that belong to the same router (IP aliases)

  49. Sándor Laki (C) Geolocation by IP address Clustering interfaces • Two IP-aliases techniques • Mercator • UDP probes are send to high-numbered ports on a set of interfaces • Routers send back a port-unreachable ICMP message with the source address • If two diff. interfaces replie with the same source address => aliases • Ally • Used on pairs of interfaces • Sends probes to the two if. • Examines the IP-ID • Most routers generate the IP-ID using a single counter that has incremented after each packet has been created

  50. Sándor Laki (C) Geolocation by IP address Validating location hints • DNS names -> locations • Some names are incorrect… • Missnamed, reconfig, reassignment of IP addresses • Topology constraints can be used to verify location hints • RTT measurements -> upper bounds… • Clustering -> aliases • Hop latencies

More Related