1 / 20

Towards Street-Level Client-Independent IP Geolocation

Towards Street-Level Client-Independent IP Geolocation. Yong Wang, UESTC/Northwestern Daniel Burgener, Northwestern Marcel Flores, Northwestern Aleksandar Kuzmanovic, Northwestern Cheng Huang, Microsoft Research.

seanparker
Download Presentation

Towards Street-Level Client-Independent IP Geolocation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Towards Street-Level Client-Independent IP Geolocation Yong Wang, UESTC/Northwestern Daniel Burgener, Northwestern Marcel Flores, Northwestern Aleksandar Kuzmanovic, Northwestern Cheng Huang, Microsoft Research http://networks.cs.northwestern.edu

  2. Problem and Motivation • How to accurately locate IP addresses on the Internet? • Host-dependent solutions: • GPS • WiFi (e.g., Google My Location, Skyhook) • Host-independent solutions: • Server cannot always expect clients’ cooperation • Security / access restrictions • Online service access analytics • Location-based online advertising

  3. A Scenario of Street-Level Online Advertising User’s location Local Businesses

  4. Prior Work • Constrained Based Geolocation [ToN 06] Median error distance = 228 km • Measure delays from active vantage points • Topology Based Geolocation [IMC 06] Median error distance = 67 km • CBG + consider network topological information • Octant [NSDI 07] Median error distance = 35.2 km • CBG + consider router’s location, geographical and demographics information

  5. Methodology Highlights • Our methodology is based on two insights • Websites often provide the actual geographical location of associated entities • E.g., universities, businesses, government offices, etc. • Develop methods to determine if web- or e-mail servers reside at the corresponding locations • Relative network delays highly correlate with geographical distances • Absolute network delay measurements are fundamentally limited in their ability to achieve fine-grained geolocation results

  6. Institutional Network Example Web cloud-sourcing mail server to external network web server router IP subnet 550 South Hill Street Suite 890, Los Angeles, CA‎ 90013 550 South Hill Street Suite 890, Los Angeles, CA‎ 90013

  7. The Role of Relative Network Delays Measured delays: < < <

  8. A Case Study • Target IP address: 38.100.25.196 • Target postal address: 1850, K Street NW, Washington DC, DC, 20006

  9. Three-Tier Geolocation System Tier 1 Goal: Find the coarse- grained region for the targeted IP Measured delays Geographical distances Create intersection

  10. Three-Tier Geolocation System Tier 2 Goal: Use passive landmarks to determine finer-grained region for the targeted IP Populate the intersection with landmarks Estimate the delay between landmarks and the target D1 + D2 < D3 +D4 Create a new intersection

  11. Three-Tier Geolocation System Tier 3 Goal: Geolocate the target IP using passive landmarks Select the landmark with the minimum delay to the target, and associate the target’s location with it. Measured distance ∝Geographical distance 10.6 km vs. 0.103 km

  12. Remaining Issues • Verifying landmarks • Sweep-out most of the erroneous landmarks • Errors are still possible! • Resilience to errors • The larger the error – the more resilient our method is • We prove that the likelihood that an erroneous landmark will affect the accuracy is small

  13. Evaluation • Three datasets • Planetlab dataset (Academic) • Collected dataset (Residential) • Online Maps dataset (In the wild) • Factors impact the accuracy • Landmark density • Population density • Access networks

  14. Dataset Characteristics Urban areas Rural areas The three datasets cover both urban areas and rural areas.

  15. Baseline Results

  16. Landmark Density Density sequence: Planetlab > Residential > Online Maps The larger the number of landmarks we can discover in the vicinity of a target, the larger the probability we will be able to more accurately geolocate the targeted IP.

  17. The Role of Population Density The error distance is smallest in densely populated areas The error grows as the population density decreases Middle of “nowhere”

  18. The Role of Access Networks 2 km 700 meters Cable access networks (Comcast) have a much larger latency variance than DSL networks (AT&T and Verizon)

  19. Conclusions • A geolocation system able to geolocate IP addresses with more than an order of magnitude better precision than the best previous method • Our methodology consists of two components • Mining landmarks from the Web and using Web or E-mail servers as landmarks • Using relative network distances as opposed to absolute network distances

  20. Thank You http://networks.cs.northwestern.edu

More Related