330 likes | 360 Views
Understand the workings and limitations of IP geolocation technology. The presentation covers how IP geolocation providers use various publicly available data to identify a location of an IP address. It also covers various limitations of the data sources and comparisons.
E N D
IPGeolocation Demystified Understanding IPGeolocationTechnology
Let'sstartwith.. Whatisan IPAddress? AnInternetProtocoladdress (IPaddress)isanumericallabelassigned toeachdeviceconnectedtoacomputernetwork. Thisnumericallabel isusedtoidentifythesedevices, allowingfordirectcommunication. Thepublicinternetoperateswiththesameprinciples. Whenadevice connectstotheinternetitutilisesagloballyuniqueIPaddressto ensurebothinboundandoutboundcommunicationisdelivered correctly. Inthiscontext, theIPaddressactsinasimilarwaytoapostaladdress usedtodeliverconventionalmail. However, unlikethepostaladdress, anIPaddressdoesnothaveanintrinsiclocationanddoesnotexpose anygeographicalproperties. Thisiswhyyoucannotdeterminethe locationofadevicebyitsIPaddressalone.
So.. Whatisan IPGeolocation? IPGeolocationisanessentialtechnologythatovercomesthis limitationtohelporganisationsidentifythelocationoftheircustomers basedontheirIPaddresses. Organisationssuchasonlineserviceoperators, financialinstitutions, searchengines, adagenciesandanybusinessofferinganonline shopping/e-commerceexperienceareabletoprovidetheircustomers withthebestproductsandservicesavailableintheirregionwithIP geolocationtechnology. ThisIPGeolocationserviceisalsocrucialforpreventingonlinefraud, managingdigitalrights, andservingtargetedmarketingmaterialand pricing.
But.. Howaccurateis IPGeolocation? Ifyouwonderwhereyouronlinecustomersarecomingfromorwishto customiseyourclients’ onlineexperiencebasedontheirlocation, youare likelyfamiliarwithvariouscommercialIPGeolocationservices, ranging fromfreetohighly-pricedtoenterprise-only. Mostoftheseproviders declaresuperioraccuracy, althoughshowlittletransparencyonthe methodology, andpresentscarceevidencetosupporttheirclaimed accuracy. Ingeneral, validationoftheaccuracyofanIPGeolocationserviceis challengingandrequiresalargepoolofground-truthdata (i.e. vast numbersofIPaddressesfromknownlocations). Thisdataisgenerally collectedfromallactiveISPs/AS’ andisrequiredtoberandom, spreadover variousgeographicalregions. Inreality, suchdataisgenerallynotavailable, inwhichcaseanyclaimedIPGeolocationaccuracywithoutfull transparencyisquestionable. Forin-depthunderstanding, checkourblogpost: HowaccuratecanIPGeolocationget?
WHATISTHE ULTIMATEDATA SOURCE? ForIPgeolocationtechnology
Let'sunderstand.. HowIPaddressesaredistributed. 51 Million TheIPv4protocoluses32-bitaddresseswhichmakes themaximumtheoreticaladdressspacelimitedto 4,294,967,296 (2^32) IPaddresses. IPv6, thenext- generationprotocol, utilises128-bitaddresseswhich makesthepoolconsiderablylarger, butstilllimited. 26 Million DuetotheglobaluniquenessrequirementofIP Addressesacrossbothprotocols, theglobalIP addressspaceallocationisheavilyregulated. Link: https://www.iana.org/ IANA – ‘TheinternetAssignedNumbersAuthorityisafunctionof ICANN, anonprofitprivateAmericancorporationthatoversees globalIPaddressallocation, autonomoussystemnumber allocation, rootzonemanagementintheDomainNameSystem, mediatypes, andotherinternetProtocol-relatedsymbolsand internetnumbers.’ (source: Wikipedia).
Moreabout.. IANA InternetAssignedNumbersAuthority IANAisresponsiblefortheallocationoflargeIPaddress spaceblockstotheRegionalInternetRegistries (RIRs): AFRINICforAfricaRegion APNICforAsia/PacificRegion ARINforCanada, USA, andsomeCaribbeanIslands LACNICforLatinAmericaandsomeCaribbeanIslands RIPENCCforEurope, theMiddleEast, andCentral Asia RIPENCC ARIN AFRINIC APNIC LACNIC RIRs, inturn, delegateaportionoftheirallocatedaddress spacetoLocalInternetRegistries (LIRs), e.g. APNIC delegatestotheJapanNetworkInformationCenter (JPNIC). Allregistriesbothregionalandlocalallocatetheir remainingavailableaddressspacetoorganisationsseeking toutiliseitonthepublicinternet.
IllustrationofIP addressdistribution
Let'stalkabout.. Autonomous Systems (AS) Businessentities (orautonomousnetworks) thatareassignedIP addressspacefortheirownusearecalledAutonomousSystems (AS). TheymustfirstregisterasanAS, receivingaglobally-unique AutonomousSystemNumber (ASN) whichcanthenbeusedtoidentify them. TheInternetServiceProvider (ISP) isthemosttypicalexampleofan ASoperator, butitisnottheonlyone. Virtually, anyorganisation seekingtousetheirownIPaddressesontheinternetqualifiesasan AS. ItisacommonoccurrencethatASentitiesliberallyusetheir allocatedIPspaceinanymannertheywish, andmoreimportantly, in anygeographicallocationtheylike. TheycanallocateittoanyAS entity/networkwithinthesameenterpriseregardlessofglobal locationorevensubleaseittoacompletelyunrelated, geographically remoteentity. Despiteexistingregulations, thereisnowaytorestrict allocatedIPaddressspacegeographically.
TheUltimateData Source? Therefore, theonlyultimatelyaccurateIPGeolocationdataisthat whichismadeavailablebyASoperators, whoaretheonlyoneswho confidentlyknowhowandwheretheirIPaddressesareutilised. AS, however, arenotobligedtosharetheirinternaldatawithanyother entity, exceptforlawenforcementagencieswithinthedetermined jurisdictionboundaries. ExistingcommercialIPGeolocationserviceprovidersdonothave accesstoASinternaldata. Someoftheseserviceprovidersclaimthey haveintegratedserviceswithISPsorreceivedatadirectlyfromISPs. Consideringthereismorethan80,000registeredASs, ofwhichmore than60,000areactiveatanyonetime (activeASNsrankedlist), itis largelyimpracticaltoformcommercialrelationshipswithall. ReceivingthedatafromasmallnumberoflocalISPsmayimprove regionalgeolocationaccuracytoaminorextentbutisnotsufficienton aglobalscale.
WHEREDOIPGEOLOCATION SERVICEPROVIDERSGET THEIRDATA? AssumingthattheexistingIPGeolocationservicesdonothaveaccesstothe AutonomousSystems’ internaldata, theycannotbeconfidentregardingthe actualgeographicallocationoftheroutableIPaddresses. So, wherearetheygettingtheirgeolocationdatafrom?
IPGeolocationDataSources 1 5 2 3 4 WhoIs Data BGP Data Field evidence Scientific data Reverse DNS WhoIsdatabaseis nourishedbyRegional andLocalinternet Registryorganisations (RIR/LIR) thatare obligatedtokeeptheir registrationrecords public. TheBorderGateway Protocol (BGP) isaglobal internetaddressrouting directory. Therearemany additionaldatasources thatcanbeutilisedforIP geolocationwhichqualify asfieldevidencedata. Eg: datareceivedfrom userusingGPS-enabled device. Thesearescientifically deriveddatafrom calculationssuchastime- delaytodistance conversionsandothers. Themethodisbasedon DNSrecords (textual nameofthepublic internetaddresses).
WhoIsData WhatisWhoIsData? WhoIsisbyfarthemostcommonsourceofgeolocationdata. WhoIs databaseisnourishedbyRegionalandLocalinternetRegistryorganisations (RIR/LIR) thatareobligatedtokeeptheirregistrationrecordspublic. ThisinformationdisclosesallIPaddressesregisteredforeachentitythey belongto, includingindependentorganisationsorISPs. IPGeolocation servicevendorscanobtainthisregistrydatausingRIRwebsitesandAPIsor canrequestbulkaccesstothedata. Examplesite:
WhoIsData Whatdataisavailable? Thisdataisusuallyupdatedonadailybasisandincludeasetof registrationdata. ThisregistrationdatacontainstheIPaddressblock recordsandwhichorganisationstheyareregisteredunder. Itmayadditionallycontainastreetaddressorthenetworklocation coordinates, althoughnoneofthegeographicalpropertiesis mandatory. Furthermore, theserecordsaremaintainedbytheregisteredparty andarenotvalidatedbyanyexternalbody. Thismeanstheaccuracy ofthedataisquestionableevenwhenitismadeavailable. ScreenshotofexampleWhoIsdatafromARIN'swebsite.
WhoIsData HowaccurateisWhoIsData? Therearearound10millionrecordsintheglobalWhoIsdatabaseforIPv4 alone, someofwhichcanserveasaveryaccurateIPGeolocationsource. Forexample, asmallinternetCafewithastaticIPaddress (orasmallrange ofaddresses) usedon-premisesandrecordedintotheRIRdatabaseinclusive ofitsphysicaladdress. Thisscenarioexposesaccurategeolocation informationwithaprecisionuptoastreetaddress. Inmostcases, whenan organisationreportsincorrectoroutdatedinformation, oroutsourcesthe registeredaddressblockstoanotherparty, therecordswillnotrevealtheIP usagelocation. Therefore, IPGeolocationbasedonWhoIsdatabaseonlyislargelyinaccurate aswhole.
BGPData WhatisBGPData? TheBorderGatewayProtocol (BGP) isaglobalinternetaddress routingdirectory. Thisisastandardisedexteriorgatewayprotocol toexchangeroutinginformationamongstactiveAutonomous Systems (AS) ontheinternet. BGPinvolvestheannouncementof preferredpathwaysanddirectionofinternetaddressblocks (prefixes). Forinstance,ifIneedtosendapackettodestination ‘A’,butIonly knowhost ‘C’ andcanforwardtraffictoit. Thepacketwillstillreach thedesireddestination ‘A’ if ‘C’ knows ‘A’ eitherdirectlyorviaother intermediatepeers. Inanutshell,thisishowglobalinternet connectivityworks. C WhenanASentitywishestouseanIPaddressrangeonthepublic internet, ithasto ‘announce’ ittotheclosestpeers. Insimple words, itsendstheannouncementthatmeans: “I’mresponsiblefor thatrange (prefix), sowhoeverwishestocommunicatewitha deviceinthatrange, directthecommunicationthroughme”. A Thisannouncementeventuallypropagatesacrossallotherpeers worldwidetoinformthemonhowtosendtraffictothatIPaddress rangeifrequired. B
BGPData HowisBGPDatausedforIP geolocation? Now, howthiscanbehelpfulforIPGeolocation? Firstly, unlikethe WhoIsdatawhichshowstheorganisationregisteredagainsta particularIPaddressblock, BGPdatacanrevealwhoisactually usingit. Thisisnotalwaysthesameenterpriseentityaswe discussedabove. If, forexample, wewitnessablockregisteredwithARINforan AmericancompanywithaUSstreetaddress, butisbeingusedby ASregisteredwithRIPEinTurkey, thissuggeststhattheIPblockis likelybeingusedinTurkey, whichimprovesgeolocation. Secondly, theBGPdatacanalsorevealwhataddressesarenotusedatall, an unannouncedspace, withwhichageolocationprocessshouldnot evenbeattempted.
BGPData CoverageofIPgeolocationservice? TheIPaddressisnotaphysicalobjectinaphysicallocation. Itissimplya numericallabelthatcanbeallocatedandunallocatedfromindividual devicesornetworks. Thereisnowaywecangeolocatealabelthatisnotin use (allocated). Therefore, whenyourIPGeolocationserviceprovider statesitcangeolocate100% oftheaddressspace, pleaseinterpretthis withcautionasitcanonlygeolocatetheannounced (routable) spaceat most. TheroutablespaceforIPv4canbemonitoredontheIPv4Address SpaceReport. SomeotherusagesofBGPdatarelyontheassumptionthatIPaddresses belongingtocertainprefixesaremeanttosharegeographicalproximity. This, however, doesnotalwaysholdtrue. Prefixestendtoaggregatealong thewayandmayincludeaclusterofseveralsmallerprefixesthatoriginate fromdifferentregions.
Fieldevidencedata WhatisFieldEvidenceData? TherearemanyadditionaldatasourcesthatcanbeutilisedforIPgeolocation whichqualifyasfieldevidencedata. Thebestexampleisthedatareceiveddirectly fromusersorsubmittedusingGPS-enableddevices, suchasmobilephonesor tablets. Thisdatacanrevealtheallegedgeographicalcoordinatesofadeviceusing apublicIPaddressandcanserveasempiricalevidenceorground-truthdatafor thatparticularIPaddressatthatparticularmomentintime. Othersourcesinclude: eCommerceoriginateddatasources/feeds, suchasbilling/shippingaddressof thecustomerwhencombinedwithanIPaddressusedforthetransaction; IoTdeviceswithknownlocationsandIPaddressesanddevicepools, either publiclyavailableorproprietary, forexample, theRIPEATLASproject; and voluntarilyorcommerciallyobtainedgeolocationdatafeedssuchasSelf- publishedIPGeolocationData.
Fieldevidencedata LimitationofFieldEvidenceData. Thereare2importantprinciplesassociatedwiththefield evidenceIPGeolocationdata: Thedataisalwayslimited, asitisimpracticalforoneentity toaccessallinternet-connecteddevicesaroundtheworld. ThismethodidentifiesIPlocationataspecificpointintime only, andispronetoerrors. Noteverythingcanbetrusted aspureandreliableevidence. Devicemisconfigurationor faultsandnetworkredirectionssuchasVPNorPROXYs alongthewayaresomeofmanydatainaccuracyscenarios thatcanoccurduringthedatacollectionprocess.
Scientificdata WhatisScientificData? Overtheyears, manyattemptshavebeenmadetointroducean additionalactivemeasurementapproachtoIPGeolocationsolutions. Mostoftheseapproachescomefromtheresearchontime-delayto distanceconversions, suchastriangulation, downtotheclosestpoint ofpresence (POP) ofnetworkinterfaces (routers). However, globalnetworktrafficinterfaces (publicrouters) are complex, withtheassumptionthattime-delaybetweentwo consecutiveinterfacesisproportionaltothephysicaldistance betweenthemisincorrect.
Scientificdata LimitationsofScientificData. SomelargeISPsmaketheirinternalsubnetshidden. Therefore, many intermediatenodesarenotpubliclyvisibleandcannotbeaccounted for. Practicalnetworkconsiderationsarebasedon ‘leastcost’ routing, whichisdifferentfromacommonacademicassumptionoftheshortest one. DuetoQualityofService (QoS) considerations, somenetwork interfacescanalsobeprogrammedtoartificiallydelaynon-productive traffic. Therefore, therelationbetweentime-delayanddistanceisinconsistentand cannotlaythefoundationforoverarchingprinciples. Todate, noneofthe methodsbasedontime-delaytriangulationtheoryhasbeenintroduced intotheserviceandisunlikelytoemergeforglobalcommercial implementation.
ReverseDNSdata WhatisReverseDNSdata? TheDomainNameSystem (DNS) isthephonebookoftheinternet. Usually, DNSis usedtotranslateadomainnametoanIPaddress, sothebrowserscanload Internetresources. However, itcanalsoworkinreverseorder, youcanqueryDNS aboutwhatdomainnamerecordisattachedtoanIPaddress. ThistextualrecordassociatedwithanIPaddressisnotmandatory. Itishardlyof anyutilitywhentheaddressisnotinvolvedinpublishinginternetservicesor consumablematerial. However, someISPsmayusethistextualtaggingopportunity tomarktheirIPaddressesforsomeinternalpurposes. SomeoftheDNSentriescanbepotentiallyusedtorevealgeographicalproperties. Forexample,ifthetargetaddressorthelastrouteralongthewayislistedon DNSasanentry:p1-0-0.sanjose1.br2.bbnplanet.net,itsuggeststhattheIP addressislikelylocatedinSanJose,California. Thismethodshowsanadd-on benefitforlocatingareaswithinterpretableDNSnames. Theonlyknowncommerciallyutilisedscientificapproach hasbeenintroducedbyDigitalEnvoy,Inc,protectedby USpatent (6,757,740) grantedin2004. Themethodis basedonDNSrecords (textualnameofthepublic internetaddresses) andcrawling (tracert) totheclosest routerinanattempttoidentifythecityandcountryof thehost.
ReverseDNSdata LimitationofReverseDNSdata? Unfortunately, thereverseDNS-basedapproachsuffersfromseveral limitations: 1. 2. 3. ManyinterfacesdonothaveanassignedDNSname; Themisnamingofaninterfaceresultsinincorrectlocation; Citynamescanoftenberepetitiveacrossdifferentcountriesor territories, i.e. SanJoseCitycanalsobefoundinbothCostaRica andinCalifornia, US; Thelackofuniversallyacceptedrulesandnamingregulations meansrecordsrequiremanualprocessing, whichistime- consumingandpronetoerrors. 4.
THEARTOFGUESSING TheIPGeolocationserviceproviderscanobtaintheirdatafrommultiple sources, althoughnonecanserveasanultimateandundoubtablesourceof truth. Whendataismutuallysupportive, i.e. multiplesourcesindicatethevery samelocationforanIPaddress, itisanobrainer. Often, however, thedata receivedisverycontroversial, andthisiswherethetrickypartlies. WefrequentlyhearpeoplesaythatIPGeolocationispartscience, partart. Well, hereistheartpart. Theartofguessing! Let’strytoseewhatyour averageIPGeolocationserviceproviderisdealingwith.
Challengeswith.. IPGeolocation Imaginewe’vegotthefieldevidence, suchasauser-submitteddatasample, suggestingthattheIPaddressX.X.X.5wasusedtodaysomewherein Manhattan, inthecentreofNewYorkCity, NY, US. TheWhoIsdataforthataddressrevealsthattheblockX.X.X.0 - X.X.X.255 (wheretheabove-mentionedaddressbelongsto) isregisteredforabusiness ‘Y’ locatedinOntario, California, US. TheBGPdatasuggeststhatthataddresshasbeenannouncedbyanAS entity ‘D’, registeredasoperatedfromAustin, Texas, US. Andtheprefix sizewas /22 (1024hosts). So, whereistheactuallocation? CanonesaythatX.X.X.0 - X.X.X.255 blockislocatedinNY? Ormaybeevenentire /22prefixisinNYtoo? MaybetheX.X.X.5istheonlyoneinNYandothersarenotevenclose? Ormaybethesampledatawe’vegotiswrongandtheactuallocationforall isinOntario, CaliforniaorevenTexas? Thefinalconclusiondependsonwhichdatasourcecanbetrustedthemost. Consideringtherearelimitedtoolstoprioritisedatasources, theexistingIP Geolocationserviceprovidersoftenendupguessing. 1. 2. 3. Theirmotto: Anyguessisagoodguess!
Furtheranalysis 51 Million Challengeswithdatasources Ifwehappentoobtainmoreevidencedatapointsfromnearbyaddressentries, itwouldlikelyimproveourconfidence, butonlyifthedatasupportoneofthe leadingguessoptions. 26 Million However, ifthedataiscontroversial, itcanmakegeolocationestimation extremelychallenging. Whatifwehavefurtherevidencefromaddress X.X.X.128fromToronto, Canada, datedjustacoupleofdaysbefore? Would thisaddresshavemovedfromCanadatotheUSrecentlyorjustapartofthe blockorarewefacinganerrorsomewhere? Thisisanothercomplexissue – datagranulation. IPaddressesareusually deployedinblocks. Thelargerblocksarebetterforglobalrouting. Ifblocks aretoosmall, theworld’sroutingtablesubstantiallyexpandsandtherouters caneventuallyfacememoryoverflowerrors. Therefore, IPGeolocation servicescanlogicallyassumethatsomeconsecutivesequencesofIPaddresses arelikelytosharereasonablegeographicalproximity.
Furtheranalysis 51 Million Complexityofreducingerrors However, definingtheactualblockIPaddressboundariescanbetrickyandofteninvolves aseriesofeducatedguesseswhichmayrequireinterventionfromthehumanoperators. 26 Million Forexample, onecanfindsimilaritiesinthereverseDNSentriesfortheblockmember addressesthatpossiblysuggestthesamenetwork. Also, IPaddressescanbetracerouted whilelookingforcorrelationsbetweenthehostIPaddressesthatparticipateinthepacket delivery. Whicheverwayischosen, itiscommonlypronetoerrors. DNSentriesarenotalwaysavailableorcanbewrong. Traceroutedoesnotalwaysrevealallthehostsinthedeliverypath, assomeare simplydonotrespondtoICMPrequests. Perfecthostcorrelationisnotalwayspossible, asnetworkroutersoftenuseseveralIP addressportsforthesamerouterdevice. Theymayappeardifferentonatraceroute butinrealityarethesame, whichmayalsoleadtoanerror.
SOHOWDOIP GEOLOCATIONSERVICE PROVIDERSOPERATE? Let'sunderstandhowvariousIPgeolocationserviceproviderswork.
Entry Level Theentry-levelIPgeolocationprovidersarelikelytousefewerdata sources, largelyusingWhoIsdataonly, whichlimitstheirdecisionscope tomuchfeweroptions. Thismakestheprocesseasierandmaybe faster, butasatrade-off, itismuchlessaccurate. Comparison Advanced Level ThemoreadvancedIPGeolocationproviderspresumablyworkhardto organiseandimprovetheirresultsbydelegatingmanyofthefinal decisionstoahumanpersonal, inadditiontosomelow-levelautomated process. Unfortunately, manualworkdoesnotguaranteebetterresults, ashumansarealsopronetoerrors, anddefinitelymakestheprocess slower. Asaresult, weoftenseecommercialIPgeolocationdatabases updatedonamonthlybasisonly, orweeklyasthebest.
Conclusion TheIPaddressspaceisaverydynamicarea. MillionsofIPaddresseschanginghandsorare reallocatedcontinuouslyeveryhour. Thereforemonthlyorweeklyupdatesarecertainlynot suitableformostIPgeolocationapplications. Insummary, noneofthecurrentlyexistingmethodsissufficientlyaccurate. Eventhougha combinationofmethodsallowsformorepreciseestimationofIPlocation, thisdoesnotsolve theproblemofaccuracyonaglobalscale. 26 Million Moreover, thelackofafullyautomatedanddeterministicmethodologypreventsexistingIP geolocationdatabasesfrombeingupdatedfrequentlyenoughtocopewiththehighlydynamic natureoftheinternetIPaddressspace. TofindouthowBigDataCloud'sIPGeolocationservicediffersfromexistingproviders, checkoutourdetailedblogpost: TheNextGenerationIPGeolocationService.
EmailAddress Contact us support@bigdatacloud.com Website Reach out if you have any questions or clarifications www.bigdatacloud.com
Formorecontentrelatedto IPgeolocation, visitour website. BigDataCloudPtyLtdisahighlyinnovativestart-up companyfoundedin2018andoperated internationallyfromourheadquartersinAdelaide, SouthAustralia. Afteryearsofpreviousexperience ine-commerce, fraudprotectionandtargeted internationalmarketing, theBigDataCloudfounders identifiedanimmenselackofhighquality, fastand affordableAPIswithinthisandothertechnical industries. Formoreinfo, visit: www.bigdatacloud.com