170 likes | 263 Views
Explore the potential of internet robots for official statistics data collection, applications in airline tickets, housing market trends, and clothing prices. Discover the benefits, challenges, and implications of using internet data alongside administrative sources.
E N D
On the use of internet robots for official statistics Olav ten Bosch MSIS, Dublin, 14-16 April 2014
Overview • Why internet as a data source (IAD)? • Internet robots, how do theywork? • Applications: • Airline tickets • Housing market • Clothing • “Robot assisted data collection” • Conclusion
Why IAD? (1) Internet sources Faster, better, more efficient New indicators Less!!! Administrative sources Tax, social security services Municipalities/ Provinces Supermarkets Surveys
Why IAD? (2) Internet sources Which content is original, reliable, stable, representative and accessible? Internet prices for CPI ? Real estate sites for housing statistics ? Internet vacancies for job statistics ? Social media sentiment for consumer confidence ? Trade in second-hand goods as economic indicators ? Travel activity for tourism statistics ?
Robots / crawlers / bots / spiders / scrapers: how do theywork? (1) Internet Requests Graphical markup Website Commands code, images, style, data, etc. Browser You
Robots / crawlers / bots / spiders / scrapers: how do theywork? (2) Navigation Internet Requests Website code, images, style, data, etc. Robot/ spider/ crawler You Data
Robots / crawlers / bots / spiders / scrapers: how do theywork? (3) Generic software for: - site navigation - product details - monitoring Navigation Agile Internet Requests Website code, images, style, data, etc. Robot/ spider/ crawler Monitor actively Data Data Data Data Data
Airline tickets (1)Robot collection versus manual collection
Housing market (2)Dynamics of the ‘database behind’ becomesvisible
Clothing (2): 2 sites: veryvolatile data • Challenges: • from volatile data to stable statistics • how to classify multiple less structured • data sources Seasonal pattern
Robot-assisted data collection (1) • Use case: few priceobservations on many sites • Example: price of a cinema ticket • “Robot tool” toautomatically check ifprices are changed
Conclusion • Using internet as a datasource we can measure statistical phenomena in a completely different way • It is powerful to combine fast internet data with reliable (but slower) administrative data • We should redesign statistics with the possibilities of internet data in mind Challenges: • Legal framework • The internet changes continuously: howto turn volatile data sources intoreliablestatistics? • We needadvancedstatisticalmethods, processesand IT