big data vs official statistics n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Big Data .vs. Official Statistics PowerPoint Presentation
Download Presentation
Big Data .vs. Official Statistics

Loading in 2 Seconds...

play fullscreen
1 / 15

Big Data .vs. Official Statistics - PowerPoint PPT Presentation


  • 105 Views
  • Uploaded on

Big Data .vs. Official Statistics . Directors General of the National Statistical Institutes Meeting 25~27 September 2013/Hague, Netherlands. Yu gyung Kang Director, Statistical Information Portal Division Statistics Korea. Contents. Technology Assessment (TA) in Korea

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Big Data .vs. Official Statistics' - sunila


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
big data vs official statistics

Big Data .vs. Official Statistics

Directors General of the National Statistical Institutes Meeting

25~27 September 2013/Hague, Netherlands

Yugyung Kang

Director, Statistical Information Portal Division

Statistics Korea

contents
Contents
  • Technology Assessment (TA) in Korea
  • Big Data Use in Private Sector
      • Market Analysis
      • Suicide Warning System
  • On-going Projects by KOSTAT
      • Pilot Project for Mining and Manufacture Survey
      • E-household Account System
      • Pilot Project for Price Statistics
  • Future Challenges

1

technology assessment 1
Technology Assessment (1)

…Conducted by MSIP of Korea in 2012, under the Article 14

of the Framework Act on Science and Technology

  • What is big data?
    • Data with 3Vs characteristics + Data Management Technology
    • * Gartner’s 3Vs : Volume, Variety and Velocity

Unstructured Data

Structured Data

Low speed

(hours to

weeks)

GB/TB

Messages

Video

Music

PB

EB

ZB

High speed

(mins. to

seconds)

Customer Data

Sale Data

Stock Data

Finance Data

BBS

SNS

GPS

…….

2

technology assessment 2
Technology Assessment (2)
  • Expected Impact

3

technology assessment 3

Policy Recommendations

Localize Core Technologies related to big data through gov’t-led R&D

Establish Legal and Institutional Basis for standardization of managing, sharing and trading big data

Foster pool of Big Data Analysts and Experts through interdisciplinary undergraduate and graduate programs

Take a Step-By-Step Approach by Setting Priorities in the sectors where benefits to the public will be visible.

Make Strategies to Protect Privacy

Technology Assessment (3)

4

big data use in private sector

Case 1 : Market Analysis by

Big Data Use in Private Sector

Which Business would you like to open?

X

5

big data use in private sector1
Big Data Use in Private Sector

Case 1 : Market Analysis by

Real Estate 411

Business

Cycle

Real Estate

Sales Information

Consumer

Type

Korean Statistical Information Service

Floating Population

6

big data use in private sector2
Big Data Use in Private Sector

Case 2 : Suicide Warning System

Why not

Suicide forecast?

Weather Forecast

  • social factors
  • weather factors
  • Werther Effect
  • personal emotion

OECD (2012), OECD Health Statistics

7

big data use in private sector3
Big Data Use in Private Sector

Case 2 : Suicide Warning System

  • Training Set (2008-2009) & Test Set (2010)
    • Total number of suicide incidents
    • Economic and weather data
      • CPI, unemployment rate, KOSPI(Korean Composite Stock Price Index), daylight hours and temperature
    • 150 million posts from about 5 million blogs on NAVER(incl. SNS posts)
      • Var1 (# of posts including “suicide”),
      • Var2 (# of posts including “dysphoria”, “be tired”, “be painful”, or “be exhausted”)
  • Model
    • Dependent Variable : No. of suicide in a given period(3 days)
    • Independent Variables
      • CPI, unemployment rate, KOSPI, daylight hours, temperature
      • Two variables obtained from the Posts
      • Celebrity suicide (control variable)
      • No. of suicide from the previous period

8

what should nsos do
What should NSOs do?

scientifically collected data

.vs. huge amount of data

Big Data

Sample Surveys

Challenge!

  • Quantity beats quality
  • Lack of representativeness of target population
  • MORE TIMELY
  • Data already there

Established theoretical basis

Representativeness of target population

Relatively slow

Expensive data collection

9

kostat tried
KOSTAT tried…

Pilot Project

Seminars

  • December 2012~April 2013
  • A pilot project on the use of big data in the process of editing existing national statistics
  • Using media data for examining outliers when producing the Index of Industrial Production(IIP)

October 2012~March 2013

Organizes seminars once or twice a month inviting outside big data experts

Aims to raise awareness of big data and its impact on producing official statistics

10

kostat is doing
KOSTAT is doing…

1. E-Diary System(household Account System)

  • Currently about 48.5% of sample household adopted the e-Diary system
  • Respondentscan import their expenditure information through online transactions from the banks, credit card companies and major retail stores.

using big data for the convenience of respondents

11

kostat is doing1
KOSTAT is doing…

2. Pilot Project of Price Index

Please select specific domains(or items) that can clearly show difference between big data and existing statistics

i.e. TV or electronic products

Prof. Roberto Rigobon

KOSTAT is currently preparing for a pilot project on compiling price index using big data for a specific manufacturing product.

12

future challenges
Future Challenges

Can we ignore Big data just because of its representativeness issue in spite of its strengths like timeliness?

Can KOSTAT disallow over 380 statistical agencies to produce official statistics with big data?

Maybe Not!

Shall make use of big data in producing statistics at some point in the future as it was the case with transition to administrative data from survey data.

Need to identify the limitations of big data through pilot projects and learn techniques and know how to refine big data based statistics for official statistics.

13