Two paradigms for official statistics production
This presentation is the property of its rightful owner.
Sponsored Links
1 / 11

Two Paradigms for Official Statistics Production PowerPoint PPT Presentation


  • 44 Views
  • Uploaded on
  • Presentation posted in: General

Two Paradigms for Official Statistics Production. Boris Lorenc, Jakob Engdahl and Klas Blomqvist Statistics Sweden. Preliminaries. The talk concerns data and knowledge about external world – not data and knowledge about producing statistics (but might have consequences for the latter)

Download Presentation

Two Paradigms for Official Statistics Production

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Two paradigms for official statistics production

Two Paradigms for Official Statistics Production

Boris Lorenc, Jakob Engdahl and Klas Blomqvist

Statistics Sweden


Preliminaries

Preliminaries

  • The talk concerns data and knowledge about external world – not data and knowledge about producing statistics (but might have consequences for the latter)

  • Inspired by the different discussions on ongoing developments and initiatives within (official) statistics

  • May have certain relevance for editing

  • Naturally, the views presented herein are those of the authors, not necessarily reflecting policies of Statistics Sweden


Preliminaries cont d

Preliminaries (cont’d)

  • Transition from (many) Stovepipes to (few) Integrated System(s)

  • Among intended goals

    • better integration of administrative data and survey data,

    • better/faster response to new or changing user needs

  • How an integrated system should look like so as to satisfy these requirements

    • answer sought in the field of knowledge systems/cognitive systems


Agenda

Agenda

  • Preliminaries

  • On some distinctions and results regarding knowledge/cognitive systems

  • Consequences for representing data in Integrated systems for statistics production

  • Further considerations for statistics methodology, including some thoughts regarding editing


Knowledge cognitive systems

Knowledge/Cognitive Systems

  • Computational

    • symbolic

      • first-order predicate logic

      • other formal logic

      • etc

    • subsymbolic

      • artificial neural networks (ANNs)

      • etc

  • Other (noncomputational)

    • embodied cognition

    • situated cognition

    • socially distributed cognition

    • etc

Good for restricted domains with clear rules (e.g. chess), less good for open-world problems


Database developments

Database developments

  • Relational Model

    • RDBMS (Relational Database Management System)

      • implements first-order predicate logic

      • database schema: theory in predicate calculus

  • NoSQL

    • schema-less (theory-less)

    • examples

      • Google‘s BigTable

      • solutions underlying some functions on Amazon, Twitter, and Facebook

  • Perhaps related: Semantic Web

    • how to structure documents into a “web of data”

    • “a web of data that can be processed directly and indirectly by machines”

      • uses Resource Description Framework (rather than RDBMS)


Consequences

Consequences

  • likely requires expert assistance to users in search and requirements specification

  • likely empowers users to themselves explore available data and consider merits of requiring new data

  • Paradigm I: Stovepipe + RDBMS

    • ‘manual’ management of a fairly restricted domain

    • single-purpose use

      likely requires expert assistance to users in search and requirements specification

  • Paradigm II: Integrated system + noSQL

    • automatic building of world knowledge pertaining to the domain

    • multi-purpose use

      likely empowers users to themselves explore available data and consider merits of requiring new data


Sampling theory considerations

Sampling theory considerations

  • In the context of Paradigm II:

    • use of weights

      • what should they then reflect:

        • inclusion probabilities (if known)?

        • nonresponse information (including an assumed model)?

        • auxiliary information pertaining to specific variables to be estimated?

    • use of models

    • memorylessness vs. Bayesian statistics


Editing

Editing

  • Editing for a purpose vs. editing “without a purpose”

    • adherence to general specifications (‘concept validity’)

    • self-learning (unsupervised) tools from computer science/ANN

    • model congruence (especially building automatic models using methods from the KDD (Knowledge Discovery and Data Mining) field

    • more?


Conclusions

Conclusions

  • The distinction likely not as clear-cut as presented here, however the trend discernible:

    • transition from “manual” to automatic processing

    • potential increased need to use models

  • In building representations of “world knowledge”, in addition to RDBMS, pay attention to developments in NoSQL, Big Data, and similar

  • Perhaps strengthen work on

    • general-purpose data editing

    • automated data editing

    • model use

    • ...

      (as already advanced in several contributions to the workshop)


Thank you

Thank you

[email protected]


  • Login