Linked open government data what s next
Download
1 / 34

Linked Open Government Data: What’s Next? - PowerPoint PPT Presentation


  • 91 Views
  • Uploaded on

Tetherless World. Linked Open Government Data: What’s Next?. Li Ding , James A. Hendler, and Deborah L. McGuinness With thanks to the entire RPI Tetherless World LOGD team: logd.tw.rpi.edu particularly John Erickson, Tim Lebo, Dominic DiFranzo;, Alvaro Graves;

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Linked Open Government Data: What’s Next?' - kuniko


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Linked open government data what s next

Tetherless World

Linked Open Government Data: What’s Next?

Li Ding, James A. Hendler, and Deborah L. McGuinness

With thanks to the entire RPI Tetherless World LOGD team: logd.tw.rpi.edu

particularly John Erickson, Tim Lebo, Dominic DiFranzo;, Alvaro Graves;

Gregory Williams; Xian Li; James Michaelis; Jin Zheng; Zhenning Shangguan; Johanna Flores, Evan Patton

Tetherless World Constellation, Rensselaer Polytechnic Institute

SemTech 2011 San Francisco June 7, 2011

DATA GOV


Outline
Outline

  • Open Government Data

  • Linked Open Government Data

  • Challenges and Opportunities

  • Future Directions


Open government data

Open Government Data:

Government data is already available and open on the Web and is growing.

Let’s create mash ups to expose more value.

?


Opening government data
Opening Government Data

“Openness will strengthen our democracy and promote efficiency and effectiveness in Government.”

--- President Obama (Jan 2009)

“if people put data onto the web -- government data, scientific data, community data, whatever it is -- it will be used by other people to do wonderful things, in ways that they never could have imagined.”

-- Tim Berners-Lee (Feb 2010)

Linked Data and Semantic Tech are key enabler!

Source: http://www.whitehouse.gov/open, http://www.ted.com/talks/lang/eng/tim_berners_lee_the_year_open_data_went_worldwide.html


International open government data a great opportunity
International Open Government Data: A Great Opportunity

  • 13 Other nations establishing open data

  • 24 States now offering data sites

  • 11 Cities in America with open data

  • 236 New applications from Data.gov datasets

  • 258 Data contacts in Federal Agencies

  • 308,650 Datasets available on Data.gov

  • Open Government Data (OGD)

    • A public asset (collected by government) with a large amount of high value data and wide domain coverage

    • An international mandate for government transparency, business applications, citizen participation, and etc.

Deployment Status (source: Data.gov)

Source: http://www.data.gov/


Challenges from raw open government data
Challenges from Raw Open Government Data

Data in proprietary formats

Independent curators

Distributed and unlinked Data

Limited Participation

Smoke rate

(Impacteen.org)

Policy coverage

(NCI)


Linked open government data

Linked Open Government Data

TWC: Tetherless World Constellation at Rensselaer Polytechnic Institute logd.tw.rpi.edu

LOGD: Linked Open Government Data



The tetherless world constellation linked open govt data portal
The Tetherless World Constellation Linked Open Govt Data Portal

TWC LOGD

Convert

Query/

Access

LOGD

SPARQL

Endpoint

Community Portal

  • RDF

  • RSS

  • JSON

  • XML

  • HTML

  • CSV

Create

Enhance

Data.gov deployment


Linked open government data1
Linked Open Government Data Portal

A Linked Open Government Data (LOGD)ecosystem is a Linked Data-based system where stakeholders of different sizes and roles find, manage, archive, publish, reuse, integrate, mash-up, and consume open government data in connection with online tools, services and societies.


Moving data gov to linked data us
Moving data.gov to linked data (US) Portal

  • Third parties (like RPI) translate the government datasets into linked data formats

  • • US Data.gov hosts 6.4B RDF triples 5/21/2010

    • acknowledges Semantic Web as a key technology for open government data


Government data within the ld cloud
Government Data within the LD Cloud Portal

Government Data is currently over ½ the cloud in size (~17B triples), 10s of thousands of links to other data (within and without)

http://linkeddata.org/


Twc logd 50 demos in many domains using various technologies
TWC LOGD: 50+ Demos in Many Domains using Various Technologies

Technology

  • Semantic Web

  • Semantic CMS

  • Semantic Search

  • Social network

  • NLP

  • Mobile

  • Visualization

  • Provenance

  • Domain

  • Health

  • Finance

  • Politics

  • Society

  • Economy

Web n-grams


Selected twc mashups

Selected TWC Mashups Technologies


Popscigrid with nih nci northwestern
PopSciGrid with NIH/NCI & Northwestern Technologies

Trends in Smoking Prevalence, Tobacco Policy Coverage and Tobacco Prices (1991-2007)

Aimed at conveying complex health-related information to consumers and health decision makers

  • Diverse datasets from NIH

  • Uses lightweight semantic technologies to produce mashups that make data accessible that would be otherwise difficult to view in perspective

  • Maintains provenance about data and manipulations

  • Two-way communication: Feedback users’ comments to gov contacts (e.g. %)


Popscigrid workflow
PopSciGrid Workflow Technologies

Publish

Integrate

Convert

Visualize

Ban coverage

derive

derive

create

Enhance


The abstract logd workflow
The Abstract LOGD Workflow Technologies

End

User

Visualize

Mashup Workflow

(Conventional OGD)

Publish

Mashup

Visualize

Mashed

Data

Integrate

Developer

LOGD

Enhance

  • Usability of LOGD

  • Interoperability

  • Scalability

  • Provenance

Convert

RAW

OGD

Gov

Agency

Publish


Challenge interoperability
Challenge: Interoperability Technologies

TBL’s 5-star Deployment Scheme for Linked Data

Syntactic

  • Extract entities from HTML tables

  • Parse Excel tables

    Semantic

  • Does “Georgia” refer to a US state or a country?

  • Is “2000” calendar year, fiscal year or dollar amount?


Mashing up data from different countries
Mashing up data from different countries Technologies

http://data-gov.tw.rpi.edu/demo/USForeignAid/demo-1554.html


Even if not rationalized together
Even if not “rationalized” together Technologies

Build ontology mapping

based on shared terms

“Economic”


Enhance interoperability using linked data drill down contextual knowledge
Enhance interoperability using Linked Data: drill down contextual knowledge

  • Identity : URI

  • Context

    • Description: metadata, esp. type & datatype

    • Mapping (linking identities)

      • Syntactic

        • Common string name

        • Common URI

      • Semantic

        • Complex Object: attributes + context (siblings)

        • Ontological Mapping: e.g., owl:sameAs

        • Rule-based Mapping: e.g. mapping “Liter” to “Gallon”


Scalability factors in logd deployment
Scalability factors in LOGD deployment contextual knowledge

  • Large number of OGD datasets

    • 6k+ Data.gov.uk

    • 200k+ Data.gov

    • 323k+ International OGD datasets

  • Non-trivial human workload: clean-up syntax, enhance semantics, integrate datasets, visualize resulting data …

  • Substantial computing workload: running time of complex tasks, memory and disk space, maintenance costs …


International catalog
International catalog contextual knowledge


Scalability issues in the international open government dataset catalog
Scalability issues in the International contextual knowledgeOpen Government Dataset Catalog

Crawled 40+ different dataset catalogs from 19 countries

“non-trivial customized programming workload”

Social Aspect

International

Open Government

Dataset Catalog

Computing Aspect

Searching 323,304 datasets“Complex SPARQL query got timeout”


Social aspect distribute human workload to the right developers
Social Aspect: Distribute human workload to the right developers

Software Engineers

Genus

Students

Visualize

Knowledge

Engineers

Convert

Application Development Expertise

Combine

Enhance

Scientists,

Experts

Decompose workload to fine-granular jobs

Leverage a wider range of developers

Layman

End Users

Publish

Domain Expertise

Joint work with Alvaro Graves, PhD student at RPI


Computing aspect fit computing power to logd deployment
Computing Aspect: fit computing power to LOGD deployment developers

  • Scale up for more government data

    • Support collective incremental data processing

    • Support large scale data analysis: graph connectivity, complex pattern/hypotheses discovery

    • Map repetitive developers’ workload to automated tools

    • Reduce service maintenance costs

  • Scale down for wider range of end user apps

    • Limited computing power, e.g., mobile devices

    • End users’ cognitive constraints, e.g., screen-size, executive summary


Provenance
Provenance developers

  • Provenance-aware frameworks are needed to support transparency, appropriate attribution, and ultimately trust of any kind of open data.

  • Versioning and persistence are important factors to sustainable applications

  • Workflow provenance can help increase understanding and trust since it can be used to explain behavior and dependencies of intelligent systems


Attribution in popscigrid
Attribution in PopSciGrid developers

  • Example scenarios

  • List direct/indirect contributors

  • End users send feedback to curators

  • Curators learn usage of datasets

  • List demos by technology

demo

logd:uses_technology

logd:uses_dataset

dcterms:contributor

technology

conversion

person

void:subset

version

agency

void:subset

State-wise Tobacco Policy coverage stats

dcterms:publisher

dataset


Twc semantic water quality portal
TWC Semantic Water Quality Portal developers

Aimed at helping people investigate local water quality

  • Diverse datasets, regulations, datatypes

  • Uses lightweight semantic technologies to produce mashups that make data accessible that would be otherwise difficult to view in perspective

  • Maintains provenance about data and manipulations

  • Exposes unexpected uses of data (and thus unexpected usage patterns)




Challenges revisited
Challenges Revisited developers

  • Interoperability

    • Syntactic: Linked Data, RDF

    • Semantic: ontology, evolving

  • Scalability (9.9 billion triples on the TWC LOGD)

    • Effective Social platform for task dispatching

    • More automations, e.g., data cleaning, and linked detection

    • Scalable tools, esp. SPARQL endpoint

  • Provenance

    • Accountability: Privacy, licensing, trust

    • Credit / Blame

    • Replicate applications and transfer system building knowledge

  • More issues

    • Persistent data access for changing data


Summary
Summary developers

  • The Open Government data is a key resource

    • Many governments releasing data, growing number in structured form

  • Government (and general data) transparency comes through in the “mashing up” of data from many sites maintaining (and exposing) provenance

    • Key to linked data

  • While there has been tremendous progress, many challenges remain

    • Trust, Provenance, Scaling, Interoperability, Archiving, Curation, …

  • The Research agenda for linked government data is an important driving area for semantic technologies


Questions
Questions? developers

The work presented in this talk was primarily conducted at the Tetherless World Constellation at Rensselaer Polytechnic Institute.

Comments / Questions:

[ dingl | dlm ] @ cs.rpi.edu.

Events:

Open Linked Govt. Data Symposium: submission deadline June 15 http://tw.rpi.edu/web/event/AAAI/2011/Fall_Symposium_OGK

TWC / Elsevier Hackathon: June 27-28

http://tw.rpi.edu/web/event/TWCElsevierHackathonJune2011

Reference: Li Ding, Timothy Lebo, John S. Erickson, Dominic DiFranzo, Gregory Todd Williams, Xian Li, James Michaelis, Alvaro Graves, Jin Guang Zheng, Zhenning Shangguan, Johanna Flores, Deborah L. McGuinness and Jim Hendler, TWC LOGD: A Portal for Linked Open Government Data Ecosystems, submitted to JWS, special issue on semantic web challenge’10


ad