Data sets vocabularies and tools
This presentation is the property of its rightful owner.
Sponsored Links
1 / 27

Data Sets, Vocabularies and Tools PowerPoint PPT Presentation


  • 53 Views
  • Uploaded on
  • Presentation posted in: General

Data Sets, Vocabularies and Tools. Pablo N. Mendes Freie Universität Berlin 1st year review Luxembourg, December 2011. Work Plan View WP4. 24. 12. 0. 6. 18. 30. 36. 42. 48. D4.1 Assembly and maintenance of the PlanetData data set catalogue. D4.2 Best practices on how to provide

Download Presentation

Data Sets, Vocabularies and Tools

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Data sets vocabularies and tools

Data Sets, Vocabularies and Tools

Pablo N. Mendes

Freie Universität Berlin

1st year review

Luxembourg, December 2011


Work plan view wp4

Work Plan View WP4

24

12

0

6

18

30

36

42

48

D4.1 Assembly and maintenance of the PlanetData data set catalogue

D4.2 Best practices on how to provide

self-describing data

Task 4.1

Assembly and maintenance of the PlanetData data set catalogue

FUB

D4.3 PlanetData data sets, vocabularies and provisioning tools catalogue and access portal

Task 4.2

Community-driven creation and maintenance of vocabularies

KIT

Task 4.3

Development of best practices for providing self-describing data

D4.4 Data quality benchmark dataset

KIT

D4.5 PlanetData data sets, vocabularies and provisioning tools catalogue and access portal

Task 4.4

Assembly and maintenance of a catalogue of data provisioning tools

UPM


Work plan view wp5

Work Plan View WP5

24

12

0

6

18

30

36

42

48

D5.1PlanetData data management tools

catalogue and access portal

D5.3 PlanetData data management tools

catalogue and access portal

Task 5.1

Assembly and maintenance of PlanetData technology catalogue

EPFL

D5.2 Best practices on how to deploy tools on large-scale infrastructures

D5.3 PlanetData data management tools

catalogue and access portal

Task 5.2

Development of best practices of large-scale data management infrastructures

KIT


Summary

Summary

  • WP4

  • Assembly and maintenance of the PlanetData data set, vocabularies and tools catalogue;

  • Community-driven creation and maintenance of vocabularies;

  • Development of best practices;

  • WP5

  • Assembly and maintenance of the PlanetData technology catalogue;

  • Best practices for large-scale data management infrastructure;


Deliverables in year 1

Deliverables in Year 1

  • D 4.1

  • Data Sets Catalog

  • Vocabularies Catalog

  • D 5.1

  • Data Management Tools Catalog


Data sets catalog

Data Sets Catalog

  • Where to maintain the catalog?

  • How to catalog?

  • What to catalog?

  • How to provide access for humans and machines?

  • How to organize a community around the catalog?


Repository thedatahub org

Repository: TheDataHub.org

  • Maintained by Open Knowledge Foundation (OKF) and world-wide open data community

  • Widely used catalog

    • Dec 1st 2012: has 2418 datasets, 314 LOD

  • Features of the portal:

    • Tagging, Rating, Feedback, Discussions, Groups


Cataloguing process

Cataloguing Process

  • Planet Data Editor

    • Collected a list of new datasets → 49 new entries

    • Updated existing entries (537 edits)

  • Crowdsourcing: data providers and third parties

    • Public call for action to mailing lists, OKFN blog

    • Supported the community contributions

  • Quality Assurance

    • Tools to support cataloguing (validator, auto-complete)

  • Joint work with LATC


Catalog metadata quickref

Catalog Metadata QuickRef

  • What?

    • package name, title, url

    • tag:lod

    • topic

    • shortname

    • format-*

  • Who?

    • author || maintainer

    • published by producer

    • provenance metadata

    • license

  • When?

    • version

    • last updated

  • How much?

    • triples

    • links:* (outlinks)

    • namespace (inlinks)

    • vocab mappings

  • Where to find?

    • example URI

    • downloads/dumps

    • SPARQL endpoint

  • Why?

    • package description


Catalog metadata

Catalog Metadata

  • How are datasets described?

  • Resources:

  • example URIs

  • SPARQL endpoint

  • RDF Dumps

  • Sitemaps, VoID files

http://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation


Cataloguing process overview

Cataloguing process overview


Catalog entry validator

Catalog Entry Validator

http://www4.wiwiss.fu-berlin.de/lodcloud/ckan/validator/validate.php

  • Checks levels of metadata completeness

  • Step-by-step annotation instructions

  • Already checks some quality indicatorse.g. availability, provenance, access methods


Ckan entry validator 2

CKAN Entry Validator (2)


Auto completion scripts

Auto-completion scripts

  • For the entries that pass the validator, we can auto-complete metadata with information such as:

    • Number of triples

    • Links to other sources

    • Vocabularies used

    • Quality indicators


Catalog access portal

Catalog Access Portal

  • For machines

    • CKAN API (continuously improved by OKFN)

    • VOID descriptions for LOD group (will be continuously improved in cooperation with LATC)

  • For humans

    • LOD Cloud Diagram

    • State of the LOD Report


Lod cloud diagram

LOD Cloud Diagram


Lod cloud diagram zoom in

LOD Cloud Diagram (zoom in)


State of the lod cloud

State of the LOD Cloud

Triples by domain

Links by domain

http://www4.wiwiss.fu-berlin.de/lodcloud/state/


State of the lod cloud 2

State of the LOD Cloud (2)

  • SPARQL Endpoint: 68.14%

  • RDF Dumps: 39.66%

  • Provide provenance:36.63 %

  • Provide licensing:17.84%

vocabulary use:


Vocabularies catalog

Vocabularies Catalog

  • Based on BTC Dataset (2.1 billion triples)

  • Shows vocabulary usage in practice

  • Executed on a 54 node Hadoop cluster

  • Access portal:

    • Searchable

    • URI Lookup

    • Top usage statistics

Hosted at http://vocab.cc


Top classes per dataset

Top Classes per Dataset


Top properties per dataset

Top Properties per Dataset


Vocabularies catalog1

Vocabularies Catalog

vocab.cc search query results

vocab.cc URI Lookup Results


Tools catalog

Tools Catalog

  • Initial focus on tools from the consortium

  • Currently 15 tools

Entry for Global Sensor Networks (GSN)

Available from planet-data.eu


Tools description

Tools Description

  • Textual description

    • What is it?

    • Documentation

    • Publications

    • Requirements

    • License

    • Contact person/mailing list

    • Organization

    • Events

  • Tags

    • Produce

    • Publish

    • Consume

    • Provisioning


Names of tools in the catalog

Names of Tools in the Catalog

  • CumulusRDF

  • D2R

  • DBpedia Spotlight

  • GSN (Global Sensor Networks)

  • Geometry2RDF

  • LDIF

  • LDSpider (Linked Data Spider)

  • LarKC (Large Knowledge Collider)

  • MonetDB

  • NOR2O

  • R2O&ODEMapster

  • OKKAM

  • Pubby

  • R2R

  • S2O

  • Silk


Tools catalog1

Tools Catalog

  • Related: LATC Tools Catalog

    • 11 tools

    • 5 tools in both, 10 new tools in PlanetData

  • Proposal for next year:

    • Join catalogs at linkeddata.org

    • Jointly maintain catalog until LATC finishes

    • Build a community → people can add their own tools

    • Afterwards PlanetData takes over and maintains the catalog for another 2 years


  • Login