1 / 17

Google Refine for Data Quality / Integrity

Google Refine for Data Quality / Integrity . Context. BioVeL Data Refinement Workflow Synonym Expansion / Occurrence Retrieval Data Selection Data Quality / Integrity. Context. BioVeL Data Refinement Workflow Synonym Expansion / Occurrence Retrieval Data Selection

gema
Download Presentation

Google Refine for Data Quality / Integrity

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Google Refine for Data Quality / Integrity

  2. Context BioVeL Data Refinement Workflow • Synonym Expansion / Occurrence Retrieval • Data Selection • Data Quality / Integrity

  3. Context BioVeL Data Refinement Workflow • Synonym Expansion / Occurrence Retrieval • Data Selection • Data Quality / Integrity

  4. In Google’s Own Words “Google Refine is a power tool for - working with messy data, - cleaning it up, - transforming it from one format into another, -- extending it with web services, - and linking it to databases”

  5. In Google’s Own Words “Google Refine is a power tool for - working with messy data, - cleaning it up, - transforming it from one format into another, - - extending it with web services, - and linking it to databases” …. and can be run in isolation

  6. Installation • Download zip file from http://code.google.com/p/google-refine/wiki/Downloads • Extract file • Run google-refine.exe

  7. Features Clustering / Grouping use case : group taxon name and merge similar groups

  8. Features Filtering use case : filter out records which do not have ‘museum’ / ‘university’ / ‘marine’ in data provider name

  9. Features Data Exclusion use case : exclude records that have been faceted / filtered

  10. Features Extending Data use case : add ISO country code column use case : add column(s) by parsing taxon name

  11. Features Reconciling Data use case : retrieve associated names from ‘WORMS’

  12. Features Save / Replay User Actions use case : extract scientific names from name labels

  13. Features Build Extensions use case : BioVeL Extension - interaction with Taverna - add additional functionality specific to the BioVeL context (e.g ECAT Name Parser)

  14. Future Possibilities remote server could be deployed as a remote server with the possibility to use shared resources (extensions, data, history actions)

  15. Future Possibilities integration with existing applications, either as a module or using REST API calls

  16. Future Possibilities central application which can be used to run scripts, call web services and even interact with software applications

  17. Thanks Questions / Suggestions / Comments

More Related