170 likes | 250 Views
Explore how Google Refine can transform messy data, improve data quality, and integrate with web services and databases. Learn about clustering, filtering, extending, and reconciling data for better insights. Discover future possibilities for remote servers, integrations, and script running.
E N D
Context BioVeL Data Refinement Workflow • Synonym Expansion / Occurrence Retrieval • Data Selection • Data Quality / Integrity
Context BioVeL Data Refinement Workflow • Synonym Expansion / Occurrence Retrieval • Data Selection • Data Quality / Integrity
In Google’s Own Words “Google Refine is a power tool for - working with messy data, - cleaning it up, - transforming it from one format into another, -- extending it with web services, - and linking it to databases”
In Google’s Own Words “Google Refine is a power tool for - working with messy data, - cleaning it up, - transforming it from one format into another, - - extending it with web services, - and linking it to databases” …. and can be run in isolation
Installation • Download zip file from http://code.google.com/p/google-refine/wiki/Downloads • Extract file • Run google-refine.exe
Features Clustering / Grouping use case : group taxon name and merge similar groups
Features Filtering use case : filter out records which do not have ‘museum’ / ‘university’ / ‘marine’ in data provider name
Features Data Exclusion use case : exclude records that have been faceted / filtered
Features Extending Data use case : add ISO country code column use case : add column(s) by parsing taxon name
Features Reconciling Data use case : retrieve associated names from ‘WORMS’
Features Save / Replay User Actions use case : extract scientific names from name labels
Features Build Extensions use case : BioVeL Extension - interaction with Taverna - add additional functionality specific to the BioVeL context (e.g ECAT Name Parser)
Future Possibilities remote server could be deployed as a remote server with the possibility to use shared resources (extensions, data, history actions)
Future Possibilities integration with existing applications, either as a module or using REST API calls
Future Possibilities central application which can be used to run scripts, call web services and even interact with software applications
Thanks Questions / Suggestions / Comments