1 / 20

Data cleansing for Dummies: Google to the rescue!!

Data cleansing for Dummies: Google to the rescue!!. Dave Smith Petrology Collections Manager. The Natural History Museum, London. Architectural wonders. Waterhouse building opened in 1881 Steel frame and terracotta Purpose built for natural history collections. The Museum. 1000 staff

makani
Download Presentation

Data cleansing for Dummies: Google to the rescue!!

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data cleansing for Dummies:Google to the rescue!! Dave Smith Petrology Collections Manager

  2. The Natural History Museum, London

  3. Architectural wonders • Waterhouse building opened in 1881 • Steel frame and terracotta • Purpose built for natural history collections

  4. TheMuseum • 1000 staff • 350 science staff • 72 million specimens (estimated) • Life Sciences • Plants, animals, birds, insects • Earth Sciences • Minerals & gems, rocks, fossils, meteorites

  5. My role • Geologist by training • Collections Manager for rock collections • 125,000 rocks • 10,000 decorative stones • 37,000 ocean sediments • 16,000 ore specimens • Departmental EMu administrator • Registry management • Report writing • Training & documentation • EMusupport & upgrade testing • Communication

  6. ‘Fingers in lots of pies’ • Have been involved in cross-museum initiatives involving EMu.

  7. 01110010100101010 10010100010001011 11100001010100101 00100100010010101 11010110010010010 00101001010010101 Data cleansing for Dummies:Google to the rescue!! Dave Smith Petrology Collections Manager

  8. The problem

  9. Core Information • 89,000 Records (73%) • Identification = 52,100 • Provenance = 64,215 • Acquisition = 38,700 • Storage = 14,300

  10. Numbers

  11. The Problem • Data sits outside Emu – how to get it in? • Not as easy as it sounds – many barriers… • Notes field used for data with uncertain placeholder. • Sites data of variable levels of atomisation depending on experience of digitiser.

  12. Acquisition Lot entry

  13. The Problem • Data sits outside Emu – how to get it in? • Not as easy as it sounds – many barriers… • Notes field used for data with uncertain placeholder. • Sites data of variable levels of atomisation depending on experience of digitiser. • Approx. 95% of specimens have a record in EMu with a minimum of registration number. Once cleaned - How to update records without overwriting enhanced data • Unfamiliarity with Access • Short time periods for data cleansing.

  14. The Solution • Google Refine • Open Refine (Github) • Personal web service • Runs in your browser

  15. The demo

  16. Benefits • Intuitive User Interface • Powerful editing / data manipulation functions • Can’t make mistakes!  Endless undo…..! • Pick up where you left it  Maintains history • Link to open-data sources to validate your data • Augment your data with free open data sources.

More Related