1 / 15

Discovering & Dealing with Data

Discovering & Dealing with Data. Presented by Kimberly Silk, MLS, Data Librarian, Martin Prosperity Institute, University of Toronto. Agenda. The MPI information environment Common data sources & authority Data management, discovery and access What is Open Data? Big Data?

dolf
Download Presentation

Discovering & Dealing with Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Discovering & Dealing with Data Presented by Kimberly Silk, MLS, Data Librarian, Martin Prosperity Institute, University of Toronto

  2. Agenda • The MPI information environment • Common data sources & authority • Data management, discovery and access • What is Open Data? Big Data? • Fun with data visualization • Q & A

  3. About the MPI • The Martin Prosperity Institute is a economic think-tank; we are part of the Rotman School within the University of Toronto • My client group consists of grad students, post-docs, visiting faculty and researchers who use social-science data to support their research • To support their research process, I procure, curate, preserve and make discoverable data sets. • The MPI has our own data repository that has grown to 4 TB in size.

  4. Data Sources • Common & Very authoritative sources • StatsCan via the Data Liberation Initiative • Bureau of Labor Statistics, Bureau of Economic Analysis, American Fact Finder (Census) • OECD eLibrary • World Bank • Int’l sources such as UK Data Archive, Swedish National Data Service, etc. • Pew Research Center • Gallup

  5. More data sources • Less authoritative?? • Chinese Data Center • Rolling Stone • MySpace • CrunchBase

  6. Data Challenge: Discovery Lots of research data being collected and added, but no method to manage it, catalogue it, or make it findable Demands from various clients: faculty, students, researchers, staff, administration The shared network drive was no longer effective

  7. Show & Share… • We want the world to see our data catalogue • But, we don’t want the world to be able to copy or change what’s in the catalogue, or the catalogue itself • We need to manage access to our data; who are you? Where are you from? Why do you want the data? What are you going to do with it? Will you share your results?

  8. Data Discovery Platforms • I reviewed several platforms that would work in an academic environment: • Nesstar – developed in Norway by Norwegian Social Science Data Services, used by StatsCan, UK Data Archive, NORC at UChicago • Islandora – Open source system based on Fedora developed at UPEI • ODESI – proprietary system developed and used by Scholars Portal • Dataverse – Open source system developed by the Institute for Quantitative Social Science at Harvard, used by NBER, and many academic think tanks.

  9. Dataverse • Dataverse was a good choice since we could install an iteration at UToronto, in the UToronto cloud, and I could manage it myself • It was free, and my colleagues at Scholar’s Portal was interested in installing it – I was the perfect guinea pig • Slowly, I am cataloguing my data collection; I have set up a lending agreement, and it’s working very well. • Demo: http://dataverse.scholarsportal.info/dvn/dv/mpi

  10. Open Data • Open data is an idea, that certain data should be freely available to everyone to use, reuse, and redistribute without restriction. • Governments around the world have begun to “open up” some of their data: US, UK, New Zealand, Norway, Russia, Australia, Morocco, Netherlands, Chile, Spain, Uruguay, France, Brazil, Estonia, Portugal, etc. • State- and municipal-levels of government have also created open data sites.

  11. Open Data Opportunities… • Governments open up their data to foster better citizenship and improve transparency • Open Data can spur grass-roots innovation: citizens access open data to use in software programs to solve problems, such as finding a local daycare, knowing when the next bus will come, reporting crime on-the-fly, or watching congress proceedings in real time.

  12. … and Challenges • Open Data takes commitment. Successful implementations have a dedicated team of people who decide what data to release according to usefulness and demand • The data must be anonymized, cleansed and in a non-proprietary format • Organizations must be prepared to listen to the citizens, be responsive, and trouble-shoot. • Open data is a public service.

  13. Big Data • Big Data is a collection of data sets that is too large for the average database management tool (Access and Excel, for instance). • Examples come from meteorology, genomics and physics. At MPI we wrestle with large GIS data sets (maps and satellite data), and deal with data at the terabyte (1 trillion bytes) level. • Larger data sets deal with petabytes (1 quadrillion bytes) and exabytes (1 quintillion bytes).

  14. Data Visualizations • The visual representation of data ---- literally, a picture can say a thousand [numbers] • Edward Tufte is a key pioneer: http://www.edwardtufte.com/tufte/ • Fantastic examples at Flowing Data: http://flowingdata.com/ • RSA Animate: http://www.thersa.org/

  15. Q & A(and, Thank You!) Kimberly Silk, MLS, Data Librarian, Martin Prosperity Institute, University of Torontokimberly.silk@martinprosperity.org

More Related