1 / 27

Surviving the Data Deluge: Challenges and Opportunities for Academic Research

Surviving the Data Deluge: Challenges and Opportunities for Academic Research. Kelvin K. Droegemeier Vice President for Research Regents’ Professor of Meteorology University of Oklahoma. A HUGE Spectrum. Zillions of small data;

lorin
Download Presentation

Surviving the Data Deluge: Challenges and Opportunities for Academic Research

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Surviving the Data Deluge: Challenges and Opportunities for Academic Research Kelvin K. DroegemeierVice President for ResearchRegents’ Professor of MeteorologyUniversity of Oklahoma

  2. A HUGE Spectrum Zillions of small data; ubiquitous, streaming,unexpected, highly perishable, non-reproducible, geo-referenced, mobile Huge structured data;anticipated, well defined, streaming, fixed in space,reproducible

  3. The Easy Way Out: Everything is Important so Keep it All!

  4. The Notion of Sharing • Humans are naturally selfish, even as infants, and not sharing is an innate behavior we exhibit at a very early age!

  5. Sharing RESULTS is Foundational to Scholarship and is Getting Easier! “School of Athens” Raphael (1509-1510)

  6. But Sharing DATA? • Today, sharing data typically means providing access to everything that led to the published article • PLOS journals require authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception • “Dataare any and all of the digital materials that are collected and analyzed in the pursuit of scientific advances” (PLOS) • For ScienceMagazine, “all data necessary to understand, assess, and extend the conclusions of the manuscript must be available to any reader, and all codes involved in the creation or analysis of data must also be available.” • New OSTP directive makes this generally applicable

  7. Key Questions We Think About! • Definition of DATA • Who owns it and when, and who decides? • What should be kept and who decides? • How and where should it be kept and who decides? • How is quality assured and who determines it? • Who provides access and who pays? • How is access provided and who decides? • Who is given access and when, and who decides? • When should access be denied and who decides? • Who provides technical assistance in using data? • How should credit be given for generating/maintaining data, and who decides? • Who ensures and pays for compliance?

  8. The Notion of Sharing • The academy encourages collaboration but still tends to evaluate and reward individual achievement – thus reinforcing at least caution in sharing

  9. So Why Would We Share?? • Because it is required • To enhance our own scholarship • To enhance our discipline • To enhance our academic program • To enhance our institution • To bring benefits to our nation • To improve the world

  10. Transdisciplinary Interdisciplinary Multi-Disciplinary The Research Spectrum Disciplinary D. Lightfoot NSF

  11. Research at the Boundaries Social and Behavioral Sciences Physical Science Technology and Engineering Policy Economics Open Data will help (force) us to more quickly and effectively develop common vocabularies and build trust among researchers across disciplines

  12. The Value Proposition of Open Data Access • In Research • Replication/reproducibility of previous results • New interpretation and analysis to scale for heterogeneous data sets – an open market approach • Opportunities for collaboration, probably numerous and multi-disciplinary • Follow-on studies and thus faster progress • Identification of unknown problems • Greater accountability and responsible conduct of research • Research experience in K through 16 • Credit for gathering/generating/making available data • In Teaching • Online and engaged learning • In Economic Development • Corporate R&D to use in analytics

  13. Question: Will Open Data Access Make a Difference?

  14. Question: Will Open Data Access Make a Difference? 550 deaths

  15. Open Data May Facilitate Understanding! Numerical Simulation 24 hours CPU = 1 hour real 20 TB of output Still trying to understand Mother Nature Real time! Still trying to understand

  16. And Provide a Pathway for Saving Lives • Social and behavioral science research, especially mining of social media data, is critical for understanding why people die in tornadoes • Data from those studies, coupled with output from computer weather models, are being used to re-envision the entire severe weather warning system • More eyes on the problem = better outcome

  17. Google Earth and Apple iPhone as Possible Role Models • Provide the substrate (base code and APIs) and open it up to the world • This is data and possibly code in the context of open access • May draw in young people and those from underrepresented groups • Availability of tools is essential, and open access data may lead to a vast array of new resources for analyzing data, especially using cloud services • Today, no real market exists but OA could create one!

  18. The Challenges of Open Data Access in Research • Misuse of data owing to misunderstanding  battles about results • Burden on researchers to explain data, deal with access problems • Applying analytics to mine vast amounts of information and make sense of it • Misappropriation of credit

  19. Broader Challenges of Open Data Access • Privacy: S (tiny bits) = life story or valuable intelligence • Workforce is a HUGE challenge and cuts across all disciplines. How do we educate in that manner? • Tools for data analytics that are extensible and format/domain-agnostic

  20. View from a Vice President for Research • Key Point #1: Facilitating Research With Data • A means to tackle some of the most compelling intellectual challenges at the intersections of multiple disciplines • It’s not only about providing access but also helping ensure EFFECTIVE use of data – which is not automatic! • The institution must help: bring people together, stimulate conversations, bridge language barriers, build trust, guide thinking, provide support • Library has a unique role to play – a renaissance as the intellectual commons of our campuses • As IT has opened new doors and brought people/disciplines together, so can the “data challenge” if we handle it properly! • Especially critical for engagement of social sciences and the humanities

  21. View from a Vice President for Research • Key Point #2: Providing Credit and Support • System (but importantly also a philosophy) for giving credit to faculty for generating, maintaining, and provisioning data (similar to how IP has been added to portfolio) • Provost, Deans, Tenure Committees, Senior Faculty • Building data stewardship into research metrics aka citations, impact factors, etc • Creation of persistent identifiers/tags (EZID, DataCiteConsortium) – but also WHAT credit means • Creation of an indirect cost component for data and compliance

  22. View from a Vice President for Research • Key Point #3: Logistics and Cost • Coordination and consolidation of data management approaches across the institution: Provost, Library Dean, CIO, VPR • Appropriate cyberinfrastructure, security, systems-level approach • Integration into the broader academic ecosystem • The strategies in which we invest today may be quite different in a short time – shifting sands • Division of responsibilities (local and national) • When will the dust settle regarding policies? • Unity versus diversity in approaches

  23. View from a Vice President for Research • Key Point #4: Changing Times • Place-based education, especially at the undergraduate level, means changing how we educate students and bringing experiential learning to every student (aka undergraduate research and creative activity). Open Access offers HUGE possibilities! • Improved access to historical data may slow the pace of new data creation yet spur support for sustainability • Cost models may be impacted by disruptive technologies and ideas (e.g., MOOC analog in the IT sector) • Workforce creation is key – especially for dealing with nuances of statistics and blending data from highly disparate sources; and noting that curationand analytics are intertwined

  24. Assessing Impact • Assessing impact is critical if we are to understand the real benefits (and possible drawbacks) of open data access • We must link assessment to value proposition points described earlier and be open to new ideas of measuring value (e.g., drawing in underrepresented groups to STEM fields, creation of new domains such as digital humanities) • Assessment also is critical for modifying policies to create even greater benefits and address drawbacks

  25. Blue Sky Thoughts: Caution – A Forecast from a Meteorologist! • Open data will generate completely new models for collaborative interaction – a new “social media” in research centered around sharing, not simply communicating. This will lead to new tools and approaches (e.g., Tableau) • New disciplines likely will arise when researchers from one discipline start “playing around” with data from others, and these new perspectives will reveal new insights, solve longstanding problems and raise new questions. The “Fourth Paradigm.”

  26. Blue Sky Thoughts: Caution – A Forecast from a Meteorologist! • The entire notion of a “publication” will change, leading to open-ended online research release events in which new analyses and data will continually be added to previously reported outcomes, creating branches and forks in time, with other disciplines linked in a virtual intellectual commons • Access to data, and tools for analysis, will enhance interest and exploration by children and offer greater opportunity to increase participation by traditionally underrepresented groups

  27. The End Game for Research • Open Data will enable NEW and better ways of doing old things + lots of NEW things • Reproducibility (repeat/verify) • Creatibility (frontier) • Extensibility (extend/expand) • Collaborability (across)

More Related