1 / 11

Large Scientific Databases

Large Scientific Databases. Large scientific datasets are those which are systematically collected and organized and which stretch the technical capabilites of the species to store, manipulate, and distribute data for scientific investigation--hence limiting that scientific investigation.

mciotti
Download Presentation

Large Scientific Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Large Scientific Databases

  2. Large scientific datasets are those which are systematically collected and organized and which stretch the technical capabilites of the species to store, manipulate, and distribute data for scientific investigation--hence limiting that scientific investigation.

  3. What is a “small” dataset? “Only a few hundred gigabytes.” • Alex Szalay

  4. What about non-scientific databases? • Why not Google?

  5. Fields producing these datasets • Observational data • Earth and space sciences • Astronomy and Astrophysics • Space Physics • Atmospheric Science • Geoscience • Ocean Science • Experimental Laboratory Data • CERN • [From Preserving Scientific Data on Our Physical Universe (Washington, National Academy Press: 1995)]

  6. Observations • The datasets they are collecting are huge and will grow. • These datasets stretch the technical capabilities of what our species can do with computer applications and hardware. Thus limiting what we can learn. • That there are bottlenecks in storage, manipulation, and in distribution. • There is not enough bandwidth for scientific use in the sizes of datasets that now exist.

  7. More observations • It may be that there are solutions in other disciplines for addressing some problems scientists working with large datasets are wrestling with. • Library & Information Science • Graphics • Hardware and software vendors • They shouldn't all have to reinvent everything separately

  8. Is there a field? • Connections between scientists working on large datasets appear to be informal • Assembling scientists working with large datasets will be useful because different ones may have solved different problems already or may have useful insights to share • There is an extensive literature but it is technical and largely not self-aware

  9. Is there a field 2 • On a broader scale, if in 10 years these datasets can be put on a desktop computer, there will be scientists out gathering even bigger datasets. It is what humans do. • Can principles be derived from current experience that will help deal with those future larger limits? • Can we focus on this aspect of science?

  10. Ancillary issues • Policy • Characteristics of the data • etc.

  11. What next? • Conference? • Gather • The scientists • Vendors • Disciplines that might help the scientists • Literature review

More Related