60 likes | 195 Views
Data Curation Issues and Challenges. ARL/CNI Fall Forum 2008 Sayeed Choudhury sayeed@jhu.edu. Pixel data collected by telescope. Sent to Fermilab for processing. Beowulf Cluster produces catalog. Loaded in a SQL database. Data Flow (Levels of Data). Courtesy of Alex Szalay.
E N D
Data Curation Issues and Challenges ARL/CNI Fall Forum 2008 Sayeed Choudhury sayeed@jhu.edu
Pixel data collected by telescope Sent to Fermilab for processing Beowulf Cluster produces catalog Loaded in a SQL database Data Flow (Levels of Data) Courtesy of Alex Szalay
Key Considerations • Work with existing scientific systems • Consider gateways for these systems as part of infrastructure development • Focus on both human and technical components of infrastructure • Human interoperability is more difficult than technical interoperability • Trust
Questions (1) • How do we transfer principles into new practices, especially given scale and complexity? • What are the fundamental differences between data and collections? Human readable vs. machine readable? • What about the “cloud” or the “crowd”? • Can flickr help us with data curation?
Questions (2) • How does a partnership audit data (and associated services) distributed across the network? • Are audits about “completeness” or perhaps about transparency and reliability? • Where are the existing data curators? Maybe we shouldn’t use the terms data librarian or data scientist or humanist.
Questions (3) • What are the requirements? Are there common requirements, which may be most appropriate area for libraries? • Are there unifying concepts or themes? “One scientist’s noise is another scientist’s signal…” • What are we trying to sustain? Data? Scholarship? Our organizations?