Dataset Gap Analysi s/Prioritization Plan - PowerPoint PPT Presentation

aricin
slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
Dataset Gap Analysi s/Prioritization Plan PowerPoint Presentation
Download Presentation
Dataset Gap Analysi s/Prioritization Plan

play fullscreen
1 / 19
Download Presentation
Dataset Gap Analysi s/Prioritization Plan
87 Views
Download Presentation

Dataset Gap Analysi s/Prioritization Plan

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Dataset Gap Analysis/PrioritizationPlan Michelle Gierach, PO.DAAC Project Scientist 2012 PO.DAAC User Working Group (UWG) Meeting March 7-8, 2012

  2. Dataset Gap Analysis Rec. 6. Do dataset gap analysis and create a report. Rec. 19. PO.DAAC provide climatologies, anomalies, indices, and various dataset statistics for selected datasets. Status: • A dataset gap analysis document was created that details datasets currently available and those that will soon be available in the ocean community. • Available climatologies, anomalies, and other value-added products (e.g., fluxes, frontal gradients) were included in the document. • Approx. 100 datasets were listed based upon input from the PO.DAAC User Working Group (UWG), Project Science Team (PST), and Data Engineers (DEs). Future Plans: • Request for information twice a year from the UWG, PST, DEs, and NASA science teams regarding additional datasets. • After this initial phase of acquiring available datasets, the next step in FY13 will be to see where gaps still exist and work with the community to create additional climatologies, anomalies, and indicies or create them ourselves within PO.DAAC.

  3. Dataset Gap Analysis Prioritization Now that we have a document that lists ~100 datasets that would be of benefit to our users, how do we prioritize? Past prioritization has been subjective and ad-hoc. Need a system that is unbiased and provides quantitative measures to assess a dataset’s significance.

  4. Dataset Lifecycle Phases Identify a Dataset of Interest Green-Light the Dataset Tailor the Dataset Policy Ingest the Dataset Archive the Dataset Register/Catalog the Dataset Distribute the Dataset Verify the Dataset Rollout the Dataset Maintain the Dataset

  5. Prioritization Plan Flow Chart: NASA Process

  6. Prioritization Plan Flow Chart Identify Datasets Dataset Classification Non-Obligated Obligated Dataset List Dataset List Significance Ranked Dataset List Cost Analysis Recommendation Archive Reject Dataset Remote Link Archive Comments/Thoughts/Questions?

  7. Prioritization Plan Flow Chart Step 1a Identify Datasets Step 1b Dataset Classification Step 1 Non-Obligated Obligated Dataset List Dataset List Significance Ranked Dataset List Cost Analysis Recommendation Archive Reject Dataset Remote Link Archive Comments/Thoughts/Questions?

  8. Prioritization Plan Flow Chart Identify Datasets Dataset Classification Non-Obligated Obligated Dataset List Dataset List Significance Step 2 Ranked Dataset List Cost Analysis Recommendation Archive Reject Dataset Remote Link Archive Comments/Thoughts/Questions?

  9. Prioritization Plan Flow Chart Identify Datasets Dataset Classification Non-Obligated Obligated Dataset List Dataset List Significance Ranked Dataset List Step 3 Cost Analysis Recommendation Archive Reject Dataset Remote Link Archive Comments/Thoughts/Questions?

  10. Prioritization Plan Flow Chart Identify Datasets Dataset Classification Non-Obligated Obligated Dataset List Dataset List Significance Ranked Dataset List Cost Analysis Recommendation Step 4 Archive Reject Dataset Remote Link Archive Comments/Thoughts/Questions?

  11. Prioritization Plan Flow Chart Step 1a Identify Datasets Step 1b Dataset Classification Step 1 Non-Obligated Obligated Dataset List Dataset List Significance Ranked Dataset List Cost Analysis Recommendation Archive Reject Dataset Remote Link Archive

  12. Step 1: Dataset Identification/Classification Approx. 100 datasets were identified within the oceanographic community. Seven of these were classified as PO.DAAC obligations, including: • L2B reprocessed QuikSCAT data (JPL) • L2C QuikSCAT data (JPL) • MEaSUREs CCMP-like product (Bourassa) • GHRSST Pathfinder 5.2 SST • GHRSST Global Ocean Sea Surface Temperature Multi Product Ensemble (GMPE) • GHRSST Global Ocean OSTIA Sea Surface Temperature Anomaly • GHRSST Global Ocean OSTIA Sea Surface Temperature Anomaly Reanalysis First priority is given to datasets labeled as PO.DAAC obligations.

  13. Prioritization Plan Flow Chart Identify Datasets Dataset Classification Non-Obligated Obligated Dataset List Dataset List Significance Step 2 Ranked Dataset List Cost Analysis Recommendation Archive Reject Dataset Remote Link Archive

  14. Step 2: Significance of Non-Obligated Datasets Decisional Criteria Access: Readily available? Foreign repository? Behind firewalls or open FTP? Toolkits: Data visualization routine? Data reader? Verified reader/subroutine? Relationships: Sibling/child datasets identified? Motivation/justification identified? Rarity: Hard-to-find data? Atypical sensor/resolution/etc.? Specification: Resolution (spatial / temporal) Spatial coverage Start time End time Data format? Exotic data structure? Sizing / volume expectation? Comments/Thoughts/Questions? Community Assessment: Papers written / number of citations # of Likes # of downloads/views Technical Quality: QQC+Latency / Gappiness Accuracy Sampling issues? Caveats/known issues identified? Processing: Has it been manipulated? Cal/Val state? Verification state? Provenance: Maturity of platform/instrument/sensor Maturity of Program Parent datasets identified (if applicable) Is the sensor fully described? Is the context of the reading(s) fully described? State-of-the-Art technology? Documentation: What is the state of the documentation? Is the documentation captured (archived)? Adherence to Process Guidelines Did it get fast-tracked? Tons of waivers? Were all exit criteria met satisfactorily? Consistent use of units?

  15. Step 2: Significance of Non-Obligated Datasets Prioritization criteria to assess a non-obligated dataset’s significance: • Source: A particular dataset’s association. • PO.DAAC-centric NASA mission/project (1) • Non-PO.DAAC-centric NASA mission/project (0.75) • Domestic (non-NASA) mission/project (0.5) • International mission/project (0.25) • Uniqueness: Would this be a new and/or one-of-a-kind dataset to PO.DAAC? • Yes/No (1/0) • Desirability: Is there a need/want for this dataset in the community? • High/Medium/Low (1/0.5/0) • Maturity (1st order): Community recognition? Technical Quality? Dataset Specifics? • High/Medium/Low (1/0.5/0)

  16. Step 2: Example Score = (Source_Score*25) + (Unique_Score*20) + (Desirability_Score*30) + (Maturity_Score*25) Comments/Thoughts/Questions? 4 Prioritization Groups: 1st tier (green); 2nd tier (yellow); 3rd tier (orange); 4th tier (pink)

  17. Prioritization Plan Flow Chart Identify Datasets Dataset Classification Non-Obligated Obligated Dataset List Dataset List Significance Ranked Dataset List Step 3 Cost Analysis Recommendation Archive Reject Dataset Remote Link Archive

  18. Step 3: Cost Analysis Cost Decisional Criteria Access: Readily available? Foreign repository? Behind firewalls or open FTP? Toolkits: Data visualization routine? Data reader? Verified reader/subroutine? Relationships: Sibling/child datasets identified? Motivation/justification identified? Rarity: Hard-to-find data? Atypical sensor/resolution/etc.? Specification: Resolution (spatial / temporal) Spatial coverage Start time End time Data format? Exotic structure? Sizing / volume expectation? Comments/Thoughts/Questions? Community Assessment: Papers written / number of citations # of Likes # of downloads/views Technical Quality: QQC+Latency / Gappiness Accuracy Sampling issues? Caveats/known issues identified? Processing: Has it been manipulated? Cal/Val state? Verification state? Provenance: Maturity of platform/instrument/sensor Maturity of Program Parent datasets identified (if applicable) Is the sensor fully described? Is the context of the reading(s) fully described? State-of-the-Art technology? Documentation: What is the state of the documentation? Is the documentation captured (archived)? Adherence to Process Guidelines Did it get fast-tracked? Tons of waivers? Were all exit criteria met satisfactorily? Consistent use of units?

  19. Prioritization Plan Flow Chart Identify Datasets Dataset Classification Non-Obligated Obligated Dataset List Dataset List Significance Ranked Dataset List Cost Analysis Recommendation Step 4 Archive Reject Dataset Remote Link Archive