managing and curating data n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Managing and Curating Data PowerPoint Presentation
Download Presentation
Managing and Curating Data

Loading in 2 Seconds...

play fullscreen
1 / 16
zenobia

Managing and Curating Data - PowerPoint PPT Presentation

0 Views
Download Presentation
Managing and Curating Data
An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Managing and Curating Data Chapter 8

  2. Introduction • Data organization • Data management • Data curation • Raw data is required to repeat a scientific study • Any data supported by public funds is legally required to be available for other scientists and the public

  3. Step 1: Managing Raw Data • Various sources of data • Data loggers • Handwritten notes • This data must be transferred to an organized format, checked and analyzed

  4. Spreadsheets • Row: single observation • Column: single measured or observed variable • Enter data ASAP! • Detect mistakes • Memory (doesn’t last long) • 2 copies • Timely analysis • Proofread the data • Check it 2006 Garden Yield

  5. Metadata: Data about data • “Must have” metadata: • Name and contact info of collector • Location of data collection • Name of study • Source of funding • Description of the organization of the data file • Methods used to collect • Types of experimental units • Description of abbreviations • Explicit description of data in columns and rows • May be created before in some cases • Very important to assemble because it’s easily forgotten

  6. Step 3: Checking the Data • Outliers: values of measurements or observations that are outside the range of the bulk of the data • Values beyond the upper or lower deciles (the 90% or the 10%) • Outliers increase the variance in data and increase the chance of a Type II error

  7. How to deal with outliers • Do not delete them; this could be considered fraud • Only delete if an error or the data no longer are valid • Think about them • Interesting hypotheses • A large body of science is devoted to outliers • What type of distribution does your data have?

  8. Errors and Missing Data • Errors are often outliers and can be identified • Sources: Mistyping (decimal points), instrument, field entry • Checking data can reduce errors • Never leave blank cells in spreadsheets; enter a zero or NA (not available)

  9. Detecting Outliers and Errors • Three techniques • Calculating column statistics • Checking ranges and precision of column values • Graphical exploratory data analysis

  10. Detecting Outliers and Errors cont. • Column stats: • Mean, median, standard deviation, variance • Logical functions to check your columns • Range checking your data

  11. Graphical Exploratory Data Analysis • Box plots (univariate) • Stem-and-leaf plots (univariate) • Scatterplots (bivariate or multivariate)

  12. Stem-and-leaf plots • Example: Vegetable biomass: 7,15, 35,36,37,23,27,21,42,55 0 7 1 5 2 1,3,7 3 5,6,7 4 2 5 5

  13. Scatter plots • Use to see how traits relate to one another

  14. Creating an Audit Trail • Examining data for outliers and errors is a QA/QC for research • Document how you perform QA/QC in your metadata • Your audit trail allows others to reanalyze and recreate your results • May be required for legal documentation