1 / 19

Lecture 3. Statistical Vocabulary & data management

Lecture 3. Statistical Vocabulary & data management. Zihaohan Sang Sept 10, 2019. Week2 !. Basic statistical vocab + data management Exploratory graphics. Statistical vocab. Research topic: drought resistance in Trembling Aspen. Here is the distribution of lodgepole pine.

foxc
Download Presentation

Lecture 3. Statistical Vocabulary & data management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 3. Statistical Vocabulary & data management Zihaohan Sang Sept 10, 2019

  2. Week2 ! • Basic statistical vocab + data management • Exploratory graphics

  3. Statistical vocab

  4. Research topic: drought resistance in Trembling Aspen

  5. Here is the distribution of lodgepole pine. Does these samples (from starts) represent the population?

  6. Take home message: • make sure samples can fully represent the population you want to study; • To avoid uncertainty caused by random chance, more general the better.

  7. Date types in R Numeric: Categorical: Discrete: Integer (1, 5, 100) Continuous: Integer + digits (1.1, 5.0, 100.3) Nominal: character or Factor (species, locations) Ordinal: Order factors (‘Good’, ‘Med’, ‘Poor’) levels: Poor < Med < Good Logical: True/False

  8. Notes: • use as.factor() or as.numeric() to force a variable into the type you want; • read.csv() function would automatically read character column as factor (levels is alphabetically) • Add one or more letters into a column, R would automatically classify it as character or factor

  9. Data Tables

  10. Golden rules for data tables • A row represents a unit • All measurements of a unit should normally be in the same row. • Different units must be in different rows. • Important to think about what your units are

  11. Golden rules for data tables 2. If in doubt, add more rows • If possible, use categorical (character) variables to indicate the independent effects (treatments, environments). • Repeat measurements are normally added as rows, with two independent variables “Time” and “Individual”. • It is always easy to convert a long table to a wide table (Excel Pivot), but not vice versa.

  12. Golden rules for data tables 3. Use strong IDs

  13. Weak IDs

  14. Strong IDs

  15. Golden rules for data tables 4. Modify your raw data entries with R scripts • Easy to do a change something and re-run the analysis (e.g. with or without outliers) • Hunting down and fixing errors is efficient, because script leaves a perfect trail of what you did. • Save yourself from repetitive tasks (that likely introduce errors)

  16. File Management

  17. Golden Rules - File Management • Keep all files you need for a particular analysis in one folder (.RData-shortcut, data.xls, data.csv, script.r, script.sas, documentation.txt) • New folders for new tasks, analysis (numbered and descriptive folder names are useful) • Use many folders but shallow folder hierarchy (2-4 subdirectories deep but many folders) • Zip previous folders (analysis steps) for backup

More Related