SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics
Introduction Statistics - technology used to describe and measure aspects of nature from samples Statistics lets us quantify the uncertainty of these measures
Introduction Statistics is also about good scientific practice The history of statistics has its roots in biology
Sir Francis Galton Inventor of fingerprints, study of heredity of quantitative traits Regression & correlation Also: efficacy of prayer, attractiveness as function of distance from London
Karl Pearson Polymath- Studied genetics Correlation coefficient c2 test Standard deviation
Sir Ronald Fisher The Genetical Theory of Natural Selection Founder of population genetics Analysis of variance Likelihood P-value Randomized experiments Multiple regression etc., etc., etc.
Statistical quotations • There are three kinds of lies: lies, damn lies, and statistics. • Benjamin Disraeli / Mark Twain • It is easy to lie with statistics, but easier to lie without them. • Frederick Mosteller
Goals of statistics • Estimation • Infer an unknown quantity of a population using sample data • Hypothesis testing • Differences among groups • Relationships among variables
Introduction Introduction to the basic concepts of statistics as applied to problems in biological science. • Goal of the course • Understand statistical concepts (population, sample,, slope, significant etc.); • Identify appropriate methods for your data (e.g., one-sample, two-sample, paired t-test or independent t-test, one-way or two-way ANOVA); • Select correct MINITAB procedures to analyze data • Scientific reading and interpretation.
Biostatistics • Why study Biostatistics? • Statistical methods are widely used in biological field; • Examples are from biological field, practical and useful; • Focus on application instead of mathematical derivation; • Help to evaluate the paper in an intelligent manner. Statistics - the science and art of obtaining reliable results and conclusions from data that is subject to variation. Biostatistics (Biometry)- the application of statistics to the biological sciences.
Biostatistics • Why Computer Applications? • Statistical methods are mostly difficult and complicated (ANOVA, regression etc); • Advances in computer technology and statistical software development make the application of statistical method much easier today than before; • Software such as MINITAB needs time to learn.
Is Biostatistics hard to study? • Factors make it hard for some students to learn statistics: • The terminology is deceptive. To understand statistics, you have to understand the statistical meaning of terms such as significant, errorand hypothesis are distinct from ordinary uses of these words.
Is Biostatistics hard to study? • Statistics requires mastering abstract concepts. It is not easy to think about theoretical concepts such as populations, probability distributions, and null hypotheses. • Statistics is at the interface of mathematics and science. To really grasp the concepts of statistics, you need to be able to think about it from both angles.
Is Biostatistics hard to study? • The derivation of many statistical tests involves difficult math. However, you can learn to use statistical tests and interpret the results even if you do not fully understand how they work. You only need to know enough about how the tool works so that you can avoid using them in inappropriate situations.
Is Biostatistics hard to study? • Basically, you can calculate statistical tests and interpret results even if you don’t understand how the equations were derived, as long as you know enough to use the statistical tests appropriately.
Questions about this course • Is this course to be hard? • No. Concept is easy and procedure is clear. • Why do we spend time on theoretical stuff? • Helpful to understand the application • Do we need to know all the stuff? • You may not need all, but be prepared
Role of statistics in Biological Science Statistics 1.Mathematical model / hypothesis 2.Study design 3.Descriptive statistics 4.Inferential statistics Science 1.Idea or Question 2.Collect data/make observations 3.Describe data / observations 4.Assess the strength of evidence for / against the hypothesis
Contents of the course • Descriptive statistics • Graph, table, mean and standard deviation • Inferential statistics • Probability and distribution • Hypothesis test • Analysis of Variation • Correlation and regression analysis • Other special topics
Basic Concept • Data • numerical facts, measurements, or observations obtained from an investigation, experiment aimed at answering a question • Statistical analyses deal with numbers
Basic Concept • Quantitative • Usual type of measurement, such as height or weight - measurements of quantitative variables carry information about 'amount' - can calculate means, etc., and can use in calculations
Basic Concept • Qualitative • Carry information about category or classification, such as medical diagnosis, ethnic group, gender - cannot calculate means as such, but can tabulate counts or frequencies and analyze frequencies
Basic Concept • Variable • a characteristic that can take on different values for different persons, places or things • Statistical analyses need variability; otherwise there is nothing to study • Examples: • Concentration of a substance, pH values obtained from atmospheric precipitation, birth weight of babies whose mothers are smokers, etc.
Basic Concept • A variable is a characteristic measured on individuals drawn from a population under study. • Data are measurements of one or more variables made on a collection of individuals.
Basic Concept • Type of Variable • Continuous variable • Between any two values of a variable, there is another possible value • Examples: height, weight, concentration • Discrete variable • Value can be only integer • Example: number of people, plant etc.
Basic Concept • Continuous variables • Can take any value to any degree of precision in a certain range - height, weight, temperature (?)
Basic Concept • Discrete variables: • Can take only certain values or can only be measured to a certain degree of accuracy - e.g., # of children that a woman has delivered, # of teeth with fillings, blood pressure (?) - may be handled differently in analysis
Basic Concept • Independent Variable • Dependent Variable We try to predict or explain a response variable from an explanatory variable.
Populations <-> Parameters;Samples <-> Estimates Basic Concept
Nomenclature Basic Concept
Basic Concept • Population • Population parameters are constants whereas estimates are random variables, changing from one random sample to the next from the same population.
Basic Concept • Population and Sample SamplePopulation, StatisticParameter population Parameter predict properties of sample Generalize to a population sample statistic
Basic Concept • Population • Population: a set or collection of objects we are interested in. (finite, infinite) • Parameter: a descriptive measure associated with a variable of an entire population, usually unknown because the whole population cannot be enumerated. For example, Plant height under warming conditions; Graduates in USIM; Smokers in the world. • Example: number of people, plant etc.
Basic Concept • Population and Sample • Population - largest collection of values of a random variable for which we have an interest at a particular time - school children in Negeri Sembilan. • Sample - selected part of a population – Form Three girls, Form Five boys, etc.
Basic Concept • A sample of convenience is a collection of individuals that happen to be available at the time.
Basic Concept • Sampling • essence of statistical inference – why? • Why sample?Cannot afford time or money to record measurements on entire population and new members of the population may be entering all of the time - We use statistical analysis of a sample to answer questions about a population - cancer patients, teen-age boys, women after child birth, etc.
Basic Concept • Sampling
Basic Concept • Bias is a systematic discrepancy between estimates and the true population characteristic.
Basic Concept • Sampling error - The difference between the estimate and average value of the estimate is a systematic discrepancy between estimates and the true population characteristic.
Basic Concept • Larger samples on average will have smaller sampling error.
Basic Concept • Properties of a good sample • Independent selection of individuals • Random selection of individuals • Sufficiently large
Basic Concept • Sampling • So how do 'intervention studies fit into this?Studies select a sample of the population (e.g., cancer patients) to study the effects of a new therapy and then make inferences about how the rest of the cancer patient population would react to the new therapy.
Basic Concept • Sample • Sample: a small number of subjects from a population to make inference about the population; • Random sample: A sample of size n drawn from a population of size N in such a way that every possible sample of size n has the same chance of being selected. • Statistic: a descriptive measure associated with a random variable of a sample.
Basic Concept • Random • Variables whose values arise by chance factors which cannot be predicted in advance, such as height or weight • race or age are 'fixed' variables; i.e., not random
Basic Concept • Random • In a random sample, each member of a population has an equal and independent chance of being selected.
Descriptive Statistics • Graphical Summaries • Frequency distribution • Histogram • Stem and Leaf plot • Boxplot • Numerical Summaries • Location – mean, median, mode. • Spread – range, variance, standard deviation • Shape – skewness, kurtosis
Frequency Distribution- Discrete variables • Example:Number of grass plants, Mytilus edulis, found in 800 sample quadrats (1m2) in an ecological study of grasses:
Frequency Distribution- Discrete variables • Example:Number of grass plants, Mytilus edulis, found in 800 sample quadrats (1m2) in an ecological study of grasses: 1, 4, 1, 0, 0, 1, 0, 0, 2, 3, 1, 2, 3, 1, 0, 2, 0, 1, 2, ……………………………………………………… 1, 2, 3, 2, 1, 1, 0, 5, 0, 0, 1, 0, 1, 0, 2, 4, 7, 2, 1,0 How is the plant number in a quadrat distributed?
Frequency Distribution- Discrete variables • Table 1. The frequency, relative frequency, cumulative frequencies of plant sedge in a quadrat. • frequency - number of times value occurs in data.(probability for population). • relative frequency - the % of the time that the value occurs (frequency/n). • cumulative relative frequency - the % of the sample that is equal to or smaller than the value (cumulative frequency/n).
Histogram (Bar graph) and polygon • Histogram graph of frequencies • Can be used to visually compare frequencies • Easier to assess magnitude of differences rather than trying to judge numbers • Frequency polygon - similar to histogram Fig. 1. Frequency distribution of plants in a quadrat.