Applied statistics for testing and evaluation – MED4 Introduction Lecturer: Smilen Dimitrov
About the course and communication • Course: Applied statistics for testing and evaluation – MED4 • Course teacher – Smilen Dimitrov, teaching assistant (TA) Kristina Daniliauskaite • Contact per e-mail welcome: firstname.lastname@example.org (email@example.com); TA: firstname.lastname@example.org • When writing, you are welcome to use your group e-mail] • However, if you do not use the group e-mail, but your own individual addresses, please include the other group members’ addresses (or the group email) in Carbon Copy (CC) when writing the e-mail • Course website: http://www.smilen.net/stat/ • Course requirements – PE course • Most of you will develop a software application as a product in your MED3 project • You are expected to perform user testing and evaluation of your product, and provide statistical analysis of the results in your project report. These results and the analysis is expected to be discussed as part of your group project exam. In addition, for the individual part of the group exam, you are expected to discuss and answer PE questions.
Statistics - starting notes • Statistical analysis refers to doing something useful with data (letting its meaning free). • Statistics is neither really a science nor a branch of mathematics. It is perhaps best considered as a meta-science (or meta-language) • One of the hardest things is choosing the right kind of statistical analysis - depends on the nature of your data and on the particular question you're trying to answer. • Statistics is intimately related to the scientific method. • Back end - analyzing data and stating conclusions, 'front end' of the process requires expertise in the specific subject matter, such as economics, biology, ecology, or medialogy. • No substitute for experience; the way to know what to do, is to have done it properly lots of times before
Introduction • Scope of application of statistics is enormous - unavoidable in research about Human Senses - Digital Perception (semester theme) • In media technology, used for testing and evaluation in two main areas: • Technical - signal processing algorithms, systems • Psychological/social – user response to products and interfaces, ratings • Statistics – finding unknown parameters and relationships through organization and study of collected data • This course – based on Statistics - An Introduction using R by M. Crawley • Introduction to basic concepts in statistical analysis • Usage of the free statistical programming language R • Heavy use of Internet resources • 5 modules: 1hr40m lectures, 1hr40m exercises • In industry: Microsoft - Statistical Media Processing research project
Experiments and statistics problems • Introduce terminology • Experiment (general) – asking the Universe a question • Answer – perform measurements or observations => collect data • Possibility for misinterpretation of data – proper understanding of statistical analysis and experimental design • Descriptive statistics • methods and tools for collecting data • models to describe and interpret data • Statistics • the study of data • problem-solving process that seeks answers to questions through data • Inferential statistics • systems and techniques for making good decisions and accurate predictions based on data
Experiments and statistics problems • For us • ‘asking the Universe a question’ -> statistics problem • Process of recording measurements -> data collection • Experiment -> method of data collection • Components of a statistics problem • Ask a Question • Collect Appropriate Data • Analyze the Data • Interpret the Results
Examples of statistics problems • Video – Room-Measurement Activity(link) • Video – bias and measurement error (link)
Examples of statistics problems Suppose you were curious about the relative heights and arm spans of men and women. 1. Ask a Question Are men typically taller than women? Do men typically have longer arm spans than women? 2. Collect Appropriate data Using a meter stick, measure the heights (without shoes) and arm spans (fingertip to fingertip) of three men and three women. Record your measurements to the nearest centimeter. 1. Ask a Question How much does a penny weigh? 2. Collect Appropriate data Use a metric scale to weigh 32 pennies to the nearest centigram (1/100 of a gram). Based on the data, how much would you expect the 33rd penny to weigh? 1. Ask a Question Should nuclear power be developed as an energy source? 2. Collect Appropriate data Twenty-five people completed the following questionnaire:
Examples of statistics problems • Media technology example - The Optimal Thumbnail experiment (http://www.otal.umd.edu/SHORE/bs21/experiment.html) 1. Ask a Question- Given several image thumbnail sizes, which is the optimal size in relation to accurate and quick recognition? 2. Collect Appropriate data • Devise an experiment (and test a hypothesis) where the subject is asked to recognize images, and measured two dependent variables: time to recognition, and accuracy of identification.
Variables and variability • If you measure the same thing twice you will get two different answers. Due to : • the changing nature of things (heterogeneity), • association with something else changing, or • errors • Variables - characteristics that may be different from one observation to the next • things that we measure, control, or manipulate in research. • symbol (A, B, x, y, etc.) that can take on any of a specified set of values • When we measure these characteristics, we assign a value for each variable. This set of values for a given variable is known as data • Measurement errors Random error - nonsystematic measurement error that is beyond our control, the effects average out to zero over a series of measurements. Measurement bias (systematic error) - favors a particular result. A measurement process is biased if it systematically overstates or understates the true value of the measurement.
Qualitative and quantitative variables • Some questions are answered with a number, some not • Qualitative (categorical) data/variables - measurement expressed by means of a natural language description (not in terms of numbers) • Nominal: When there is not a natural ordering of the categories. Examples might be gender, race, religion, or sport. • Ordinal: When the categories may be ordered. Categorical variables that judge size (small, medium, large, etc.) are ordinal variables. • Quantitative data/variables - measurement expressed in terms of numbers • Ratio-scale: A scale that has a meaningful zero value and equidistant measure: doubling principle (10 yrs is twice as old than 5 yr) • Interval scale: Interval scales have equidistant measure however the doubling principle breaks down in this scale (50° is not half as hot as 100° Celsius) Measurement scales Nominal Ordinal Ratio Interval
Methods of data collection • Data collection - integral part of statistics • Methods of data collection - methods to gather information about the world • Experiments - the only way to determine causal relationships between variables • Independent variable (IV) - manipulated by an experimenter to exist in at least two levels • Dependent variable (DV) - the second variable the experimenter measures • 'if you read a Wiki, then you will have enhanced knowledge.' • Sample surveys - the selection and study of a sample of items from a population. A sample is just a set of members chosen from a population, but not the whole population. A survey of a whole population is called a census. • phoning the fifth person on every page of the local phonebook and asking them how long they have lived in the area. • Observational studies - the most primitive method of understanding the laws of nature. Basically, a researcher goes out into the world and looks for variables that are associated with one another. • Observations have the equivalent of two Dependent Variables Experimental research Correlational research
Descriptive and inferential statistics • Descriptive statistics - methods used to summarize or describe a collection of data • Analysis by bringing out the information the data contains • Steps: • Collect data • Classify data • Summarize data • Present data • Proceed to inferential statistics if there is enough data to draw a conclusion • Inferential statistics - modeling patterns of data, to draw inferences about the thing being studied • Analysis by testing or retesting a hypothesis.
Descriptive statistics • Descriptive statistics - a branch of statistics that denotes any of the many techniques used to summarize a set of data. • Allows us to describe groups of many numbers. One way to do this is by reducing them to a few numbers that are typical of the groups, or describe their characteristics. • Techniques • Graphical description • Tabular Description • Summary statistics • Two objectives for summary statistics • choose a statistic algorithm that shows how different units seem similar - a measure of central tendency (typical value – location). • arithmetic mean, • the median, • the mode • choose another statistic that shows how they differ - a measure of statistical variability (spread). • the range • standard deviation • inter-quartile range • specific values from the quantiles. • the variance; • variance square root, • absolute deviation. • Normal distribution • Skewed distribution
Inferential statistics • Inferential statistics or statistical induction do not just describe numbers, they infer causes • comprises the use of statistics to make inferences (informed guesses) concerning some unknown aspect (usually a parameter) of a population - to draw inferences about situations where we have only gathered part of the information that exists • Dealing with probability – as we’re dealing with guesses/predictions. Two schools differ: • frequency probability using maximum likelihood estimation - The frequentists understand probability in the common sense---i.e. if an event has probability 1/6 then in many trials the event will happen 1/6 of the time – well defined experiments. • Bayesian inference - Bayesians, on the other hand, hold that probability is a measure of our belief (or confidence) in some event happening. Bayesians update their belief in the light of new data using Bayes theorem - apply probabilities to arbitrary statements. • Generally when we have a research question, we can form from it a research hypothesis or a set of hypotheses – however, usually not directly testable using inferential statistics. • Statistic algorithms – tools • T-test • ·ANOVA • Correlation • Factorial • Regression • Chi-Squared • Probability • Distributions
Average and the arithmetic mean • In mathematics, an average or central tendency of a set (list) of data refers to a measure of the 'middle' of the data set. • There are many different descriptive statistics that can be chosen as a measurement of the central tendency. The most common method, and the one generally referred to simply as the average, is the arithmetic mean • In statistics, mean has two related meanings: • the average in ordinary English, which is also called the arithmetic mean (and is distinguished from the geometric mean or harmonic mean). The average is also called sample mean. • the expected value of a random variable, which is also called the population mean.
Box: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Number of raisins: 29 27 27 28 31 26 28 28 30 29 26 27 29 29 25 28 28 Average and the arithmetic mean • A set of data • Arithmetic mean • (Sum notation) • Example 1. Ask a Question How many raisins are in a half-ounce box of raisins? 2. Collect Appropriate data We counted the number of raisins in 17 half-ounce boxes: And the arithmetic mean is …
Average and the arithmetic mean • Many kinds of averages • Arithmetic mean - the sum of all measurements divided by the number of observations in the data set • Median - the middle value that separates the higher half from the lower half of the data set • Mode - the most frequent value in the data set • Geometric mean - the nth root of the product of n data values • Harmonic mean - the reciprocal of the arithmetic mean of the reciprocals of the data values • Quadratic mean or root mean square (RMS) - the square root of the arithmetic mean of the squares of the data values • Generalized mean - generalizing the above, the nth root of the arithmetic mean of the nth powers of the data values • Weighted mean - an arithmetic mean that incorporates weighting to certain data elements • Truncated mean - the arithmetic mean of data values after a certain number or proportion of the highest and lowest data values have been discarded • Interquartile mean - a special case of the truncated mean • Midrange - the arithmetic mean of the highest and lowest values of the data or distribution • When is arithmetic mean improper – average rate of return of investments – the numbers multiply, so geometric mean must be used.
Introduction to R • R – statistical computing programming language • Powerful tool for statistical modelling: • Data exploration, tabulating and sorting data, drawing plots of data • Sophisticated calculator to evaluate complex arithmetic expressions, and a flexible object oriented language • Installation • In class example - finding the average of a number of raisins in a box • Data collection in Excel • Using R, finding the average and plotting bar graphs.
Exercise for mini-module 1 – STAT01 Exercise A 1. Collect the following data about the members of your group in an Excel sheet: a) Name b) Age c) Previous education 2. Import the data into R, and find the average age of the group members. 3. Using R, plot the individual ages of the group members as a bar graph. Exercise B Repeat exercise A, for all students of the MED3 class. Compare the found age average of the entire class, with the age average for your group. Delivery: Deliver the collected data (in tabular format), the found age averages and the bar-graphs in an electronic document.