Math Stat Course: Making Incremental Changes Mary Parker University of Texas at Austin
Intro to Mathematical Statistics (M378K at University of Texas) • Prerequisite: Probability course (which is required for all math majors) • Students: Math majors, actuarial students, other science majors • Previous statistics courses? Some took an applied stat course, either freshman course or after probability course. Some didn’t.
Math Stat Topics • Sampling distributions of statistics • Estimation of parameters: confidence intervals, method of moments estimation, maximum likelihood estimation, comparison of estimators using mean square error and efficiency, sufficient statistics • Hypothesis tests: p-values, power, likelihood ratio tests • Distributions used include normal, binomial, Poisson, uniform, gamma, beta, t, F, chi-squared, and other standard distributions. • Other topics as time permits.
Some students took this course: M358K Applied Statistics descriptionNew course in the last five yearsPrerequisite: Probability course. Taken by math majors with concentration in secondary school teaching, statistics, and some others. If they take both M358K and M378K, they are encouraged to take M358K first. Introduction of this course has not decreased enrollment in M378K and some students who take this new course and didn’t plan to take more statistics do go on to M378K.
Questions MAIN: How can a teacher who doesn't have the time/inclination to completely revamp her course make incremental changes that will better prepare students to understand and use contemporary statistics techniques? Preliminary: • What aspects of the reform of the first course are also appropriate for the math stat course? • What should we preserve in the current math stat course so that it continues to give mathematically sophisticated students a strong foundation in statistics? • What additional tools and techniques of theoretical statistics should be introduced at this level? • Within twenty years, when all students will be using the equivalent of a Mathematica-level program, what can/should we be teaching in theoretical statistics courses?
Incrementally changing Math Stat • Focus on assumptions throughout. • Check assumptions. • Mention alternative techniques if assumptions not met. • Discuss robustness of methods. • Briefly introduce nonparametric statistics and Bayesian inference to illustrate different assumptions / framework. • Have students do explorations.
What explorations? Main idea: Simulate and explore sampling distributions of various statistics. Use to illustrate theoretical ideas and to check on robustness of procedures. Preliminary idea 1: Create a complete sampling distribution themselves and check its properties to see that they agree with the theoretical results. Preliminary idea 2: Think of some interesting estimators to investigate. (See that there are more possible estimators for a parameter than the sample mean.)
Why explorations? • Explorations help make the theory concrete • Robustness of statistical techniques: The concept seems strange to math students and they appreciate tools to explore it on their own.
Simulate and explore a sampling distribution • The population is the numbers of potatoes in a 5-lb sack of potatoes from a certain company. Assume the counts are distributed as discrete uniform, from 12 potatoes to 18 potatoes. Choose a reasonable sampling method and construct the sampling distribution of the sample mean for samples of size 2. • Find the mean and variance of the population and then find the mean and variance of the sampling distribution. • Comment on the results, based on your theoretical understanding from the formulas we proved about the mean and variance of a sample mean. • Discuss what would be different for samples of size 9. • Investigate the sampling distribution of the sample range.
Strategy • Given very early in the semester. • Student groups of 2-3. • Grading and instructions encourage students to think about it over a couple of weeks without spending much time on it at first, BECAUSE • This assignment is not as well-defined as it looks for many students.
Difficulties often encountered • Should (13,14) be a different element of the sample space from (14,13)? • Should I sample with replacement or without replacement? Why? • When computing the standard deviations here, is the denominator n or n-1?
Extensions • Sampling without replacement: what changes? What does that tell us about the language/formulas of our text? (independence of samples) • Where could we find the equivalent formulas to those in our text for sampling without replacement? What’s different?
Constructing various estimators “German Tank Problem” Assume German tanks had consecutive ID numbers from 001 to ???. Need to estimate the number of the population of German tanks (max ID in the population,) based on the IDs from the sample of tanks we have captured. In groups, think of at least three different reasonable estimators. Then draw a sample of size 5 from my “population of German tank IDs” in the envelope. Give your three estimates. Use a computer to simulate the three sampling distributions
Strategy • Done in class before beginning to talk about estimation. • Usually students will use (1) two times the mean,(2) the maximum, and then, after a bit of time, will come up with something else. • Students will need help simulating the sampling distributions. Again, arrange the timing/grading to encourage them to think about it and discuss it before spending a lot of time doing it.
Difficulties in simulating sampling distributions • How do you describe the original population to the computer? (Discrete uniform on 1 to 600, maybe) • Is it fairly easy to obtain a random sample from that distribution in your software? (If not, find other software!) • Distinguish between the sample size and the number of points from the sampling distribution. • What should you do with the sampling distribution?
Looking at sampling distributions • What should you look at to summarize a sampling dist’n? (histogram, summary statistics) • Is it close to normally distributed? (Discuss normal scores plots.) • (More advanced) Is it close to a __ dist’n? (Make available information about probability plots in more generality.) • If the statistic is unbiased, what characteristic will the sampling dist’n have? (If yours doesn’t have the mean exactly what it’s supposed to, is that because you made an error? Why or why not?)
Focus on Assumptions • Checking assumptions for typical normal-theory techniques • Already discussed normal probability plots • Discuss what types of deviations from assumptions cause problems for a particular technique and why • In two-sample t procedures, help them see exactly why equal variance assumption is more popular among theorists than those working in applications. • Robustness • Central Limit Theorem. Explorations of various types of distributions – how large must n be?
Focus on Assumptions II • Nonparametric techniques • Sign test, signed rank test, and rank-sum test • Compare results with those from t-test for some examples to further illustrate conditions for robustness of t-tests • Bayesian statistics • Very brief introduction, contrasting assumptions of frequentist and Bayesian approaches • Do examples from binomial or normal with conjugate priors and indicate that choosing the prior mean and variance gives quite a lot of flexibility • Mention that using more general, non-conjugate priors leads to the need for more computationally-intensive methods
Actual assignments • Construct a sampling distribution • German tank problem • Simulating sampling distributions in MINITAB Find the actual assignments and supporting material at the website listed on the handout for this session http://www.ma.utexas.edu/users/parker/jsm04/ Right now, click here