1 / 26

Tools of the trade incl. MLE

Tools of the Modeling Trade. Set theoryProbability theoryAlgebraCombinatoricsCalculusDecision theory. Game theory Monte Carlo simulation Automata theory MLE. Set theory. A?BUnion, A or BA?BIntersection, A and BA?BSubset, A within Ba?AMembership, a an element of A?ANegation, Not A.

darryl
Download Presentation

Tools of the trade incl. MLE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Tools of the trade incl. MLE

    2. Tools of the Modeling Trade Set theory Probability theory Algebra Combinatorics Calculus Decision theory

    3. Set theory A?B Union, A or B A?B Intersection, A and B A?B Subset, A within B a?A Membership, a an element of A ?A Negation, Not A

    4. Probability theory Adding math to set theory (Examples…) P(A?B) = joint probability P(A)*P(B) if A and B independent P(B|A) = conditional probability P(B) given A Bayes theorem P(B|A) from P(B) Posterior probability from prior probability and evidence.

    5. Algebra The basics are crucial! Also, matrix algebra Dot products Matrix/vector multiplication Determinants Eigenvectors/values

    6. Combinatorics How many possible sets of 2 can you create from 6 objects? (Note: each object selected only once) 6C2, or C(6,2) nCm= n!/((n-m)!*m!) What if order matters? nPm = n!/(n-m)! More complicated when objects aren’t unique or if each can be selected multiple times.

    7. Calculus The mathematics of continuous phenomena. Slope/change of a function at a particular point on that function. Derivative: dy/dx or Area under a curve Integral: ? Must specify the limits of the integral Usually used in PDFs to create CDFs in statistical theory.

    8. Decision theory Signal detection theory Swets, Dawes, & Monahan (2000) for excellent overview. Macmillan & Creelman (1991) for excellent thorough treatment. Luce’s decision rule Turns quantitative values into discrete probabilities of various decisions. We’ll talk about this more later.

    9. Game theory Usually used to describe multiple interacting agents. For example, to model the prisoner’s dilemma, bargaining, auctions, strategy games, and other group behaviors. Common in behavioral economics. Nash equilibrium. Lots of resources….

    10. Monte Carlo simulation This method involves creating behaviors that conform to known rules. Differs from typical approach of using tools to assess unknown structure given observed behavior. Instead, assumes known structure and looks at type of behavior produced. Often used to assess statistical and other models’ ability to uncover known relations. Examines validity of modeling approaches/distinctions. Is also used to play out the implications of a model.

    11. Automata theory Based in computing theory. Assumes that system has discrete states and known probabilities of moving from one state to another. Common in language (to specify grammatical vs. nongrammatical utterances) and robotics.

    12. Example of a finite state automaton

    13. Maximum Likelihood Estimation (MLE) Most statistical methods are designed to minimize error. Choose the parameter values that minimizes predictive error: y - y’ Maximum likelihood estimation seeks the parameter values that are most likely to have produced the observed distribution.

    14. Likelihood and PDFs For a continuous variable, the likelihood of a particular value is obtained from the PDF (probability density function).

    15. Likelihood ? Probability (for continuous distributions)

    16. Maximum likelihood estimates of parameters For MLE, the goal is to determine the most likely values of the population parameter value (e.g, µ, ?, ?, ?, … ) given an observed sample value (e.g., x-bar, s, b, r, ….) Any model’s parameters (e.g., a, b, c in nonlinear models, weights in backprop) can be estimated using MLE.

    17. Likelihood is based on shape of the d.v.’s distribution!!! ANOVA, Pearson’s r, t-test, regression… all assume that d.v. is normally distributed. Under those conditions, the LSE (least squares estimate) is the MLE. If the d.v. is not normally distributed, the LSE is not the MLE. So, first step is to determine the shape of the distribution of your d.v.

    18. Step 1: Identify the distribution Normal, lognormal, beta, gamma, binomial, multinomial, Weibull, Poisson, exponential…. AAAHHH! Precision isn’t critical unless the sample size is huge. Most stats package can fit a dv distribution using various distribution classes.

    19. Step 2: Choose analysis If only looking at linear models, use GLM. GLM (at least in R) allows you to specify the dv distribution type. (You’ll know you have an MLE method if the output includes likelihoods). Otherwise, you need to modify your fitting method to use a different loss function.

    20. Step 3: Loss functions LSE uses (y - y’)2 as the loss function and tries to minimize the sum of this quantity (across rows) ? SSE. MLE loss functions depend on the assumed distribution of the d.v.

    21. MLE Loss functions Likelihood function is for the joint probability of all of the data. For example, P(µ=2) for row 1 and P(µ=2) for row 2 and P(µ=2) for row 3… Which equals: It’s mathematically easier to deal with sums, so we’ll take the log of that quantity:

    22. MLE Loss functions, cont. Now, we have something that can be computed for each row and summed… …but, we want the maximum of that last equation whereas loss functions should be minimized. Easy! We’ll just take negate it. Negative log likelihood becomes our loss function.

    23. So, once you know the PDF… …take the log of the function and negate it. This doesn’t change the point of the maximum/minimum of the PDF…

    24. Normal and gamma distributionsNormal and gamma distributions

    25. Model comparison in MLE Computation of likelihood ratio (LR) L(model 1) / L(model 2)

    26. Finding LRs from output Note: if you specified your own loss function and had the stat package minimize it, it will give you the sum of negative log likelihoods (-LL). To obtain likelihood for LR computation, you must negate the -LL and then “delog” it by using 10LL or eLL depending on whether you used log or ln.

    27. Bottom line There are lots of modeling tools in the tool chest. Choose the right tool for the job. Don’t use the same tool for every job. Optimally, you should use MLE parameter estimates. If dv is normally distributed, then LSE=MLE. Otherwise, use GLM or MLE with loss function specified. Note, nonparametric methods like backprop can use various error/loss functions, too. In R, nnet can use entropy or softmax parameters to minimize conditional likelihood.

More Related