1 / 16

Causal Data Mining

Causal Data Mining. Richard Scheines Dept. of Philosophy, Machine Learning, & Human-Computer Interaction Carnegie Mellon. Causal Graphs. Causal Graph G = { V,E } Each edge X  Y represents a direct causal claim: X is a direct cause of Y relative to V. Chicken Pox.

adolph
Download Presentation

Causal Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CausalData Mining Richard Scheines Dept. of Philosophy, Machine Learning, & Human-Computer Interaction Carnegie Mellon

  2. Causal Graphs Causal Graph G = {V,E} Each edge X  Y represents a direct causal claim: X is a direct cause of Y relative to V Chicken Pox

  3. Causal Bayes Networks The Joint Distribution Factors According to the Causal Graph, i.e., for all X in V P(V) = P(X|Immediate Causes of(X)) P(S = 0) = .7 P(S = 1) = .3 P(YF = 0 | S = 0) = .99 P(LC = 0 | S = 0) = .95 P(YF = 1 | S = 0) = .01 P(LC = 1 | S = 0) = .05 P(YF = 0 | S = 1) = .20 P(LC = 0 | S = 1) = .80 P(YF = 1 | S = 1) = .80 P(LC = 1 | S = 1) = .20 P(S,YF, LC) = P(S) P(YF | S) P(LC | S)

  4. Structural Equation Models • Structural Equations: One Equation for each variable V in the graph: V = f(parents(V), errorV) for SEM (linear regression) f is a linear function • Statistical Constraints: Joint Distribution over the Error terms Causal Graph

  5. Causal Graph SEM Graph (path diagram) Structural Equation Models Equations: Education = ed Income =Educationincome Longevity =EducationLongevity Statistical Constraints: (ed, Income,Income ) ~N(0,2) 2diagonal - no variance is zero

  6. Tetrad 4: Demo www.phil.cmu.edu/projects/tetrad

  7. Causal Datamining in Ed. Research • Collect Raw Data • Build Meaningful Variables • Constrain Model Space with Background Knowledge • Search for Models • Estimate and Test • Interpret

  8. CSR Online Are Online students learning as much?What features of online behavior matter?

  9. CSR Online Are Online students learning as much? Raw Data : Pitt 2001, 87 studentsFor everyone: Pre-test, Recitation attendance, final examFor Online Students: logged: Voluntary question attempts, online quizzes, requests to print modules

  10. CSR Online Build Meaningful Variables: • Online [0,1] • Pre-test[%] • Recitation Attendance [%] • Final Exam [%]

  11. CSR Online Data: Correlation Matrix (corrs.dat, N=83)

  12. CSR Online Background Knowledge: Temporal Tiers: • Online, Pre • Rec • Final

  13. CSR Online Model Search: No latents (patterns – with PC or GES) - no time order : 729 models - temporal tiers: 96 models) With Latents (PAGs – with FCI search) - no time order : 4,096 - temporal tiers: 2,916

  14. Tetrad Demo Online vs. Lecture Data file: corrs.dat

  15. Estimate and Test: Results • Model fit excellent • Online students attended 10% fewer recitations • Each recitation gives an increase of 2% on the final exam • Online students did 1/2 a Stdev better than lecture students (p = .059)

  16. An Introduction to Causal Inference, (1997), R. Scheines, in Causality in Crisis?, V. McKim and S. Turner (eds.), Univ. of Notre Dame Press, pp. 185-200. Causation, Prediction, and Search, 2nd Edition, (2000), by P. Spirtes, C. Glymour, and R. Scheines ( MIT Press) Causality: Models, Reasoning, and Inference, (2000), Judea Pearl, Cambridge Univ. Press “Causal Inference,” (2004), Spirtes, P., Scheines, R.,Glymour, C., Richardson, T., and Meek, C. (2004), in Handbook of Quantitative Methodology in the Social Sciences, ed. David Kaplan, Sage Publications, 447-478 Computation, Causation, & Discovery (1999), edited by C. Glymour and G. Cooper, MIT Press References

More Related