1 / 9

Statistical Tools A Few Comments

Statistical Tools A Few Comments. Harrison B. Prosper Florida State University PHYSTAT Workshop 2004 1-2 March 2004. Outline. Issues Wish List Example Summary. Statistical Tools: Issues. Some difficulties with tools used in HEP Difficult to express ideas cleanly and clearly

Download Presentation

Statistical Tools A Few Comments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical ToolsA Few Comments Harrison B. Prosper Florida State University PHYSTAT Workshop 2004 1-2 March 2004 Statistical Tools PhyStat Workshop 2004 Harrison B. Prosper

  2. Outline • Issues • Wish List • Example • Summary Statistical Tools PhyStat Workshop 2004 Harrison B. Prosper

  3. Statistical Tools: Issues • Some difficulties with tools used in HEP • Difficult to express ideas cleanly and clearly • Tools scattered over different (typically, monolithic) programs • Interface between heterogeneous data formats and disparate tools is a headache • Histograms are tightly coupled to their viewers • Algebra of histograms relatively crude • Inadequate support for systematic study of ensembles Statistical Tools PhyStat Workshop 2004 Harrison B. Prosper

  4. Issues – II • In a systematic statistical study one may wish to: • Generate different ensembles of observations, possibly with conditioning, and study various statistical properties (bias, variance, coverage etc.) • Assess robustness with respect to • prior densities and likelihoods • Study different confidence limit procedures • Study different optimization criteria Statistical Tools PhyStat Workshop 2004 Harrison B. Prosper

  5. Issues – III • One may wish to study: • Type I and type II error rates • Consistency – both convergence to, and rate of convergence to, the true answer as sample size increases • Probability densities p(z) given underlying distributions p(x) Statistical Tools PhyStat Workshop 2004 Harrison B. Prosper

  6. Wish List • Decoupling • Statistical tool separate from, and independent of, the environment in which it might be used. • However, provide bindings for different environments/languages (R, Root, Python, Java, etc.) • Modularity • Each statistical tool encapsulates a single coherent statistical idea. Avoid monoliths. • Histograms • Histogram and histogram viewers independent of each other. (A sensible idea from Marc Paterno!) • Elegant algebra of histograms h = a*h1+b*h2/h3 etc. • Powerful, intuitive tools for multi-dim. data exploration Statistical Tools PhyStat Workshop 2004 Harrison B. Prosper

  7. Wish List – II • Likelihoods • Flexible method for reporting them; maybe as swarms of points generated via MCMC? • Frequency Methods • Flexible ensemble generator, which allows easily extracted sub-ensembles • Flexible query of ensembles (to get coverage, error rates, variances, bias etc.) • Bayesian Methods • Flexible robustness studies (prior family, likelihood family etc.) • Multi-dimensional integration (adaptive and Markov chain MC) Statistical Tools PhyStat Workshop 2004 Harrison B. Prosper

  8. Example: A Current Statistical Problem From DØ Single Top Group • Set limit on s(p+pbar → t + X) given an histogram for each of • 4 signal channels • tq(EC), tqb(EC), tq(CC), tqb(CC) • 4 background sources per signal channel • QCD, ttbar(l+jets), ttbar(ll), W+Jets • Some histograms are weighted, some unweighted • We would like to study different limit procedures, including Bayesian, and study their frequency properties. Currently using ad hoc and rather inflexible pieces of homegrown C++! Statistical Tools PhyStat Workshop 2004 Harrison B. Prosper

  9. Summary • The Good • Lots of statistical tools already exist • A lot more needed – opportunity for creativity! • The Bad • Use of current tools, however, often requires familiarity with several frameworks/languages • The Ugly • Lack of a simple, but powerful, language for expression of statistical ideas. Rapid “what if” analyses done with C++. This is crazy! I don’t want to think about pointers and de-referencing when I’m trying to think about mathematics. Statistical Tools PhyStat Workshop 2004 Harrison B. Prosper

More Related