A Toolkit for the modeling of Multi-parametric fit problems

A Toolkit for the modeling ofMulti-parametric fit problems Luca Lista INFN Napoli

Motivation • Initially developed while rewriting a fortran fitter for BaBar analysis • Simultaneous estimate of: • B(BJ/) / B(BJ/K) • direct CP asymmetry • More control on the code was needed to justify a bias appeared in the original fitter • As much as possible of the code has to be under control and testable separately

Requirements • Provide Tools for modeling parametric fit problems • Unbinned Maximum Likelihood (UML[*]) fit of: • PDF parameters • Yields of different sub-samples • Both, mixed • 2 fits • Toy Monte Carlo to study the fit properties • Fitted parameter distributions • Pulls, Bias, Confidence level of fit results [*] not Unified Modeling Language …  …

Design issues • Trying to optimize as much as possible the PDF code • Gets called a large number of times • Yes, it can be done in C++: • Addressed with Template Metaprogramming • No need to use virtual functions • The underlying minimization engine is Minuit, as always • Wrapped in different flavours (ROOT, …)

class PdfFlat { public: typedef double type; enum { variables = 1 }; PdfFlat( double a, double b ) : min( a ), max( b ) { } double operator()( type * v ) const { type x = *v; return ( x < min || x > max ? 0 : 1 / ( max - min ) ); } double min, max; }; class PdfPoissonian { public: typedef int type; enum { variables = 1 }; PdfPoissonian( double m ) : mean( m ) { } double operator()( type * v ) const { type n = *v; return ( exp( - mean ) * pow( mean, n ) / TMath::Factorial( n ) ); } double mean; }; PDF interface Variable type Returns P(x) Variable set Returns dP(x)/dx The user can define its own pdfs with the above interface

template< class Pdf, class Generator = RootRandom > class RandomGenerator { public: typedef typename Pdf::type type; RandomGenerator( const Pdf& pdf ); void generate( type * v ) const; }; template< class Generator > class RandomGenerator< PdfFlat, Generator > { public: typedef PdfFlat Pdf; typedef typename Pdf::type type; RandomGenerator( const Pdf& pdf ) : _min( pdf.min ), _max( pdf.max ) { } void generate( double * v ) const { v[ 0 ] = Generator::shootFlat( _min, _max ); } private: const double& _min, &_max; }; Random number generators Partial specialization Generic Random engine: Root, CLHEP, … The user can define its own generators with the preferred method

Combining PDFs template<class Pdf1, class Pdf2> class PdfIndependent2 { public: typedef typename Pdf1::type type; enum { variables = Pdf1::variables + Pdf2::variables }; PdfIndependent2( const Pdf1& pdf1, const Pdf2& pdf2 ) : _pdf1( pdf1 ), _pdf2( pdf2 ) { } double operator()( double * val ) const { return _pdf1( val ) * _pdf2( val + Pdf1::variables ); } private: const Pdf1 &_pdf1; const Pdf2 &_pdf2; }; template<class Pdf1, class Pdf2, class Pdf3> class PdfIndependent3 { ... }; template<class Pdf1, class Pdf2, class Pdf3, class Pdf4> class PdfIndependent4 { ... }; n1 + n2 variables Product of pdfs

Transformation of variables The Jacobian must be 1! template<class Pdf, class Transformation> class PdfTransformed { public: typedef typename Pdf::type type; enum { variables = Pdf::variables }; PdfTransformed( const Pdf& pdf, const Transformation& trans ) : _pdf( pdf ), _trans( trans ) { } double operator()( double * val ) const { double x[ variables ]; copy( val, val + variables, x ); _trans( x ); return _pdf( x ); } private: const Pdf &_pdf; Transformation _trans; }; typedef PdfIndependent2<PdfGaussian, PdfGaussian> PdfBasic; typedef PdfTransformed<PdfBasic, Rotation2D> Pdf; PdfGaussian g1( 0, 0.1 ), g2( 0, 1 ); Pdf pdf( PdfBasic( g1, g2 ), Rotation2D( M_PI / 4 ) ); Client code example

A data sample to be fitted template< int n, class type = double > class Sample { public: typedef std::vector<type *> container; typedef typename container::size_type size_type; typedef typename container::iterator iterator; typedef typename container::const_iterator const_iterator; ~Sample() { for( iterator i = begin(); i != end(); i ++ ) delete [] *i; } size_type size() const { return _v.size(); } const type* operator[]( int i ) const { return _v[ i ]; } iterator begin() { return _v.begin(); } iterator end() { return _v.end(); } const_iterator begin() const { return _v.begin(); } const_iterator end() const { return _v.end(); } type * extend() { _v.push_back( new type[ n ] ); return _v.back(); } private: container _v; }; Fixed number of variables Basically, a vector of double*

UML PDF Parameter fit const int sig = 100; double mean = 0, sigma = 1; PdfConstant p( sig ); // alternative: PdfPoissonian PdfGaussian q( mean, sigma ); Experiment<PdfConstant, PdfGaussian> experiment( p, q ); PdfGaussian pdf( mean, sigma ); Likelihood<PdfGaussian> like( pdf ); UMLParameterFitter<Likelihood<PdfGaussian> > fitter( like ); fitter.addParameter( "mean", & pdf.mean ); fitter.addParameter( "sigma", & pdf.sigma ); for ( int i = 0; i < 5000; i++ ) { Sample< 1 > sample; experiment.generate( sample ); double par[ 2 ] = { mean, sigma }, err[ 2 ] = { 1, 1 }, logLike; logLike = fitter.fit( sample.begin(), sample.end(), par, err ); double pullm = ( par[ 0 ] - mean ) / err[ 0 ]; double pulls = ( par[ 1 ] - sigma ) / err[ 1 ]; } Fixed PDF for MC generation Variable PDF For fitting Parameters linked to the fitter

Parameter fit Results (Pulls) There is a bias (as expected): 2 = 1/ni(xi-)2  1/n-1i(xi-)2

UML Yield fit const int sig = 10, bkg = 5; typedef PdfPoissonian Fluctuation; // alternative: PdfConstant Fluctuation fluctuationSig( sig ), fluctuationBkg( bkg ); typedef PdfIndependent2< PdfGaussian, PdfGaussian > PdfSig; typedef PdfIndependent2< PdfFlat, PdfFlat > PdfBkg; PdfSig pdfSig( PdfGaussian( 0, 1 ), PdfGaussian( 0, 0.5 ) ); PdfBkg pdfBkg( PdfFlat( -5, 5 ), PdfFlat( -5, 5 ) ); typedef Experiment< Fluctuation, PdfSig > ToySig; typedef Experiment< Fluctuation, PdfBkg > ToyBkg; ToySig toySig( fluctuationSig, pdfSig ); ToyBkg toyBkg( fluctuationBkg, pdfBkg ); Experiment2< ToySig, ToyBkg > toy( toySig, toyBkg ); typedef ExtendedLikelihood2< PdfSig, PdfBkg > Likelihood; Likelihood like( pdfSig, pdfBkg ); UMLYieldFitter< Likelihood > fitter( like ); for ( int i = 0; i < 5000; i++ ) { Sample< 2 > sample; toy.generate( sample ); double s[] = { sig, bkg }, err[] = { 1, 1 }; double logLike = fitter.fit( sample.begin(), sample.end(), s, err ); double pull1 = ( s[0] - sig ) / err[0] ), pull2 = ( ( s[1] - bkg ) / err[1] ); } In 2 dimensions: Flat background Gaussian signal

Yield fit Results (Pulls) <s> = 10 <b> = 5 Discrete structure because of low statistics Poisson fluctuation

const int sig = 10, bkg = 5; typedef PdfPoissonian Fluctuation; Fluctuation fluctuationSig( sig ), fluctuationBkg( bkg ); typedef PdfIndependent2< PdfGaussian, PdfGaussian > PdfSig; typedef PdfIndependent2< PdfFlat, PdfFlat > PdfBkg; PdfGaussian g1( 0, 1 ), g2( 0, 0.5 ); PdfFlat f1( -5, 5 ), f2( -5, 5 ); PdfSig pdfSig( g1, g2 ); PdfBkg pdfBkg( f1, f2 ); typedef Experiment<Fluctuation, PdfSig> ToySig; typedef Experiment<Fluctuation, PdfBkg> ToyBkg; ToySig toySig( fluctuationSig, pdfSig ); ToyBkg toyBkg( fluctuationBkg, pdfBkg ); Experiment2<ToySig, ToyBkg> toy( toySig, toyBkg ); typedef ExtendedLikelihood2<PdfSig, PdfBkg> Likelihood; PdfGaussian G1( 0, 1 ); PdfSig pdfSig1( G1, g2 ); Likelihood like( pdfSig1, pdfBkg ); UMLYieldAndParameterFitter<Likelihood> fitter( like ); fitter.addParameter( "mean", & G1.mean ); double pull1, pull2, pull3; for ( int i = 0; i < 5000; i++ ) { Sample< 2 > sample; toy.generate( sample ); double s[] = { sig, bkg, 0 }; double err[] = { 1, 1, 1 }; double logLike = fitter.fit( sample.begin(), sample.end(), s, err ); pull1 = ( s[ 0 ] - sig ) / err[ 0 ]; pull2 = ( s[ 1 ] - bkg ) / err[ 1 ]; pull3 = ( s[ 2 ] - 0 ) / err[ 2 ]; } Combined Yield and parameter fit

Combined fit Results (Pulls) <s> = 10 <b> = 5

Support for 2 fit class Line { public: Line( double A, double B ) : a( A ), b( B ) { } double operator()( double v ) const { return a + b * v; } double a, b; }; { Line line( 0, 1 ); Chi2<Line> chi2line( line, partition ); Chi2Fitter<Chi2<Line> > fitter1( chi2line ); fitter1.addParameter( "a", &line.a ); fitter1.addParameter( "b", &line.b ); Parabola para( 0, 1, 0 ); Chi2<Parabola> chi2para( para, partition ); Chi2Fitter<Chi2<Parabola> > fitter2( chi2para ); fitter2.addParameter( "a", &para.a ); fitter2.addParameter( "b", &para.b ); fitter2.addParameter( "c", &para.c ); // ... } Still no support for Correlated errors!

Application to B(BJ/) / B(BJ/K) • Four variables: • B reconstructed mass • Beam - B energy in the  mass hypothesis • Beam – B energy in the K mass hypothesis • B meson charge • Two samples: • J/ , J/ ee • Simultaneous fit of: • Total yield of BJ/, BJ/K and background • Charge asymmetry • Resolution and energy shitfs separately for J/ , J/ ee

B(BJ/) / B(BJ/K), cont. • “Peculiar” distribution of kinematical variables • Non trivial variable transformation to factorize the pdfs • Different samples factorize w.r.t. different variable combinations • Real experience: • Code much more manageable and under control • Different components are testable separately • Pdf, random generation • Different approaches to the fits • Getting confidence in the fit results require massive testing

Model for independent PDFs EK D D E

Dealing with kinematical pre-selection -120 MeV < E, EK < 120MeV B A C D A B C D The area is preserved after the trasformation

Extracting the signal B J/K B J/ J/y  ee events Background Likelihood projection J/y  mm events

Possible improvement • Tools for upper limit extraction based on Toy Monte Carlo • Adaptable code available (from B analysis) • Support for 2 fit with full covariance matrix • Provide more “standard” pdfs and random generators • Exponential, Argus, Crystal ball, … • Generic Hit-or-miss generator, etc. • Managing singular PDF • Delta-Dirac components • Factorize out dependence on ROOT, CLHEP, etc. • Done for random generators • Can be improved for Minuit interface • More thoughts about the interface • Is passing a double* as set of PDF variables suitable?

A Toolkit for the modeling of Multi-parametric fit problems

A Toolkit for the modeling of Multi-parametric fit problems

Presentation Transcript

PARAMETRIC SOLID MODELING AND DESIGN

SPICE Toolkit Common Problems

Parametric Modeling

MOAT: A Multi-Object Assignment Toolkit

Recent Parametric Instability Modeling

Review of the Growing Modeling Toolkit

SRILM - The SRI Language Modeling Toolkit

PREFER: A System for the Efficient Execution of Multi-parametric Ranked Queries

Introduction to Solid Modeling Parametric Modeling

The parametric equations of a line

PARAMETRIC OPTIMAL CONTROL PROBLEMS

Parametric Modeling

Multi-View Stereo : A Parametric Study

A Comparison of Modeling Scales in Flexible Parametric Models

SPICE Toolkit Common Problems

Parametric Modeling

A MULTI-SCALE/MULTI-PHYSICS MODELING FRAMEWORK FOR SOLIDIFICATION SYSTEMS

E-R Modeler: A Database Modeling Toolkit for Eclipse

Update on the Goodness of Fit Toolkit

Introduction to Parametric Modeling

Spring Design using Parametric Modeling