1 / 19

פרקים נבחרים בפיסיקת החלקיקים

פרקים נבחרים בפיסיקת החלקיקים. אבנר סופר אביב 2007 4. Simplest variable combination: diagonal cut. Combining variables. Many variables that weakly separate signal from background Often correlated distributions Complicated to deal with or to use in a fit

duscha
Download Presentation

פרקים נבחרים בפיסיקת החלקיקים

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. פרקים נבחרים בפיסיקת החלקיקים אבנר סופר אביב 2007 4

  2. Simplest variable combination: diagonal cut

  3. Combining variables • Many variables that weakly separate signal from background • Often correlated distributions • Complicated to deal with or to use in a fit • Easiest to combine into one simple variable Fisher discriminant:

  4. Neural networks BB & qq Background MC BB MC Continuum MC Signal MC

  5. Signal BB bgd cc+uds Input variables for neural net Legendre Fisher Log(Dz) cosqT Log(K-D DOCA) Lepton tagging (BtgElectronTag & BtgMuonTag)

  6. Uncorrelated, (approximately) Gaussian-distributed variables • “Gaussian-distributed” means the distribution of v is • How to combine the information? • Option 1: V = v1 + v2 • Option 2: V = v1 – v2 • Option 3: V = a1 v1 + a2 v2 • What are the best weights ai? • How about ai= (<vis> – <vib>) = difference between the signal & background means Signal Background v1 Background v2 Signal

  7. Incorporating spreads in vi • <v1s> – <v1b> > <v2s> – <v2b>, but v2 has a smallerspreads and more actual separation between S and B • ai = (<vis> – <vib>)/((sis)2 + (sib)2)where (sis)2 = <(vis – <vis>)2> = e (vies – <vis>)2 / Nis the RMS spread in the vi distribution of a pure signal sample (similarly defined for sib) • You may be familiar with the form<(v – <v>)2> = <v2> + <<v>2> – 2 <v<v>> = <v2> - <v>2 v1 Signal Background v2 Background Signal

  8. Linearly correlated, Gaussian-distributed variables • Linear correlation: • <v1> = <v1>0 + c v2 • (s1)2 independent of v2 • ai = (<vis> – <vib>) / ((sis)2 + (sib)2)doesn’t account for the correlation • Recall (sis)2 = <(vis – <vis>)2> • Replace it with the covariance matrix Cijs = <(vis – <vis>)(vjs – <vjs>)> • ai = j (<vis> – <vib>) (Cijs + Cijb)-1 • Fisher discriminant: F  j ai vi Inverse of the sum of the S+Bcovariance matrices

  9. Fisher discriminant properties • Best S-B separation for a linearly correlated set of Gaussian-distributed variables • Non-Gaussian-ness of v is usually not a problem… • There must be a mean difference <vis> – <vib>  0 • Need to calculate ai coefficients using (correctly simulated) Monte Carlo (MC) signal and background samples • Should validate using control samples(true for any discriminant) Take abs value

  10. More properties • F is more Gaussian than its inputs • (virtual calorimeter example) • Central limit theorem: • If xj (j=1, …n) are independent random variables with means <xj> and variances sj2, then for large n, the sum j xj is a Gaussian-distributed variable with mean j <xj> and variance j sj2 • F can usually be fit with 2 Gaussians or a bifurcated Gaussian • A cut on F corresponds to an (n-1)-diemensional plane cut through the n-dimensional variable space

  11. Nonlinear correlations • Linear methods (Fisher) are not optimal for such cases • May fail altogether if there is no S-B mean difference

  12. Artificial neural networks • “Complex nonlinearity” • Each neuron • takes many inputs • outputs a response function value • The output of each neuron serves as input for the others • Neurons divided amonglayers for efficiency • The weight wijl between neuron i in layer l and neuron j in layer l+1 is calculated using a MC “training sample”

  13. Response functions • Neuron output = r (inputs, weights) = a(k(inputs, weights))

  14. Common usage a = linear in output layer a = tanh in hidden layer k = sum in hidden & output layer

  15. Training (calculating weights) • Event a (a=1…N) has input variable vector x = (x1…xnvar) • For each event, calculate the deviation from the desired value (0 for background, 1 for signal) • Calculate the error function for random values w of the weights

  16. … Training • Change the weights so as to cause the most steep decline in E: • “online learning”: remove the sums • Requires a randomized training sample

  17. What architecture to use? • Weierstrass theorem: for a multilayer perceptron, 1 hidden layer is sufficient to approximate a continuous correlation function to any precision, if the number of neurons in the layer is high enough • Alternatively: several hidden layers and less neurons may converge faster and be more stable • Instability problems: • output distribution changes with different samples

  18. What variables to use? • Improvement with added variables: • Importance of variable i:

  19. More info • A cut on a NN output = non-linear slice through n-dimensional space • NN output shape can be (approximately) Gaussianized: • q  q’ = tanh-1[(q – ½ (qmax+qmin) / ½(qmax – qmin)]

More Related