1 / 26

Computational Strategies for Tool Marks: Principal  Component  Analysis  and  Other  Methods

Computational Strategies for Tool Marks: Principal  Component  Analysis  and  Other  Methods. Outline. Introduction Details of Our Approach Data acquisition Surface/Profile Pre-processing Methods of Statistical Discrimination Error Rate Estimates Suggested Operating Guidelines.

caesar
Download Presentation

Computational Strategies for Tool Marks: Principal  Component  Analysis  and  Other  Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computational Strategies for Tool Marks: Principal  Component  Analysis  and  Other  Methods

  2. Outline • Introduction • Details of OurApproach • Data acquisition • Surface/Profile Pre-processing • Methods of Statistical Discrimination • Error Rate Estimates • Suggested Operating Guidelines

  3. Background Information • All impressions made by tools and firearms can be viewed as numerical patterns • Machine learning trains a computer to recognize patterns • Can give “…the quantitative difference between an identification and non-identification”Moran • Can yield identification error rate estimates • May be even confidence for I.D.s, posterior error probability estimates…??

  4. Data Acquisition • Obtain striation/impression patterns from 3D confocal microscopy • Store files in ever expanding database • Data acquisition is labor intensive and time consuming • Data files are available to practitioner community through web interface

  5. Screwdriver Striation Patterns in Lead 3D surfaces (interactive) 2D profiles

  6. Glock Fired Cartridges Bottom of Firing pin imp. Glock 19 fired cartridge cases

  7. Shear marks on primer of two different Glock 19s

  8. Mean total profile: Mean “waviness” profile: Mean “roughness” profile:

  9. Profile Simulator • Nice for data sets to be large but real data acquisition is SLOW • Need SOMETHING in the interim • Our simulator is based on DWT MRA • May shed light on processed generating surfaces • Should be extendable to 2D striations/impressions…

  10. What Other Statistics Can Be Used? • Multivariate Statistical pattern comparison! • Modern algorithms are called machine learning • Idea is to measure features of the physical evidence that characterize it • Train algorithm to recognize “major” differences between groups of features while taking into account natural variation and measurement error.

  11. Setup for Multivariate Analysis • Need a data matrix to do machine learning Represent as a vector of values {-4.62, -4.60, -4.58, ...} • Each profile vector is a row in the data matrix • Typical length is ~4000 points/profile • 2D surfaces are far longer • HIGHLY REDUNDANT representation • of surface data • PCA can: • Remove much of the redundancy • Make discrimination computations • far more tractable

  12. 3D PCA 24 Glocks, 720 simulated and real primer shear profiles: • How many PCs should we use to represent the data?? • No unique answer • FIRST we need an algorithm to I.D. a toolmark to a tool • ~47% variance retained

  13. Support Vector Machines • Support Vector Machines (SVM) determine efficient association rules • In the absence of any knowledge of probability densities • For small to medium sized data sets Gaussian LDA decision boundary SVM decision boundary

  14. PCA-SVM • How many PCs should we use? • Gather training set • Use systematically increase PC dimension and check Hold-one-out Cross-Validation with SVM With 7 PCs, expect ~3% error rate

  15. Error Rate Estimation • Cross-Validation: systematically hold-out chunks of data set for testing • Most common: Hold-one-out • Bootstrap: Make up data sets with randomly selected observation vectors (with replacement) • Repeat thousands of times • Can yield confidence intervals around error rate estimate • The Best: Small training set, BIG test set

  16. 18D PCA-SVM Primer Shear I.D. Model, 2000 Bootstrap Resamples Refined bootstrapped I.D. error rate for primer shear striation patterns= 0.35% 95% C.I. = [0%, 0.83%] (sample size = 720 real and simulated profiles)

  17. How good of a “match” is it? Conformal Prediction • Can give a judge or jury an easy to understand measure of reliability of classification result • Confidence on a scale of 0%-100% 80% confidence 20% error Slope = 0.2 99% confidence 1% error Slope = 0.01 95% confidence 5% error Slope = 0.05 • This is an orthodox “frequentist” • approach Cumulative # of Errors • You choose the confidence level • A Set of possible I.D. obtained • Varying confidence changes • the size of the sets Sequence of Unk Obs Vects

  18. Empirical Bayes’ • Bayes’ Rule: can we realistically estimate posterior error probabilities empirically/falsifiably?? • Given algorithm indicates a toolmark is associated with a tool, what is the probability it is truly not a “match”? • Probability that there is another tool in the pop. that can generate a toolmark which the algorithm will match to the wrong tool Probability of no actual association given a test/algorithm indicates a positive ID

  19. Empirical Bayes’ • Erfon’s machinery for “empirical Bayes’ two-groups model”Efron 2007 • Surprisingly simple! • S-, truly no association, Null hypothesis • S+, truly an association, Non-null hypothesis • z, a Gaussian random variate derived from a machine learning task to ID an unknown pattern with a group • Scheme yields estimate of Pr(S-|z) along with it’s standard error • Called the local false discovery rate (fdr) or posterior error probability • Given a similarity score, fdr is an estimate of the probability that the computer is wrong in calling a “match” • Catch: you need A LOT of z and they should be fairly independent and Pr(S-) > 0.9

  20. Empirical Bayes’ • Try Erfon’s machinery for “empirical Bayes’ two-groups model” • Gives a calibrated “no-match posterior probability” model The SVM alg got these Primer shear IDs wrong

  21. Empirical Bayes’ • Model’s use with crime scene “unknowns”: This is an uncertainty in the estimate This is the est. post. prob. of no association = 0.00027 = 0.027% Computer outputs “match” for: unknown crime scene toolmarks-with knowns from “Bob the burglar” tools

  22. Future Directions • All feature vectors must be aligned and of the same length, • Use CCF to align within and between guns. • Padded any overhang with uniform random noise Glock 1 Glock 23 • A real pain for 1D, not to mention 2D!!

  23. Future Directions • Try invariant feature extraction from digital imaging: • Legendre moments: • Translationally invariant • Trying for 2D striation patterns • Zernike moments: • Rotationally and translationally invariant but more expensive. • Trying for 2D impression patterns

  24. Future Directions • Extend ImageJ Surface Metrology functionality • DigitalSurf Mountains map is EXPEN$IVE!! • DWT and MODWT modules for • De-noising • Form-removal • Surface Filtering • GP-GPU implementation of computationally intensive routines • This WILL work in Java with JOCL bindings • Need big surfaces to make copying overhead worthwhile

  25. Acknowledgements • Helen Chan • Julie Cohen • Aurora Dimitrova • FraniKammerman • Loretta Kuo • Dale Purcel • Stephanie Pollut • Chris Singh • Melodie Yu • Research Team: • Mr. Peter Diaczuk • Ms. Carol Gambino • Dr. James Hamby • Dr. Brooke Kammrath • Dr. Thomas Kubic • Mr. Chris Lucky • Off. Patrick McLaughlin • Dr. Linton Mohammed • Mr. Jerry Petillo • Mr. Nicholas Petraco • Dr. Graham Rankin • Dr. Jacqueline Speir • Dr. Peter Shenkin • Mr. Peter Tytell

  26. Website Information and Reprints/Preprints: toolmarkstatistics.no-ip.org/ npetraco@gmail.com npetraco@jjay.cuny.edu

More Related