310 likes | 344 Views
Explore the accuracy of error rates in biometric systems, assess sample quality, and calculate confidence intervals using various estimation methods. Understand the implications of biased and unbiased estimators and delve into confidence interval calculations. Discover parametric and non-parametric estimation methods, including subset and bootstrap techniques. Consider dependencies in fingerprint databases and how they impact confidence intervals.
E N D
Chapter 15System Errors Revisited Ali Erol 10/19/2005
System Errors Revisited • Quantify the accuracy of FAR and FRR estimates. • Confidence Intervals, a well known technique used in statistical analysis. • See references [22],[23]. • The first three author’s algorithm [23] experimentally demonstrated to provide better Confidence Intervals estimates.
FAR/FRR • Definition: FRR(x)=Prob(smx/H0)=F(x) FAR(y)=Prob(sn>y/Ha)=1-Prob(sn y/Ha)=1-G(y) • We need • F(x)=Dist(x) : Genuine (Matching) score DF • G(y)= Dist(y): Imposter (Non-matching) score DF
FAR/FRR • Instead we have • Set of genuine scores X={X1, X2, …., XM} • Set of imposter scores Y={Y1,Y2, …., YN} • We estimate
Problem • What is the accuracy of these error rates? • The number of biometric samples • The quality of the samples • Data collection procedure (e.g. 10 consecutive samples) • Subjects involved, the acquisition device etc.
An Estimation Problem Given x: A random variable (F(x) denotes Dist(x)) X={X1, X2, …., XM}: Sample set Estimate =E(x) Solution Error (Unbiased estimator*)
Biased/Unbiased Estimators • For an unbiased estimator we have • Example: Gaussian Model: Estimate mean 1and variance 2 using maximum likelihood criterion i.e. maximize Prob(X/ ,) (Unbiased estimator) (Biased estimator) (Unbiased estimator)
Confidence Interval • Assume F(x) is given then Dist(r) can be calculated • r is function of , which is a function of x • Calculate (1-) 100% certainty (Next Slide) r[1(,X), 2(,X)] • Which leads to (1-)100% confidence interval for given by
Confidence Interval • Example • Discard /2 on lower and higher ends • Find the r values corresponding to the interval boundary (called quantile) Dist(r) r Prob(q(/2) r q(1-/2))=1-
Confidence Interval • Interpretation: • Generate sample sets X from F(x) • Calculate confidence intervals for each X • (1-)100% of these intervals contain .
Parametric Method • Xi identically distributed • Assume Xiare independent(not true in general) • Then can be taken to be normal distribution using central limit theorem (large M). • Result: • E.g. For 95% confidence z=1.96 • Smaller interval with increasing M and
Non-Parametric Method Sample Set X • Assume F(x) is available. f(x) Density of Additional Sample Sets Random Variable
Non-Parametric Method • FACT: For large B we have • Define error to be • Calculate Dist(r) • Solution:
Dist(r) r /2 /2 Non-Parametric Method • Interval calculation: Sorting and counting
Bootstrap Method • F(x) is not available; all we have is X • How do we generate ? • Solution (i.e. Bootstrap method): Sampling with replacement from X. • Put the samples in a bag, draw, record and put it back. • Draw M samples from X B times. Some samples Xi may not be in each set.
/2 /2 Bootstrap Method (Imperfections) • Xi are not independent. • In SR the dependence between samples is not replicated. • Effect of dependence for independent samples • Variance of is smaller • Leads to smaller CIs
Subset Bootstrap • Potential sources of dependency • All samples from the same person (e.g. multiple fingers) • All samples from same biometric (e.g. finger) • Partition X into independent subsets • Apply SR on subsets.
Subset Bootstrap (An example) • Fingerprint database • P persons • c fingers per person D=cP Fingers • d samples per finger • DB Size= cPd • Matching pairs • d(d-1) per finger • cd(d-1) per person • cPd(d-1)=Dd(d-1) total • Using a symmetric and asymmetric matcher does not make any difference [23].
Subset Bootstrap (An Example) • X1 X2 • X1: P=10 c=2, D=20, d=8 M=1120 • X2: P=50 c=2, D=100, d=8 M=5600 • Finger based partition: Set subsets to be the samples from the same finger (i.e. D subsets of d(d-1) matching scores) • Person based partition: Set subsets to be the samples from the same person (i.e. P subsets of cd(d-1) matching scores)
Subset Bootstrap (An Example) • We expect • CI1 (light gray) to be larger than CI2 (dark gray) • Because X1 has smaller number of samples • CI2 (dark gray) to be contained in CI1 (light gray) • Because X1 X2 • The intervals are larger for person based partitioning • There is dependency between fingers of the same person
CIs for FAR/FRR • Calculate CIs for each threshold T=t0 and given an
CI for FRR • Given genuine score set X • Generate • Calculate • Sort and count
CI for FAR • Given imposter score set Y • Generate • Calculate • Sort and count
Subset Bootstrap for FAR • Imposter scores Y are not independent • We are using multiple impressions of the same finger. • Let Ixk: kth finger impression from subject x then sim(Ia1,Ib1), sim(Ia1,Ib2), sim(Ia2,Ib3) are not statistically independent • Use a finger only once; for D fingers we have only D/2 such pairs • There is actually dependency between X and Y
Subset Bootstrap for FAR • Fingerprint database • P persons • c fingers per person D=cP Fingers • d samples per finger • DB Size= cPd • Non-matching pairs • N=d2D(D-1)=P[(dc)2(P-1)+d2c(c-1)] • d2(D-1) per finger • (dc)2(P-1)+d2c(c-1) per person
Subset Bootstrap for FAR …. …. DB Partition IN Ii I1 x Y1=IixI1 YN-1=IixIN Ii • Finger (N=D): Take Ii (d elements), match itagainst Iki(d2 pairs) then we have d2(D-1) pairs. Repeat it with all Ii to construct subsets Yk • Person (N=P): Take Ii (cd elements), match itagainst Iki((dc)2 pairs) then we have (dc)2(P-1) pairs. Inside Ii we have d2c(c-1) pairs. Repeat it with all Ii to construct subsets Yk • Not completely independent: We use Ii many times.
Subset Bootstrap for FRR • Person subset is a better estimate
How good are the CIs? • There exists a true confidence interval (At the beginning we assumed F(x) is known) • The CI we calculate is just one estimate. • How accurate is that estimate?
How good are the CIs? • We estimate E(x) • Ideal Test: Assume F(x) is available • Generate • Calculate • Assume and test if
How good are the CIs? • Practical Test (for comparison) • Randomly split X into two subsets Xa and Xb • Calculate and CIa • Test • Repeat 1-3 many times and count the number of hits i.e. probability of falling into the CIa • Hit rate is not equal to the confidence. Assume have normal distribution. • The higher the hit rate is the better the estimates are.
How good are the CIs? • =0.1 • Person based partitioning provide more accurate confidence intervals • 73.10% is very close to the expected value