- 95 Views
- Uploaded on
- Presentation posted in: General

Flight Test and Statistics

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

PRESENTED BY

Richard Duprey

Director, FAA Certification Programs

National Test Pilot School

Mojave, California

Flight Test and Statistics“If you want to be absolutely certain you are right, you can’t say you know anything.”

- Background on National Test Pilot School
- Coverage of Statistics
- Scope - six hours of academics
- Detail

- Use of statistics in flight test
- Types of questions we try to answer

- Private non-profit
- Grants Master Science

- Only civilian school of its kind
- SETP equivalent to USAF and Navy Test Pilot Schools

- Offers variety of courses (Fixed Wing and Helicopters)
- Professional - 1 year
- Introductory
- Performance and Flying Qualities Testing
- Systems Testing
- Operational Test and Evaluation
- NVG

- FAA Test Pilot / FTE initial and recurrent training

0

z

- Types of Errors
- Types of Data
- Elementary Probability
- Classical Probability
- Experimental Probability
- Axioms

- Examples

- Flight testing involves data collection
- time to climb
- fuel flow for range estimates
- qualitative flying qualities ratings
- INS drift rate
- Landing and Take-off data
- Weapon effectiveness

- All of these experimental observations have inaccuracies
- Understanding these errors, their sources, and developing methods to minimize their effect is crucial to good flight testing

- There are two very different types of errors
- systemic errors and random errors

- Systemic errors
- repeatable errors
- caused by flawed measuring process
- ex: measuring with an 11 inch ruler or airspeed indicator corrections

- Random errors
- not repeatable and usually small
- caused by unobserved changes in the experimental situation
- errors by observer - reading airspeed indicator
- unpredictable variations - small voltage fluctuations causing fuel counter errors

- can’t be eliminated but typically distributed about a well defined distribution

- There are four types of numerical data:
- NOMINAL DATA
- numerical in name only - say an aircraft configuration
- 1 = gear down, 2 = gear up, 3 = slats extended

- normal arithmetic processes not applicable
- 3 >1 or 3-1=2 are not valid relationships

- numerical in name only - say an aircraft configuration
- ORDINAL DATA
- contains information about rank order only
- #1 = C-150, #2 = B-1, #3 = F-15

- in terms of max speed: 3>1 is valid, but not 3-1=2

- contains information about rank order only

- NOMINAL DATA

- There are four types of numerical data (continued)
- INTERVAL DATA
- contains rank and difference information - ex: temperature in degrees Fahrenheit
- 30, 45, 60 at different times, 15 deg. difference
- zero point arbitrary, so 60o F is not twice 30oF

- contains rank and difference information - ex: temperature in degrees Fahrenheit
- RATIO DATA
- all arithmetic processes apply
- most flight test data falls into this category
- Can say that a 1000 pound per hour fuel flow is 4 times greater than 250 PPH

- INTERVAL DATA

- Quantitative analysis of random errors of measurement in flight testing must rely on probability theory
- Goal
- Student to understand what technique is appropriate and limitations on the results

- The probability of event A occurring is the fraction of the total times that we expect A to occur -

- Where: - P(A) is the probability of A occurring
- - na is the number of times we expect A to occur
- - N is the total number of attempts or trials

- From this definition, P(A) must always be between 0 and 1
- if A always happens, na = N and P(A) = 1
- if A never happens, na= 0 and P(A) = 0

- In order to determine P(A) we can take two different approaches
- make predictions based on foreknowledge (“a priori”)
- conduct experiments (“a posteriori”)

- If it is true that
- every single trial leads to one of a finite number of outcomes
- and, every possible outcome is equally likely

- Then,
- na is the number of ways that A can happen
- N is the total number of possible outcomes

- For example:
- six-sided die implies six possible outcomes: N = 6
- if A is getting a 6 on one roll, na = 1
- P(A) = 1/6 = 0.1667

- What is the probability of getting two heads when we toss two fair coins?
- There are four possible outcomes (N = 4)
- (H,H) (H,T) (T,H) (T,T)

- na = 1 since only one of the possible outcomes results in two heads (H,H)
- Thus P(A) = 1/4 = 0.25

- Approach instructive
- Generally not applicable to flight test where:
- Possible outcomes infinite
- Each possible outcome not equally likely
- Leads us to second approach

- Experimental probability is defined as
- Where
- nA obs is the number of times we observe A

Versus . number of times we expect A to occur

- Nobs is the number of trials

- If the probability of getting heads on a single toss of a coin is determined experimentally, we might get

1.0

Porb

(heads)

0.5

0

norb

1000

100

10

1

- Probability Theory can be used to describe relationships between events

- Three probability axioms are easily justified as opposed to proven
- P(not A) = 1 - P(A)
- Probability of something happening has to be one

- P(A or B) = P(A) + P(B)
- P(H or T) = 0.5 + 0.5 =1 for a single coin

- P(A and B) = P(A) x P(B)
- P(T and T) = 0.5 x 0.5 = 0.25 for two coins
- same answer we got when examining all possible outcomes

- P(not A) = 1 - P(A)
- The last two axioms require that
- each outcome is independent
- A occurring doesn’t affect probability of A or B occurring

- each outcome is mutually exclusive
- Only one can occur in a single trial

- each outcome is independent

- Problem:
- Based on test data, 95% of the time an F-4 will successfully make an approach-end barrier engagement on an icy runway
- what is the probability that at least one of a flight of four F-4’s will miss?

- Based on test data, 95% of the time an F-4 will successfully make an approach-end barrier engagement on an icy runway
- Solution:
- P (1 or more miss) = 1 - P(all engage)
- Probability that at least one will miss is the complement of the probability that all will engage

- P (all engage) = P(1st success) × P(2nd ) × P(3rd) × P(4th)
= 0.95 × 0.95 × 0.95 × 0.95 = 0.954 = 0.81

Thus,

- P (1 or more miss) = 1 - 0.81 = 0.19

- P (1 or more miss) = 1 - P(all engage)

Example

- Problem:
- What is the probability of getting 7 or 11 on a single roll of a pair of dice?

- Solution:
- Since getting 7 or 11 are independent, mutually exclusive events, we can say
- P (7 or 11) = P (7) + P (11)

- N = 62 = 36
- n7 = 6
- (6, 1) (1, 6) (5, 2) (2, 5) (4, 3) (3, 4)

- n11 = 2
- (6, 5) (5, 6)

- Thus,
- P (7) = 6/36, P (11) = 2/36
- P (7 or 11) = 6/36 + 2/36 = 0.222

- Since getting 7 or 11 are independent, mutually exclusive events, we can say

- Populations and Samples
- Measures of Central Tendency
- Dispersion

- Probability Distributions
- Discrete
- Continuous
- Cumulative

- A population is all possible observations
- Many populations are infinite
- A pair of dice can be rolled indefinitely
- Population of F-117 weapons deliveries is all the possible drops it could make in its lifetime

- Some populations are limited
- Votes by registered Republicans

- Many populations are infinite
- A sample is any subset of a population
- For example
- 100 rolls of a pair of dice
- Bomb scores for 100 weapon delivery sorties

- For example

- Constructing a population
- Must impose assumptions
- Homogenous
- Independent
- Random

- Must impose assumptions

- Homogeneous
- the data must come from one population only
- DC-10 take-off data shouldn’t be used with MD-11

- Independent
- selecting one data point must not affect subsequent probabilities
- selecting and removing a heart from a deck of cards changes the probability of drawing another heart
- DC-10 landing 75 feet past touchdown aim point on one landing doesn’t change probability that next landing will miss by same distance (or any distance)

- Random
- equal probability of selecting any member of population
- using a member of a population with a bias would be non-random
- F-16 with boresight error would cause a bias in downrange miss distance

- Given homogenous, independent, random sample, need to describe the contents of that sample
- Measure steel rod diameter with a micrometer - would get several different answers
- Tighten the micrometer
- Dust particles on the rod
- Reading scale on micrometer

- What to do with answers that are different?

- There are three common measures of central tendency:
- Mean (arithmetic average) - most commonly used
- Mode
- most common value in the sample
- there may be more than one mode

- Median
- middle value
- for an even-numbered sample, average the two middle values

- Dangers ........

- Just reporting the mean as the answer can be very misleading
- Consider the following two samples, both with a mean of 100 (and same median as well)
- Sample 1: 99.9, 100, 100.1
- Sample 2: 0.1, 100, 199.9

- We also need to report how much the data generally differs from the mean value

- We define deviation as the difference between the ith data point and the mean:
- Averaging the deviations does not help:

- Since there as many deviations above and below the mean, we could average the absolute values of deviations:

- While the mean deviation can be used, the standard deviation s is a more common measure of dispersion:
- versus
- The square of the standard deviation, s2, is called the variance

- Normally, we use Greek letters to denote statistics for populations:
m for population mean

s2 for population variance

- And we use Roman letters for sample statistics:
for sample mean

s2 for sample variance

- One other difference exists between s and s
- The sample standard deviation has the sum of the squares divided by N - 1 versus N
- Mathematically, this is due to a loss of one degree of freedom
- The effect is to increase the standard deviation slightly
- Difference decreases as sample gets larger

- Two data points eliminated - wrong configuration, improper technique
- Data adjusted for standard weight (2150 lbs.), runway slope (GPS), temperature, pressure, airspeed/altimeter corrections
- Technique, rotate at 65, liftoff at 70, maintain 75 until 50 feet AGL

- Statistical applications requires understanding of the characteristics of the data obtained
- Probability distributions gives us such understanding

- To understand probability distributions, consider the problem of tossing 2 coins
- Let n represent the number of heads for a single toss of both coins
- Then the probabilities of getting n = 0, 1, or 2 can be calculated:
- for n = 0, P(0) = 0.25
- for n = 1, P(1) = 0.5
- for n = 2, P(2) = 0.25

- We can present the data as a bar graph

- In flight test, we are concerned with empirical distributions versus theoretical in the coin example
- If we collect data on landing errors:

- If we get more and more data, and make the intervals smaller, our histogram approaches a continuous curve:
Continuous Probability Distribution of Touchdown Miss Distance

- Can’t be interpreted same way as the previous discrete distribution

- Height of curve above a point is not the probability of “x” having that point value
- Any one point on the x-axis represents a non-zero point on the curve
- But the probability associated with that single point must be zero, since there are an infinite number of points on the x-axis
- We can meaningfully talk only about the probability of being between two points a and b on the x-axis

- The probability of getting a result between a and b is rep-resented by the area under the probability distribution curve between a and b

f (x)

P(a £ x £ b)

x

- A cumulative probability distribution gives the probability that x is less than or equal to some value, a
- Relative probability of aircraft landing miss distances could be displayed in the following cumulative distribution

1.0

0.95

f (x)

0.5

x

xT

- Special Probability Distributions:
- Binomial
- Normal
- Student’s t
- Chi squared

- The binomial is a discrete distribution
- It tells us the probability of getting n successes in N trials given the probability (p) of a single success
- Limiting cases
- if n = N, then obviously P(N) = pN
- if n = 0, then P(0) = (1 -p)N
- or, letting q = 1 - p, P(0) = qN

- For 0 < n < N, the possible number of combinations of success and failure gives

- Two flight control systems are equally desirable
- What is probability that 6 out of 8 pilots would prefer system A over B?
- If A and B are truly equally good, probability of pilot picking A over B is 0.5 (P=q =0.5)
- Probability of 6 pilots picking A over B is:
= 0.109

- There is only a 11% probability that this would happen. If it did, it would mean that your initial assumptions about the two flight control systems was in error

- If p = q = 0.5, then for N = 8, the binomial distribution would be and from the figure, P(2) is about 11%

- The normal distribution is a continuous probability distribution based on the binomial
- SINGLE MOST IMPORTANT DISTRIBUTION IN FLIGHT TEST ANALYSIS

- Any deviation from a mean value is assumed to be composed of multiples of elemental errors evenly distributed
- The mathematical derivation is left as an exercise

- Graphically, it can be seen that x = m gives the maximum value and x = m ± s are the two points of inflection on the curve

f (x)

x

m

m+s

m-s

- Thus the probability that x lies between some value a and b is given by:
- Major problem - cannot be solved explicitly
- numerical techniques are required
- tables could be used, but different tables would be required for each m and s.

- By using a substitution of variables
- We can use tables for a normal distribution where the mean is zero and the deviation is one
- Thus
- Becomes
- Mean of zero and a standard deviation of one

-3

-2

-1

0

1

2

3

99.7%

95%

68%

f(z)

2.5%

13.5%

34%

34%

13.5%

2.5%

z

- Cruise performance test flown 40 times
- Mean fuel used was 8,000 pounds
- Standard deviation was found to be 500 pounds

- Find probability that on the next sortie, we will use between 7000 and 8200 pounds
- Given m = 8000, s = 500
- find the probability that 7000 < x < 8200

- From table: 0.6554-0.0228 = 0.6326
- 63% Probability that fuel used would be within the specified range

- Problem : To use the normal distribution we had to know the population mean and standard deviation
- Flight Test - don’t normally know the population - just have sample
- The difference between sample and population mean is described by the statistic:

- Different t distributions must be tabulated for each value of n
- For large n, the t-distribution approaches the standard normal distribution - use normal distribution when n =30

n = 10

n = 2

t

- B-33 landing distance example

- Just as the sample mean may differ from the population mean, we should expect a difference in the variances
- The difference is distributed according to:

1

2

3

4

5

6

7

8

9

10

f (c) 2

n = 1

n = 4

n = 10

c2

- Find c2 for 95th percentile (11.1)
- one-tailed
- 5 degrees of freedom

- Find c2 for 95th percentile (0.831,12.80)
- two-tailed
- 5 degrees of freedom

- Find the median value of c2 (27.3)
- 28 degrees of freedom

- Confidence Limits
- Intervals for mean and variance

- Hypothesis Testing
- Null and alternate hypotheses
- Tests on mean and variance

- In practice, we take a sample from a population such as Take-off distance
- Report it as if it were the true answer
- Subsequent tests will differ - sample mean/variance will differ from true population

- Can be considered sufficiently accurate if we
- Standardize test method and conditions
- Take sufficient samples

- Quantitative methods (confidence intervals) exist to determine how certain we are that we have the correct answer

Given a population with mean m, and variance s2, then the distribution of successive sample means, from samples of n observations, approaches a normal distribution with mean m, and variance s2/n

- Regardless of original Distribution of A, the distribution of the means will be approximately normal - gets better as n increased
- Mean of the means will be the same as the mean of A
- Variance of means = function of variance of A divided by n

Sample

size n

Þ

x

x

f(z)

a

2

a

2

z

- If we take samples of size n, the means of multiple tests (okay samples) will be normally distributed

- Thus

- If z comes from one of our samples
or, using the central limit theorem

- Thus

- Thus (1 - a) percent of the time, the true population mean m, will be within a certain range about the sample mean
- The range of values is the interval
- And (1 - a) is the confidence level

- Find 95% confidence interval for F-100 engine thrust given:
n= 50 engines tested

mean thrust = 22,700 lbs

s = 500 lbs

- At 95%, a =0.05, Z 1- a/2 = 1.96
= 22,700 +/- 1.96 ( )

22,561 < < 22,839

- At 99%, a =0.01, Z 1- a/2 = 2.58
= 22,700 +/- 2.58 ( )

22,518< < 22,882

- Observations
- Interval widens for increased certainty
- Had to use “s” as an estimate for , legitimate for n >30

- Some flight tests involved repeated numerous test points, most do not
- But when n <30, we must substitute t for z
- For example, if our earlier problem were based on only a sample of 5, what would the 95% confidence interval be?

- Find 95% confidence interval for F-100 engine thrust given:
n= 5 engines tested

mean thrust = 22,700 lbs

s = 500 lbs

- At 95%, a/2 =0.025, =4, t 4, 0.975 = 2.78
= 22,700 +/- 2.78 ( )

22,078 < < 23,321

vs. 22,561 < < 22,839 for 95% with =50

vs. 22,518 < < 22,882 for 99% with =50

- Had to use “s” as an estimate for , legitimate for n >30

- Similar to intervals for means, the confidence interval for variance is based on the c2 statistic:
- For example, find the 95% confidence interval where n = 6, s = 2

- At 95%, a/2 =0.025, 1- a/2 = 0.975, v =5, s = 2
>>>

- Large band due to small sample size, if n = 18, interval would be smaller

- Instead of just using data to estimate of some parameter, we hypothesize an answer and then use data to judge reasonableness
- Truth can be known with certainty only if we examine the entire population
- Example
- assume a coin is fair (hypothesis)
- toss the coin 100 times
- if results are
- 48 heads, conclude coin is fair
- 35 heads, conclude coin is not fair

- Acceptance of a statistical hypothesis
- result of insufficient evidence to reject it
- doesn’t necessarily mean that it is true

- Thus, it is important to carefully select initial hypothesis (the null hypothesis - H0 )
- selected for purposes of rejecting it – called the null hypothesis
- if we don’t gather enough data we must accept the null hypothesis
- Formulated so that in case of insufficient data, we return to the status quo or safe conclusion

- Examples of null hypothesis
- the defendant is innocent
- the new RADAR is no better than the old
- the MTBF of a new part is no better than the old

- Since we are trying to negate the null hypothesis (H0) with data, the alternate hypothesis (H1) must be defined -- H0 must be “opposite” of H1
- Examples:
- 1. H0: m = 15 H1: m ¹ 15
- 2. H0: p ³ 0.9 H1: p < 0.9
- 3. Lock-on range of new radar is better than old

- A Type I error
- rejecting null hypothesis when it is true
- chance variation of fair coin gives 35/100 heads

- probability is denoted as a (the level of significance)

- rejecting null hypothesis when it is true
- A Type II error
- accepting null hypothesis when it is false
- 43/100 concluded as fair when P(A) = 0.4

- probability is denoted as b (the power of the test)

- accepting null hypothesis when it is false
- We want small a
- as a decreases, b increases (fixed sample size)
- Large b implies we stay with the status quo, H0 more frequently than we should - a more “acceptable error”

- to decrease both , increase sample size

- as a decreases, b increases (fixed sample size)

- Step One
- Form null and alternate hypothesis

- Step Two
- Choose level of significance (a)
- Define areas of acceptance and rejection (one or two tailed)

- Step Three
- Collect data and compare to expectations

- Step Four
- Accept or reject the null hypothesis

- Some tests - interested in extremes in either direction
- Two Tailed

- Example: Burn times on an ejection seat rocket motor
- Too short - don’t clear aircraft
- Too long - impose too many g’s on pilot

- Form hypothesis of the form
- H0: m = m0 H1: m ¹ m0
- Reject H0 whenever sample produce results too low or high

- Not the usual for flight test - usually deal with “One Tailed”

- Early Testing of F-19 bombing system for 30º dive angles gave
- Cross range error were normally distributed
- Mean error of 20 ft and a standard deviation of 3 feet.

- After a flight control modification to solve a high AOA flying qualities problem, it was found
- Sample mean cross range error for nine bombs was 22 feet.
- Has the mean changed at the 0.05 level of significance?

- Step One
- Form null and alternate hypothesis
- H0: m = 20 (status quo) H1: m ¹ 20

- Step Two
- Choose level of significance: (a) = 0.05 (given)
- Define areas of acceptance and rejection (one or two tailed)
- (a) = 0.05 would be divided into two tails - hi/lo
- extreme values in either direction would indicate change in m
- not changed significantly from unmodified system

- Step Three
- Collect data and compare to expectations

- Step Four
- Accept or reject the null hypothesis

Reject

- Since z = 2 which is > 1.96
- Conclude with 95% confidence to reject null hypothesis
- Mean cross range bombing error has changed due to flight control modification

Reject

a

2

a = 0.025

2

Accept

z

- Most flight tests - interested in extremes in only one direction
- One Tailed - small sample, unknown

- Example: Does aircraft satisfy contractual range requirements
- Only care if distance is shorter than specified

- Form hypothesis of the form
- H0: m m0 H1: m m0
Or

- H0: m m0 H1: m m0

- H0: m m0 H1: m m0
- Reject H0 whenever sample produce results extreme in one direction

- Contract fuel climb requirements
- Use less than 1500 pounds in climb from Sea Level to 20,000 feet

- Test results
- Nine climbs average of 1600 lbs
- Sample standard deviation of 200lbs.

- Do we penalize the contractor?

- Step One
- Form null and alternate hypothesis
- H0: m 1500 (until proven guilty) H1: m 1500

- Step Two
- Choose (a) = 0.05 for level of significance
- (a) = 0.01 reserved for safety of flight questions

- Define areas of acceptance and rejection (one or two tailed)
- one tailed - contract not met only if fuel used was on the high side

- Choose (a) = 0.05 for level of significance

- Step Three
- Collect data and compare to expectations

- Step Four
- Accept or reject the null hypothesis

Reject

- Since t = 1.5 which is < 1.867
- Conclude with 95% confidence to accept null hypothesis
- Contractor has met climb fuel requirements

- Put another way
- Don’t have data @95% confidence level to show contractor failed to meet specs

- Conclude with 95% confidence to accept null hypothesis

a = 0.05

Accept

z

- “Four” steps still valid here
- Substitute chi-squared for z or t
- Example on variance
- The contract states the standard deviation of miss distances for particular weapon system delivery mode must not exceed 10 meters at 90 % confidence.
- In ten test runs we get s = 12 meters.
- Is the contractor in compliance?

- Step One
- Form null and alternate hypothesis
- H0: 10 H1: 10

- Step Two
- (a) = 0.10 was specified
- smaller ’s good >>> implies one sided test
- Extremely large ’s will nullify H0

- (a) = 0.10 was specified

- Step Three
- Collect data and compare to expectations

- Step Four
- Accept or reject the null hypothesis
- Since 13 < 14.7, accept H0 that 10 Meters
- Can’t conclude contractor has failed to meet spec

Data Analysis - Hour 5

- Tests for non- normal distributions
- Sample size
- Error Analysis

- Non-parametric tests make no assumption about population distribution
- Everything so far --- assumed normal
- These tests less useful when used on normal distributions – require a larger sample size to give us same info from the test

- Use “goodness of fit tests” to determine distribution type
- Normal – use methods already describe
- Otherwise, use non- parametric

- Three non-parametric tests useful in flight test

- Three nonparametric tests we’ll use are
- Rank Sum Test
- also U test, Wilcoxon test, and Mann-Whitney test

- Sign Test
- can be applied to ordinal data

- Signed Rank Test
- combination of sign and rank sum tests

- Rank Sum Test
- All test the null hypothesis that two different samples come from the same population - assumes both are equivalent
- Calculates statistics from the two samples
- Determines probability --- decide if original assumption correct

The method (based on binominal distribution) consists of:

- Rank order all data from each sample
- Assign rank values to each data point
- average rank for repeated data values

- Compute the sum of the ranks for each sample (R1, R2)
- Calculate the U statistic for each sample (n = sample size)
- Compare the smaller U to the critical value in reference
- If U < critical value, reject H0 (i.e. 1 = 2 )

- The target detection range (nm) of two radars was
- System 1: 9, 10, 11, 14, 15, 16, 20
- System 2: 4, 5, 5, 6, 7, 8, 12, 13, 17

- Is there a difference between the two systems at 90% confidence?

Score

4

5

5

6

7

8

9

10

11

12

13

14

15

16

17

20

System

2

2

2

2

2

2

1

1

1

2

2

1

1

1

2

1

Rank

1

2.5

2.5

4

5

6

7

8

9

10

11

12

13

14

15

16

- Rank order all scores and assign rank values
R1 = 7+8+9+12+13+14+16 = 79

R2 = 1+2.5+2.5+4+5+6+10+11+15 =57

Calculate U1, U2

- Compare smaller U (12 in this case) with critical values for
- = 0.10 n1 = 7 n2 = 9 Ucr = 15

- Since U < Ucr
Reject null hypothesis that two radar’s have the same performance with 90% confidence

- Require > paired observations of two samples with a “better than” eval
- Can be used on ordinal data, such as pilots preferring system A or B
- Pilot preferring system A over B is same as B over A

- The probability of system A being preferred over system B, x times in N tests is just
- But if H0 is A=B, then p = q = .5, and

- But f(x) is just the probability for one discrete point, such as 3 of 8 pilots preferring A over B, and we need the whole tail
- Thus (i.e. sum)

- Suppose 10 pilots evaluate handling qualities of two different sets of control laws during powered lift approaches
- The results are
- 7 prefer system B
- 2 prefer system A
- 1 had no preference

- Should we switch to the new control laws?

- Null hypothesis is that both systems (old and new) are equally desirable
- Choose 0.5 level of significance since SOF not an issue
- Calculate probability of 0, 1 or 2 pilots choosing system A if there were really no difference
- If probability is less than level of significance, reject H0
- Conclude B is better than A

- Can only be 91% sure that B is really better than A
- Not enough – need 95% to justify added expense of System A
- Thus, accept H0 – no significant difference between A and B

- Combines elements of both the Sign Test and the Rank Sum Test
- That is, the Sign Test can be made more powerful if there is some indication of how much one system was preferred over another
- Method:
- Rank differences by absolute magnitude
- Sum the positive and negative ranks (W+, W-)
- Compare the smaller W with critical values in reference
- Reject H0 if W < Wcr

- If ten pilots who evaluated two competing systems gave them a Cooper Harper rating on a scale of 1 to 10:
PilotSystem ASystem B Difference

1 3 12

2 5 23

3 3 4-1

4 4 31

5 3 30

6 4 22

7 4 13

8 2 11

9 3 12

10 1 2-1

Rank

2.5

2.5

2.5

2.5

6

6

6

8.5

9

Difference

-1

1

1

-1

2

2

2

3

3

- Ranking differences by absolute magnitude, ignoring zero difference:

- Summing positive and negative ranks:
W+ = 2.5 + 2.5 + 6 + 6 + 6+ 8.8 + 8.5 = 40.0

W- = 2.5 + 2.5 = 5.0

- Using = 0.05, WCR =8 (one tailed criteria)
- Since 5 < 8 (WCR ), can reject H0
- There is a difference between A and B with 95% confidence

- Since 5 < 8 (WCR ), can reject H0

- One of the most significant aspects of statistics for flight testing is to determine how much you need to test
- Too few data points will result in poor conclusions or recommendations
- Too many data points will waste limited resources

- Two approaches for determining sample size
- Sample size when accuracy is the driving factor
- An approach for determining significant differences between means

- Tradeoffs

- Required to determine a population statistic such as takeoff distance within some accuracy ~ 10%
- Concept of confidence interval can be used to determine required number of sample points

- Remember the confidence interval of the mean:
- But is the error, thus

- System Program Office wants us to determine Takeoff distance within 10% during the test program
- Historically we find the standard deviation for similar aircraft to be about 20% of the mean
- We need to be 95% confident of our answer
- How many data points should we plan?

- z0.975 = 1.96 for 95% confidence
- = 0.2 historical is 20% of the mean
- Error = +/- 0.1 10% error
- Tests required () =
16 Takeoffs would be required

- Check to see if assumption about standard deviation remains reasonable (test hypothesis on variance) during testing

- For the general problem of whether or not a system meets a specification or if their is a significant difference between two systems, the approach is more complex
- The difference between paired samples (d) from two populations will have some distribution
- If the two populations are the same, the mean of the d’s will be zero
- If they are not the same, the mean will be non-zero

Determining Significant Differences Between Means

- If the difference between the population means is d1, then test results above and below a d of xcwill give
- Test result giving mean difference above xc
- Populations differ in their means with level of significance

- Test result below xc
- Not a difference when in fact there was with probability ß

f (d)

a

b

d

d1=minimum significant difference

xc

- Move xc to right, reduce but increase etc.
- Only to reduce both is to increase sample size
- The sample size needed to determine the difference between two populations is a function of a, b, d1, s1, and s2,

- How many data points are required to determine if a system meets the specification for a weapon delivery accuracy of 5 mils?
- We need
- a; normally set it at 0.10, 0.05, or 0.01 (0.01 is usually reserved for critical safety-of-flight issues) - use 0.05 here
- b; set this larger than a, typically 0.1 or 0.2 - use 0.1 here
- d1; the least difference considered significant - use 1 mil here
- s1 and s2; these come from testing (initially from historical data)
- note that s for a specification is zero
- assume 3 mils for s1 here (i.e results from previous test)

- How many data points are required to determine if a system meets the specification for a weapon delivery accuracy of 5 mils?
- 77 Test points required - probably not feasible - must look at trade-offs
- How significant is it if we change from 0.10 to 0.20 or change 1 from 1 to 1.5?

- The general approach
- can lead to unacceptable answers
- has several choices

- Analyzing these options can lead to logical choices

n

a = 0.1

b = 0.1

b = 0.2

d1

- Sample size cannot be determined with accuracy
- Signed rank test is about 90% efficient as test on means using z statistic
- Calculate n as just described and divide by 0.90

- How many pilots do we need to evaluate new flight control system laws and be 90% certain that there is a significant improvement (defined by Cooper Harper Scale)?
a= 0.10 b=0.20 (arbitrary) d1 = 1

s1, s2 - review of similar tests show s 1

- Yields
- Thus -- 10 Evaluation pilots would be needed

- Thus far we have discussed errors of directly measured parameters
- In flight test we normally combine observations into calculated values
- fuel used = fuel flow x time
- specific range = velocity / fuel flow

- The propagation or combinations of errors can thus be significantly larger the one individual piece would imply

- The number of significant figures in a result implies a level of precision
- Definition
- the left most nonzero digit is the most significant figure
- the least significant figure is
- right most nonzero digit (no decimal point)
- right most digit (with a decimal point)

- all digits between least and most significant are significant digits

- Rules
- addition/subtraction: keep one more decimal digit than in least accurate number
- other: use one more digit than in least accurate, then round result to least accurate

- Ex. Timing event with watch with tenth of a second division
- shouldn’t record more than two decimal places --10.24 seconds

- Precision of computed value is dependent on the precision of each directly measured value
- Example
Partial

Derivative

Form

In a computed value (say Q) it can be shown that the error in Q (DQ) where Q = f(a,b,c...) is:

- But in this course, we have seen that individual errors are stochastic (randomly variable), so
- Example
- Find the standard deviation of CL (lift coefficient) given a 1% standard deviation each for n, W and Ve

- Where:
- A 1 % error in each term gives a 2.4% error in the final result

Questions?