HSBC Six Sigma Black Belt Training Analyse. April 2006 Rev 1.0. Analyse Phase. Module 1. Recap of the Measure Phase. Module 2. Overview of the Analyse Phase. Module 3. Graphical Data Analysis. Module 4. Simple – Identify, Rank and Validate Key X’s  5 Why  Cause and effect diagram
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
HSBC Six Sigma Black Belt Training
Analyse
April 2006 Rev 1.0
Module 1
Recap of the Measure Phase
Module 2
Overview of the Analyse Phase
Module 3
Graphical Data Analysis
Module 4
Simple – Identify, Rank and Validate Key X’s
 5 Why
 Cause and effect diagram
 Multivoting
Module 5
Validate the Vital Few
 One sample methods
 Two sample methods
 ChiSquare
Module 6
More Advanced – Identify, Rank and Validate Key X’s
 ANOVA
 Simple regression
Advanced – Identify, Rank and Validate Key X’s
 Introduction to design of experiments
Module 7
Module 8
Tollgate
1.Complete team charter
4. Map and analysethe process
7. Identify sourcesof variation
10. Generatesolution ideas
13. Implement solution
2. Specify customer
requirements & MGP
5. Remove measurement variation & collect data
8. Rank key causesof variation
11. Select best fit solution
14. Monitor processand results
3. Complete high level process map
6. Determine process capability
9. Validate root causes
12. Test solution and confirm results
15. Replicate and share best practice
Define
Measure
Analyse
Engineer
Control
Tollgate
Tollgate
Tollgate
Tollgate
Tollgate
Steps:
Module 1
Recap of the Measure Phase
Module 2
Overview of the Analyse Phase
Module 3
Graphical Data Analysis
Module 4
Simple – Identify, Rank and Validate Key X’s
 5 Why
 Cause and effect diagram
 Multivoting
Module 5
Validate the Vital Few
 One sample methods
 Two sample methods
 ChiSquare
Module 6
More Advanced – Identify, Rank and Validate Key X’s
 ANOVA
 Simple regression
Advanced – Identify, Rank and Validate Key X’s
 Introduction to design of experiments
Module 7
Module 8
Tollgate
Measure Phase
Before moving intoanalyse, let’s leverage our tools to betterexplore and explain our project data
Define
Measure
Analyse
Engineer
Control
Tollgate
Tollgate
Tollgate
Tollgate
Tollgate
Steps:
1.Complete team charter
4. Map and analysethe process
7. Identify sourcesof variation
10. Generatesolution ideas
13. Implement solution
2. Specify customer
requirements & MGP
5. Remove measurement variation & collect data
8. Rank key causesof variation
11. Select best fit solution
14. Monitor processand results
3. Complete high level process map
6. Determine process capability
9. Validate root causes
12. Test solution and confirm results
15. Replicate and share best practice
Pocket guide (pages 4145)
Measure
Tollgate
A.Collect or create inputs
B.Determine the correct decision
4. Map and analysethe process
C.Identify the decision makers
5. Remove measurement variation & collect data
6. Determine process capability
D.Administer the assessment
E.Analyse the outcomes and take action
MSA is an essential first step to minimise measurementbias prior to sampling & data collection
Unfortunately, these components can bring their own level of variation to the process. MSA is analysis and control of measurement variation
Repeatability
Variation that occurs when repeated measurements are made of the same item under absolutely identical conditions
Reproducibility
Variation that occurs when different conditions are used to take measurements
OperatorB
You’re a smooth operator… Let’s seeif your friends are as smooth as you!
Ask me again, and I’ll tell you the same!
OperatorC
OperatorA
Repeatability
Reproducibility
Our measures need to be reliable
Step 1 – Create inputs
Decide on the outputs to be evaluated (inputs)
The inputs need to be
Representative
Equally represented in the set of outputs
Correct decision should not be too obvious
Sufficient sample size
Ensure 50% of inputs are defectfree
Determine number of inputs needed
Step 2 – Determine standard
Develop a standard
Have two credible decision makers inspect or review each input
Come to a consensus agreement with each other as to the correct disposition
Step 3 – Identify decision makers
Identify the person or persons who are going to participate in the study
These should be individuals who make the decisions within the process under study on a regular basis
Selected decision makers have to meet the following guidelines
Familiar process
Same location
Same time constraints
Step 4 – Administer
The structure of the assessment requires that each decision maker evaluate each item at least twice
Process
1st person evaluates all of the samples in trial 1
2nd person does the same
Once all the people have assessed, the samples are returned to the 1st person for an evaluation in trial 2
Then back to 2nd person
Be aware of bias
Step 5 – Analyse outcome and take action
Analyse outcome and take actions
Assess
Assessor effectiveness
Overall MSA
Biases
Actions include
Immediate corrections
Refined definitions
Training
Gage recalibration
Starting over
Define
Measure
Analyse
Engineer
Control
Tollgate
Tollgate
Tollgate
Tollgate
Tollgate
Steps:
1.Complete team charter
4. Map and analysethe process
7. Identify sourcesof variation
10. Generatesolution ideas
13. Implement solution
2. Specify customer
requirements & MGP
5. Remove measurement variation & collect data
8. Rank key causesof variation
11. Select best fit solution
14. Monitor processand results
3. Complete high level process map
6. Determine process capability
9. Validate root causes
12. Test solution and confirm results
15. Replicate and share best practice
Pocket guide (pages 2252)
Measure
Tollgate
A.Data demographics
4. Map and analysethe process
Remember, must complete MSA prior to collecting or analysing data!
5. Remove measurement variation & collect data
6. Determine process capability
B.Sampling
Data collection begins with the end in mind. “What do you want to know?”
Measure
A1. Select what to measure
Tollgate
A2. Develop operational definitions
A.Data demographics
A3. Identify data sources
4. Map and analysethe process
A4. Prepare data collection form
A5. Implement and refine data collection
5. Remove measurement variation & collect data
B1. Sample types and terminology
6. Determine process capability
B2. Confidence
B.Sampling
B3. Sampling techniques
B4. Sample size formulas/calculators
Data demographics + sampling = Data collection
Application availability (uptime)
Systems
Funding source
Network availability (uptime)
Market instructions
Timely trades (Y)
Advisor
Origination
Destination
Equity of fixed income
Security
Lot size
Market
1st Step: What are you going to measure?
2nd Step: Create clear and understandable data definitions
There are two main sources of data available to the team
Data that is already being collected in your organisation and has been around for some time (usually called “historical” data)
New data that your team collects
Historical data can be handy, when you have it  it requires fewer resources to gather, it’s often computerised, and you can start using it right away
But be warned! Existing data may not be suitable if
It was originally collected for reasons other than process improvement
It uses different definitions
Data structure makes it hard to stratify (or database lacks sort capability)
3rd Step: Where are you going to get the data?
What?Why?Who?How?When?Where?
MeasuresOperational definitionFormulaPurposeSingle person Collection methodDates/times Source
of the data responsible frequency
collection
Actions taken to validate measurement system
Sampling information
Type of Data DiscreteContinuous(please circle one)
Sample size
Collection time period
4th Step: Now that you know what you want… how do we plan to get it?
Final Step: Are you sure that your plan will work? Have you tested it?
Define
Measure
Analyse
Engineer
Control
Tollgate
Tollgate
Tollgate
Tollgate
Tollgate
Steps:
1.Complete team charter
4. Map and analysethe process
7. Identify sourcesof variation
10. Generatesolution ideas
13. Implement solution
2. Specify customer
requirements & MGP
5. Remove measurement variation & collect data
8. Rank key causesof variation
11. Select best fit solution
14. Monitor processand results
3. Complete high level process map
6. Determine process capability
9. Validate root causes
12. Test solution and confirm results
15. Replicate and share best practice
Pocket guide (pages 4751)
Judgmental vs Statistical sampling
Judgmental sample
Statistical sample
1st Consideration – Do you need opinions or facts?
Sampling is the process of collecting only a portion of available data either from a static data group (population) or on an ongoing basis (process), and drawing conclusions about the total population when the process is stable (statistical inference)
The sample is a “window” into the population
Population approach
Process approach
Population approach
Process approach
…the type of “window” you use depends on the population
10000
8000
6000
4000
Sample size
2000
0
2000
0
.05
.1
.15
.2
Precision interval
Sample size and precision interval
2nd Consideration – How precise do you need to be?
Population
Sample
X
X
X
X
X X X X
X
X
Randomsampling
X
X
X
X
X
X
Population
B
A
A
B
Sample
B
B
A
B
A
B
A A B B B C D D
Stratified
random sampling
B
D
D
D
C C
D
D
D
X
X
Sample
Systematicsampling
X X X X
Preserve Time Order
09:3009:45 10:00 10:15
XXXXX XXXXXXX XXXXXX XXXXXX
Subgroupsampling
Sample
X X X X
Preserve Time Order
Data
Variable/continuous
Attribute/discrete
2
2
1.96s
1.96
[p(1p)]
n=
n=
d
d
n = ?
s = 24.03
d = 2
n = ?
d = .02
p = 5%
n= sample size
s = standard deviation
d = precision
p = proportion defective
1.96= 95% Confidence
Lastly, how much data do you need?
Sample size n=54
Sample Size Calculators (JMP)
Define
Measure
Analyse
Engineer
Control
Tollgate
Tollgate
Tollgate
Tollgate
Tollgate
Steps:
1.Complete team charter
4. Map and analysethe process
7. Identify sourcesof variation
10. Generatesolution ideas
13. Implement solution
2. Specify customer
requirements & MGP
5. Remove measurement variation & collect data
8. Rank key causesof variation
11. Select best fit solution
14. Monitor processand results
3. Complete high level process map
6. Determine process capability
9. Validate root causes
12. Test solution and confirm results
15. Replicate and share best practice
Pocket guide (pages 8692)
Measure
Tollgate
A.VOC
4. Map and analysethe process
B.Determine Y
5. Remove measurement variation & collect data
6. Determine process capability
C.Z, Sigma, Yield
Process capability is the metric that our customers feel
Identify process
Define CTQ
Define unit, defect, & defect opportunity
Count units, opportunities, & defects
Discrete orcontinuous
data
Discrete
Continuous
Calculate defect rate:
Count defects per
million opportunities
Identify distribution set defect limits
calculate yield
Look up Sigma value
in table
Convert yield
into short term sigma value
Module 1
Recap of the Measure Phase
Module 2
Overview of the Analyse Phase
Module 3
Graphical Data Analysis
Module 4
Simple – Identify, Rank and Validate Key X’s
 5 Why
 Cause and effect diagram
 Multivoting
Module 5
Validate the Vital Few
 One sample methods
 Two sample methods
 ChiSquare
Module 6
More Advanced – Identify, Rank and Validate Key X’s
 ANOVA
 Simple regression
Advanced – Identify, Rank and Validate Key X’s
 Introduction to design of experiments
Module 7
Module 8
Tollgate
Define
Measure
Analyse
Engineer
Control
Tollgate
Tollgate
Tollgate
Tollgate
Tollgate
Steps:
1.Complete team charter
4. Map and analysethe process
7. Identify sourcesof variation
10. Generatesolution ideas
13. Implement solution
2. Specify customer
requirements & MGP
5. Remove measurement variation & collect data
8. Rank key causesof variation
11. Select best fit solution
14. Monitor processand results
3. Complete high level process map
6. Determine process capability
9. Validate root causes
12. Test solution and confirm results
15. Replicate and share best practice
Analyse phase
Module 4
Module 5
Module 6
Module 7
Identify sourcesof variation
Rank key causes of variation
Validate root causes
Analyse requires us to identify the “likely suspects…”
6. Determine process capability
Simple
“<2 Sigma”
Measure
Tollgate
Process stable?
Common cause strategy
More advanced
“23.5 Sigma”
Yes
All tools above and
No
Special cause strategy
Advanced
“>3.5 Sigma”
All tools above and
Module 1
Recap of the Measure Phase
Module 2
Overview of the Analyse Phase
Module 3
Graphical Data Analysis
Module 4
Simple – Identify, Rank and Validate Key X’s
 5 Why
 Cause and effect diagram
 Multivoting
Module 5
Validate the Vital Few
 One sample methods
 Two sample methods
 ChiSquare
Module 6
More Advanced – Identify, Rank and Validate Key X’s
 ANOVA
 Simple regression
Advanced – Identify, Rank and Validate Key X’s
 Introduction to design of experiments
Module 7
Module 8
Tollgate
Define
Measure
Analyse
Engineer
Control
Tollgate
Tollgate
Tollgate
Tollgate
Tollgate
Steps:
1.Complete team charter
4. Map and analysethe process
7. Identify sourcesof variation
10. Generatesolution ideas
13. Implement solution
2. Specify customer
requirements & MGP
5. Remove measurement variation & collect data
8. Rank key causesof variation
11. Select best fit solution
14. Monitor processand results
3. Complete high level process map
6. Determine process capability
9. Validate root causes
12. Test solution and confirm results
15. Replicate and share best practice
By end of this module you should be able to
Measure
Tollgate
A.Graphical data analysis
7. Identify sources of variation
B.Brainstorming
8. Rank key causes of variation
C.Five whys
9. Validate root causes
D.Cause and effect diagram
We have to first think “out of the box” before we can focus “inside the box…”
We have so much data! We need good filters to sort through it all!
Graphical data analysis
Brainstorming
Five whys
Cause and effect
Process analysis
Key “nuggets”
(root causes)
Our filters (tools) separate the vital few “nuggets” from the rest of the trivial many
Stratification is a data analysis technique by which data are sorted into various categories
Through the identification of specific factors, one can surface suspicious patterns and uncover differences in processes
Important to “stratify the data” to focus and prioritise improvement efforts later in the Engineer Phase
Data stratification helps to identify the impact of each x on Y
Compares variation in Y with each individual x
Results identify critical x(s)
Concerned with the direction and magnitude of the relationship, not why one exists
Stratification is exploration
Stratification will point to the factors that have the greatest impact on a problem
May need to expand (or narrow) the problem and goal of your project
Establish priorities for further analysis
Provide clues regarding possible causes by comparing “good” and “bad”
Common factors used for stratification:
Type (What is occurring?)
Timing (When it occurs?)
Frequency (How often does it occur?)
Where (Where in the process or location?)
Who (Which business, department, employee, customer group?)
Goal of data stratification is to prioritise and focus efforts
Total trade cycle time
140
130
120
110
100
90
80
70
0
10
20
30
Months starting January 2003
Fixed income
Equity
140
140
130
130
120
120
110
110
100
100
90
90
80
80
70
70
0
10
20
30
0
10
20
30
Months starting January 2003
Months starting January 2003
Discrete
Continuous
X
Y
Discrete/counts
Bar chart, Histogram, Pareto chart, Pie chart
Bar chart, Histogram, Pareto chart
Continuous
Box plot, Multivariability chart
Scatter plot, Run chart*, Multivariability chart
* When plotting Y against time  always use a Run Chart
Discrete
Continuous
X
Y
Discrete/counts
Bar chart, Histogram, Pareto chart, Pie chart
Bar chart, Histogram, Pareto chart
Continuous
Box plot, Multivariability chart
Scatter plot, Run chart*, Multivariability chart
* When plotting Y against time  always use a Run Chart
160
100%
N= 160
140
90%
85%
120
75%
75%
100
55%
Number of errors
80
50%
60
40
25%
20
88
32
16
8
16
0
0
Typos
Other
Empty
fields
Missing
pages
Wrong
fieldsentries
Date Prepared
Type of error
Collected By
Date Source
Formula
Pie chart
Trade error rate by brokerage offices
Pareto chart
Pareto chart of loan application errors5/12 to 5/13/96  Raleigh office
Discrete
Continuous
X
Y
Discrete/counts
Bar chart, Histogram, Pareto chart, Pie chart
Bar chart, Histogram, Pareto chart
Continuous
Box plot, Multivariability chart
Scatter plot, Run chart*, Multivariability chart
* When plotting Y against time  always use a Run Chart
Distributions
2004 YTD total
Discrete
Continuous
X
Y
Discrete/counts
Bar chart, Histogram, Pareto chart, Pie chart
Bar chart, Histogram, Pareto chart
Continuous
Box plot, Multivariability chart
Scatter plot, Run chart*, Multivariability chart
* When plotting Y against time  always use a Run Chart
One way analysis of 204 Avg P by R
Third quartile
Median
First quartile
Outliers
Discrete
Continuous
X
Y
Discrete/counts
Bar chart, Histogram, Pareto chart, Pie chart
Bar chart, Histogram, Pareto chart
Continuous
Box plot, Multivariability chart
Scatter plot, Run chart*, Multivariability chart
* When plotting Y against time  always use a Run Chart
Strong positive correlation
Strong negative correlation
No correlation
Possible positive correlation
Possible negative correlation
Other pattern
Checks processed per hour
A run chart plots results, in time order, with the Xaxis describing the time component
and the Yaxis describing the measured variable
Checks processed
91011
91011121234
91011121234
Monday Tuesday Wednesday
XAxis units could be dates, days, hours or other
time intervals. They could be just numeric counts
but more value might be added by proper annotation
A run is a group of data points that all fall on one side of the median (Exclude points that fall directly on the median)
You can conclude that you have a special cause signal if
Numberof data points
You see fewer runs than this
You see moreruns than this
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
3
3
3
4
4
4
5
5
6
6
6
7
7
8
8
9
9
9
10
10
11
8
9
10
10
11
12
12
13
13
14
15
15
16
16
17
17
18
19
19
20
20
Too few runs
Shift
Median
Measurement
Measurement
Measurement
Measurement
Median
Too many runs
Trend
Measurement
Measurement
Measurement
Measurement
Median
Alternate (up & down)
Same value
Measurement
Measurement
Instructions:
Goal: Practice using graphical analysis and exploration tools to identify the most and least profitable customer groups in HSBC Futures business
Time: 30 minutes
Maintenance level per management
Reactive/proactive?
Clearing or execution
The variability chart permits analysis of more than two Xfactors simultaneously!
Variability Chart for Net After Servicing and Assigned
2
1
Low Maintenance, Proactive and Clearing is the combination with the highest standard deviation. Some clients are very profitable, others much less so.
Discrete
Continuous
X
Y
Discrete/counts
Bar chart, Histogram, Pareto chart, Pie chart
Bar chart, Histogram, Pareto chart
Continuous
Box plot, Multivariability chart
Scatter plot, Run chart*, Multivariability chart
Analyse Phase
Module 1
Recap of the Measure Phase
Module 2
Overview of the Analyse Phase
Module 3
Graphical Data Analysis
Module 4
Simple – Identify, Rank and Validate Key X’s
 5 Why
 Cause and effect diagram
 Multivoting
Module 5
Validate the Vital Few
 One sample methods
 Two sample methods
 ChiSquare
Module 6
More Advanced – Identify, Rank and Validate Key X’s
 ANOVA
 Simple regression
Advanced – Identify, Rank and Validate Key X’s
 Introduction to design of experiments
Module 7
Module 8
Tollgate
Define
Measure
Analyse
Engineer
Control
Tollgate
Tollgate
Tollgate
Tollgate
Tollgate
Steps:
1.Complete Team Charter
4. Map and Analysethe Process
7. Identify Sourcesof Variation
10. GenerateSolution Ideas
13. Implement Solution
2. Specify Customer
Requirements & MGP
5. Remove Measurement Variation & Collect Data
8. Rank Key Causesof Variation
11. Select Best Fit Solution
14. Monitor Processand Results
3. Complete High Level Process Map
6. Determine Process Capability
9. Validate Root Causes
12. Test Solution and Confirm Results
15. Replicate and Share Best Practice
By end of this module you should
Graphical data analysis
Brainstorming
Five whys
Cause and effect
Process analysis
Key “nuggets”
(root causes)
Our filters (tools) separate the vital few “nuggets” from the rest of the trivial many
Brainstorming is used to generate a lot of ideas quickly to identify potential causes
Brainstorming encourages creativity, involves everyone, generates excitement and energy, and separates people from the ideas they suggest
Methods
Rounds – Go around in turn, one item per turn, until everyone passes
Popcorn – Anyone calls out ideas, no order, until all ideas are out
Brainstorming guidelines
Give advance notice so people can leverage data or prior analysis from the Define and Measure Phases
Start with silent “think” time
Freewheel  don’t hold back
NO CRITICISM
Hitchhike  build upon ideas
The more ideas, the better
Post ideas
Use affinity diagrams (where appropriate) to model themes or patterns
Why?
Why?
Sales reps
have poor time mngt skills
Sales rep.
forgot
50%
40%
Why?
Contract not
uptodate
70%
Why?
Sales reps distracted by admin tasks
50%
Why?
Invoice keypunch
error
20%
Negotiations
started late
20%
Pricing error
85%
Customers went
bankrupt
5%
Problem
Shipment short
or damaged
5%
40% of customers didn't pay us for their last bill
Wait for
internal
approval
Incorrect bill
80%
40%
Customer given
rebate
5%
Different
interpretations
of contract
requirements
Customer changed
order
5%
Bill never
received
10%
15%
Ask “Why” Five Times
Cause
Effect
Categories
Problemstatement
Causes
Measurements
Materials
Men & women
Problem statement
Environment
Methods
Machines
Measurements
Materials
Men & women
Cause
Why
Problem statement
Environment
Methods
Machines
Goal: Practice constructing a fishbone (Ishikawa) diagram
Instructions:
Time:Teams:30 minutes
Reportout:15 minutes
Xs
Ys
Suspected
Suspected
Suspected
Causes
causes
Causes
1
2
3
4
5
6
Total
%
1
2
3
4
5
6
Total
%
Cr
Cr
VA
VA
NVA
NVA
VE
Total
VE
Now that we have ideas, suspicions, and hypotheses about potential root causes ... How do we get organised?
Process variation (predictor)
Process capability (response)
Output
S
I
C
P
O
x
x
x
x
Y
Not all x’s have the same contribution to Y...need to identify the vital few!
Those items receiving the most votes get further attention/consideration
Impact
High
Low
Further investigation required
Further investigation possible
Many
Frequency
Further investigation possible
Few
Those items in the upperleft quadrant get the most further attention/consideration
Module 1
Recap of the Measure Phase
Module 2
Overview of the Analyse Phase
Module 3
Graphical Data Analysis
Module 4
Simple – Identify, Rank and Validate Key X’s
 5 Why
 Cause and effect diagram
 Multivoting
Module 5
Validate the Vital Few
 One sample methods
 Two sample methods
 ChiSquare
Module 6
More Advanced – Identify, Rank and Validate Key X’s
 ANOVA
 Simple regression
Advanced – Identify, Rank and Validate Key X’s
 Introduction to design of experiments
Module 7
Module 8
Tollgate
Define
Measure
Analyse
Engineer
Control
Tollgate
Tollgate
Tollgate
Tollgate
Tollgate
Steps:
1.Complete Team Charter
4. Map and Analysethe Process
7. Identify Sourcesof Variation
10. GenerateSolution Ideas
13. Implement Solution
2. Specify Customer
Requirements & MGP
5. Remove Measurement Variation & Collect Data
8. Rank Key Causesof Variation
11. Select Best Fit Solution
14. Monitor Processand Results
3. Complete High Level Process Map
6. Determine Process Capability
9. Validate Root Causes
12. Test Solution and Confirm Results
15. Replicate and Share Best Practice
By end of this module you should be able to
Black belt worldChallenges
Tools of choice
Statistical method
Data Y
Data X
Typical Null Hypothesis
JMP platform
OneSample
C
None
H0: mean = specified value
Distribution
Matched Pairs
C
2 continuous columns
H0: meanA = meanB
Matched Pairs
TwoSample
C
N with 2 values
H0: meanA = meanB
Fit Y by X
ANOVA
C
N with at k>≥2 values
H0: meanA = … = meank
Fit Y by X or Fit Model
Simple Regression
C
C
H0: slope = 0
Fit Y by X or Fit Model
ChiSquare
N
N
H0: X and Y are independent
Fit Y by X
DOE
C
C or N
ANOVA type H0: Factors have no effect on Y
DOE, Fit Model
Baseline
Benchmark
Regulatory requirement
Standard
Current process
Comparing before & after…
Before promotion
campaign
After promotion
campaign
Comparing separate groups
Crossselling ratio Northern Region
Crossselling ratio
Southern Region
Case 1
Case 2
Case 3
Region 1
Region 2
Region 3
Region n
Case 4
Lower confidence limit < point estimate < upper confidence limit
90% lower
confidence limit
90% upper
confidence limit
Point
estimate
95% lower
confidence limit
95% upper
confidence limit
Point
Estimate
99% lower
confidence limit
Point
estimate
99% upper
confidence limit
Higher confidence levels result in wider confidence intervals.The point estimate remains the same
Sample size n=10
Sample size n=40
Sample size n=100
Sample size n=400
Sample mean and standard deviation being the same
What is the regional Cross Selling ratio?
? ? ? ? ? ? ? ? ? ? ? ?
123
Answer 1: Based on 44 observations from different branches, the mean cross selling ratio is 2.87
Single point
estimate
2.87
123
Answer 2: Based on 44 observations from different branches, the mean cross selling ratio is from 2.73 to 3.0
Confidence
interval
2.87
122.733.0
Which Answer is the better – Point estimate or confidence interval?
Lower Cl = LowerConfidence limit < Population Mean < Upper Cl = UpperConfidence limit
7.14 days
Longterm mean cycle time from application to decision
5.2 days
Mean cycle time of 50 observations AFTER change
Based on longterm experience, the mean cycle time from date of application to decision was 7.14 days. This was too long
The loan application process has been streamlined. The new process is working in a pilot program of the NYPAFL region
Based on a sample of 37 cycle times from date of application to decision, the new mean cycle time is estimated at 5.2 days
Is this a real improvement?
7.14 days
Longterm mean cycle time from application to decision 7.14 days
90%LCI=3.5 days
90% UCI=7.0 days
95%LCI=3.1 days
Mean = 5.2 days
95% UCI=7.3 days
Is this a real improvement?
95% confidence interval mean cycle time of 50 observations AFTER change
The 90% confidence interval is from 3.5 days to 7.0 days. The longterm mean cycle time of 7.14 days is outside that range
What about the 95% confidence interval for the mean?
7.14 days
Longterm mean cycle time from application to decision 7.14 days
90%LCI=3.5 days
90% UCI=7.0 days
95%LCI=3.1 days
Mean = 5.2 days
95% UCI=7.3 days
Recommendation – Look further for causes of the long cycle times!
“What is going on and how precisely do we know it?” Confidence intervals
2
Point Estimates
1
4
3
5
95% Confidence Interval
Data is in Cross Sell n_44.jmp
Confidence intervals for mean and standard deviation
6
7
Normal Quantile Plot option adds a graph to the report that is useful for visualizing the extent to which the variable is normally distributed
If a variable is normal, the normal quantile plot approximates a diagonal straight line
This kind of plot is also called a quantilequantile plot, or QQ plot
Goal: Practice computing and evaluating confidence intervals
Instructions:
The situation is described in the notes below
Time:15 minutes
Baseline
Benchmark
Regulatory requirement
Standard
Current process
Case 1
The Cross Selling ratio has been 2.7 per event
Recently a system to customize product offerings has been introduced and a promotional campaign built around it
Current process
Proposed change
Did we make
a real change?
The time to decision for small loan applications takes 7.14 days from receipt of application
A new online process was intended to reduce that time. Data collected under the new system are available
A bank buying a loan portfolio, set the price assuming an average credit score of 700
Buyer and seller request a sample of loans to determine any price altering differences from the target 700
Y
Mean loan cycle time
Y
Crossselling
Desired goal
Direction of improvement
Typical examples
Relationship to CTQ
JMP: pvalues
Smaller or lower is better
(Onesided)
Smallest value
Fastest processing
Lowest variability
Loan cycle
Time
Prob <t
Bigger or higher is better
(Onesided)
Highest amount
Biggest payback
Crossselling
Profitability
Customer retention
Prob >t
Target is best
(Twosided)
Midpoint of range or Maximum return at specific zone
Deal pricing
Prob >t
Y
Pricing
Target price
Step 1: Problem question
Specify the improvement
goals. Determine what you are trying
to prove. In what direction (>, <, ≠)
do you expect the improvement?
Leads to Alternative Hypothesis
Step 2: Baseline
Specify the standard of comparison,
benchmarks, longterm means/Std Dev.,
standards, regulatory requirements,
customer specifications
Leads to Null Hypothesis
Step 3: Conduct test
Specify error probabilities (a, b),
minimum differences to detect,
sample size or data source,
run JMP
Results and interpretation
Question: Did a recent change in the small business loan approval process reduce the mean cycle time?
Answers
No, mean cycle time did not change.
Yes, mean cycle time is lower than 7.14.
Hypotheses
Null Hypothesis
H0: mean = 7.14
Alternative hypothesis
HA: mean < 7.14
Smaller is better!
Answer
No, Cross Selling ratio did not change
Yes, Cross Selling ratio is now larger than 2.7
Hypotheses
Null Hypothesis
H0: mean = 2.7
Alternative Hyp.
HA: mean > 2.7
Question: Did a recent promotional campaign for mortgage loans increase the Cross Selling ratio?
Bigger is better!
Example
Null Hypothesis H0: Standard, existing performance
Alternative hypothesis HA: Research question or claim: What we want to prove with the data
A
Crossselling without customized offerings achieves a mean ratio equalto 2.7:H0: mean=2.7
Crossselling with customized offerings increases average CrossSelling to above 2.7:HA: mean >2.7
B
Loan application require a mean processing time of 7.14 days with old process:H0: mean=7.14 days
Loan application processed with the new online component require less time:HA: mean < 7.14 days
C
The loan portfolio has an average credit worthiness of 700:
H0: mean = 700
The loan portfolio has an mean credit worthiness different from 700:HA: mean ≠ 700
A. Rightsided HA (bigger is better!): CrossSelling
B. Leftsided HA (smaller is better): decision time of loan application
C.Twosided HA (Target is Best!): Pricing a Loan Portfolio. The data are used to fix the final price of the loan portfolio?
H0
H0
HA
HA
OR
700
700
Credit scores in portfolio have mean score <700
Credit scores in portfolio have mean score >700
H0: Actual mean = Specified constant for mean credit worthiness is 700
HA: Actual mean ¹ Specified constant for mean credit worthiness is different from 700
Defendant is not guilty
Defendant is guilty
Verdict is:“NotGuilty”
Good decision to acquit the innocent
Error: Guilty is declared innocent
Verdict is:“Guilty”
Error: Innocent is convicted
Good decision to convict the guilty
Data is evidence in hypothesis testing
True state of nature
Decision
HO is true
HA is true
Accept HO
Reject HA
Correct decision
Type II error error probability
Accept HA
Reject HO
Type I error error probability
Correct decision
Hypothesis testing allows us to quantify in advance the probability of an incorrect decisions(error probabilities or .) The error probabilities can be controlled by taking appropriate sample sizes
Relatively large sample sizes reduce the error probabilities
Business impact of change
Magnitude of desired change
Process variability
Required sample size n
Required α
Required β
Critical and costly
small
small
moderate
0.05 or less
0.05 or less
Critical and costly
small
large
large
0.05 or less
0.05 or less
Neither costly nor critical
large
small
small
0.05 or larger
0.05 or larger
Neither costly nor critical
large
large
moderate
0.05 or larger
0.05 or larger
H0 mean
Critical value of test statistic
a = significance level
1a
Region of test statistic to accept H0. p>a
Region of test statistic to reject H0. p<a
Acceptance region
Rejection region
H0 mean
Lower critical value of test statistic
Upper critical value of test statistic
a/2
1a
a/2
Region of test statistic to reject H0. p<a
Region of test statistic to accept H0. p>a
Region of test statistic to reject H0. p<a
For these simple tests, the data consist of a single variable column (Total XSell)
H0 entered
Sample mean
Twosided pvalue
Rightsided pvalue
Leftsided pvalue
Conclusion: The sample mean is 2.866. The rightsided pvalue = 0.0074 is less than a=0.05. Reject H0! Conclude that changes significantly increased the mean Cross Selling ratio above the historical mean of 2.7
Historical average of Total X=Sell = 2.7
H0: mean=2.7
Reject H0!
95% CI does not include H0 mean
2.7 2.8 2.9 3.0
Finding an Improvement in CrossSelling
3.02.7=0.3
s=0.4
b=0.05
a=0.05
b
a
Mean H0mean=2.7
Mean HAmean=3.0
Required Sample Size n=25
= probability reject H0 when it is true
= probability reject HA when it is true
Power = 1 = prob to accept HA when true
Difference
to detectSample size
1 day833
2 days210
3 days94
H0: mean =7.14
R
A
Diff =1 day, n = 833
A
Diff =2 days, n = 210
R
R
A
Diff =3 days, n = 94
Sample Mean
3
4
5
6
7
R
A
H0: mean = 7.14
HA: mean < 7.14
Accept H0!
Improvement did not reduce cycle time
Reject H0!
Improvement reduced cycle time
a =0.05, Power=0.95, s =8
Large sample sizes give better discrimination for small and costly changes
Case 2
Cross Selling ratio
Before promotion campaign
After promotion campaign
Case 3
Crossselling ratio Northern region
Crossselling ratioSouthern region
Previously we saw that
This section covers
Two Sample questions
Is there are difference in average CrossSelling between regions NY/PA/FL and Rochester?
Does the average time from application received to decision depend on whether the application is received complete?
Does the average time from application to decision depend on customer type (2 categories)
Matched Pairs question
Does the new promotion campaign affect CrossSelling at branches? Compare CrossSelling at branches before aand after the promotion campaign.
Match pairs whenever you can, Two Sample if you must!
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
A
15 branches with Cross Selling ratio recorded before (B) and 2 months After (A) pilot program to increase CrossSelling was introduced. Each branch is compared with itself, after versus before
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
32 observations from Group 1
27 observations from Group 2
n1=32
n2=27
Samples are taken from different processes or populations
Two Sample designs are highly versatile for comparing two alternatives
One Sample comparisons
Two Sample comparisons
Matched Pairs comparisons
Compare sample results with external standard
Compare results samples from two different groups
Direct comparison of sample at different times, etc.
Always possible, if sample can be taken
Usually possible, but with restrictions
Not always possible
Longterm average and sample results may have been collected under different conditions
Sampling of two different groups is source of variability
Sampling variability virtually eliminated due to direct comparison
One group (longterm average, benchmark) is not sampled at all
Sample size in two groups may differ
Pairs of observations require One Sample size
Use when resources are limited or when one needs to compare to standard
Use when needed
Use whenever possible
Two Sample
Comparison based on different sampling units (8 branches, 4 with and 4 without pilot)
Not a direct comparison
Matched pair
Comparison based on same sampling units (4 branches before and after pilot)
Direct comparison is very sensitive
Branch A
2.9
3.0
Branch A with pilot
3.0
Branch B
2.5
2.65
Branch B with pilot
2.65
Branch C
2.8
2.85
Branch C with pilot
2.85
Branch D
3.2
3.25
Branch D with pilot
3.25
Branch E without pilot
2.9
Branch F without pilot
2.5
Branch G without pilot
2.8
Branch H without pilot
3.2
Crossselling with and without pilot program: Does the pilot program increase CrossSelling? Is the difference between with pilot and without pilot >0!
Branch
XSell
Branch
XSell before pilot
XSell after pilot
Two Sample results of SimTwoSamXsell.JMP
Matched pair results of SimPairedXsell.JMP
Mean difference = 0.0875
8 branches
pvalue: Prob>t = 0.3322 > a=0.05
Accept Null Hypothesis. Conclude that the pilot did not change CrossSelling
The 95% confidence interval for the mean difference is from 0.55 to 0.38
Mean Difference = 0.0875
4 branches
pvalue: Prob>t = 0.0177 < a=0.05
Reject Null Hypothesis. Conclude that the pilot significantly changed CrossSelling
The 95% confidence interval for the mean difference is from 0.0113 to .1637
Step 1: Compare CrossSelling of each branch with itself
Crossselling in a branch
before promotion
Crossselling in a branch
after promotion
Difference in CrossSellingafter – before is of interest
Step 2:
analyse the differences
Nonzero differences indicate change, impact, etc
Zero difference implies no change
Pairs 15 cross sell.JMP
Did Pilot program increase CrossSelling?
+0.05
+0. 025
0.00
0.025
0.08
Observed
Difference
0.06
0.024
0.04
Difference
0.02
H0
0.00
0.02
0.04
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Branch
Did Pilot program increase CrossSelling?
Average difference of Cross Selling ratio is 0.024. Is this significantly different from a zero difference?
12 out of 15 differences are positive, indicating higher Cross Selling ratio after promotion. However, is this significant overall?
Mean differences based on 15 (before, after) pilot pairs is 0.02409
This difference is statistically very significant (p=0.0005 < a=0.05)
The difference is small enough to make one wonder whether the pilot program is worth extending to the other branches
The 95% Confidence Interval for the mean difference is:
Lower95% = 0.011 < actual mean difference < Upper95% = 0.037
Need to evaluate proper course of action
Small difference, but statistically significant!
But does it really matter?
Did pilot program significantly increase CrossSelling? Yes!
Check table output for deciding on H0 or HA
Graphical output is interesting, but can be ignored at first cut
Whenever two different processes or populations (regions, suppliers, programs) need to be compared and direct observations are impossible or difficult
Match pairs whenever you can! Use two samples if you must!
32 branches in region 1: NY/PA/FL
27 branches in region 2: Rochester
Crosssell
Crosssell
Crosssell
Crosssell
Crosssell
Crosssell
Crosssell
Crosssell
Crosssell
Crosssell
Crosssell
Crosssell
Crosssell
Sample Mean (Region 1)
Sample Mean (Region 2)
Region 1 sample mean of Cross Selling ratio
Region 1 sample mean of Cross Selling ratio
Two Sample compares means of samples from different groups!
Is the mean Cross Selling ratio of Region 1 different from that of Region 2?
CrossSellTwoSample.jmp excerpt
Is there a difference in mean CrossSelling between Region 1 and 2 branches?
2.810
2.82
2.81
2.80
2.79
2.78
2.77
2.76
2.75
2.82
2.81
2.80
2.79
2.78
2.77
2.76
2.75
2.753
NY/PA/FL
Rochester
Are the two CrossSell ratios different? Yes, but are they significantly different?
Question
Answer
2.52.62.72.82.93.0
NY/PA/FL
Rochester
95% confidence intervals for mean crosssell ratio
1
1.Select Fit Y by X platform for data in CrossSellTwo Sample.jmp.
2.For Two Sample t Test one needs
2
Is there a difference in mean CrossSelling between Region 1 and 2 branches?
3
4
5
Other output omitted!
Is there a significant difference in mean CrossSelling between region 1 and 2 branches? no!
There is no statistically significant difference! The mean of region 1 region is not different from that of region 2. p=0.6423>0.05 and the CI for difference includes 0 (.3 to .19)
Testing for unequal variance is performed for two reasons
1a. Do the two processes have the same variation? The process with lower variation is the more consistent.
1b. Are the returns of two investment sector equally (or unequally) certain? The variance is here a measure of uncertainty or risk. The lower the risk, the lower the variance of the returns.
2. The standard TwoSample t Test assumes that the variation of the underlying processes is the same. If the variation is statistically different as determined by the Levene Test, tests the differences between Means using the Welch ANOVA tTest.
6
7
Fit Y by X
Test for Unequal Variances:
The Levene test for unequal variances is preferred in Six Sigma.
pvalue = 0.8281 > Variances are EQUAL.
Test for Equal Means:
Welch test takes unequal variances into account.
pvalue = 0.6375 > Means are EQUAL
CrossSellTwoSample.jmp
Do complete applications have a shorter mean cycle time to decision?
X Y
Scatterplot shows overlap of observations in groups. Group mean diamonds do not overlap
Clear separation of mean diamonds indicates significant difference of group means
SmallBusinessData.jmp excerpt
1
2
3
4
95% individual CI for means
12.4
5.7
5791113
FALSE
days
TRUE
XFactor: Application Complete?
1.The average difference between the true and false levels is 6.74
2.A 95% confidence interval for mean difference is between 5.3 and 8.2 days
3.The pvalue for testing no difference in means versus that there is a difference is p= 0.0000Null Hypothesis should be rejected. The difference is significant either at a onesided or twosided level
4.The Means table shows means and confidence intervals for 158 incomplete and 581 complete applications in the data set.
Incomplete applications significantly delay the average decision time!They approximately double the cycle time
Small Business Data.jmp
The Levene test has a pvalue <0.0001.
The variances (standard deviation) between the two levels are significantly different. Need to check Welch test.
Factor: Applications Complete?
Levels: TRUE, FALSE
Large sample sizes bring out significant results.
Welch test for unequal MEANS is also very significant with p<0.0001. However, this is the same result obtained with assuming equal variances. But this result is proper.
Goal: Practice Two Sample hypothesis tests
Instructions:
Time:Teams:30 minutes
Report out:15 minutes
Internet banking
use (X2)
Age or income
group (X1)
Categorical data are often easier or cheaper to obtain. They often give sufficient information to analyse a problem
Chisquare is used to validate conclusions about cause and effector assumptions about performance
Mosaic plot
1.00
0.75
Like
Like
Like
Like
Preference
0.50
Overall proportions
0.25
Dislike
Dislike
Dislike
Dislike
0.00
Men
Women
Children
Person
Person by preference
Count
Dislike
Like
Row totals
Men
17
24
41
Women
19
63
82
Children
37
68
105
Column total
73
155
228
SaveBookPrefer.jmp
Person by preference (Expected cell frequencies are in italics)
Count, Expected
Dislike
Like*
Row totals
Men
17,13.1272
24,27.8728
41
Women
19,26.2544
63,55.7456
82
Children
37,33.6184
68,71.3816
105
Column total
73
155
228
Independence of variables (people, preference), if observed cell frequencies and expected cell frequencies are close
* Expected cell frequencies for Like are computed similar to those of Dislike
For each individual cell (i,j) JMP calculates a CellChiSquare. Small values support the Null Hypothesis
The SUM of all cellChiSquares represents the test statistic from which the pvalue is calculated
ChiSquare test statistic is the sum of all squared differences between observed and expected cell frequencies divided by expected cell frequencies to make them more comparable. This is the Pearson ChiSquare test
Loading raw data into JMP Table
1
Person by preference
Count
Dislike
Like
Men
17
24
Women
19
63
Children
37
68
2
Preassigned frequency role!
Choice of X and Y does not matter for statistical test, but may be important for conclusions
3
Preference
4
Count, expected
Dislike
Like
Row totals
Men
17
13.1272
1.1426
24
27.8728
0.5381
41
Men/dislike
Women
19
26.2544
2.0045
63
55.7456
0.9440
82
Person
Count
17
Children
37
33.6184
0.3401
68
71.3816
0.1602
105
Expected
13.1272
Cell Chi^2
1.1426
Column total
73
155
228
Test
ChiSquare
Prob>ChiSq
Likelihood ratio
5.227
0.0733
Pearson
5.130
0.0769
5
Pearson ChiSquare pvalue: The Pearson is the standard pvalue for tests this hypothesis
In this example Prob>ChiSq=0.0769 > 0.05
Null Hypothesis can not be rejected
One can not reject that the proportions of men, women and children who prefer the new savings book is different
There is not enough evidence to say with statistical significance that the proportion of likes and dislikes changes by type of person, OR, that type of person and preference (like, dislike) are associated
Stratification by income variable!
Combined table of strata 1 and 2 – FreqALL
STRATA 1: High income respondents FreqHIGHinc
Personal loan
Personal loan
Count,expected
No
Yes
Count,expected
No
Yes
No
8453.5385
4878.4615
132
No
419.6842
4024.3158
44
Refinanced mortgage
Refinanced mortgage
Yes
3262.4615
12291.5385
154
Yes
3014.3158
217.6842
32
116
170
286
34
42
76
STRATA 2: Low Income Respondents – FreqLOWinc
Personal loan
Count,expected
No
Yes
No
8034.3619
853.6381
88
Refinanced mortgage
Yes
247.6381
12074.3619
122
Data are in LoanRefinChiStrata.jmp
82
128
210
1.00
1.00
0.75
0.75
Like
Yes
Yes
Mortgage refinanced
0.50
Mortgage refinanced
0.50
0.25
No
0.25
No
Dislike
0.00
0.00
NO
YES
NO
YES
Personal/Auto loan
Personal/Auto loan
STRATA 1: High income respondents
1.00
0.75
Yes
Mortgage refinanced
0.50
0.25
No
0.00
NO
YES
Personal/Auto loan
Combined table: ALL respondents
STRATA 2: Low income respondents
Overall results of package services
On time performance
1.00
Count,Expected
Late
On time
0.75
Bagone
230225
270275
500
On time
On time performance
0.50
Mess Ex
220225
280275
500
Package company
0.25
Late
450
550
1000
0.00
Bagone
Mess Ex
Package company
Data are in PackageService.jmp
500 express mail packages
500 regular mail packages
On time performance
On time performance
Count,Expected
Late
On time
Count,Expected
Late
On time
Bagone
7054
3046
100
Bagone
160144
240256
400
Package company
Package company
Mess Ex
200216
200184
400
Mess Ex
2036
8064
100
270
230
500
180
320
500
ChiSquare = 12.882 with p=0.000
ChiSquare = 13.889 with p=0.000
1.00
1.00
On time
0.75
0.75
On time
On time performance
On time performance
0.50
0.50
Late
0.25
0.25
Late
0.00
0.00
Bagone
Mess Ex
Bagone
Mess Ex
Package company
Package company
Which package service has better regular mail? Which service has better express mail?
Service A
Service B
Service A
Service B
On time
49
51
On time
4,900
5,100
Late
51
49
Late
5,100
4,900
Total sample size n=200
ChiSquare=0.8 with p=0.8, NOT significant!
Total sample size n=20,000
ChiSquare=8.0 with p<0.01, Significant!
Goal: Practice Two Sample hypothesis tests
Instructions:
Time:Teams:15 minutes
Report out:15 minutes
Internet banking
Count
None
Somewhat
A lot
Row total
50<
62
51
34
147
Income level
51+
37
62
64
163
Column total
99
113
98
310
Module 1
Recap of the Measure Phase
Module 2
Overview of the Analyse Phase
Module 3
Graphical Data Analysis
Module 4
Simple – Identify, Rank and Validate Key X’s
 5 Why
 Cause and effect diagram
 Multivoting
Module 5
Validate the Vital Few
 One sample methods
 Two sample methods
 ChiSquare
Module 6
More Advanced – Identify, Rank and Validate Key X’s
 ANOVA
 Simple regression
Advanced – Identify, Rank and Validate Key X’s
 Introduction to design of experiments
Module 7
Module 8
Tollgate
Define
Measure
Analyse
Engineer
Control
Tollgate
Tollgate
Tollgate
Tollgate
Tollgate
Steps:
1.Complete Team Charter
4. Map and Analysethe Process
7. Identify Sourcesof Variation
10. GenerateSolution Ideas
13. Implement Solution
2. Specify Customer
Requirements & MGP
5. Remove Measurement Variation & Collect Data
8. Rank Key Causesof Variation
11. Select Best Fit Solution
14. Monitor Processand Results
3. Complete High Level Process Map
6. Determine Process Capability
9. Validate Root Causes
12. Test Solution and Confirm Results
15. Replicate and Share Best Practice
By end of this module you should be able to
Region 1
Region 2
Region 3
Region n
Case 4
3
3
3
2
2
1
1
1
1
2
3
2
3
2
2
3
3
1
1
1
1
2
Western region by districts: 22 branches sampled
Group 1Southwestern
8 Branches sampledMean 1
Group 2Buffalo
7 Branches sampledMean 2
Group 3Niagara
7 Branches sampledMean 3
X Factor: Districts within Western region. Y Response: CrossSelling Ratio
Data are in Sam22WesternCrossSell.jmp
Type of hypotheses in oneway ANOVA
Null Hypothesis: All group means are the same
Alternative hypothesis: One mean is different from the rest
Alternative hypothesis: All three means are different from each other
Sample Means
Mean 1 = 2.81
Mean 3 = 2.97
Mean 2 = 3.37
2.82.93.03.13.23.33.4
Crossselling ratio sample results from Samm22WesternCrossSell.jmp
Which of these means are significantly different from each other?
Overall Mean explains the data well?
OR should the groups means be used instead?
Where should we focus our analysis…overall mean or group means?
Sam22WesternCrossSell.jmp.
1. Use Fit Y by X
2. Scatterplot with overall mean
3
3.1
Select means/ANOVA
pvalue < 0.05 →Reject H0!
But which means are different?
3.2
Individual 95% confidence intervals are to wide to be conclusive
4
4.1
Select compare means – each pair
Which are different?
Connected Letters Report
4.2
Significant difference? Check if CI includes 0!
Twosided
Ordered Differences Report
Table of pairwise Means, 95% Confidence Intervals, & 2sided pvalues
LevelMean
NIAGARAA 3.376
SOUTHWESTERNAB2.975
BUFFALO B2.814
Levels not connected by same letter are significantly different.
4
3.5
Total Xsell
3
2.5
2
BUFFALO
NIAGARA
SOUTHWESTERN
District
Sam22WesternCrossSell.jmp
5
6
The Levene test is marginally not significant. So the results assuming equal variances apply.
Factor: District
Levels: Buffalo, Niagara, SouthWestern
Small sample sizes make significant results difficult to achieve.
The pvalue (=0.0351) of the Welch ANOVA tests is close to the pvalue assuming equal variances (p=0.0475). Both tests come up with the same conclusion. But the Welch tests is not needed here.
Goal: Practice ANOVA skills
Instructions:
Time:Teams:15 minutes
Report out:15 minutes
Factor B
Level B1
Level B2 Level B3
Level A1
A1,B1
A1,B1
A1,B2
A1,B2
A1,B3
A1,B3
Factor A
A2,B1
A2,B1
A2,B2
A2,B2
A2,B3
A2,B3
Level A2
Factor B
1
One observation per factor level combination
Level B1
Level B2
Level B3
Level A1
A1, B1
A1, B2
A1, B3
Factor A
A2, B1
A2, B2
A2, B3
Level A2
X Factor A
2
Two or more observations per factor level combination
Do levels of maintenance by management and reactive/proactive affect profitability of futures trading accounts?
Profitability of customers in $10,000
X2 Factor Maintenance by Management
Medium
High
Low
19, 11
20, 17
22, 31
Reactive
X1 Factor
(Reactive and Proactive)
27, 29
25, 30
31, 49
Proactive
12 randomly selected customers are classified into maintenance by management categories low, medium, and high depending on how much effort management has to spend on this customer. Customers are also classified by their behavior into a reactive and a proactive group
s
n
50
ea
M
Proactive
S
40
0 L
0
30
Reactive
00
1
$
20
n
i
t
i
f
o
r
10
P
High
Medium
Low
Maintenance by Management
Profitability seems to increase as Maintenance levels decreases
Proactive customers appear to be more profitable than Reactive customers
Which of these differences are significant?
Effects of reactive/proactive & maintenance on response=mean profitability
Interaction
No interaction
Profitability
Profitability
$
$
Proactive
Proactive
Reactive
Reactive
High
Medium
Low
High
Medium
Low
SSError
SSFactor A
Twoway ANOVA decomposes the total variation into four parts
Total variation
SSTotal
Variation due to factor B alone
Variation due to factor A alone
SSFactor B
Variation due to interaction of A & B
Unexplained random variation (noise)
SSInteraction AB
Effect tests pvalues for the significance of factors and interaction
LS Means plot from interaction window options gives a view of means of factor level combinations
50
Proactive
40
30
Profit in $10,0000 LS Means
Reactive
20
10
High
Medium
Low
Maintenance by management
X2 Factor Maintenance by Management
Medium
High
Low
X Factor
(Reactive and Proactive)
19, 11
20, 17
22, 31
Reactive
27, 29
25, 30
31, 49
Proactive
1
2
These are options to be accessed at factor windows
From Effect Test: p=0.0197
From Effect Test: p=0.0832
From Effect Test: p=0.8688
For each factor ask: Which of these means are different from each other?
LS Means are the better means. These are they only ones to use in ANOVA
33.25
23
21.5
31.83
20
40
28
27.5
26.5
18.5
15
LS Means numbers added from previous slide
Crosstab Report for LSMeans Difference
Which levels are different?
Differences are
High – mediumNS
High – low sig.
Medium – lowNS
Highlow
Std err dif
Lower 95% CL for dif
Upper 95% CL for dif
1
How can one tell significance?
1.
2
2.
LSMeans
Connected Letters
Report
The interaction plot gives two views of the same interaction.
Sometimes one view is more instructive.
Take your pick!
Two views of same interaction!
Profitability = net after servicing and assigned
Contingency table
Reactive/proactive?
Count
P
R
High
18
12
30
Maintenance level per management
Medium
110
90
200
Low
22
42
64
150
144
294
Table contains frequencies of factor level combinations
The real data file does not have an equal number of observations per factor/level combination. LS Means and regular means in the least squares tables do not agree!!! Only use LS Means
Data excerpt of first 8 observations
Specify a 2way model with interaction between main effects
Effect tests
Source
DF
F Ratio
Prob > F
Maintenance level per management
2
8.4634
0.0003
Reactive/ proactive?
1
7.2719
0.0074
Maintenance level per management*reactive/ proactive?
2
4.2316
0.0154
pvalue column
Y=Net after service and assigned significance of Xterms
All pvalues less than 0.05. All effects are statistically significant
150000
Net After Servicing
and Assigned
50000
50000
High
Medium
Low
Maintenance Level per Management
LS Means plot of interaction shows results best
150000
Net After Servicing
and Assigned
50000
Low
Medium
50000
High
P
R
Reactive/ Proactive?
150000
Net After Servicing
and Assigned
50000
50000
P
R
Reactive/ Proactive?
Connected Letters Report
Statistically Significant
200000
150000
Net after servicingand Assigned
Maintenance Level
per Management
100000
Maintenance Level
per management
50000
Low
0
Medium
High
50000
P
200000
150000
Net after servicingand assigned
100000
Reactive/Proactive?
Reactive/Proactive?
50000
R
0
50000
P
R
Low
High
Medium
Goal: Practice twoway ANOVA skills
Instructions:
Time:Teams:15 minutes
Report out:15 minutes
Goal: Practice twoway ANOVA skills
Instructions:
Read the situation described in the Notes of this page
Restate the problem in terms of statistical hypotheses concerning both factors Branch size and Loan type
Fit the appropriate statistical model using JMP. The data is in BranchLoanSatisfac.JMP
Does branch size matter with regard to customer satisfaction?
Does loan type matter with regard to customer satisfaction?
Is this a situation where interactions can play an important role? If yes, what do the data say about interaction?
Is there a “Branch size” – “Loan” combination that is a clear(statistically significant) winner of the customer satisfaction derby?
Time:Teams:15 minutes
Report out:15 minutes
BranchLoanSatisfac.jmp
Obtain this LS Means plot as an Interaction Plot by clicking on the red triangle on Branch size* Loan type bar. Select LS Means Plot
Lines are not parallel which may indicate interaction. Highest satisfaction is with small branches and secured loans Significant?
Level LeastSq Mean
Small, secured A80.7
Medium, securedB75.3
Small, personal B C72.0
Medium, personal C D70.0
High, personal D67.0
High, secured D60.0
Case
Effect of X
Performance
of Y (CTQ)
Three objectives in using regression
1.Summarize Data
2.Rate of Change
3.Predict Y from X
Y
1
Changein Y
Line as data summary
2
Change in X
Y0
PredictY0 at X0
3
X0
x
x+1
x
Observations
Explain relationships between
Is advertising cost effective?
Y
1
Slope b1
Change in loanby 100000$
2
Loans writtenin 100000$
X = Advertising $
x
x+1
Observations
Is advertising cost effective?
Y
Yhat = b0+b1(X+1)
Slope b1 =est. average change in Y per unit change in X
b1
Yhat = b0+b1X
Unit changein x
b0
X
(0,0)
x
x+1
Intercept b0 = estimated average of Y when X=0
Often used as a mere fitting constant and not itself of interest
Y
Y
Saturation level
Saturation
Sshaped curve
Limited growth
Threshold
X
X
Y
Y
Exponential decay
Exponential growth
X
X
1
2
3
Both Y, response and X, factor variable must be continuous!
Loans in 100000 $ = 13.120814 + 1.5574123 advertising in 10000$
Intercept Slope
4
5
6
Summary of fit (fit line)
R square0.762
Rsquare adj0.722
Root mean square error0.438
Mean of response16.981
Observations 8
Fit mean
Mean16.981
Std Dev [RMSE]0.830
Std error0.293
SSE4.820
7
Loans in 100,000$ = 13.120814 + 1.5574123 Advertising 10,000$
Overall Mean line
Regression line
Fitted line: Linear relationship between Y and X
Ybar = Mean of Y: No relationship
Summary of Fit
Fit Mean
Mean of Response
16.981
16.981
Mean
Compare RMSE
Root Mean Square Error
0.438
0.830
Std Dev [RMSE]
Which line fits the data better?
Parameter estimates table
Term
Estimate
Std Error
t Ratio
Prob>t
Intercept13.12080.89514.66<.0001
Advertising in 10000$1.55740.3564.380.0047
Term
Estimate
Std Error
t Ratio
Prob>t
Intercept
b0 = Yhat at X=0
Std Dev. of b0
b0/Std.Error
pvalue for b0
Slope =Advertising in 10000$
b1 = change in Y per change in X
Std Dev. of b1
b1/Std.Error
pvalue for b1
Parameter Table gives the pvalue of the slope. If Prob>t of slope <0.05, the regression equation is said to be significant. The slope is significantly different from 0.
Predicted values – Yhat – average of Y at X0
Yhat in fit Y by X  Linear fit
Yhat in Fit Model
Advertising in 10000 $
Loans in 10000 S
Predicted loans in 100000 $
2.00
2.82
2.40
3.13
1.88
2.64
2.06
2.90
16.80
17.00
16.90
18.00
16.00
17.10
15.85
18.20
16.24
17.51
16.86
17.996
16.05
17.23
16.33
17.64
Estimated average Loan $ for 20K $ Advertising:
X=2.0, Y=16.8,→predicted value Yhat=16.24.
Yhat = Predicted Loans $ are all on the regression line
7
7
6
6
R2 = 0.68
R2 = 0
5
5
4
4
Y5
Y6
Y
Y
3
Outlier
3
2
2
1
1
0
0
1
1
0
1
2
3
4
5
6
0
1
2
3
4
5
6
X
X
7
7
6
6
5
5
4
4
3
Y
3
Y
2
2
1
1
R2 = 0.75
R2 = 0.5
0
0
1
1
0
1
2
3
4
5
6
0
1
2
3
4
5
6
X
X
Range of X from 2 to 4, with a mediocre R2
Range of X from 1 to 5, with much stronger R2
Xvalues need to be spread over reasonable range
Y
Danger!
Interpolation is OK
Extrapolation
Extrapolation
Danger!
Range of observed X values
X
Why do customers go over their credit limit?
Is there a fixed set of customers that exceed the limit?
Is there a systemic problem that encourages exceedances?
Are the number of rejected files depending on the volume?
ScatterACh.jmp
600
550
500
Jun02
Total rejects
450
400
350
3500
4000
4500
5000
Customer files processed
Total Rejects = 103.3791 + 0.1331215 Customer files processed
Intercept Slope
Goal: Practice regression skills
Instructions:
Time:Teams:15 minutes
Report out:15 minutes
Module 1
Recap of the Measure Phase
Module 2
Overview of the Analyse Phase
Module 3
Graphical Data Analysis
Module 4
Simple – Identify, Rank and Validate Key X’s
 5 Why
 Cause and effect diagram
 Multivoting
Module 5
Validate the Vital Few
 One sample methods
 Two sample methods
 ChiSquare
Module 6
More Advanced – Identify, Rank and Validate Key X’s
 ANOVA
 Simple regression
Advanced – Identify, Rank and Validate Key X’s
 Introduction to design of experiments
Module 7
Module 8
Tollgate
Define
Measure
Analyse
Engineer
Control
Tollgate
Tollgate
Tollgate
Tollgate
Tollgate
Steps:
1.Complete Team Charter
4. Map and Analysethe Process
7. Identify Sourcesof Variation
10. GenerateSolution Ideas
13. Implement Solution
2. Specify Customer
Requirements & MGP
5. Remove Measurement Variation & Collect Data
8. Rank Key Causesof Variation
11. Select Best Fit Solution
14. Monitor Processand Results
3. Complete High Level Process Map
6. Determine Process Capability
9. Validate Root Causes
12. Test Solution and Confirm Results
15. Replicate and Share Best Practice
By end of this module you should be able to
The results?
1. Can use lowcost options
2. Process costs are reduced/risk mitigated; less control required
3. Quality performance improves
D
M
A
E
C
Business
problem
Practical problem
Quantifyproblem &
identify gaps
Statistical problem
DOE
Identify improvement levers
Statistical solution
Implement solutions
Practical solution
DOE can be leveraged to deliver the DMAE phases solutions!
DOE for Financial Services
Define
Set factors & run exper.
Analyse
Engineer
Control
Depending on strategy
(1) Identify problem
Basic quality tools are used to set the design
JPM increases the ease of application
Tollgate
Tollgate
Tollgate
Tollgate
Tollgate
Terms
Our Challenge
a. Determine (Y) –  response
b. Identify & set factors
c. Select & adjust design
d. Run & analyse experiment
Framework/reference
1
Cause
Effect
2
X factor
Y response
Response functionf (factors)
Y
Response 2
3
Response 1
X factor
Level 1
Level 2
Y
X1,X2,…,Xn
Execution challenge
Design strategy
What are the important factors?
What is the best way to run or execute?
What is
optimal way to run or execute?
Uses
Find important X factors affecting Y
Estimate factor effects & interactions
Find optimal factor combinations
Benefits
Efficiency of the data collection
DOE provides more bang for the buck in terms of information
DOE determines the necessary number of runs to obtain the desired information
Results are easy to understand
Designed experiments provide simpler and more meaningful interpretation of results than nondesigned data collection methods
Results can be extended
Experimental designs can be extended and augmented to improve the clarify and further improve the understanding of the problem.
Plot of scaled factor effects
Pareto plot of factor effects
Data in ProcessTime6Fac.jmp
Which factors have the largest effect
on loan processing times?
1. Staff Level seems to have high impact on processing time. High staff levels reduce processing time.
2. Processing Center seems tied with Staff Level for most important factor. The New Processing Center takes considerably less time than the Old.
3. Application Complete appears less important than 1 & 2. But complete Applications take less time than incomplete ones.
Other factors appear not important!
$ $
DOE focuses on efficient and systematic collection of data involving many factors
DOE provides unique and exclusive insights into complex problems
DOE for Financial Services
Define
Set factors & run exper.
Analyse
Engineer
Control
Model for defining the problem
Leverage problem statement F(Y) = X1..Xn
Identity
Location
Timing
Magnitude
P
To develop CTQ’S and later factors & levels
From Plunkett and Hale, The Proactive Manager. J. Wiley and Sons, 1982)
The problem statement identifies the response variable Y
DOE for Financial Services
Define
Set factors & run exper.
Analyse
Engineer
Control
Experiment
Uncontrollable inputs subject
to variation
Process causes variability
Controllable inputs subject
to variation
Variability in output
Design factors
Any product or process/service design parameter that can be controlled
Controllable
Lowcost alternatives
Reduce sensitivity to noise factors
Typically nominal values
Control factors affect mean and variability or variability only
Adjustment factors affect the mean only
Cost reduction factors have little effect on mean or variability
Noise factors
Inherent processing/operating variation, unittounit differences under “same” conditions
Uncontrollable
Costly/undesirable to control
Typically tolerances, variations different customer applications
Deterioration of system components (internal noise)
Variation in operating environment e.g. regulatory requirements (external noise)
Human errors
DOE for Financial Services
Define
Set factors & run exper.
Analyse
Engineer
Control
The cause and effect diagram is the method of choice in factor selection
When used
Explanation
.
Note:The CE analysis can be (1) a guide for discussion when identifying factors to prevent participants from straying, and (2) a further reference if the experiment does not resolve the problem. The previous diagram may be used as a starting point
The material on CE is repeated from Module 4 to show its use in the DOE process
Methods
Design
Factor
Factor
Factor
Factor
Factor
Factor
Response
Factor
Factor
Factor
Factor
Factor
Factor
Materials
Machine
Data collection
Interviewer
Sounds untrustworthy
on phone
Voices
own
opinion
Untimely analysis
Missing data
Poor training
Successful
telephone
survey
Confusing
question
sequence
Customer
refuses answers
Customer
not at home
Ambiguous
questions
Customer unable
to respond
Nonactionable
questions
Interview timing
Questionnaire
The material on CE is repeated from Module 4 to show its use in the DOE process
Customer
Data
Nonresponse
Untimely
analysis
Wrong address
Illegible response
Questionnaire
lost in mail
Incomplete
response
Successful
direct mail
survey
Sequence
Length
Nonactionable
Ambiguous
Layout
Questions
Questionnaire
The material on CE is repeated from Module 4 to show its use in the DOE process
DOE for Financial Services
Define
Set factors & run exper.
Analyse
Engineer
Control
Make sure that factor level combinations do not result in operational difficulties!
Factor B
Nonfeasible
region
potential
high
feasible
region
actual
high
region of
actual
experiment
Nonfeasible
region
actual
low
potential
low
potential
low
actual
low
actual
high
potential high
Factor A
DOE for Financial Services
Define
Set factors & run exper.
Analyse
Engineer
Control
Execution challenge
Recommended design types
What are the important factors?
1
What is the best way to run or execute?
2
What is
optimal way to run or execute?
3
Note: see MBB for design adjustments
Fractional factorial
Full factorial
Response Surface D.
Factorial Designs at two levels systematically combine factors at two level combinations to give desired data for identifying important factors, and estimating the effect of those factors on the Y response variable
Telephone +
direct mail
Telephone
contact
Factor telephonecontact(No, Yes)
Factor direct mail(No, Yes)
No
contact
Direct
Uses
Allow identification of important factors
Allow estimation of how large each factor effect is
Some allow estimation of interaction terms
Benefits
They are very efficient for discovery
They are sufficient for evaluating linear Response Surfaces
They require the fewest number of runs per factor and overall
They can be used as building blocks in sequences of experiments
Factorial Designs at two levels evaluate each factor at only two levels
No interaction
Interaction
Factor B
Factor B
Effect
Effect
Effect
Effect
Level 1
Level 2
Level 1
Level 2
Factor A
Factor A
DOE platform in JMP
2
1
Run
Direct Mail
Telephone
Account Bal K$
1
No ()
No ()
10
2
No ()
Yes (+)
12
3
Yes (+)
No ()
14
4
Yes (+)
Yes (+)
16
Direct mail
Telephone
Account bal
Mail (No)
Tele (No)
10
Mail (No)
Tele (Yes)
12
Mail (Yes)
Tele (No)
14
Mail (Yes)
Tele (Yes)
16
Yes
12
16
Telephone
No
10
14
Yes
No
Direct mail
DOE for Financial Services
Define
Set factors & run exper.
Analyse
Engineer
Control
Two factors at two levels:
Factor A: Direct mail contacts
Factor B: Telephone contacts
1
2
Click and overtype default labels
3
3
To
Sort left to right
The levels of the first factor column varies slowest, the second varies faster, the third (if present) varies even faster
Sort right to left
The levels of the first factor column varies fastest, the second varies less fast, the third (if present) varies even slow
Randomize
The runs are presented in the random order in which the experiment should be performed
Y
1
2
Click to create new data table with experimental conditions
1
2
3
Y
Run experiment and enter the Yresponse (Acc Bal)
4
Y
Data table for 2 x 2 factorial “On Board” design
DOE for Financial Services
Define
Set factors & run exper.
Analyse
Engineer
Control
Get all required data
Randomise
Replicates
Order of experiment
Staff
Center
Application complete?
Time
3
4
1
7
6
8
5
2
10
10
10
10
20
20
20
20
Old
Old
New
New
Old
Old
New
New
Yes
No
Yes
No
Yes
No
Yes
No
20.72
28.89
12.34
12.53
11.35
18.49
4.19
6.78
Complete matrix:Eliminating one run leads to poor results, eliminating two runs leads to no results
Run Order:The first condition is staff=10, new center, applications complete
Easily done!
DOE for Financial Services
Define
Set factors & run exper.
Analyse
Engineer
Control
1
3
2
or
4
5
6
Data Table
Pareto plot of estimates
Term
Estimate
Direct Mail[Mail(No)]
2.000000
Telephone[Tele(No)]
1.000000
Direct Mail[Mail(No)]*Telephone[Tele(No)]
0.000000
Pareto plot of main effects and interactions
7
16
Acc Bal
10
10
Tele(Yes)
Mail(yes)
Tele(No)
Mail(No)
Direct Mail
Telephone
16
Acc Bal
14
10
Mail (No)
Tele (No)
Direct mail
Mail (yes)
Telephone
Tele (Yes)
Prediction profiler
Right panel shows setting of prediction profile plot for Mail (Yes) and Tele (No). The value in the circle (added here) is 14 and represents the average value for this setting
8
Left panel shows setting of prediction profile plot for Mail (No) and Tele (No). The value in the circle (added here) is 10 and represents the average value for this setting
9
LS Means plots
18
16
14
Acc Bal LS Means
12
10
8
Mail (No)
Mail (yes)
Direct Mail
18
16
14
Acc Bal LS Means
12
10
8
Tele (No)
Tele (Yes)
Telephone
LS Means plot for main effects
Interaction profiles
18
Mail(yes)
16
14
Direct Mail
Acc Bal
Direct Mail
12
Mail(No)
10
8
18
Tele(Yes)
16
14
Telephone
Telephone
Acc Bal
12
Tele(No)
10
8
Mail(No)
Tele(No)
Mail(yes)
Tele(Yes)
10
Interaction plot is similar to LS Means plot
Cube plot
11
Tele (Yes)
12
16
Telephone
Tele (No)
10
14
Direct mail
Mail (No)
Mail (yes)
Note: JMP contains additional, more advanced types of analysis for DOE results
1
1
2
6 factors require a minimum of 8 runs!
Select the Screening Design with 6 factors from a table in JMP
3
Enter factor names, types and ranges
Data are in ProcessTime6Fac.jmp
4. Generate matrix
5. Obtain Y values from experiment
4
5
OR
1a
1b
2
Add 6 factors as Model Effects
Full effectsDays
1. Staff level 8.76
2. Center8.44
3. App complete+4.68
3
4
How big is the factor effect?
1
3
2
The steepest slopes
5
Prediction profiler
Estimate the magnitude of each main effect and possible interaction effects!
Each RUN (row) is an experimental condition
Each Column shows the number of times a factor is run at high and low
Column + and – are used to compute factor effects
Run
A
B
C
Code
1
2
3
4
5
6
7
8

+

+

+

+


+
+


+
+




+
+
+
+
(1)
a
b
ab
c
ac
bc
abc
Systematic collection of data based on DESIGN MATRIX
Only important factors are considered in this DOE
Purpose: Determine which combination of these three factors results in the shortest time to complete the loan application
Factorial of 3 factors each at two levels
Full factorial requires 8 = 2*2*2 runs
Response variable = Time in days
Factors
Staff level10 or 20
CenterOld or new
Application completeYes or No
2
1
OR
ProcessTime3Factor.jmp
3
4
Eliminate staff level* center and staff level* app. complete interactions!
Factor ABCY
1
Rerun observations or use old observations
2
Estimate Model terms with more precision by eliminating nonsignificantinteractions terms
3
All pvalues at or below 0.05 level
4
Full effect8.36+10.864.48
Staff Center App Complete
Important factors have steep prediction profile
30
20.22
11.85
25
20
Old
27.77
19.41
15
Old
Time LS Means
10
New
b
5
0
No
Yes
App Complete
12.45
4.08
Yes
a
5
13.84
5.47
No
New
App Complete
10
Staff Level
20
Best Performance = Shortest Time
New Center with 20 Staff
Model
Why Response Surface methodology is needed
What Response Surface methodology can do
Goal: Practice DOE Skills
Instructions:
Time:Teams:30 minutes
Report out:15 minutes
Module 1
Recap of the Measure Phase
Module 2
Overview of the Analyse Phase
Module 3
Graphical Data Analysis
Module 4
Simple – Identify, Rank and Validate Key X’s
 5 Why
 Cause and effect diagram
 Multivoting
Module 5
Validate the Vital Few
 One sample methods
 Two sample methods
 ChiSquare
Module 6
More Advanced – Identify, Rank and Validate Key X’s
 ANOVA
 Simple regression
Advanced – Identify, Rank and Validate Key X’s
 Introduction to design of experiments
Module 7
Module 8
Tollgate
Analyse
Probing question(s)
What to look for?
Critical checkpoints
Phase objective
Recommended tools
Do’s
Don'ts
(1) Strongly recommended for Tollgate
Reqt’s
1.0
2.0
0
+
+
+
+
+
+
+
+
+
+
+
+
+
+
D
M
A
I
C
Prioritised list of improvements (X’s)
Factors impacting requirements (Y’s)
Comments
Opportunity assessment
Validation/verification of (X’s)
Lessons learned
What we need
Major elements (subprocess) with
requirements
Key takeaways
D
M
A
I
C
Receive request
Proposal prep
Approval
D
M
A
I
C
List of priority improvements (X’s)
Factors impacting outputs (Y’s)
1.Time consuming applications
2.Decision cycle time with rework
3.Decline process – Large
4.Resources consumed renewals (ABC)
Comments
1.Reducing transmission time will provide immediate cycle time improvement
2.For Large credits, rework loop is consuming 10 days cycle time 1/3 of the time
3.Large decline rate: 42%. Resources consumed to decline Large deals: $4.0m per year
4.Resources consumed to process Large renewals: $3.2m per year
5.Process handoffs eliminated
6.Reducing transmission time will provide immediate cycle time improvement
Small
Medium
Large
1. Resolve submittal of applications through mail
2. Reduce DCO additional information requests
3. Move kill point for unapprovable applications earlier in process, before reaching DCO’s. Also institute better strike zoning
Current kill point deals declined
4. Reduce resources consumed in renewals process
5. Reconfigure personnel and process into teams including underwriting, support and documentation
6. Resolve transmission of documents through mail
Root causes
Used Case and Effect analysis to confirm
major causes (see Appendix I)
All tests of medians conducted at 95% confidence
Receive requests, analysis and proposal preparation
Credit approval
Documentation request and prep
Doc transit, closing and booking
% of total Large cycle time consumed by processes above:
27%
31%
12%
30%
Improvement targeted to circles areas!
2,000
1,500
1,000
500
0
Count
Question 4a, time for ann. rev
Question $b, CAT A request
Question 4c, CAT B request
Question 4d, time for admin CARM
100
80
60
40
20
0
Cum percent
NVA
D
M
A
I
C
Areas of focus:
Which CARM steps consume RM time?
Team hosted brainstorming session with other CIB staff
Specific improvements have been identified
What we need:
Appendix 1Glossary of terms
BLOCKING VARIABLES
A relatively homogenous set of conditions within which different
conditions of the primary variables are compared. Used to ensure that
background variables do not contaminate the evaluation of primary
variables
ABSCISSA
The horizontal axis of a graph
ACCEPTANCE REGION
The region of values for which the Null Hypothesis is accepted
ALPHA RISK
The probability of accepting the alternate hypothesis when, in reality, the Null Hypothesis is true
BRAINSTORMING
A teamoriented meeting used in problem solving to develop a list of
possible causes that may be linked to an observed effect
ALTERNATIVE HYPOTHESIS
A tentative explanation which indicates that an event does not follow a
chance distribution; a contrast to the Null Hypothesis
CAPABILITY INDICES
A mathematical calculation used to compare the process variation to a
specification. Examples are Cp, Cpk, Pp, PpK, Zst, and Zlt. General
Electric uses Zst and Zlt as the common communication language on
process capability
ANALYSIS OF VARIANCE
A statistical method for evaluating the effect that factors have on
process mean and for evaluating the differences between the means of two or more normal distributions
CAUSALITY
The principle that every change implies the operation of a cause.
ASSIGNABLE CAUSE
A process input variable that can be identified and that contributes in an observable manner to nonrandom shifts in process mean and/or standard deviation
CAUSATIVE
Effective as a cause
CAUSE
That which produces an effect or brings about a change
ASSIGNABLE
VARIATIONS
Variations in data which can be attributed to specific causes
CAUSE AND EFFECT
(C&E) DIAGRAM
One of the seven basic tools for problem solving and is sometimes
referred to as a "fishbone" diagram because of its structure. Spine
represents the "effect" and the major legs of the structure are the "cause
categories.” The substructure represents the list of potential causes which can induce the "effect." The 6 M's (men & women, machine, material, methods, measurements, and Mother Nature) are sometimes used as cause categories
ATTRIBUTE DATA
Quality data that typically reflects the number of conforming or nonconforming units or the number of nonconformities per unit on a go/no go or accept/reject basis
AVERAGE
Sum of all measurements divided by the total number of measurements. Statistic which is used to estimate the population mean. Same as MEAN
BACKGROUND
VARIABLES
Variables which are of no experimental interest and are not held
constant. Their effects are often assumed insignificant or negligible, or they are randomised to ensure that contamination of the primary response does not occur. Also referred to as environmental variables and uncontrolled variables
C CHARTS
Charts which display the number of defects per sample. Used where
sample size is constant
CENTRAL TENDENCY
Numerical average, e.g., mean, median, and mode; centre line on a
statistical process control chart
BENCHMARKING
A process for identification of external bestinclass practices and standards for comparison against internal practices
CENTER LINE
The line on a statistical process control chart which represents the
characteristic's tendency
BETA RISK
The probability of accepting the Null Hypothesis when, in reality, the
alternate hypothesis is true
CHAMPION
An executive level business leader who facilitates the leadership,
implementation, and deployment of Six Sigma philosophies
BINOMIAL
DISTRIBUTION
A statistical distribution associated with data that is one of two possible
states such as GoNo Go or PassFail
CHANGE ACCELERATION
PROGRAM (CAP)
A process which helps accelerate stakeholder buyin and implementation
of new philosophies and processes within a business
BLACK BELT
A process improvement project team leader who is trained and certified
in Six Sigma methodology and tools and who is responsible for
successful project execution
CHARACTERISTIC
A definable or measurable feature of a process, product, or service
CRITICAL TO QUALITY
(CTQ) CHARACTERISTIC
A drawing characteristic determined to be important for variability
reduction based on a requirement from production, engineering, customer
application, or regulatory agency. Can also apply to transactional or
service delivery processes
CLASSIFICATION
Differentiation of variables
COMMON CAUSE
See RANDOM CAUSE
CONFIDENCE LEVEL
The probability that a randomly distributed variable "x" lies within a
defined interval of a normal curve
CUTOFF POINT
The point which partitions the acceptance region from the reject region
The two values that define the confidence interval
CONFIDENCE LIMITS
DATA
Factual information used as a basis for reasoning, discussion, or
calculation; often refers to quantitative information
CONFOUNDING
Allowing two or more variables to vary together so that it is impossible to
separate their unique effects
DATA TRANSFORMATION
A mathematical technique used to create a near normally distributed data
set out of a nonnormal (skewed) data set
CONSUMER’S RISK
Probability of accepting a lot when, in fact, the lot should have been
rejected (see BETA RISK)
DEFECT
Any product characteristic that deviates outside of specification limits
DEFECT PER MILLION
OPPORTUNITIES (DPMO)
Quality metric used in the Six Sigma process and is calculated by the
number of defects observed divided by the number of opportunities for
defects normalised to 1 million units
CONTINUOUS DATA
Data obtained from a measurement system which has an infinite number
of possible outcomes
A random variable which can assume any value continuously within
some specified interval
CONTINUOUS RANDOM
VARIABLE
DESIGN FOR CUSTOMER
IMPORT (DFCI)
Approach to customers characterised by customer centricity and
measuring from the customer pointofview
CONTROL CHART
A graphical rendition of a characteristic's performance across time in
relation to its natural limits and central tendency
FAILURE MODE &
EFFECTS ANALYSIS
(FMEA)
Analytical technique focused at problem prevention through
identification of potential problems. The FMEA is a proactive tool that
is used pragmatically to identify potential failures and their effects, to
numerically rate the combined risk associated with severity, probability
of occurrence and delectability, and to document appropriate plans for
prevention. FMEA’s can be applied to system, application, and product
design and to manufacturing and nonmanufacturing processes (i.e.,
services and transactional processes)
CONTROL LIMITS
Apply to both range or standard deviation and subgroup average (X)
portions of process control charts and are used to determine the state of
statistical control. Control limits are derived statistically and are not
related to engineering specification limits in any way
CONTROL PLAN
A formal quality document that describes all of the elements required to
control variations in a particular process or could apply to a complete
product or family of products
FIRST TIME YIELD
Yield that occurs in any process step prior to any rework that may be
required to overcome process shortcomings
Specification requirements for the product being manufactured
CONTROL
SPECIFICATIONS
FIXED EFFECTS MODEL
An experimental model where treatments are specifically selected by the
researcher. Conclusions only apply to the factor levels considered in the
analysis. Inferences are restricted to the experimental levels
CORRELATION
The relationship between two sets of data such that when one changes,
the other is likely to make a corresponding change. Also, a statistical tool
for determining the relationship between two sets of data
FLUCTUATIONS
Variances in data which are caused by a large number of minute
variations or differences
COST OF POOR QUALITY
(COPQ)
Cost associated with providing poor quality products or services. Can be
divided into four cost categories: Appraisal, Scrap, Rework, and Field
Complaint (warranty costs)
FREQUENCY
DISTRIBUTION
The pattern or shape formed by the group of measurements in a
distribution based on frequency of occurrence
A measure of the variation observed when a single operator uses a gage
to measure a group of randomly ordered (but identifiable) parts on a
repetitive basis
A measure of average variation observed between operations when
multiple operators use the same gage to measure a group of randomly
ordered (but identifiable) parts on a repetitive basis
HOMOGENEITY OF
VARIANCE
GAGE REPEATABILITY
GAGE REPRODUCIBILITY
A measure of variation observed when a gage is used to measure the
same master over an extended period of time
A measure of gage accuracy variation when evaluated over the expected
operating range
The average difference observed between a gage under evaluation and a
master gage when measuring the same parts over multiple readings
Six Sigma role similar in function to Black Belt, but length of training
and project scope are reduced
Vertical display of a population distribution in terms of frequencies; a
formal method of plotting a frequency distribution
The average difference observed between a gage under evaluation and a
master gage when measuring the same parts over multiple readings
The variances of the data groups being contrasted are equal (as defined
by a statistical test of significant difference)
LINE CHARTS
MISTAKEPROOFING
GAGE ACCURACY
GAGE LINEARITY
MEAN TIME BETWEEN
FAILURES (MTBF)
LOWER CONTROL LIMIT
MEASUREMENT SYSTEMS
ANALYSIS (MSA)
INTERVAL
GAGE STABILITY
GAGE ACCURACY
INDEPENDENT VARIABLE
INTERACTION
MEDIAN
INSTABILITY
MEAN
MIXED EFFECTS MODEL
HISTOGRAM
MINITAB
GREEN BELT
MULTIVARI
Numeric categories with equal units of measure but no absolute zero
point, i.e., quality scale or index
A horizontal dotted line plotted on a control chart which represents the
lowest process deviation that should occur if the process is in control (free from assignable cause variation)
See AVERAGE
Average time to failure for a statistically significant population of a
product operating in its normal environment
A controlled variable; a variable whose value is independent of the value
of another variable
Unnaturally large fluctuations in a process input or output characteristic
Contains elements of both the fixed and random effects models
Statistical software package that operates on Microsoft Windows with a
spreadsheet format and has powerful statistical analysis ability
Charts used to track the performance without relationship to process
capability or limits
Means of evaluating a continuous or discrete measurement system to
quantify the amount of variation contributed by the measurement system.
Refer to Automotive Standard (AIAG STD) for details
Method used in the measure/analyse phase of Six Sigma to display in
graphical terms variation within parts, machines, or processes between
machines or process parts, and over time
Mistake proofing is a proactive technique used to positively prevent errors
from occurring
The midvalue in a group of measurements when ordered from low to
high
The tendency of two or more variables to produce an effect in
combination which neither variable would produce if acting alone
KEY PROCESS INPUT
VARIABLES (KPIV’s)
The vital few input variables, called "x’s" (normally 26), that drive 80%
of the observed variations in the process output characteristic ("y")
MASTER BLACK BELT
A person who is an "expert" on Six Sigma techniques and on project
implementation. Master Black Belts play a major role in training,
coaching and in removing barriers to project execution in addition to
overall promotion of the Six Sigma philosophy
HYPOTHESIS
When used as a statistical term, it is a theory proposed or postulated for
comparing means and standard deviations of two or more data sets. A
"null" hypothesis states that the data sets are from the same statistical
population, while the "alternate" hypothesis states that the data sets are not from the same statistical population
NULL HYPOTHESIS
NORMAL DISTRIBUTION
PROCESS AVERAGE
PROBABILITY
POISSON DISTRIBUTION
PROBLEM
PROBABILITY OF AN EVENT
NONCONFORMING UNIT
NORMALIZED ROLLED
THROUGHPUT YIELD (RTYN)
POWER OF AN
EXPERIMENT
PARETO DIAGRAM
PRECISION TO
TOLERANCE RATIO (P/T)
P CHARTS
POPULATION
ONESIDED
ALTERNATIVE
PROBLEMSOLVING
OUT OF CONTROL
ORDINATE
POPULATION
PROCESS CONTROL CHART
PERTURBATION
PROCESS CONTROL
ORDINAL
PROCESS
PREVENTION
PARAMETER
The probability of rejecting the Null Hypothesis when it is false and
accepting the alternate hypothesis when it is true
A chart which places common occurrences in rank order
The entire set of items from which a sample is drawn
The estimate of the average process yield used to determine RTY. It is
determined by taking the nth root of the RTY (where "n" is the # of
process steps) included in the RTY calculation
A ratio used to express the portion of engineering specification consumed
by the 99% confidence interval of measurement system repeatability and
reproducibility error. (5.15 standard deviations of R&R error)
An assertion to be proven by statistical analysis where two or more data
sets are stated to be from the same population
The practice of eliminating unwanted variation before the fact, e.g.,
predicting a future condition from a control chart and then applying
corrective action before the predicted event transpires
Ordered categories (ranking) with no information about distance between
each category, i.e., rank ordering of several measurements of an output
parameter
A nonrandom disturbance
The vertical axis of a graph
A continuous, symmetrical density function characterised by a bellshaped curve, e.g., distribution of sampling averages
The value of a parameter which has an upper bound or a lower bound, but
not both
A statistical distribution associated with attribute data (the number of nonconformities found in a unit) and can be used to predict first pass yield
The process of solving problems; the isolation and control of those
conditions which generate or facilitate the creation of undesirable
symptoms
Condition which applies to statistical process control chart where plot
points fall outside of the control limits or fail an established run or trend
criteria, all of which indicate that an assignable cause is present in the
process
Charts used to plot percent defectives in a sample where sample size is
variable
The central tendency of a given process characteristic across a given
amount of time or at a specific point in time
A unit which does not conform to one or more specifications, standards,
and/or requirements
See STATISTICAL PROCESS CONTROL
The number of successful events divided by the total number of trials
A group of similar items from which a sample is drawn. Often referred to
as the universe
The chance of an event happening or condition occurring by pure chance
and is stated in numerical form
A deviation from a specified standard
A constant defining a particular property of the density function of a
variable
Any of a number of various types of graphs upon which data are plotted
against specific control limits
A particular method of doing something, generally involving a number of
steps or operations
NONCONFORMITY
A condition within a unit which does not conform to some specific
specification, standard, and/or requirement; often referred to as a defect;
any given nonconforming unit can have the potential for more than one
nonconformity
PRIMARY CONTROL
VARIABLES
The major independent variables used in the experiment
PROCESS MAP
A detailed stepbystep pictorial sequence of a process showing process
inputs, potential or actual controllable and uncontrollable sources of
variation, process outputs, cycle time, rework operations, and inspection
points
PROCESS SPREAD
The range of values which a given process characteristic displays; this
particular term most often applies to the range but may also encompass the variance. The spread may be based on a set of data collected at a specific point in time or may reflect the variability across a given period of time
RANDOM CAUSE
A source of variation which is random, usually associated with the "trivial
many" process input variables, and which will not produce a highly
predictable change in the process output response (dependent variable),
e.g., a correlation does not exist; any source of variation results in a small
amount of variation in the response; cannot be economically eliminated
from a process; an inherent natural source of variation
RANGE
RESEARCH
RANDOM VARIABLE
PRODUCER’S RISK
ROOT SUM SQUARED (RSS)
PROJECT
RANDOM SAMPLE
ROLLED THROUGHPUT
YIELD (RTY)
RATIONAL SUBGROUP
RANDOMNESS
R CHART
RESIDUAL ERROR
RANDOM VARIATIONS
RESPONSE SURFACE
METHODOLOGY (RSM)
REPRESENTATIVE
SAMPLE
RANK
ROBUST
QUALITY FUNCTION
DEPLOYMENT (QFD)
REGRESSION
REJECTION REGION
REPLICATION
Plot of the difference between the highest and lowest in a sample.
Normally associated with the range control portion of an X, R chart
The condition or state in which a response parameter exhibits a high
degree of resistance to external causes of a nonrandom nature; i.e.,
impervious to perturbing influence
Probability of rejecting a lot when, in fact, the lot should have been
accepted (see ALPHA RISK)
A problem, usually calling for planned action
Selecting a sample such that each item in the population has an equal
chance of being selected; lack of predictability; without pattern
Critical and exhaustive investigation or experimentation having for its aim
the revision of accepted conclusions in the light of newly discovered facts
See EXPERIMENTAL ERROR
A statistical technique for determining the best mathematical expression
that describes the functional relationship between one response and one or more independent variables
Repeat observations made under identical test conditions
The difference between the highest and lowest values in a "subgroup"
sample
A subgroup is usually made up of consecutive pieces chosen from the
process stream that the variation represented within each subgroup is as
small as feasible. Any changes, shifts and drifts in the process will appear
as differences between the subgroups, selected over time
A variable which can assume any value from a distribution which
represents a set of possible values
A graphical (pictorial) analysis technique used in conjunction with DOE
for determining optimum process parameter settings
A sample which accurately reflects a specific condition or set of
conditions within the universe
Square root of the sum of the squares. Means of combining standard
deviations from independent causes
The product (series multiplication) of all of the individual first pass yields
of each step of the total process
Values assigned to items in a sample to determine their relative occurrence in a population
The region of values for which the alternate hypothesis is accepted
Variations in data which result from causes which cannot be pinpointed or
controlled
A condition in which any individual event in a set of events has the same
mathematical probability of occurrence as all other events within the
specified set, i.e., individual events are not predictable even though they
may collectively belong to a definable distribution
QFD is a disciplined matrix methodology used for documenting and
transforming customer wants and needs – "the voice of the customer" 
into operational "requirement" terms. It is an effective tool for determining
criticaltoquality characteristics for transactional processes, services and
products
SAMPLE
A portion of a population of data chosen to estimate some characteristic
about the whole population. One or more observations drawn from a larger collection of observations or universe (population)
SUBGROUP
A logical grouping of objects or events which displays only random eventtoevent variations, e.g., the objects or events are grouped to create
homogenous groups free of assignable or special causes. By virtue of
minimising within subgroup variability, any change in the central tendency
or variance of the universe will be reflected in the "subgrouptosubgroup"
variability
A predetermined sample of consecutive parts or other data bearing objects removed from the process for the purpose of data collection
SCATTER DIAGRAMS (PLOTS)
Charts which allow the study of correlation, e.g., the relationship between
two variables or data sets
TYPE I ERROR
STATISTICAL CONTROL
UPPER CONTROL LIMIT
SYSTEM
UNNATURAL PATTERN
THEORY
STATISTICAL PROCESS
CONTROL (SPC)
TEST OF SIGNIFICANCE
SIPOC
SHORT RUN STATISTICAL
PROCESS CONTROL
SYMPTOM
SYSTEMATIC VARIABLES
SIX SIGMA
TWOSIDED
ALTERNATIVE
SYSTEM
STABLE PROCESS
SIX M'S
SPECIAL CAUSE
TYPE II ERROR
SKEWED DISTRIBUTION
STANDARD DEVIATION
That which serves as evidence of something not fully understood in factual terms
The values of a parameter which designate both an upper and lower bound
That which is connected according to a scheme.
A plausible or scientifically acceptable general principle offered to explain
phenomena
A statistical control charting technique which applies to any process
situation where there is insufficient frequency of subgroup data to use
traditional control charts (typically associated with lowvolume
manufacturing or where setups occur frequently). Multiple part numbers
and multiple process streams can be plotted on a single chart
A highlevel process map. Stands for SupplierInputsProcessOutputsCustomer
A term coined by Motorola to express process capability in parts per
million. A Six Sigma process generates a maximum defect probability of
3.4 parts per million (PPM) when the amount of process shifts and drifts
are controlled over the longterm to less than +1.5 standard deviations
A nonsymmetrical distribution having a tail in either a positive or
negative direction
See ASSIGNABLE CAUSE
See ALPHA RISK
The application of standardised statistical methods and procedures to a
process for control purposes
A horizontal line on a control chart (usually dotted) which represents the
upper limits of capability for a process operating with only random
variation
That which is connected according to a scheme
A process which is free of assignable causes, e.g., in statistical control
Any pattern in which a significant number of the measurements do not
group themselves around a central tendency. When the pattern is
unnatural, it means that nonrandom disturbances are present and are
affecting the process
A statistical procedure used to determine whether or not a process
observation (data set) differs from a postulated value by an amount
greater than that due to random variation alone
A statistical index of variability which describes the process spread or
width of a distribution
A pattern which displays predictable tendencies
A quantitative condition which describes a process that is free of
assignable/special causes of variation (both mean and standard deviation). Such a condition is most often evidenced on a control chart, i.e., a control chart which displays an absence of nonrandom variation
The major categories that contribute to effects on the fishbone diagram
(men & women, machine, material, methods, measurement, and Mother
Nature
See BETA RISK
VARIABLE
VARIABLES DATA
VARIATION
VARIATION RESEARCH
VOICE OF THE CUSTOMER
X& R CHARTS
A characteristic that may take on different values
Data collected from a process input or output where the measurement scale has a significant level of subdivisions or resolution., e.g., ohms, voltage, diameter, etc
Any quantifiable difference between individual measurements; such differences can be classified as being due to common causes (random or special causes (assignable)
Procedures, techniques, and methods used to isolate one type of variation from another (for example, separating product variation from test variation)
Data gathered from the customers that provides information about their needs and requirements
A control chart which is a representation of process capability over time; displays the variability in the process average and range
Appendix 2Six Sigma tool selector