1 / 20

Bootstrap Hints

Bootstrap Hints. Overview of Bootstrapping Hints. The objective of a good bootstrap model is to be a realistic model of intuitive judgments which are even more accurate than the judges The measure of effectiveness in this area is the R squared

lclifford
Download Presentation

Bootstrap Hints

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bootstrap Hints

  2. Overview of Bootstrapping Hints • The objective of a good bootstrap model is to be a realistic model of intuitive judgments which are even more accurate than the judges • The measure of effectiveness in this area is the R squared • Roughly, R squared means the % of variance explained by the model • These hints should help improve R squared

  3. Strategies for Improving R Squared • Hints for choosing the right variables • Hints for improving data gathering • Hints for improving quantification • Hints for finding higher-order variables

  4. Hints for Choosing Variables • For some commonly bootstrapped variables – such as Confidence Index and Cancellation Probability – these variables may be considered: • Project cost and/or duration • Is it a compliance project and/or is the project a documented strategic requirement? • What is the scope of the business covered? (eg. Number of departments involved, number of users, etc.) • Sponsor characteristics such as level, whether the sponsor is business or IT, or the sponsors past success record in past projects • Whether the investment is new software development, package modification, upgrades to previous systems, hardware only, etc. • Technology risk such as proven track records, IT familiarity with the technology, the maturity of the technology • Watch how many variables are added - much more than 8 variables starts to become unproductive and may degrade the accuracy of the model – stick to the important ones

  5. Data Gathering Hints • You will probably always get a higher R square when averaging larger groups • Be sure to allow time for calibration • Use a trial bootstrap list that they discuss as a group • They can check results with “pair-wise comparisons” – they pick pairs of investments at random, determine which they would prefer, then they confirm that their evaluators scores reflect this

  6. Hints for Quantifying Variables • Regression assumes that all variables are basically linear • Reviewing each variable for non- linearity and finding a way to make them linear will improve R squared • Variables that can be captured as 0 or 1 (binary) need no review • Continuous variables need to be graphed to check for non-linearity • Discrete variables that are not binary require pivot table analysis (see pivot table procedure for details)

  7. Continuous Variables • One way to improve R square is to convert your non-linear variables into linear variables • To check which variables are non-linear make an XY graph of the continuous variable on the X axis and the bootstrapped variable (from the evaluators) on the Y axis • If you find an obviously non-linear relationship, you can change the variable so that it becomes linear • Depending on how the graph looks, you can take the appropriate steps

  8. Linear • This is an obvious linear relationship, leave it just like it is

  9. Scattered Distribution • If the XY plot is not obviously non-linear, then just leave it like it is • If the Excel regression output indicates that this variable has little or no effect, consider removing it

  10. Clustered distribution • Here, a “threshold” would be the best quantification of this variable • Instead of being linear, this variable appears to make a difference only when it is above or below a certain value (in this case, about 6% on the horizontal scale • Try converting the continuous variable to a binary. In this case you would use “=if(x<.06, 1,0)”

  11. 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 0 10 20 Upward Sloping • If the graph slopes upward, then you might try putting the scale of the X axis on “logarithmic” • If this makes it look linear then use the formula “=log(X)” • If that doesn’t work try “=X^.5” or some other power of X less than 1

  12. 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 0% 50% 100% 150% 200% 250% 300% Leveling Off • Try setting the scale of the Y axis to “logarithmic” • If this makes it look linear then use “=exp(X)” • If it doesn’t work, try “=X^2” or some other power of X

  13. Downward Sloping • Try setting the scale of the Y axis to “logarithmic” • If this makes it look linear then use “=exp(x)” • If it doesn’t work, try “=1/X” 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 0% 50% 100% 150% 200% 250% 300%

  14. Hints for Higher-Order Terms • After your first attempt at a regression, you may improve your R squared by adding some “higher-order” variables • A higher-order variable includes variables that are the products of other variables, conditional statements involving other variables, etc. • To find potential candidates for higher-order terms, ask yourself if the importance of some variables depend on the values of other variables • Try several new terms and plot each one. If there looks like an obvious linear relationship, then add it • If you make a higher-order variable, run a new regression, and the R square is higher, it was probably a good choice

  15. Continuous Higher-Order Terms • If the importance of one variable depends on the value of another, and they are both continuous, try the following – we’ll call these two variables X and Y • If the bootstrapped variable should increase when both X and Y are high (or when both are low) then try “=X*Y” • If the bootstrapped variable should increase when one variable is high and the other is low then try “=X/Y” • If X is especially important when Y is over/under a certain value N then try “=if(Y>N, X, 0)

  16. Discrete Higher-Order Terms • You might try a pivot table that compares the average bootstrapped output variable in combinations of the two variables – put one variable in the columns of a pivot and the other in the rows • You can then try a nested IF statement that allows you to put a separate discrete value on each combination of the two variables • For example, suppose you found a compounding relationship between “strategic” (Y) and “multiple departments” (X) • You might try “=if(X=1,if(Y=1,.41,.11),.5)” Average Multiple Departments These 2 are not significantly different so you can average them and use the same value Strategic 1 0 1 0 .41 .51 .11 .49

  17. Improvements Due to Bootstrap • This chart shows the percentage reduction in error of intuitive estimates compared to bootstrapped estimates • Results vary depending on how objective and systematic the model was – like ours Mean across many studies Business failures using financial ratios Psychology course grades IQ scores using Rorschach tests Student ratings of teaching effectiveness Mental illness using personality tests Changes in stock prices Graduate students grades Life-insurance salesrep performance Cancer patient life-expectancy 0% 5% 10% 15% 20% 25% 30% 35% 40%

  18. Actual Classification Plots • An Illinois insurance company created a classification chart to help prioritize the current list of proposed investments • They wanted to determine which investments could be accepted without more analysis and which need more analysis • 18 investments were plotted on the classification chart • The results had a profound effect on investment priorities • Some investments that were assumed to be beneficial now required analysis and some that required analysis could now be approved immediately

  19. Classification of Example Projects Do Abbreviated Risk-Return Analysis: 6. DLSW Router Network Redesign 9. Extended Hours 18. Doc. Access Strategy Do Full Risk-Return Analysis: 8. Pearl Indicator and Pearl I/O interface 11. Richardson Data Center Consolidation 15. MVS DB2 Tools Accept without Further Analysis: 5. Lucent switch upgrade 7. Image Server Relocation 17. Enterprise IntraNet to all sites 1 5 17 7 0.9 11 6 10 0.8 15 9 4 18 8 No Classification Needed 0.7 Confidence Index 3 16 14 12 0.6 1 2 Success Factor Adjustments: 4. Network OS migration to Novell 5.x 10. Optimize Single Code Base 0.5 13 Reject; Consider Other Options: 1. Data Strategy 2. Enterprise Security Strategy 3. Remote Server Redundancy 12. MQ Series: Base 13. Development Environment 2000 (mf) 14. “Source Control” Source Code Mgmt 16. Enterprise InterNet 0.4 0.3 10 100 1,000 10,000 Expected Investment Size ($000)

  20. Bootstrapping Deliverables • Final presentation including • An XY chart showing correlation of original estimates to the bootstrap model • Any “solution space” that was developed such as classification charts • A worksheet for input of various values which uses the bootstrap model to estimate some output variable(s) • Any customization to RAVI documentation for that client for proper use of the worksheets and solution spaces • Any recommendations based on the bootstrap

More Related