1 / 46

Performance Engineering

Performance Engineering. MEASUREMENT AND STATISTICS. Prof. Jerry Breecher. Measurement and Statistics. In order to get you in the mood for doing some measuring, statistics, and estimating, here are some quotations with the right flavour: "Figures don't lie, but liars figure." Mark Twain

marciano
Download Presentation

Performance Engineering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Performance Engineering MEASUREMENT AND STATISTICS Prof. Jerry Breecher

  2. Measurement and Statistics In order to get you in the mood for doing some measuring, statistics, and estimating, here are some quotations with the right flavour: "Figures don't lie, but liars figure." Mark Twain "There are three kinds of untruths; lies, damn lies, and statistics." Mark Twain The following are from "Policy Paradox and Political Reason" by Deborah Stone. "Numerals hide all the difficult choices that go into a measurement." "Certain kinds of numbers, big ones, numbers with decimal points, ones not multiples of ten, seemingly advertise the prowess of the measurer." "How accurate a number is depends on the cost of acquiring it and on how important it is."

  3. Measurement and Statistics "Numbers are a form of poetry. Symbols are another." "No number is innocent, for it is impossible to count without making categorization." "Every number is a political statement about where to draw the line." "The first number you measure becomes the status quo."

  4. Measurement and Statistics Purpose: This section is about the methodology of measurement. What goes into designing an experiment, gathering some numbers, interpreting the results, and presenting those results to management in a way that allows them to make the necessary decisions. Warm-up Experiment: Divide into teams and measure the length of an object in the classroom. To do so you will need to make team decisions about tools, techniques, and reporting metrics. Upon completion, discuss what can be learned from this experiment.

  5. Measurement and Statistics FUNDAMENTAL QUESTIONS ABOUT MEASUREMENT: What kind of accuracy can you expect from a computer (or any other) measurement? When you make a measurement, can you believe the result? How sure are you of the result? How should you state the result of an experiment? How do you reflect your belief in its accuracy? Can one number represent the performance of a product? When have you measured enough? Figures don't lie, but liars figure. How do you extrapolate from what you know to what you'd like to know?

  6. Measurement and Statistics FUNDAMENTAL QUESTIONS ABOUT MEASUREMENT: How do you know what tools to use? Is everything in a computer measurable? How do you know what to measure? Should you always know the result of a measurement before you make it? How do you figure out dependencies; how does one variable depend on another? So after all this talk about the details of measurement, how do you actually design an experiment?

  7. Measurement and Statistics 1. What kind of accuracy can you expect from a computer (or any other) measurement? Associated Questions are: • What are some sources of uncertainty when measuring a computer and its software. • Is a computer deterministic? (What is the meaning of deterministic? Do a detour on predictable, deterministic, stochastic and chaotic.) • What are the pros and cons of taking all the variation out of an environment. Repeatability vs. believability. Here are some factors that lead to experimental variation: • System/Component/Molecule/Atom – how granular is the measurement. • Background Activity • End effects and incomplete cycle effects. Measurement error. • Randomness doesn't mean equality (stochastic process). Example: Travelling around a monopoly board. • Randomness from resource contention ( stochastic process ). Example: Six processes do nothing but read randomly from a single disk. Do they each make approximately the same number of accesses after 1 second? 1 minute? 1 hour? 1 day?

  8. Measurement and Statistics 1. What kind of accuracy can you expect from a computer (or any other) measurement? Here are some factors that lead to experimental variation (continued): • Changing hardware. Example: Variations in fullness of a disk, CPU boards, interrupt traffic. • Tool granularity Example: Our experiment in class. Example: You write a program that measures time in seconds. What percentage accuracy can you get from your experiment. Example: You want to measure the time required to execute a routine and have available a system call named get_time_of_day. get_time_of_day returns time in units of 1/65535 seconds = 16 microseconds. The time required to execute the get_time_of_day routine itself is 100 microseconds. What is the shortest routine that can be measured with this tool? How would you do it? Bottom Line: Never believe a real system number to better than 5 - 10%. Artificial numbers can sometimes be repeated to 1 - 2%, but are susceptible to spurious factors.

  9. Measurement and Statistics 2. When you make a measurement, can you believe the result? How sure are you of the result? ? Suppose you make several determinations of some measure. If you can answer yes to the following questions, then you can have some faith in your measurement: • Can you explain why the numbers vary? (“Handwaving” isn't allowed here, but “statistics” may be a valid answer.) • If variations are greater than 10%, can you figure out what's causing the variation and could you eliminate it if time allowed? • If the granularity of your tool is greater than the measurement variations, is that acceptable? (Your granularity then becomes your uncertainty.) But How Much Do You Trust It? To answer this we need a brief digression into some math. Suppose we've taken a number of measurements

  10. Measurement and Statistics 2. When you make a measurement, can you believe the result? How sure are you of the result? ? Then the mean and standard deviation are: s = s2 =variance = SD2 The first form of the Standard Deviation is the form of the underlying data. The second form is that of the measured data. They are the same for an infinite amount of data and close enough for a large set of numbers. NOTE: Use of these equations assumes that the measurements are independent of each other.

  11. Measurement and Statistics 2. When you make a measurement, can you believe the result? How sure are you of the result? ? Confidence Intervals: We'd like to say  “I'm p% sure that with n samples the actual value is within d of the mean of the measurements.” In this section, we develop simple ways to be able to make that statement. Example of Standard Deviations using Normal Distributions: By quoting the standard deviation of a measurement, we say we're 68% sure the true mean is within a standard deviation of the measured mean. Unfortunately, that 68% depends on having a large number of samples.  For smaller numbers, the percentage will change. Normal distribution showing mean and variance.

  12. Measurement and Statistics 2. When you make a measurement, can you believe the result? How sure are you of the result? ? Distributions: Student-T   Both the normal and Student-T distributions represent how random data should be found. The difference lies in how many samples are taken; the Normal Distribution assumes a very large (like infinite) number of samples, while the Student T is for n (less than infinite) samples. As you see in the examples on subsequent pages, n is used as part of the confidence calculation. T distribution showing dependence only on number of samples. The derivation of the t-distribution was first published in 1908 by William Sealy Gosset, while he worked at a Guinness Brewery in Dublin. He was not allowed to publish under his own name, so the paper was written under the pseudonym Student. The t-test and the associated theory became well-known through the work of R.A. Fisher, who called the distribution "Student's distribution".

  13. Measurement and Statistics 2. When you make a measurement, can you believe the result? How sure are you of the result? ? The Burns Co. is now making laptop computers in its Shelbyville plant. Mr. Burns is too cheap to wreck too many computers in a test, so he's letting his QA guru, Homer, smash five of them. Homer is to record from how high in the air he can drop each laptop on the floor before it won't work anymore. Mr. Burns' wants laptops that can survive a fall from his height of five feet, two The t-test will tell us if we can accept that the average breaking point for a Burns laptop is greater than 5'2", given what we know about the sample. Let's say the five computers broke at drops of: • 4 feet, 8 inches • 5 feet, 1 inch • 2 feet, 3 inches • 6 feet, 10 inches • 7 feet, 1 inch

  14. Measurement and Statistics 2. When you make a measurement, can you believe the result? How sure are you of the result? ? Using the formula: (avg. of sample) - (presumed avg. of larger pop.) t = -------------------------------------------------- (st. dev. of sample) / (sq. root of sample size) • we get an average breaking height of 62.2 inches, St Dev of 23.4, and a t-score of 0.0191. • Let's go to the t-score table. There we find the t-value for four degrees of freedom and a 90-percent confidence interval (that's p=.05, since taking .05 off each side of the bell curve leaves us with .90 in the middle). That value is 2.13. • Since the value we calculated is less than the table's t-value, that means we cannot accept the assumption that all Burns laptops together have an average breaking drop of over 62 inches. Even though our sample's average came in (just) over that.

  15. Measurement and Statistics 2. When you make a measurement, can you believe the result? How sure are you of the result? ? Example – Use of Student-T: As part of our ongoing regression test package, monitoring the performance of PRODUCT X, we run tests that tickle a number of code paths. In this table, higher numbers are better - they represent the number of transactions completed – they are throughput. RESULTS Model --> 110 120 130 140 150 160 Product X, Version A 3.25 6.34 9.37a 11.8b 14.3 16.6c Product X, Version B 3.20 6.30 9.22d 11.8e 14.4f 16.8 Here are the raw numbers which went into making up the averages indicated above: a b c d e f 9.36 11.76 16.59 9.21 11.83 14.40 9.37 11.80 16.59 9.22 11.82 14.29 9.38 11.79 16.58 9.20 11.85 14.43 9.35 11.77 16.63 9.22 11.82 14.36 9.38 11.85 16.66 9.23 11.88 14.44

  16. Measurement and Statistics 2. When you make a measurement, can you believe the result? How sure are you of the result? ? Example – Use of Student-T: Let's work through in detail the numbers in "f". We find the mean = (14.40 + 14.29 + 14.43 + 14.36 + 14.44 )/5 = 14.38 SD = SQRT( (.02 + .09 + .05 + .02 +.06 )/4 ) = SQRT( 0.00375 ) = 0.061 s = variance = SD2 = 0.00375 Suppose we want to find the confidence interval for 95% confidence. With 5 variables, we have n = 4 degrees of freedom. Read the table for t(0.975) ( there's 2.5% UNconfidence on each side of the curve ) giving 2.78. d = t * SQRT( s / n ) = 2.78 * SQRT( 0.00375 / 5 ) = 2.78 * 0.027 = 0.075 The number is 14.38 +- 0.075 with 95% confidence. (How should you round off this number to accurately reflect your confidence?)

  17. Measurement and Statistics Example of Normal Distribution: Suppose we’ve been making measurements as shown in the first column in the Table below. By inserting those numbers in Excel, the spreadsheet will calculate all kinds of things for us automatically.

  18. Measurement and Statistics 2. When you make a measurement, can you believe the result? How sure are you of the result? COMPARING TWO SETS OF MEASUREMENTS: You’ve just measured the Performance of the latest release of your product. The numbers are better than they were when you measured them on the last release. But what does “better” mean. How do you show that two sets of numbers, with lots of uncertainty in each of the sets, really have one set better than the other. First of all, here’s the easy way. With your two sets, calculate their means and their confidence intervals (the % confidence you use is up to you.) Visually plot these results as show in the three examples below: A B C The results are such that the mean of one set is within the confidence interval of the other set. The two sets are NOT different. A.Here the confidence intervals don’t overlap. The results are different from each other. The confidence intervals overlap but the means are not inside the CI of the other set. Need to do a more complex test.

  19. Measurement and Statistics 2. When you make a measurement, can you believe the result? How sure are you of the result? COMPARING TWO SETS OF MEASUREMENTS: In essence this is a way to combine the confidences for the two data sets so as to determine the confidence in the difference between the two sets. This is called a t-test. Excel can do a t-test as shown in the data below: Data Set Data Set 1 2 5.36 19.12 16.57 3.52 0.62 3.38 1.41 2.50 0.64 3.60 7.26 1.74 5.31 5.64 <-- Average =AVERAGE(A3:A8) 6.16 6.64 <-- Standard Deviation =STDEV(A3:A8) 0.465703 <-- Result of the t-test says there is a 46% chance these are from the same distribution =TTEST(A3:A8,B3:B8,1,1) So for these sets of data, the answer is inconclusive. We can’t tell if there’s a significant difference between the data sets.

  20. Measurement and Statistics 2. When you make a measurement, can you believe the result? How sure are you of the result? CHECKING A SERIES OF VALUES: We'd like to know if a series of values matches a predicted distribution. In other words, we have a theory of what an experiment should give - do the results in fact match the theory? Chi-Squared tables are available for this purpose. Calculate Chi - Squared where O = Observed and E = Expected.

  21. Measurement and Statistics 2. When you make a measurement, can you believe the result? How sure are you of the result? CHECKING A SERIES OF VALUES: Example: Suppose a random number generator is invoked 200 times and produces values shown in this table: Range Number of Values 0.0 - 0.1 23 0.1 - 0.2 22 0.2 - 0.3 19 0.3 - 0.4 15 0.4 - 0.5 22 0.5 - 0.6 21 0.6 - 0.7 20 0.7 - 0.8 16 0.8 - 0.9 21 0.9 - 1.0 21 Plugging this into the equation gives: There are nine degrees of freedom. From the chi-squared distribution at this same website. Look along the 9-degree row and find that 3.1 is between 3.325 (0.050) and 2.700 (0.025) - interpolated as approximately 0.040. We can reject the hypothesis the results are the same with a probability of about 4%. Conversely, we can be 96% sure the distribution is uniform. Exercise: Do this same calculation using the Chi Squared Function in Excel.

  22. Measurement and Statistics Pat has developed a new product, "rabbit" about which she wishes to determine performance. There is special interest in comparing the new product, rabbit to the old product, turtle, since the product was rewritten for performance reasons. (Pat had used Performance Engineering techniques and thus knew that rabbit was "about twice as fast" as turtle.) The measurements showed: Performance Comparisons ProductTransactions / secondSeconds/ transactionSeconds to process transaction Turtle 30 0.0333 3 Rabbit 60 0.0166 1 Which of the following statements reflect the performance comparison of rabbit and turtle? 3. How should you state the result of an experiment? How do you reflect your belief in its accuracy? o Rabbit is 100% faster than turtle. o Rabbit is twice as fast as turtle. o Rabbit takes 1/2 as long as turtle. o Rabbit takes 1/3 as long as turtle. o Rabbit takes 100% less time than turtle. o Rabbit takes 200% less time than turtle. o Turtle is 50% as fast as rabbit. o Turtle is 50% slower than rabbit. o Turtle takes 200% longer than rabbit. o Turtle takes 300% longer than rabbit.

  23. Measurement and Statistics • The guiding principle in stating a result is to keep it simple. • State the accuracy using the same methods we've just discussed. Use Means, Standard Deviations, and Confidence Intervals. • Include the number of decimal points that reflect the accuracy of your answer. Avoid things like 7.365 with standard deviation of 2. • It goes without saying that reflecting your belief in the accuracy presupposes you’ve done the experiment correctly. Some simple guidelines: • In my experience, you always do the experiment wrong the first five times. Through experience you learn to look critically at your result to see if it makes sense. If not, then you go figure out what went wrong. Usually it’s some parameter that wasn’t controlled. • Only vary one parameter at a time. • Watch out for interactions between parameters. The result of changing one parameter results in some other parameter changing as well. • Don’t do too many or too few experiments. • Get someone else to check your results – by the time you finish a measurement you have too much invested in it and are very likely to miss something obvious. 3. How should you state the result of an experiment? How do you reflect your belief in its accuracy?

  24. Measurement and Statistics Answer: No, but you'll be asked to do it anyway. Preparation For This Section – some definitions: Mean or Expected Value: 4. Can one number represent the performance of a product? MedianThat value for which there’s an equal probability of being above it and below it. ModeThe most likely value. The value with the highest probability. Mode Median Mean

  25. Measurement and Statistics • Example: • The Performance Group at the XYZ Corporation has developed a synthetic workload that they feel reflects the kind of computer work done by XYZ's "typical" customer. This workload is composed of various programs driven by a remote terminal emulator ( RTE ). The RTE can both initiate programs and log when the programs complete. • This workload was run last week with results shown in the table: • Results of XYZ Corp Performance Benchmark • Transaction TypeTime to complete transaction • Edit a file 14 sec • Compile and link a file 143 sec • Run compiled program 17 sec • 200 disk reads 6 sec • 1000 process reschedules 3 sec • 100 physical page faults 10 sec • Send and receive mail 57 sec • TOTAL TIME 250 sec • NOTE: Because all these programs are started simultaneously, there is contention for resources. • The time reported to management was 250 seconds. 4. Can one number represent the performance of a product?

  26. Measurement and Statistics • Example: • Questions: • Is this a good performance indicator? • If yes, then sit and relax a few minutes. • If no, how would you express the results of these tests? How might you revamp the tests? • What guidelines can be derived for producing one-number performance metrics? 4. Can one number represent the performance of a product?

  27. Measurement and Statistics • This is really two questions: • When have you measured enough to get the accuracy of answer that management expects at this time? • This is a matter of setting the correct expectations before you start. Many times the answer is in response to a “what if” question – you can get the appropriate accuracy in one hour. Other times you’ll need weeks of design/setup/measurement/analysis to get the expected accuracy. • NOTE: Only a small amount of the total experimental time is in the measurement. Most time goes for design and elimination of unwanted factors. So this question could be stated as “How complicated should an experiment be?” • When have you measured enough to get the degree of accuracy you expected for the experiment? • You can use the confidence measures we discussed before. In essence, confidence is 5. When have you measured enough?

  28. Measurement and Statistics The relationship between the number of required samples and experimental parameters is: 5. When have you measured enough? Here n = number of samples required z = the number of deviations of the desired confidence s = Standard Deviation r = The desired accuracy in percent. xmean = The mean of the measurement NOTE: See that the more accuracy you want (s), the more measurements you need. NOTE: If your numbers all come out the same, stop. Measurement uncertainty is not the largest part of the error in your metric.

  29. Measurement and Statistics • Often we need a result that is unmeasurable, or would require eons to determine. Is it legal to guess? • Answer: • Sure - as long as you also estimate the uncertainty of your guess. • Here are a few practice situations that will help you improve your powers of estimation. Remember, there is no RIGHT answer. • Estimate how many people will come to this class next week. More important than the answer is the assumptions you use for your answer. • Approximately how many cars were in the parking lot outside this building when you came in tonight? How many are there now? • What is the probability that you will be killed in a car accident? • I recently saw a lawn service truck that had printed on its side “Over 7 trillion blades cut.” Is this a reasonable claim for them to make? 6. Figures don't lie, but liars figure. How do you extrapolate from what you know to what you'd like to know?

  30. Measurement and Statistics 5. Here is a comic strip version of an approximation problem. It contains a model, and then an estimation of the required parameters in the model. 6. Figures don't lie, but liars figure. How do you extrapolate from what you know to what you'd like to know? 6. But be careful; sometimes the model doesn’t work.

  31. Measurement and Statistics • We'll do a lot more on tools later, but for right now, the best answer is to measure the simplest way possible. • Usually tools are easier to come by than environments. • Make sure the tool is less granular than the required uncertainty. 7. How do you know what tools to use?

  32. Measurement and Statistics • Some electrical signals may not be available. • The place to make a measurement is in code not under your control. • We have a very poor sense of typical/normal. We don't know what our users typically do with the machine. • The measurement may perturb the system and destroy what we wanted to know. • Available measurements may not relate to what I want to know. For instance, which disk blocks are being accessed by each of the processes on a system. 8. Is everything in a computer measurable?

  33. Measurement and Statistics • This is the hardest question of all. To know what to measure you must have a picture or model of your product. Most of the rest of this course will deal with various kinds of pictures. • Often an adequate model is a causal one: first procedure A executes; this causes hardware B to produce an effect; then interrupt code handles the hardware result; etc. • Things to keep in mind include: • Interaction between variables – do you expect a change in X to produce a change in Y? You should have a guess as to the result before you make the measurement. • Changing one variable at a time, and measuring it at 10 different values, can be extremely wasteful and time consuming. • Change only the variables that matter. If you don’t know, try changing something, just once, and see what happens • Example: You wish to design an experiment that will measure the time required to execute a program on various Intel processors.. What parameters would you need to vary to try different processors and configurations? DESIGN THE TESTS TO BE RUN. 9. How do you know what to measure?

  34. Measurement and Statistics • You should always have a guess so you can tell if your result is way off. That guess should be the result of a model/theory of how the mechanism you are measuring is working 10. Should you always know the results of a measurement before you make it?

  35. Measurement and Statistics This whole topic is something called linear regression. It says that if you can plot two variables, x and y, and there’s a simple relationship between the variables, then you can define the dependency between them. 11. How do you figure out dependencies; how does one variable depend on another ? Good SIMPLE Model. Good Complicated Model BAD Model A linear regression means that we can fit a curve of the form y = a + bx. The quality of the fit (error) can be defined as the sum of the y distances between the fitting-curve and the experimental data.

  36. Measurement and Statistics So the “best fit” is defined to be the curve that minimizes the sum of errors squared. 11. How do you figure out dependencies; how does one variable depend on another ? with the constraint that When you solve this, you can immediately determine the values of a and b from the experimental data. and and with

  37. Measurement and Statistics Let’s uses as an example the following pairs of data (14,2), (16,5), 27,7), (42,9), (39,10), (50,13), (83,20). We COULD use the equation above to determine a and b. Or, Excel can be used in the same way and gives the same results. 11. How do you figure out dependencies; how does one variable depend on another ? The equation in this case is Y = 0.036 + 0.25449 X.

  38. Measurement and Statistics (14,2), (16,5), 27,7), (42,9), (39,10), (50,13), (83,20). Also, if you know what you’re doing, you can use “Tools Data Analysis  Regression” and Excel will give you all kinds of statistics evaluating the goodness of fit of the straight line. (Note that you may need to use ToolsOptions to bring in the analysis tools.)  If the model you’re expecting isn’t a straight line, then you’ll need to do more sophisticated analysis, but the method follows in the same way as we’ve just done.

  39. Measurement and Statistics We’re going to follow through these steps and recommend that you use them in your experiments. (These are originally due to Jain.) 12. So after all this talk about the details of measurement, how do you actually design an experiment? • State Goals and Define The System • What is it you hope to accomplish? Why is it worth doing? • What is the hardware and software (the system) that you will use to achieve these goals? • 2. List Services and Outcomes • a.For the system you’ve chosen, what are the services provided. For instance, if you’re studying a disk subsystem, it can absorb data (write) or present you with data (read) or give an error. • b.By outcomes here are meant very high level statements. The outcome of a disk read is DATA. It’s not a performance or quantifiable answer expected here. • 3. Select Metrics • a.What are the criteria you want to use to compare performance? This is still not a quantifiable value, but simply what it is you will measure. This could be a speed metric, or an accuracy metric.

  40. Measurement and Statistics 4. List Parameters a.What parameters affect performance? If you’re measuring disks, then the model of disk determines it’s seek time, it’s rotational latency, etc. This is a system parameter. b.The kind of test you use, determined by the workload you use, can also define parameters. These might be requested IO’s per second, random or sequential blocks, etc. 5. Select Factors to Study a.A factor is a parameter that you vary. b.So, for the parameters you’ve just listed – all of which you COULD vary, which ones will you actually modify during the course of the experiment? 6. Select Evaluation Technique a.You could do this experiment by modeling. You would mathematically represent the system under study and modify parameters in this model. b.You could do this experiment by simulation. You would write a program that represented the system. Again you could modify parameters and look at results. c.You could do this experiment by measurement. Here you have a real system, drive it with some kind of workload, and get the results. d.In practice, in industry, only measurements are valued. It’s generally cheaper to use the real system than it is to build a mathematical or simulated system.

  41. Measurement and Statistics 7. Select Workload a.How will you drive the system under test? b.It depends on the Evaluation Technique. With a simulation you may have collected some data that you can feed into your program. c.For a measurement evaluation, you will have some kind of software that drives the system you’re testing. You will need to find a workload that tickles the parameter of interest to you. 8. Design Experiments a.What experiments will you do to collect the data you want? b.This means selecting the actual values to be used as factors. If one of your factors is the type/model of disk, then how many different disks will you use?

  42. Measurement and Statistics • 9. Make A Guess What The Result Will Be • a.Many people take a measurement and say “Oh, that must be right.” The best way to be able to make that statement is to have understood what should happen and then either get what you expected or not. • b.If you get what’s expected, then you can be confident that: • vYou understand a picture of how the system is working. • vYou did your measurements correctly. • c.If you DON’T get what’s expected, then you can be confident that: • vYou didn’t understand the system and so you need to form a new picture. • vYou did the measurement wrong – there’s some experimental error. • 10. Conduct the Measurement, Analyze and Interpret Data • a.Now actually do the measurement, simulation, or whatever you’ve designed. • b.It’s rare that you just get a number and you’re all done. • c.There is always interpretation to be done: • vWhat does the data mean? • vIs this the result I would expect? • d.There are always statistics to be done: • vIs the data valid? • vWhat is the uncertainty in the measurements?

  43. Measurement and Statistics 11. Figure Out What You Want To Talk About a.Know your audience. Are they management types (who want only an overview) or are they technical people (who want all the details.) Proper targeting is important! b.Choose from all the data you have, those pieces that are most relevant. Don’t forget to make it interesting! 12. Present Final Results c.As you know, in the real world, it’s not what you do, it’s what others think you do. d.Presentation is everything.

  44. Measurement and Statistics BONUS: There are various terms and definitions we never got around to formally defining. Here they are Definitions of Measured Data These are some basic terms to define so we have a common lingo. Independent Events Two events are independent if there’s no way that the occurrence of the first event can have anything to do with the second event. Random Variate A variable that can take on one of a particular set of values with a specified probability. Cumulative Distribution Function The CDF maps a given value to the probability that the variable has a value equal to or less than a. Probability Density FunctionThe deriviative of the CDF Gives the probability of x being in the interval (x1, x2).

  45. Measurement and Statistics Definitions of Measured Data These are some basic terms to define so we have a common lingo. Probability Mass Function The equivalent of the PDF but used for discrete variables. Mean or Expected Value Variance A measure of the deviation of the values from the mean. Standard Deviation This is another measure of the deviation of values. Represented by m, the square root of the variance.

  46. Measurement and Statistics Definitions of Measured Data Covariance Given two random variables x and y with means mx and my, their covariance is For independent variables, the covariance is 0. Correlation CoefficientAnother measure of how two variables are interdependent. MedianThat value for which there’s an equal probability of being above it and below it. ModeThe most likely value. The value with the highest probability. Normal DistributionThe most commonly used distribution. The sum of a large number of independent observations from any distribution has a normal distribution.

More Related