1 / 57

# CPE 619 The Art of Data Presentation - PowerPoint PPT Presentation

CPE 619 The Art of Data Presentation. Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama in Huntsville http://www.ece.uah.edu/~milenka http://www.ece.uah.edu/~lacasa. Overview. Types of Variables

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'CPE 619 The Art of Data Presentation ' - wynter-conrad

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### CPE 619The Art of Data Presentation

Aleksandar Milenković

The LaCASA Laboratory

Electrical and Computer Engineering Department

The University of Alabama in Huntsville

http://www.ece.uah.edu/~milenka

http://www.ece.uah.edu/~lacasa

• Types of Variables

• Guidelines for Preparing Good Charts

• Common Mistakes in Preparing Charts

• Pictorial Games

• Special Charts for Computer Performance

• Gantt Charts

• Kiviat Graphs

• Schumacher Charts

• Decision Maker’s Games

• Type of computer: Super computer, minicomputer, microcomputer

• Type of Workload: Scientific, engineering, educational

• Number of processors

• Response time of system

• 1) Require minimum effort from the reader

• Direct labeling vs. legend box

• 2) Maximize Information

• Words in place of symbols; cleary label the axes

• 3) Minimize ink

• No grid lines, more details

• 4) Use commonly accepted practices

• origin at (0,0); independent variable (cause) along x axis; the dependent variable (effect) along the y axis; linear scales; increasing scales; equal divisions

• 5) Avoid ambiguity

• Show coordinate axes, scale divisions, origin;Identify individual curves and bars

• Are both coordinate axes shown and labeled?

• Are the axes labels self-explanatory and concise?

• Are the scales and divisions shown on both axes?

• Are the minimum and maximum of the ranges shown on the axes appropriate to present maximum information

• Is the number of curves reasonably small?

• Do all graphs use the same scale?

• Is there no curve that can be removed without reducing information?

• Are the curves on a line chart individually labeled?

• Are the cells in a bar chart individually labeled?

• Are all symbols on the graph accompanied by appropriate textural explanations?

• If the curves cross, are the line patterns different to avoid confusion?

• Are the units of measurement indicated?

• Is the horizontal scale increasing from left to right?

• Is the vertical scale increasing from bottom to top?

• Are the grid lines aiding in reading the curves?

• Does this whole chart add to information available to the reader?

• Are the scales contiguous?

• Is the order of bars in a bar chart systematic?

• If the vertical axis represents a random quantity, are confidence intervals shown?

• Are there no curves, symbols, or texts on the graph that can be removed without affecting the information?

• Is there a title for the whole chart?

• Is the chart title self-explanatory and concise?

• For bar charts with unequal class interval, is the are and width representative of the frequency and interval?

• Do the variable plotted on this cart give more information that other alternatives?

• Does the chart clearly bring out the intended message?

• Is the figure referenced and discussed in the text of the report?

• Presenting too many alternatives on a single chart

• Max 5 to 7 messages => Max 6 curves in a line charts, no more than 10 bars in a bar chart, max 8 components in a pie chart

• Presenting many y variables on a single chart

• Using symbols in place of text

• Placing extraneous information on the chart

• E.g., grid lines, granularity of the grid lines

• Selecting scale ranges improperly

• Automatic selection by programs may not be appropriate

8100

8300

8200

Common Mistakes in Charts (cont’d)

• Using a line chart in place of column chart

• line => continuity

MIPS

CPU Type

• Using non-zero origins to emphasize the difference

• Three quarter high-rule => height/width > 3/4

Mine and yours are almost the same (conceal difference)

Mine is much better than yours (emphasize difference)

Height of the highest point should be at least ¾ of the horizontal offset of the rightmost point

• Using double-whammy graph for dramatization

• Using related metrics

• Plotting random quantities without showing confidence intervals

Means of two random variables

Means are not enough. Overlapping confidence intervals usually means that the two random quantities are statistically indifferent.

• Pictograms scaled by height

• Wrong scaling: Area(MINE) > 4*Area(YOURS)??

MinePerformance = 2

YoursPerformance = 1

0

8

6

4

2

0

8

6

4

Pictorial Games (cont’d)

• Using inappropriate cell size in histograms

Normal distribution

Exponential distribution

12

12

10

10

Frequency

Frequency

[0,2)

[2,4)

[4,6)

[6,8)

[8,10)

[10,12)

[0,6)

[6,12)

Response Time

Response Time

0

2

4

6

8

Pictorial Games (cont’d)

• Using broken scales in column charts

• Amplify differences

12

12

10

11

Resp.

Time

Resp.

Time

10

9

F

F

A

B

C

D

E

A

B

C

D

E

System

System

• Gantt charts

• Kiviat Graphs

• Schumacher's charts

• Shows relative duration of a number of conditions

60

CPU

20

20

IO Channel

10

30

5

15

Network

0%

20%

40%

60%

80%

100%

Utilization

CPUBusy

CPU inSupervisor State

CPU OnlyBusy

CPU/ChannelOverlap

CPU inProblem State

Channel onlyBusy

CPUWait

Any ChannelBusy

Kiviat Graphs

• Radial chart with even number of metrics

• HB and LB metrics alternate

• Ideal shape: star

CPUBusy

CPU inSupervisor State

CPU OnlyBusy

CPU inProblem State

CPU/ChannelOverlap

CPUWait

Channel onlyBusy

Any ChannelBusy

Kiviat Graph for a Balanced System

• Problem: Inter-related metrics

• CPU busy = problem state + Supervisor state

• CPU wait = 100 – CPU busy

• Channel only – any channel –CPU/channel overlap

• CPU only = CPU busy – CPU/channel overlap

CPU Keel boat

I/O Wedge

I/O Arrow

CPU bound system

I/O bound system

CPU- and I/O bound system

Merrill’s Figure of Merit (FoM)

• Performance = {x1, x2, x3, …, x2n}Odd values are HB and even values are LB

• x2n+1 is the same as x1

• Average FOM = 50%

• System A:

• System B:System B has a higher figure of merit and it is better.

• All axes are considered equal

• Extreme values are assumed to be better

• Utility is not a linear function of FoM

• Two systems with the same FoM are not equally good

• System with slightly lower FoM may be better

• Use Kiviat graphs for networks

ApplicationThroughput

Packets

With Error

Implicit Acknowledgements

Duplicate Packets

• Performance matrix are plotted in a tabular manner

• Values are normalized with respect to long term means and standard deviations

• Any observations that are beyond mean  one standard deviation need to be explained

• See Figure 10.25 in the book

Metrics

Configuration

Details

• This needs more analysis.

• You need a better understanding of the workload.

• It improves performance only for long IOs/packets/jobs/files, and most of the IOs/packets/jobs/files are short.

• It improves performance only for short IOs/packets/jobs/files, but who cares for the performance of short IOs/packets/jobs/files, its the long ones that impact the system.

• It needs too much memory/CPU/bandwidth and memory/CPU/bandwidth isn't free.

• It only saves us memory/CPU/bandwidth and memory/CPU/bandwidth is cheap.

See Box 10.2 on page 162 of the book for a complete list

• Qualitative/quantitative, ordered/unordered, discrete/continuous variables

• Good charts should require minimum effort from the reader and provide maximum information with minimum ink

• Use no more than 5-6 curves, select ranges properly, Three-quarter high rule

• Gantt Charts show utilizations of various components

• Kiviat Graphs show HB and LB metrics alternatively on a circular graph

• Schumacher Charts show mean and standard deviations

• Workload, metrics, configuration, and details can always be challenged. Should be carefully selected.

What type of chart (line or bar) would you use to plot:

• CPU usage for 12 months of the year

• CPU usage as a function of time in months

• Number of I/O's to three disk drives: A, B, and C

• Number of I/O's as a function of number of disk drives in a system

• List the problems with the following charts

• On a system consisting of 3 resources, called A, B, and C. The measured utilizations are shown in the following table. A zero in a column indicates that the resource is not utilized. Draw a Gantt chart showing utilization profiles.

• The measured values of the eight performance metrics listed in Example 10.2 for a system are: 70%, 10%, 60%, 20%, 80%, 30%, 50%, and 20%. Draw the Kiviat graph and compute its figure of merit.

• For a computer system of your choice, list a number of HB and LB metrics and draw a typical Kiviat graph using data values of your choice.

### Ratio Games

• Ratio Game Examples

• Using an Appropriate Ratio Metric

• Using Relative Performance Enhancement

• Ratio Games with Percentages

• Ratio Games Guidelines

• Numerical Conditions for Ratio Games

• Conclusion: 6502 is worse. It takes 4.7% more time than 8080.

1. Ratio of Totals

3. 8080 as the base:

2. 6502 as the base:

• Ratio of Totals: 6502 is worse. It takes 4.7% more time than 8080.

• With 6502 as a base: 6502 is better. It takes 1% less time than 8080.

• With 8080 as a base: 6502 is worse. It takes 6% more time.

• Conclusion: RISC-I has the largest code size. The second processor Z8002 requires 9% less code than RISC-I.

• Conclusion: Z8002 has the largest code size and that it takes 18% more code than RISC-I. [Peterson and Sequin 1982]

8.00

11.00

13.00

10.50

8.50

Example:

• Throughput: A is better

• Response Time: A is worse

• Power: A is better

• Example: Two floating point accelerators

• Problem: Incomparable bases. Need to try both on the same machine

• Example: Tests on two systems

1. System B is better on both systems

2. System A is better overall.

System A:

System B:

• Other Misuses of Percentages:

• 1000% sounds more impressive than 11-time. Particularly if the performance before and after the improvement are both small

• Small sample sizes disguised in percentages

• Base = Initial. 400% reduction in prices  Base = Final

• If one system is better on all benchmarks, contradicting conclusions can not be drawn by any ratio game technique

• Even if one system is better than the other on all benchmarks, a better relative performance can be shown by selecting appropriate base.

• In the previous example, System A is 40% better than System B using raw data, 43% better using system A as a base, and 42% better using System B as a base.

• If a system is better on some benchmarks and worse on others, contracting conclusions can be drawn in some cases. Not in all cases.

• If the performance metric is an LB metric, it is better to use your system as the base

• If the performance metric is an HB metric, it is better to use your opponent as the base

• Those benchmarks that perform better on your system should be elongated and those that perform worse should be shortened

• Raw Data

• A is better than B iff

• With A as the Base

• A is better than B iff

• With B as the base

• A is better than B iff

2

B is betterusing all 3

Ratio of B/A response on benchmark j

1

A isbetterusing all 3

Base B

Raw Data

Base A

0

1

1

1

2

3

Ratio of B/A response on benchmark i

• Ratio games arise from use of incomparable bases

• Ratios may be part of the metric

• Relative performance enhancements

• Percentages are ratios

• For HB metrics, it is better to use opponent as the base

• The following table shows execution times of three benchmarks I, J, and K on three systems A, B, and C. Use ratio game techniques to show the superiority of various systems.

• Derive conditions necessary for you to be able to use the technique of combined percentages to your advantage.