- 39 Views
- Uploaded on
- Presentation posted in: General

CPE 619 The Art of Data Presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

CPE 619The Art of Data Presentation

Aleksandar MilenkoviÄ‡

The LaCASA Laboratory

Electrical and Computer Engineering Department

The University of Alabama in Huntsville

http://www.ece.uah.edu/~milenka

http://www.ece.uah.edu/~lacasa

- Types of Variables
- Guidelines for Preparing Good Charts
- Common Mistakes in Preparing Charts
- Pictorial Games
- Special Charts for Computer Performance
- Gantt Charts
- Kiviat Graphs
- Schumacher Charts

- Decision Makerâ€™s Games

- Type of computer: Super computer, minicomputer, microcomputer
- Type of Workload: Scientific, engineering, educational
- Number of processors
- Response time of system

- 1) Require minimum effort from the reader
- Direct labeling vs. legend box

- 2) Maximize Information
- Words in place of symbols; cleary label the axes

- 3) Minimize ink
- No grid lines, more details

- 4) Use commonly accepted practices
- origin at (0,0); independent variable (cause) along x axis; the dependent variable (effect) along the y axis; linear scales; increasing scales; equal divisions

- 5) Avoid ambiguity
- Show coordinate axes, scale divisions, origin;Identify individual curves and bars

- Are both coordinate axes shown and labeled?
- Are the axes labels self-explanatory and concise?
- Are the scales and divisions shown on both axes?
- Are the minimum and maximum of the ranges shown on the axes appropriate to present maximum information
- Is the number of curves reasonably small?
- Do all graphs use the same scale?
- Is there no curve that can be removed without reducing information?
- Are the curves on a line chart individually labeled?
- Are the cells in a bar chart individually labeled?
- Are all symbols on the graph accompanied by appropriate textural explanations?
- If the curves cross, are the line patterns different to avoid confusion?
- Are the units of measurement indicated?
- Is the horizontal scale increasing from left to right?
- Is the vertical scale increasing from bottom to top?
- Are the grid lines aiding in reading the curves?
- Does this whole chart add to information available to the reader?
- Are the scales contiguous?
- Is the order of bars in a bar chart systematic?
- If the vertical axis represents a random quantity, are confidence intervals shown?
- Are there no curves, symbols, or texts on the graph that can be removed without affecting the information?
- Is there a title for the whole chart?
- Is the chart title self-explanatory and concise?
- For bar charts with unequal class interval, is the are and width representative of the frequency and interval?
- Do the variable plotted on this cart give more information that other alternatives?
- Does the chart clearly bring out the intended message?
- Is the figure referenced and discussed in the text of the report?

- Presenting too many alternatives on a single chart
- Max 5 to 7 messages => Max 6 curves in a line charts, no more than 10 bars in a bar chart, max 8 components in a pie chart

- Presenting many y variables on a single chart

- Using symbols in place of text
- Placing extraneous information on the chart
- E.g., grid lines, granularity of the grid lines

- Selecting scale ranges improperly
- Automatic selection by programs may not be appropriate

8000

8100

8300

8200

- Using a line chart in place of column chart
- line => continuity

MIPS

CPU Type

- Using non-zero origins to emphasize the difference
- Three quarter high-rule => height/width > 3/4

Mine and yours are almost the same (conceal difference)

Mine is much better than yours (emphasize difference)

Height of the highest point should be at least Â¾ of the horizontal offset of the rightmost point

- Using double-whammy graph for dramatization
- Using related metrics

- Plotting random quantities without showing confidence intervals

Means of two random variables

Means are not enough. Overlapping confidence intervals usually means that the two random quantities are statistically indifferent.

- Pictograms scaled by height
- Wrong scaling: Area(MINE) > 4*Area(YOURS)??

MinePerformance = 2

YoursPerformance = 1

2

0

8

6

4

2

0

8

6

4

- Using inappropriate cell size in histograms

Normal distribution

Exponential distribution

12

12

10

10

Frequency

Frequency

[0,2)

[2,4)

[4,6)

[6,8)

[8,10)

[10,12)

[0,6)

[6,12)

Response Time

Response Time

0

0

2

4

6

8

- Using broken scales in column charts
- Amplify differences

12

12

10

11

Resp.

Time

Resp.

Time

10

9

F

F

A

B

C

D

E

A

B

C

D

E

System

System

- Gantt charts
- Kiviat Graphs
- Schumacher's charts

- Shows relative duration of a number of conditions

60

CPU

20

20

IO Channel

10

30

5

15

Network

0%

20%

40%

60%

80%

100%

Utilization

CPUBusy

CPU inSupervisor State

CPU OnlyBusy

CPU/ChannelOverlap

CPU inProblem State

Channel onlyBusy

CPUWait

Any ChannelBusy

- Radial chart with even number of metrics
- HB and LB metrics alternate
- Ideal shape: star

CPUBusy

CPU inSupervisor State

CPU OnlyBusy

CPU inProblem State

CPU/ChannelOverlap

CPUWait

Channel onlyBusy

Any ChannelBusy

- Problem: Inter-related metrics
- CPU busy = problem state + Supervisor state
- CPU wait = 100 â€“ CPU busy
- Channel only â€“ any channel â€“CPU/channel overlap
- CPU only = CPU busy â€“ CPU/channel overlap

CPU Keel boat

I/O Wedge

I/O Arrow

CPU bound system

I/O bound system

CPU- and I/O bound system

- Performance = {x1, x2, x3, â€¦, x2n}Odd values are HB and even values are LB
- x2n+1 is the same as x1
- Average FOM = 50%

- System A:

- System B:System B has a higher figure of merit and it is better.

- All axes are considered equal
- Extreme values are assumed to be better
- Utility is not a linear function of FoM
- Two systems with the same FoM are not equally good
- System with slightly lower FoM may be better

- Use Kiviat graphs for networks

ApplicationThroughput

LinkOverhead

Packets

With Error

Implicit Acknowledgements

LinkUtilization

Duplicate Packets

- Performance matrix are plotted in a tabular manner
- Values are normalized with respect to long term means and standard deviations
- Any observations that are beyond mean ï‚± one standard deviation need to be explained
- See Figure 10.25 in the book

Workload

Metrics

Configuration

Details

- This needs more analysis.
- You need a better understanding of the workload.
- It improves performance only for long IOs/packets/jobs/files, and most of the IOs/packets/jobs/files are short.
- It improves performance only for short IOs/packets/jobs/files, but who cares for the performance of short IOs/packets/jobs/files, its the long ones that impact the system.
- It needs too much memory/CPU/bandwidth and memory/CPU/bandwidth isn't free.
- It only saves us memory/CPU/bandwidth and memory/CPU/bandwidth is cheap.
See Box 10.2 on page 162 of the book for a complete list

- Qualitative/quantitative, ordered/unordered, discrete/continuous variables
- Good charts should require minimum effort from the reader and provide maximum information with minimum ink
- Use no more than 5-6 curves, select ranges properly, Three-quarter high rule
- Gantt Charts show utilizations of various components
- Kiviat Graphs show HB and LB metrics alternatively on a circular graph
- Schumacher Charts show mean and standard deviations
- Workload, metrics, configuration, and details can always be challenged. Should be carefully selected.

What type of chart (line or bar) would you use to plot:

- CPU usage for 12 months of the year
- CPU usage as a function of time in months
- Number of I/O's to three disk drives: A, B, and C
- Number of I/O's as a function of number of disk drives in a system

- List the problems with the following charts

- On a system consisting of 3 resources, called A, B, and C. The measured utilizations are shown in the following table. A zero in a column indicates that the resource is not utilized. Draw a Gantt chart showing utilization profiles.

- The measured values of the eight performance metrics listed in Example 10.2 for a system are: 70%, 10%, 60%, 20%, 80%, 30%, 50%, and 20%. Draw the Kiviat graph and compute its figure of merit.

- For a computer system of your choice, list a number of HB and LB metrics and draw a typical Kiviat graph using data values of your choice.

Ratio Games

- Ratio Game Examples
- Using an Appropriate Ratio Metric
- Using Relative Performance Enhancement
- Ratio Games with Percentages
- Ratio Games Guidelines
- Numerical Conditions for Ratio Games

- Conclusion: 6502 is worse. It takes 4.7% more time than 8080.

1. Ratio of Totals

3. 8080 as the base:

2. 6502 as the base:

- Ratio of Totals: 6502 is worse. It takes 4.7% more time than 8080.
- With 6502 as a base: 6502 is better. It takes 1% less time than 8080.
- With 8080 as a base: 6502 is worse. It takes 6% more time.

- Conclusion: RISC-I has the largest code size. The second processor Z8002 requires 9% less code than RISC-I.

- Conclusion: Z8002 has the largest code size and that it takes 18% more code than RISC-I. [Peterson and Sequin 1982]

8.00

11.00

13.00

10.50

8.50

Example:

- Throughput: A is better
- Response Time: A is worse
- Power: A is better

- Example: Two floating point accelerators
- Problem: Incomparable bases. Need to try both on the same machine

- Example: Tests on two systems
1. System B is better on both systems

2. System A is better overall.

System A:

System B:

- Other Misuses of Percentages:
- 1000% sounds more impressive than 11-time. Particularly if the performance before and after the improvement are both small
- Small sample sizes disguised in percentages
- Base = Initial. 400% reduction in prices ïƒž Base = Final

- If one system is better on all benchmarks, contradicting conclusions can not be drawn by any ratio game technique

- Even if one system is better than the other on all benchmarks, a better relative performance can be shown by selecting appropriate base.
- In the previous example, System A is 40% better than System B using raw data, 43% better using system A as a base, and 42% better using System B as a base.

- If a system is better on some benchmarks and worse on others, contracting conclusions can be drawn in some cases. Not in all cases.
- If the performance metric is an LB metric, it is better to use your system as the base
- If the performance metric is an HB metric, it is better to use your opponent as the base
- Those benchmarks that perform better on your system should be elongated and those that perform worse should be shortened

- Raw Data

- A is better than B iff

- With A as the Base

- A is better than B iff

- With B as the base

- A is better than B iff

2

B is betterusing all 3

Ratio of B/A response on benchmark j

1

A isbetterusing all 3

Base B

Raw Data

Base A

0

1

1

1

2

3

Ratio of B/A response on benchmark i

- Ratio games arise from use of incomparable bases
- Ratios may be part of the metric
- Relative performance enhancements
- Percentages are ratios
- For HB metrics, it is better to use opponent as the base

- The following table shows execution times of three benchmarks I, J, and K on three systems A, B, and C. Use ratio game techniques to show the superiority of various systems.

- Derive conditions necessary for you to be able to use the technique of combined percentages to your advantage.

- Read chapter 10&11