introduction to dimensional analysis session 2 n.
Download
Skip this Video
Download Presentation
Introduction to Dimensional Analysis Session 2

Loading in 2 Seconds...

play fullscreen
1 / 40

Introduction to Dimensional Analysis Session 2 - PowerPoint PPT Presentation


  • 128 Views
  • Uploaded on

M. D. Metadata Solutions. Introduction to Dimensional Analysis Session 2. 5/20/2005. Dan McCreary President Dan McCreary & Associates dan@danmccreary.com (952) 931-9198. Agenda. General introduction to Data Dictionaries that drive Business Intelligence (BI) concepts and terminology

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Introduction to Dimensional Analysis Session 2' - carsten


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
introduction to dimensional analysis session 2

M

D

Metadata Solutions

Introduction to Dimensional AnalysisSession 2

5/20/2005

Dan McCreary

President

Dan McCreary & Associates

dan@danmccreary.com

(952) 931-9198

agenda
Agenda
  • General introduction to Data Dictionaries that drive Business Intelligence (BI) concepts and terminology
  • Understand why Data Dictionaries are so critical in accurate BI
  • Understand how BI looks at the world in different ways
  • Understand how data warehouse concepts and data dictionary impact analysis and research
what is a data warehouse
What is a Data Warehouse?
  • Fast Retrieval
  • Internally Consistent
  • Slice and Dice Capability
  • Easy to “Browse”
  • Complete and Reliable
  • Data Quality Controls
    • GI-GO (Garbage-In, Garbage-Out)

Source: Ralph Kimball

factors driving business intelligence
Factors Driving Business Intelligence
  • Computer process and store twice as much data per dollar every 18 months (Moore’s Law)
  • People can make better decisions if they have tools to quickly see only the data they are interested in seeing
  • People frequently want to analyze data in new ways that was unanticipated by people creating "canned reports"
  • Tools can be designed to allow non-technical (non-SQL programmers) to generate their own reports
  • People have an incredible ability to categorize things base on their properties and attributes but if they don't have consistent definitions of these properties they will not be generating consistent results
the bi iterative process
The BI Iterative Process

Access

Data Warehouse

  • The BI process in an on-going iterative process where the structure of the data warehouse changes based on what data is critical to an organizations business objectives.

Analysis

BI ProjectManagement

Publishing, Change, DataGap Analysis, New DataGathered

Insights, Conclusions andFindings

bi evolution
BI Evolution
  • Shorten the time-to-report interval
  • Allow users to "browse" data sets interactively
  • Remove programmers with "backlogs" of reports
  • Users frequently waited days, weeks for months to get a custom report created

Increasing Responsiveness

Monthly Green Bar Reports

BrowseableGraphical Interface

dimensions of bi
Dimensions of BI

Technical SophisticationRequired

Low

(analysts)

Highly Responsive to "What If" Scenarios

Degree ofEnd User Control

High

(programmers)

Few Dimensions

few parameters, few filters

Many Dimensions

many variables

overlapping terminology
Overlapping Terminology

Data Mining

Statistical Analysis

Data Dictionaries

Data Modeling

Semantics

PatternDiscovery

Business

Intelligence

Data

Warehousing

Transaction

Processing

(OLTP)

Indexing

Dimensional

Analysis

Aggregates

Data Storage

(RDBMS)

key terms covered in this class
Key Terms Covered in This Class
  • Properties
  • Dimension
  • Aggregation and Levels
  • Enumerations of Categorical Data
  • Labeling Categories
  • Giving precise definitions to Labels
  • Dimension Hierarchies and Levels
  • Cubes
  • Measures
  • Filters
  • Data Warehouse Presentation
things have many properties
Things Have Many "Properties"

People are very good at recognizing and sorting things by their properties.

sorting by an object s property
Sorting by an Object's Property
  • Sort objects by their color
sorting by a property
Sorting by A Property
  • Sort objects by their shape
sorting by color and shape
Sorting by Color AND Shape

Shape “Dimension”

Color “Dimension”

dimensional analysis
Dimensional Analysis
  • The science of figuring out intuitive ways that people want to categorize information using independent variables to graphically filter and browse their data
dimension
Dimension
  • List of categories used to partition the information based on a property of the objects
  • Dimension Names: Color, Shape
labels
Labels
  • A name given to a non-overlapping category within a dimensions

Labels

"red"

"blue"

"green"

enumeration
Enumeration
  • Whenever we decide to break the continuous observable world into a predefined list of categories when each category has a label we call this an "enumerated value domain". These will then become the "dimensions" of our cube.

"green"

"red"

"blue"

Note: NO OVERLAP!

Statisticians call this type of "categorical data" and it requires the categories to be non-overlapping.

the challenge of semantic classification
The Challenge of Semantic Classification
  • People are good at sorting based on a property they see
  • People are good at assigning names to a property type
  • People usually come up with different names for properties
  • Some dimensions people easily agree on
  • Some are very difficult to classifyand even more difficult go get peopleto agree on a non-overlapping classification system

"Red Circle"

"Polygon"

"Square"

"Blue"

"Green"

"Blue-Green"

What happens with a small percentage of data does not quite fit into a discrete category?

level
Level
  • A layer of "aggregation" within a single dimension – categorization of properties

All Shapes

Shapes

With Curves

Shapes

Without Curves

Levels

Circle

Heart

Moon

Square

Trapezoid

Star

Diamond

measures example weight
Measures (example weight)

5.7

7.4

3.5

6.6

10

1.1

8.2

6.1

5.5

3.8

9.1

9.3

2.6

8.4

A measure is any property that you can perform math on (sums, averages).

measures
Measures
  • Something that you can do math on.

-

X

+

%

/

sum

average

sample object fact table
Sample Object "Fact Table"

Measures tend to have

data types of integers and

floating point numbers.

Note that categorical data can not beadded together. But we can count thefrequencies of items with a category!

shape dimension
Shape Dimension

Note that there is no reference to "Has Curves" in the prior table. "HasCurves" is a property of the shape value domain because it can be "inferred"from the shape of the object.

Some categorical definitions use "exclusionary" language.Note that "Has Curves" also must have a precise definition in the data dictionary.

facts and dimension
Facts and Dimension

Shape Facts

Color Dim

Color_FK

Color Name

Shape_FK

Shape Dim

Weight

Has Curves

Shape Name

Note that "Has curves" does not need to be in the central fact table.It is a property of the shape!

adding dimensions
Adding Dimensions

5.7

7.4

3.5

6.6

10

1.1

8.2

6.1

5.5

3.8

9.1

9.3

2.6

8.4

We have now added a 3rd dimension – "Dash Style"

each new property is another dimension
Each New Property is Another Dimension

Shape Facts

Color Dim

Color_FK

Shape Dim

Color Code

Shape_FK

Has Curves

DashStyle_FK

Shape Code

Weight

DashStype Dim

Shape

filters
Filters

5.7

7.4

3.5

6.6

10

1.1

8.2

6.1

5.5

3.8

9.1

9.3

2.6

8.4

A filter will exclude all objects with a specified property.

For example we can exclude all shapes with a property of "Circle"

example discarding invalid scores
Example: Discarding Invalid Scores

This example filter removes all scores EXCEPT the valid scores.

selecting only scale scores
Selecting Only Scale Scores

This filter removes all scores EXCEPT the assessments Scale Scoreusing the Test Score Type dimension.

the star schema

Dim1

Dim2

Dim4

Dim3

Dim5

PK

PK

PK

PK

PK

Cat1

Cat1

Cat1

Cat1

Cat1

Facts

Primary Key

Cat2

Cat2

Cat2

Cat2

Cat2

Foreign Key

Cat3

Cat3

Cat3

Cat3

Cat3

Foreign Key

Foreign Key

Foreign Key

Foreign Key

Measure1

Measure1

The Star Schema
adding measures
Adding Measures

Shape Facts

Color Dim

Color_FK

ColorCode

Shape_FK

Shape Dim

DashStyle_FK

ShapeCode

WeightValue

DashStype Dim

Measures are

Integers or floats

that you can perform

math on.

HeightValue

ShapeCode

PriceAmount

DensityValue

Measures can be easily be added to the fact table without changing any of the dimensions.

slide32

5.7

7.4

3.5

6.6

10

1.1

8.2

6.1

5.5

3.8

9.1

9.3

2.6

8.4

Cube
  • A Cube is a pre-built structure that has facts and many dimensions (not necessarily just three)
  • Designed to have averages and sums for most levels "pre-calculated" to make analysis fast

Color Dimension

Dash-Style Dimension

Shape Dimension

build a mental model

5.7

7.4

3.5

6.6

10

1.1

8.2

6.1

5.5

3.8

9.1

9.3

2.6

8.4

Build a Mental Model

(aka "Page Fields")

Filter

Funnel

Measure = count

Vertical Dimension (rows)

Presentation

Horizontal Dimension (columns)

using cubes in excel
Using Cubes in Excel

Filter Dropped Here

Measures Dropped Here

Row and Column Dropped Here

count of year vs assessment name
Count of Year vs. Assessment Name

The measure is the count of records in the cube.

The Column Dimension is the "Fiscal Year"

There are around

25 million

test

results

The Row Dimension is the "Test Name".

conformed dimensions
Conformed Dimensions
  • When building many cubes, there is a large benefit to "reusing" dimensions
  • Commonly reused dimensions
    • Time (Fiscal Year, Quarter)
    • Organization (School, District)
    • Expense Category
    • Student
slide37

Each bar represents the

sum of all the expendituresin the category (Expenditures

On girls athletics for the fiscalyear 1991)

sample of national conformed dimensions
Sample of National Conformed Dimensions

School Incident Data

School Technology

Assessment

Organization

Financial

Student

Teacher

Claims

Process

Date

Student Attendance

Student Assessment

District Financial Reporting

School and District Status

Teacher Licensing

School Food and Nutrition

Student Disciplinary Reporting

Student Safety Reporting

District Technology Planning

role of data architecture
Role of Data Architecture
  • Facilitate how business users want to identify and categorize data
  • Assist in the creation and documentation of categorical value domains and measures
  • Creation of machine-readable data dictionaries for use in building data warehouse structures
summary
Summary
  • We found a way for non-SQL programmers to analyze complex data by looking at one dimension at a time
  • Users don't have to memorize "codes"
  • Users do need to understand how continuous data is mapped into categories and what the labels on these categories mean