data quality management n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Data Quality Management PowerPoint Presentation
Download Presentation
Data Quality Management

Loading in 2 Seconds...

play fullscreen
1 / 41

Data Quality Management - PowerPoint PPT Presentation


  • 96 Views
  • Uploaded on

Data Quality Management. Geospatial errors can cause real-life problems!. http://www.brownsmarina.com/fun.html. One management strategy …. Murphy’s Law. Ignoring data quality issues usually doesn’t work very well. Some geospatial goofs. This one’s worse….

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Data Quality Management' - carnig


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
data quality management

Data Quality Management

Geospatial errors can cause real-life problems!

http://www.brownsmarina.com/fun.html

CS 128/ES 228 - Lecture 14a

one management strategy
One management strategy …

CS 128/ES 228 - Lecture 14a

murphy s law
Murphy’s Law

Ignoring data quality issues usually doesn’t work very well

CS 128/ES 228 - Lecture 14a

some geospatial goofs
Some geospatial goofs

CS 128/ES 228 - Lecture 14a

this one s worse
This one’s worse…

Mars Climate Orbiter (MCO) was lost on 23 Sep 1999 when it failed to enter an orbit around Mars, instead crashing into the planet, destroying the $125 million craft, part of a $328 million mission

http://www.boeing.com/companyoffices/gallery/images/space/d2_mars_climate_orbiter_01.htm

The root cause of the failure was a computer program that was supposed to provide its output in newton seconds (N·s) but instead provided pound-force seconds (lbf·s).

http://lamar.colostate.edu/~hillger/unit-mixups.html#mco

CS 128/ES 228 - Lecture 14a

and these are really bad
And these are really bad!

Just a 'map error'?

The China Daily website carries a cartoon of the damaged US plane at Hainan Island's airbase and asks sarcastically if Sunday's collision "might be due to another map error“ - a reference to the US bombing of the Chinese embassy in Belgrade in 1999. "Last time it's due to a map error, and this time another map error? What about the next?”

http://news.bbc.co.uk/1/hi/world/monitoring/media_reports/1260185.stm

CS 128/ES 228 - Lecture 14a

what is error
What is error?
  • “Error is the physical difference between the real world and the GIS facsimile”

-Heywood, Cornelius, & Carver, p. 301

  • Errors are impossible to avoid, but can be managed (chapt. 10)

CS 128/ES 228 - Lecture 14a

a data management model
A Data Management Model

Data acquisition

Data representation & analysis

Data outputs

CS 128/ES 228 - Lecture 14a

data acquisition errors
Data acquisition errors

Scientists use the term “error” for two very different concepts:

  • natural variability
  • actual mistakes

CS 128/ES 228 - Lecture 14a

take a sidewalk
Take a sidewalk …

Suppose its width is about 1 ¾ m. What happens if we measure the width at 3 points?

  • “Error” (natural variability):measured widths= 1.77, 1.69, 1.82 m
  • “Error” (actual mistake): mean of the 3 measurements = 1.67 ft

How can you tell this average is wrong?

CS 128/ES 228 - Lecture 14a

accuracy vs precision
Accuracy vs. Precision

Figure 10.1, An Introduction to Geographic Information Systems by Heywood, Cornelius, and Carver

CS 128/ES 228 - Lecture 14a

random error vs bias
Random error vs. Bias

CS 128/ES 228 - Lecture 14a

where does lack of precision come from
Where does lack ofprecision come from?
  • Natural variability
  • Imprecise equipment
  • Sloppy measurement
  • Accumulated error

CS 128/ES 228 - Lecture 14a

random error is often normal
Random error is often “normal”

mean

Standard

deviation

CS 128/ES 228 - Lecture 14a

95 of observations 2 s d
95% of observations ±2 s.d.

mean

Mean + 2 s.d.

Mean + 2 s.d.

CS 128/ES 228 - Lecture 14a

means have smaller variability than single measurements
Means have smaller variability than single measurements

S. E. (mean) = standard deviation √n

If n = 4 √n = ?

What n is needed to reduce the S. E. to ¼ of the std. deviation?

CS 128/ES 228 - Lecture 14a

where does lack of accuracy come from
Where does lack of accuracy come from?
  • Dubious source data
  • Incompatible source data Data collected at different times through different methods, possibly in different formats (such as datums, coordinate systems)
  • Bias (instrumental or operator)

CS 128/ES 228 - Lecture 14a

how can we fix it
How can we fix it?
  • Benchmarks

ex. National Geodetic Survey maintains a database of survey “monuments” at http://www.ngs.noaa.gov/

cgi-bin/datasheet.prl

  • Otherwise – just measure variability

http://upload.wikimedia.org/wikipedia/commons/thumb/6/66/USCGS-E134.jpg/617px-USCGS-E134.jpg

CS 128/ES 228 - Lecture 14a

data representation errors
Data representation errors
  • Transference error
  • Data storage errors
  • Analysis errors

CS 128/ES 228 - Lecture 14a

where does transference error come from
Where does transference error come from?
  • Typos, etc.
    • Less likely with automated data collection and transformation
    • Can be prevented through diligence and software “sanity” checks
  • Format conversion
    • Many inter-format conversions cause loss/corruption of data/information

CS 128/ES 228 - Lecture 14a

data checking
Data checking

CS 128/ES 228 - Lecture 14a

something got lost in the translation
Something got lost in the translation
  • “geographic information systems is an interesting course”
  • “지리적인 정보 시스템은 재미있는 과정 이다 ”
  • “The geography information system is the process which is fun”

Thanks to http://babelfish.altavista.com/babelfish/tr

CS 128/ES 228 - Lecture 14a

raster vector conversions
Raster Vector conversions

Aliasing is an intrinsic problem of GIS’s

CS 128/ES 228 - Lecture 14a

digitization errors
Digitization errors

Georeferencing

Raster to vector conversion

CS 128/ES 228 - Lecture 14a

topology errors
Topology errors

Figure 10.5, An Introduction to Geographic Information Systems by Heywood, Cornelius, and Carver

CS 128/ES 228 - Lecture 14a

data storage retrieval errors
Data storage/retrieval errors

Hardware failure

Hardware Limitations

CS 128/ES 228 - Lecture 14a

what is a hardware limitation
What is a hardware limitation?
  • Numbers in a computer are stored in a finite number of bits.
  • Using too few bits can cause round-off error.

Box 9.2, Principles of Geographic Information Systems by Burrough and McDonnell

CS 128/ES 228 - Lecture 14a

where do errors of data rot come from
Where do errors of data rot come from?
  • Link rot

Not Found

The requested URL /cs/dlevine/ was not found on this server.

Apache/1.3.27 Server at www.xxx.edu Port 80

  • Poor “style”
    • E.g. “Employees may appeal to Sr. Carney” as opposed to “Employees may appeal to the President of the University”

CS 128/ES 228 - Lecture 14a

where do errors of analysis come from
Where do errors of analysis come from?

How long do you have? …

  • Mistaken queries
  • Analyzing layers with different datums or coordinate systems
  • Comparing attributes with incompatible units

CS 128/ES 228 - Lecture 14a

more errors of analysis
More errors of analysis …
  • Inappropriate resolution
  • Combining rasters/vectors with different resolutions
  • Using exact/abrupt surface fits when approx./gradual is appropriate (or vice versa)

CS 128/ES 228 - Lecture 14a

data output errors
Data output errors
  • Maps
  • Reports

CS 128/ES 228 - Lecture 14a

junket at taxpayers expense
Junket at taxpayers’ expense?

Did a politician misuse federal funds to visit Alaska on the way to official business in Japan?

Muekrcke. Map Use, 2nd ed. p. 395

CS 128/ES 228 - Lecture 14a

no intentional map error
No - Intentional map error*

*More like lying with maps!

Muekrcke. Map Use, 2nd ed. p. 395

CS 128/ES 228 - Lecture 14a

should maps be as accurate as possible
Should maps be as accurate as possible?
  • Map simplification
    • Features are omitted
    • Area features become lines or points
  • Exaggeration
    • Features’ apparent size is “increased” (e.g. hydrants)
    • Features’ separation is increased on the map for visibility

Must Mapquest be accurate?

CS 128/ES 228 - Lecture 14a

reporting significance of findings
Reporting significance of findings
  • What does the term “significant” mean to scientists?
  • Hypothesis testing

CS 128/ES 228 - Lecture 14a

are two means really different
Are two means really different?

These two normal distributions have a very large overlap. The means of the two populations are notsignificantly different, because the overlap is > 5% of the area under the curves. t would be very small.

http://www.steve.gb.com/science/statistics.html#t

CS 128/ES 228 - Lecture 14a

what about these two means
What about these two means?

http://www.steve.gb.com/science/statistics.html#t

CS 128/ES 228 - Lecture 14a

these means are also significantly different why
These means are also significantly different - why?

http://www.steve.gb.com/science/statistics.html#t

CS 128/ES 228 - Lecture 14a

how do we actually test for statistical differences
How do we actually test for statistical differences?

Student’s t-test

t = difference in means

measure of variability

CS 128/ES 228 - Lecture 14a

three commandments of data reporting
Three Commandments of Data Reporting
  • Thou Shalt Not …
  • Report insignificant digits(or omit significant trailing zeros)
  • Report means without also reporting sample sizes and variability
  • Report results as “significant” (or even worth talking about) without doing the appropriate statistical tests.

CS 128/ES 228 - Lecture 14a

how do we minimize not avoid error
How do we minimize (NOT avoid) error?

“CONSTANT VIGILANCE”

http://news.bbc.co.uk/1/shared/spl/hi/pop_ups/05/entertainment_goblet_of_fire/html/3.stm

-- “Mad Eye” Moody

Defense Against The Dark Arts Instructor

Hogwarts School of Witchcraft and Wizardry

CS 128/ES 228 - Lecture 14a