GG 313 Fall, 2005. Text: Intro. To Geological Data Analysis, Swan and Snadilands ( OPTIONAL – Bookstore says it ’ s out of print! ) Class Notes: PDF file: Go to: http://www.higp.hawaii.edu/~cecily/courses/gg313.html Download Lecture supplements: 1. Lecture Notes PDF
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
(OPTIONAL – Bookstore says it’s out of print!)
Class Notes: PDF file: Go to:
Download Lecture supplements: 1. Lecture Notes PDF
This site also includes many files that will be used
extensively in the course
Web Page: Not yet
20% mid-term exam
20% final exam
Late homework will suffer!
Need access to computer room? Need an account?
See Susan in the Geology Office POST 801
Modifications by Cecily Wolfe and Rob Dunn.
Many thanks for their help and access to their notes
• Geosciences are very data intensive
• Jobs require quantitative skills
• Need to test hypotheses
• Need to defend arguments
You need to take this course if
• You plan to do any data analysis
• You want to understand the analyses of others
• You want to be prepared for a career in science
be intimidated - ASK QUESTIONS ANY TIME
be surprised if I can’t answer your question -
I’m learning, too.
To examine and study in detail to determine properties and essential features.
Why do we analyze data?
To pass information to others, to simplify and compress information into useful quantities, to make your experiment understandable in a broader context. Extraction of “signals” from “noise”. We know what information we want to pass on – analysis allows us to do it efficiently.
No – some data are clear enough in context to be understandable without analysis.
With difficulty. Be aware of problems and subterfuge in analysis.
BIAS is rampant in science – many times unconscious. Bias towards a particular hypothesis can warp thinking.
Hypothesis testing should always be aimed at DISCREDITING hypotheses, not proving them.
• Earthquake recurrence data
• Ash thickness
• Rock chemistry
• Sample age
• Seismic noise level
from your research?
Discrete data have distinct outcomes - like rolling dice, counts of items or groups.
“Ordinal” data are ranked but without a constant interval, as in the Moh’s hardness scale.
Arbitrary measure, such as color or lithology.
Continuous data have a large range of values without breaks, with an infinite number of possible values in a given range.
• Voltage across a resistor
• Percentage depletion of an oil field
• the value of gravity
Time and location are the INDEPENDENT variables,
Since their values do not depend on other values.
ANALOG data: data that have continuous values -
the earth’s magnetic field
No matter how finely you measure, there is always a
???? WHAT KIND OF DATA IS DNA ????
They must first be DIGITIZED.
The data DOMAIN is the region over which the data are defined, such as time, or quadrangle.
The data RANGE is the values that are recorded.
We normally use a log scale to measure range, with the common unit of a deciBell or dB.
A factor of 10 change in AMPLITUDE is equal to 20 dB, or a factor of 10 in POWER is also equal to 20 dB.
This is slightly different from the definition in the notes. The ratio shown is a value (A) divided by a reference (A0). The value of A can be larger or smaller than A0.
Consider A=2 and A0=1. What is DR?
What if A=1/2 and A0=1. What is DR?
DR stands for “Dynamic Range” when A0 is the smallest value we can observe and A is the largest value.
You are offered a head set that is advertised to have a dynamic range of 24 dB. How much larger is the loudest sound than the quietest sound that this head set can handle?
Should you buy it?
Your ear can hear sounds from about 10-9 Pa up to ~103 Pa. What is the dynamic range of your ear?
We’ll talk about at least two frequencies in this course. The DATA FREQUENCY is how the data repeat with respect to time or other independent variable - tides, for example.
We’ll also talk about statistical frequency - how often a particular outcome is observed - when we talk about statistics.
Noise is any part of a signal that is not desired. Noise is almost always a factor in time series. We can define a Signal-to-Noise ratio: S/N
The characteristics of noise are often as important as the characteristics of the signal.
The instruments we use to observe the characteristics of our data are always limited in two important ways - precision and accuracy.
An accurate measurement is close to the true value. A precise measurement has little scatter.
A rifleman shoots at targets with two different weapons. The gun used on the left is accurate but has poor precision. The one used on the right has good precision but has poor accuracy.
Often, we can’t see our target well when analyzing data, and measures of precision and accuracy are critical to establishing the validity of our results. Importance of CALIBRATION.
The values of some data are completely determined by some natural law - such as the freezing temperature of pure water at STP. These are deterministic properties.
Other data have no structure or patterns, and are called chaotic or random.
Most data have some random components which make individual values impossible to predict. These are probabilistic or stochastic phenomena.
We’re after an understanding of some natrual phenomenon. Steps that might lead to this:
• HYPOTHESIS - what do we think is happening?
• Devise an experiment
• çollect data
• perform exploratory analysis
•Reduce the data to relevant information
• compare results with hypothesis
Defining your hypothesis and experiment are critical to the success of science, but specific to particular problems.
Data collection must be done carefully to maximize accuracy and minimize noise.
Exploratory data analysis is done to get familiar with the data, establish its validity, and see if we can spot trends and patterns. This involves PLOTTING raw data.
Data reduction removes irrelevant parameters and involves MODELS of how the relevant parameters might behave.
Comparison with the hypothesis may involve statistical tests, or other means of comparison.
PLOTTING your data.
One of the simplest plots is the two-dimensional scatter plot. This is VERY easy using Excel. In most of our work, we will use MatLab, but you should also be familiar with Excel. The difference is like from a calculator to a computer - both have their uses.
In - class exercise. Scatter plot with Excel.
What you should learn:
• How to use Excel for simple plotting
• How to add data series to a plot
• How to read in data from a file
• How to label the axes and add a title
• How to print the plot