GG 313 Fall, 2005

GG 313 Fall, 2005

Text: Intro. To Geological Data Analysis, Swan and Snadilands (OPTIONAL – Bookstore says it’s out of print!) Class Notes: PDF file: Go to: http://www.higp.hawaii.edu/~cecily/courses/gg313.html Download Lecture supplements: 1. Lecture Notes PDF This site also includes many files that will be used extensively in the course Web Page: Not yet

Grades: Lots of homework – 60% of grade 20% mid-term exam 20% final exam Late homework will suffer! Need access to computer room? Need an account? See Susan in the Geology Office POST 801

Much of this course was designed by Paul Wessel with Modifications by Cecily Wolfe and Rob Dunn. Many thanks for their help and access to their notes And ideas.

Why take this course? • Geosciences are very data intensive • Jobs require quantitative skills • Need to test hypotheses • Need to defend arguments You need to take this course if • You plan to do any data analysis • You want to understand the analyses of others • You want to be prepared for a career in science

Don’t - be intimidated - ASK QUESTIONS ANY TIME be surprised if I can’t answer your question - I’m learning, too.

What does it mean to “analyze data”? To examine and study in detail to determine properties and essential features. Why do we analyze data? To pass information to others, to simplify and compress information into useful quantities, to make your experiment understandable in a broader context. Extraction of “signals” from “noise”. We know what information we want to pass on – analysis allows us to do it efficiently.

Is analysis always necessary? No – some data are clear enough in context to be understandable without analysis.

How can you tell if an analysis is valid? With difficulty. Be aware of problems and subterfuge in analysis. BIAS is rampant in science – many times unconscious. Bias towards a particular hypothesis can warp thinking. Hypothesis testing should always be aimed at DISCREDITING hypotheses, not proving them.

What types of data might need analysis? • Earthquake recurrence data • Ash thickness • Rock chemistry • Sample age • Seismic noise level Other examples? from your research?

What types of data are there? Discrete data have distinct outcomes - like rolling dice, counts of items or groups. “Ordinal” data are ranked but without a constant interval, as in the Moh’s hardness scale.

“Nominal” data are classified by some more or less Arbitrary measure, such as color or lithology. Continuous data have a large range of values without breaks, with an infinite number of possible values in a given range. • Voltage across a resistor • Percentage depletion of an oil field • the value of gravity others??

Time series data vary with location and/or time. Time and location are the INDEPENDENT variables, Since their values do not depend on other values. ANALOG data: data that have continuous values - Such as: sea level the earth’s magnetic field No matter how finely you measure, there is always a Value. ???? WHAT KIND OF DATA IS DNA ????

Analog data cannot be analyzed in a computer. They must first be DIGITIZED. The data DOMAIN is the region over which the data are defined, such as time, or quadrangle. The data RANGE is the values that are recorded. We normally use a log scale to measure range, with the common unit of a deciBell or dB.

A deciBell (named after Alexander Graham Bell) is defined such that: A factor of 10 change in AMPLITUDE is equal to 20 dB, or a factor of 10 in POWER is also equal to 20 dB. Or,

This is slightly different from the definition in the notes. The ratio shown is a value (A) divided by a reference (A0). The value of A can be larger or smaller than A0. Consider A=2 and A0=1. What is DR? What if A=1/2 and A0=1. What is DR? DR stands for “Dynamic Range” when A0 is the smallest value we can observe and A is the largest value.

Example: You are offered a head set that is advertised to have a dynamic range of 24 dB. How much larger is the loudest sound than the quietest sound that this head set can handle? Should you buy it? Your ear can hear sounds from about 10-9 Pa up to ~103 Pa. What is the dynamic range of your ear?

FREQUENCY: We’ll talk about at least two frequencies in this course. The DATA FREQUENCY is how the data repeat with respect to time or other independent variable - tides, for example. We’ll also talk about statistical frequency - how often a particular outcome is observed - when we talk about statistics.

NOISE Noise is any part of a signal that is not desired. Noise is almost always a factor in time series. We can define a Signal-to-Noise ratio: S/N The characteristics of noise are often as important as the characteristics of the signal.

MEASUREMENT The instruments we use to observe the characteristics of our data are always limited in two important ways - precision and accuracy. An accurate measurement is close to the true value. A precise measurement has little scatter.

A rifleman shoots at targets with two different weapons. The gun used on the left is accurate but has poor precision. The one used on the right has good precision but has poor accuracy. Often, we can’t see our target well when analyzing data, and measures of precision and accuracy are critical to establishing the validity of our results. Importance of CALIBRATION.

Randomness The values of some data are completely determined by some natural law - such as the freezing temperature of pure water at STP. These are deterministic properties. Other data have no structure or patterns, and are called chaotic or random. Most data have some random components which make individual values impossible to predict. These are probabilistic or stochastic phenomena.

Steps in Analysis We’re after an understanding of some natrual phenomenon. Steps that might lead to this: • HYPOTHESIS - what do we think is happening? • Devise an experiment • çollect data • perform exploratory analysis •Reduce the data to relevant information • compare results with hypothesis

Defining your hypothesis and experiment are critical to the success of science, but specific to particular problems. Data collection must be done carefully to maximize accuracy and minimize noise. Exploratory data analysis is done to get familiar with the data, establish its validity, and see if we can spot trends and patterns. This involves PLOTTING raw data.

Data reduction removes irrelevant parameters and involves MODELS of how the relevant parameters might behave. Comparison with the hypothesis may involve statistical tests, or other means of comparison.

Exploratory Data Analysis PLOTTING your data. One of the simplest plots is the two-dimensional scatter plot. This is VERY easy using Excel. In most of our work, we will use MatLab, but you should also be familiar with Excel. The difference is like from a calculator to a computer - both have their uses.

Excel In - class exercise. Scatter plot with Excel. What you should learn: • How to use Excel for simple plotting • How to add data series to a plot • How to read in data from a file • How to label the axes and add a title • How to print the plot

GG 313 Fall, 2005

GG 313 Fall, 2005

Presentation Transcript

GG 313 Lecture 4 Probability Basics

GG 313 Lecture 11 Chapter 3 Linear (Matrix) Algebra Sept 27, 2005

Fall 2005

GG 313 Lecture 22 Series of Events Run Tests Correlation Nov 10, 2005

CS3 Fall 2005

Reading Fall 2005

CS3 Fall 2005

CS3 Fall 2005

GG 313 Lecture 6

GG 313 Lecture 9 Nonparametric Tests 9/22/05

FALL 2005

Fall 2005-

CMPUT603 - Fall 2005

Lecture02_C1403 Fall 2005

ME 313 – Fall 2011 Last Lecture

GG 313 Fall, 2005

CS3 Fall 2005