Data Domains and Introduction to Statistics Chemistry 243
Photons are modulated by sample Electromagnetic methods Electrical methods Instrumental methods and what they measure
Instruments are translators • Convert physical or chemical properties that we cannot directly observe into information that we can interpret.
Sometimes multiple translations are needed • Thermometer • Bimetallic coil converts temperature to physical displacement • Scale converts angle of the pointer to an observable value of meaning adapted from C.G. Enke, The Art and Science of Chemical Analysis, 2001. • Thermostat: Displacement used to activate switch http://upload.wikimedia.org/wikipedia/commons/d/d2/Bimetaal.jpg http://upload.wikimedia.org/wikipedia/commons/2/26/Bimetal_coil_reacts_to_lighter.gifhttp://static.howstuffworks.com/gif/home-thermostat-thermometer.jpg
Data domains • Information is encoded and transferred between domains • Non-electrical domains • Beginning and end of a measurement • Electrical domains • Intermediate data collection and processing
Voltage (V = iR) Quantity to be measured Intermediate quantity 1 Intermediate quantity 2 Emission Current Number Intensity Initial conversion device PMT Intermediate conversion device Resistor Readout conversion device Digital voltmeter Data domains Often viewed on a GUI (graphical user interface)
Electrical domains • Analog signals • Magnitude of voltage, current, charge, or power • Continuous in both amplitude and time • Time-domain signals • Time relationship of signal fluctuations • (not amplitudes) • Frequency, pulse width, phase • Digital information • Data encoded in only two discrete levels • A simplification for transmission and storage of information which can be re-combined with great accuracy and precision • The heart of modern electronics
Digital and analog signals • Analog signals • Magnitude of voltage, current, charge, or power • Continuous in both amplitude and time • Digital information • Data encoded in only discrete levels
Analog to digital to conversion • Limited by bit resolution of ADC • 4-bit card has 24 = 16 discrete binary levels • 8-bit card has 28 = 256 discrete binary levels • 32-bit card has 232 = 4,294,967,296 discrete binary levels • Common today • Maximum resolution comes from full use of ADC voltage range. • Trade-offs • More bits is usually slower • More expensive K.A. Rubinson, J.F. Rubinson, Contemporary Instrumental Analysis, 2000.
Byte prefixes About 1000 About a million About a billion
Serial and parallel binary encoding Slow – not digital; outdated (serial) Fast – between instruments “serial-coded binary” data Binary Parallel: Very Fast – within an instrument “parallel digital” data
Introductory statistics • Statistical handling of data is incredibly important because it gives it significance. • The ability or inability to definitively state that two values are statistically different has profound ramifications in data interpretation. • Measurements are not absolute and robust methods for establishing run-to-run reproducibility and instrument-to-instrument variability are essential.
Introductory statistics:Mean, median, and mode • Population mean (m): average value of replicate data • Median (m½): ½ of the observations are greater; ½ are less • Mode (mmd): most probable value • For a symmetrical distribution: • Real distributions are rarely perfectly symmetrical
Statistical distribution • Often follows a Gaussian functional form
Introductory statistics: Standard deviation and variance • Standard deviation (s): • Variance (s2):
Gaussian distribution • Common distribution with well-defined stats • 68.3% of data is within 1s of mean • 95.5% at 2s • 99.7% at 3s
Statistical distribution • 50 Abs measurements of an identical sample • Let’s go to Excel Table a1-1, Skoog
Standard deviation and variance, continued • s is a measure of precision (magnitude of indeterminate error) • Other useful definitions: • Standard error of mean
Confidence intervals • In most situations m cannot be determined • Would require infinite number of measurements • Statistically we can establish confidence interval around in which m is expected to lie with a certain level of probability.
Calculating confidence intervals • We cannot absolutely determine , so when s is not a good estimate (small # of samples) use: • Note that t approaches z as N increases. 2-sided t values
Example of confidence interval determination for smaller number of samples • Given the following values for serum carcinoembryonic acid (CEA) measurements, determine the 95% confidence interval. • 16.9 ng/mL, 12.7 ng/mL, 15.3 ng/mL, 17.2 ng/mL or • Sample mean = 15.525 ng/mL • s = 2.059733 ng/mL • Answer: 15.525 ± 2.863, but when you consider sig figs you get: 16 ± 3
Propagation of errors • How do errors at each set contribute to the final result?