A multi stage expert system for aerosol classification
This presentation is the property of its rightful owner.
Sponsored Links
1 / 38

A Multi-Stage Expert System for Aerosol Classification PowerPoint PPT Presentation


  • 58 Views
  • Uploaded on
  • Presentation posted in: General

A Multi-Stage Expert System for Aerosol Classification. Statisticians : Raymond Mugno and Wei Zhu Computer Scientists : Peter Imrich and Klaus Mueller Environmental Chemists : Dan Imre and Alla Zelenyuck. Our Project. Working with Environmental Chemists at Brookhaven National Laboratory

Download Presentation

A Multi-Stage Expert System for Aerosol Classification

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


A multi stage expert system for aerosol classification

A Multi-Stage Expert System forAerosol Classification

Statisticians:

Raymond Mugno and Wei Zhu

Computer Scientists:

Peter Imrich and Klaus Mueller

Environmental Chemists:

Dan Imre and Alla Zelenyuck


Our project

Our Project

  • Working with Environmental Chemists at Brookhaven National Laboratory

  • Design an Expert System to classify aerosols

  • Existing tools are not accurate enough

  • Existing tools are not fast enough


Our project an expert system

Our Project: An Expert System

  • Use statistical techniques to reduce data

  • Use visualization techniques to show reduction

  • Give experts a tool to view classification

  • Experts can change how data is reduced, by moving particles and making “classification rules”


Our project the chemistry

Our Project: The Chemistry

  • Aerosols are collected by a SPLAT mass spectrometer

  • 5200 voltage readings are collected for each particle

  • The 5200 voltage drops are converted to a 450 dimensional vector of intensities

  • These vectors are the mass spectra


Mass spectra

Mass Spectra

  • Each spectra can be represented as vector, v = (v1, v2, …, v450)

  • The subscript, j=1,2, … 450, represents the atomic weight of an element in the particle

  • The value of vj represents the amount of that element


A multi stage expert system for aerosol classification

Dataset

  • Over 1 million mass spectra collected from Houston, Texas

  • Filtered to 238,160 actual particles

  • Each particle has a unique time stamp down to millisecond

  • Each particle has a time of flight, from which mass can be determined

  • Each particle has a mass spectra, an ordered array of 450 integer peaks


The two level classification scheme

The Two-Level Classification Scheme

  • Level 1 (dimension reduction): Classify large numbers of mass spectra into clusters with very similar particles, i.e. similar mass spectra, using K-means clustering.

  • Level 2 (class determination): Guided by chemical experts, combine clusters using binary classifiers. Determine the particle/cluster membership (acid, aromatics, or finer classes).


Clustering analysis

Clustering Analysis

  • Statistical tool used to classify multi-dimensional entities

  • Hierarchical clustering:

  • Non-Hierarchical clustering


K means clustering analysis

K-Means Clustering Analysis

  • Start with a k seeds (representative entities of the k clusters) and a threshold distance

  • For each entity find the distance to each seed

  • If the minimum distance is less than the threshold, add the entity to that cluster

  • If the minimum distance is greater than the threshold, the entity becomes the seed of a new cluster


K means clustering analysis1

K-Means Clustering Analysis

  • After iterating through all the entities, update the seeds

  • Iterate though the particles again

  • Continue iterating through the particles until none of the entities change clusters or other criterion is met

  • Differs from classification, because end number of classes and the classes themselves are not set in advance


First level classification k means clustering

First Level Classification:K-Means Clustering

  • Start with 25 seeds, average spectra of particle class, from experts

  • For each particle, the distance between its spectra and each seed’s spectra is calculated

  • If the minimum distance is less than a threshold distance, the particle is put into that corresponding cluster

  • If the minimum distance is greater than the threshold distance, the current particle is set as a new seed


Distance function

Distance Function

  • 1 – r

  • r is the Pearson Correlation Coefficient

  • We label seed spectra as (x1,x2,…,xn) and the particle spectra as (y1,y2,…,yn) -- where Xi or Yi represents the magnitude of the ‘peak’ at location i is mass to charge ratio (i=13,14,…,250)


Notation

Notation

X39=20

X41=5

X53=2

Y26=1

Y28=25

Y68=2


Why use correlation coefficient

Why use Correlation Coefficient?

  • Spectra with “peaks” of the same proportion at the same locations will have small distance between them

  • Classify similar shaped spectra together for dimension reduction


First level results

First Level Results

  • Started with 25 seeds

  • Threshold to create a new cluster set to 0.3

  • Processed 238,160 particles (5 iterations)

  • Finished with 2000 clusters/seeds

  • Seeds are updated to be average of spectra in cluster

  • Dimension reduction of 120 fold


Cluster example 1 based on an organic seed 23

Cluster Example 1, based on an Organic seed (23)

19 major peaks from 27 to 97


Cluster example 2 based on an fe seed 13

Cluster Example 2, based on an Fe seed(13)

4 major peaks at 54, 55, 56 and 57


New cluster example 1 464

New Cluster --- Example 1 (464)

6 Major Peaks at: 23, 24, 28, 30, 36, and 39


New cluster example 2 640

New Cluster --- Example 2 (640)

6 Major Peaks at: 53, 54, 55, 56, 57, and 58


New cluster example 3 574

New Cluster --- Example 3 (574)

19 Major Peaks from 23 to 131


A multi stage expert system for aerosol classification

Measuring Within-Cluster Similarity

Calculated the average distance for each particle to its clusters center

Calculated the standard deviation of the distances to the cluster’s center for each cluster

Found the particle furthest from the cluster’s center


Comments on the first level classifier

Comments on the First Level Classifier

  • Using a distance of threshold of 0.3 yielded clusters where the within cluster similarity level is very high

  • Ideal for dimension reduction – Now instead of working with 238,160 original spectra, we can work with the 2000 seeds for the clusters for a second level classification

  • A dimension reduction of 120 fold!


A multi stage expert system for aerosol classification

Second Level Classification

  • 2000 clusters is too many

  • Find clusters that are very similar and combine them

  • Find clusters that are very similar and classify them into a general group and have the chemical experts sub divide the general groups


A multi stage expert system for aerosol classification

Second Level Classification

Hierarchical Clustering

  • Find the pair wise distance between each entity

  • Merge the two “closest” entities

  • Repeat the procedure until there is only 1 entity, or until merging distance threshold is met


A multi stage expert system for aerosol classification

Second Level Classification

Hierarchical Clustering

  • Simple Linkage (closest entities)

  • Average Linkage (average distance over all entities)

  • Centroid Linkage (distance between average entities)

  • Complete Linkage (distance between furthest elements)


A multi stage expert system for aerosol classification

Second Level Classification

  • Find clusters that are very similar and classify them into a general group and have the chemical experts sub divide the general groups

  • Using Centroid Linkage

  • Each cluster is represented by a seed, the average spectra of that cluster


A multi stage expert system for aerosol classification

Second Level Classification

  • Use Binary Matching metric to group clusters

  • From a particle spectra vector v create a binary vector w.

  • If vi > Peak_Threshold of total of vi’s peaks, wi = 1, else wi=0

  • Experts gave Peak_Threshold =10 to filter out noise


A multi stage expert system for aerosol classification

Second Level Classification

Metric for comparing 2 binary vectors, w and x

Binary score = Number of peaks in common

Max Peaks = Maximum number of peaks between x or w

Distance = 1 – (Binary Score)/Max Peaks


A multi stage expert system for aerosol classification

Second Level Classification

Circular Dendrogram

Seeds are located around the circumference

More similar seeds are merged closer to the outer edge of the circle


A multi stage expert system for aerosol classification

Second Level Classification


A multi stage expert system for aerosol classification

Second Level Classification

User can zoom on different area of the dendrogram

From dendrogram, user can obtain seed spectra

From dendrogram, user can obtain cluster information, such as number of particles


A multi stage expert system for aerosol classification

Second Level Classification


A multi stage expert system for aerosol classification

Conclusion

Chemists now have a tool to view distribution of the data

Can look at 2000 seeds instead of 238,000 particles

Gives chemists insight to distribution of particle classes

Chemists can give feedback that will improve metrics


Future work molecule library

Future Work - Molecule Library


Future work time series analysis

Future Work - Time Series Analysis

  • Time series distribution of average (or median) aerosol size

  • Time series distribution of atmospheric composition

  • Time series models for oxidation rate comparison


A multi stage expert system for aerosol classification

Future Work - Comparison of Classifiers

1. Hierarchical clustering (Hinz et al, 1995)

2. Non-hierarchical clustering (Trieger et al, 1995)

3. Fuzzy clustering (Hinz et al, 96/99; Trieger et al, 95)

4. Discriminant analysis (Alsberg et al,1998)

5. Neural network (Song et al, 1999)

6. Classification tree (Harrington et al, 1989)

  • Which approach is the best?


Future work spatial temporal analysis

Future Work - Spatial Temporal Analysis

  • Analyzing the spatial & temporal evolution of air-borne particles

  • Random field theory

  • Interactive classification & analysis via the graphical user interface

    The Human-Machine Interface: examine chemical significance of clusters; adjust classifiers for better classification; explore the spatial/temporal trend & model goodness-of-fit graphically ...


In the news

In the News...

  • Technology Takes on Terrorism

  • By Earl LaneWASHINGTON BUREAUNewsday: February 26, 2002


  • Login