weka a useful tool for air quality forecasting l.
Skip this Video
Loading SlideShow in 5 Seconds..
Weka: A Useful Tool for Air Quality Forecasting PowerPoint Presentation
Download Presentation
Weka: A Useful Tool for Air Quality Forecasting

Loading in 2 Seconds...

play fullscreen
1 / 27

Weka: A Useful Tool for Air Quality Forecasting - PowerPoint PPT Presentation

  • Uploaded on

Weka: A Useful Tool for Air Quality Forecasting . William F. Ryan Department of Meteorology The Pennsylvania State University wfr1@psu.edu. 2007 National Air Quality Conference, Orlando. Weka. The weka, or woodhen, is a bird native to New Zealand. Weka is

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Weka: A Useful Tool for Air Quality Forecasting' - Mia_John

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
weka a useful tool for air quality forecasting

Weka: A Useful Tool for Air Quality Forecasting

William F. Ryan

Department of Meteorology

The Pennsylvania State University


2007 National Air Quality Conference, Orlando


The weka, or woodhen, is a bird

native to New Zealand. Weka is

also the name of a suite of machine

learning software tools, written in

Java, and developed at the University

of Wiakato in New Zealand.


machine learning
Machine Learning
  • Machine learning is a subfield of artificial intelligence (AI) concerned with the development of algorithms and techniques that allow computers to "learn".
  • The machine learning algorithms in Weka include, among others, linear regression, classification trees, clustering and artificial neural networks (ANN).
weka can be a useful tool
Weka Can Be A Useful Tool
  • Weka has the potential to be a useful tool to support local air quality forecasting efforts – particularly those operating on a limited budget.
    • Weka is open source (free) software - although the purchase of the associated text book is strongly recommended.
    • Weka is easily installed on standard PC's but can also run on Linux and other platforms.
    • Only minimal modifications are necessary to prepare data files for use in Weka.
    • The user interface is simple and intuitive.
weka and pm 2 5 forecasting
Weka and PM2.5 Forecasting
  • Of particular interest to air quality forecasters is the wide range of algorithms included in Weka.
  • These algorithms may be useful to address shortcomings in statistical forecast guidance for fine particulate matter (PM2.5).
  • Simple linear regression methods provide reasonable skill for O3 forecasting, due to the very strong and nearly linear ozone-temperature relationship, but linear regression methods have shown limited skill in forecasting PM2.5.
pm 2 5 forecasting
PM2.5 Forecasting

O3 (left panel) is well-behaved

statistically. Distribution is near

normal with a strong association

with maximum temperature. As a

result, linear techniques are


PM2.5 (right panel) is not well-

behaved. Distribution is skewed,

no strong association with any

particular weather variable.

Tools included in Weka, including

ANN and classification

and regression trees (CART),

are capable of addressing

non-linear problems posed by PM2.5.

weka information
Weka: Information


input file format
Input File Format

Weka uses its own

file format called:


All you need to do

though is provide a

*.csv file with variable

names in the first line

and Weka will convert

aarf format
aarf Format

aarf format is simple anyway:

ASCII file

List of variable and type

Then data follows,

comma separated

Missing data marked as “?”

data editing
Data Editing

Data can be easily edited

within Weka itself

analyzing data
Analyzing Data

Variables can be easily

scanned with basic

statistics and histograms

provided by Weka

functions available
Functions Available

WEKA includes a number of

different techniques that can be useful for forecast development.

These include:

Linear and logistic regression

Perceptron models

(Neural networks)

linear regression
Linear Regression

Unfortunately, the “work horse” linear

regression module in Weka is limited in


-No automatic stepwise function

-Poor diagnostics

Compare: SYSTAT, Minitab

classification and regression trees cart
Classification and Regression Trees (CART)

A variety of classification

algorithms are available.

Standard algorithm is

J48, which is a souped

up version of the last

free version of CART

(Version 4.5)

Commercial version is

currently 5.0.

cart options
CART Options
  • A number of options
  • are available to
  • fine tune the CART
  • Analysis:
  • Minimum # of cases
  • per node
  • Types of pruning: e.g.,
  • sub-tree raising
  • Confidence values for
  • splitting nodes
cart diagnostics
CART Diagnostics

CART is notorious for using

CPU resources but the WEKA

version runs efficiently on my

standard PC.

Diagnostics are better for

CART than linear regression.

Example on left is of a 4 category

PM2.5 CART forecast.

artificial neural networks ann
Artificial Neural Networks (ANN)

“Linear Regression by

a mob”

Produces forecast by

taking the weighted

sum of predictors and

then layering the


artificial neural networks summary
Artificial Neural Networks - Summary

Known samples (historical data) are used to “train” the network.

Input data (xi) are assigned weights (wi) and combined in the “hidden” layer – like a set of linear

regressions. These sets are then combined in

additional layers – like regressions of regressions.

The sum of data and weights are transformed

(“squashed”) to the range of the training data and error is measured.

A supervised training algorithm uses output error to adjust network weights to minimize errors.

artificial neural networks pros cons
Artificial Neural Networks – Pros/Cons
  • Pro: ANN’s are a powerful technique utilized across scientific disciplines.
  • Pro: Theoretically well suited to non-linear processes like air quality.
  • Con: Not transparent to users. Hard to integrate into forecast thinking.
  • Con: Technically difficult to understand, raises risk of misuse.
example neural network structure
Example: Neural Network Structure


weka neural networks
WEKA Neural Networks

WEKA provides user control

of training parameters:

# of iterations or epochs

(“training time”)

Increment of weight adjustments

in back propogation (“learning


Controls on varying changes

to increments (“momentum”)

  • Weka is a low-cost forecasting tool that has the potential to be a useful for air quality forecasting – particularly in situations where non-linear effects dominate.
  • Some Weka modules are not fully developed for forecast algorithm development.
  • Patience, use of textbook and Weka listserv are required to get the most out of the program.
urls of interest
URLs of Interest
  • Weka:
    • http://www.cs.waikato.ac.nz/ml/weka
  • Mailing List:
    • https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
  • Mailing List Archives
    • https://list.scms.waikato.ac.nz/mailman/htdig/wekalist/
  • Informal FAQ:
    • http://www.public.asu.edu/~sksinghi/weka-faq.html
  • The Delaware Valley Regional Planning Commission (DVRPC) – Mike Boyer and Sean Greene – and the member states (PA, DE and NJ) for supporting air quality forecast development.
  • Dr. George Young of Penn State for his advice, patience and teaching skill.