Weka: A Useful Tool for Air Quality Forecasting . William F. Ryan Department of Meteorology The Pennsylvania State University email@example.com. 2007 National Air Quality Conference, Orlando. Weka. The weka, or woodhen, is a bird native to New Zealand. Weka is
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
William F. Ryan
Department of Meteorology
The Pennsylvania State University
2007 National Air Quality Conference, Orlando
The weka, or woodhen, is a bird
native to New Zealand. Weka is
also the name of a suite of machine
learning software tools, written in
Java, and developed at the University
of Wiakato in New Zealand.
O3 (left panel) is well-behaved
statistically. Distribution is near
normal with a strong association
with maximum temperature. As a
result, linear techniques are
PM2.5 (right panel) is not well-
behaved. Distribution is skewed,
no strong association with any
particular weather variable.
Tools included in Weka, including
ANN and classification
and regression trees (CART),
are capable of addressing
non-linear problems posed by PM2.5.
Weka uses its own
file format called:
All you need to do
though is provide a
*.csv file with variable
names in the first line
and Weka will convert
aarf format is simple anyway:
List of variable and type
Then data follows,
Missing data marked as “?”
Data can be easily edited
within Weka itself
Variables can be easily
scanned with basic
statistics and histograms
provided by Weka
WEKA includes a number of
different techniques that can be useful for forecast development.
Linear and logistic regression
Unfortunately, the “work horse” linear
regression module in Weka is limited in
-No automatic stepwise function
Compare: SYSTAT, Minitab
A variety of classification
algorithms are available.
Standard algorithm is
J48, which is a souped
up version of the last
free version of CART
Commercial version is
CART is notorious for using
CPU resources but the WEKA
version runs efficiently on my
Diagnostics are better for
CART than linear regression.
Example on left is of a 4 category
PM2.5 CART forecast.
“Linear Regression by
Produces forecast by
taking the weighted
sum of predictors and
then layering the
Known samples (historical data) are used to “train” the network.
Input data (xi) are assigned weights (wi) and combined in the “hidden” layer – like a set of linear
regressions. These sets are then combined in
additional layers – like regressions of regressions.
The sum of data and weights are transformed
(“squashed”) to the range of the training data and error is measured.
A supervised training algorithm uses output error to adjust network weights to minimize errors.
WEKA provides user control
of training parameters:
# of iterations or epochs
Increment of weight adjustments
in back propogation (“learning
Controls on varying changes
to increments (“momentum”)