1 / 27

Weka: A Useful Tool for Air Quality Forecasting

Weka: A Useful Tool for Air Quality Forecasting . William F. Ryan Department of Meteorology The Pennsylvania State University wfr1@psu.edu. 2007 National Air Quality Conference, Orlando. Weka. The weka, or woodhen, is a bird native to New Zealand. Weka is

Mia_John
Download Presentation

Weka: A Useful Tool for Air Quality Forecasting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Weka: A Useful Tool for Air Quality Forecasting William F. Ryan Department of Meteorology The Pennsylvania State University wfr1@psu.edu 2007 National Air Quality Conference, Orlando

  2. Weka The weka, or woodhen, is a bird native to New Zealand. Weka is also the name of a suite of machine learning software tools, written in Java, and developed at the University of Wiakato in New Zealand. http://www.cs.waikato.ac.nz/ml/weka

  3. Machine Learning • Machine learning is a subfield of artificial intelligence (AI) concerned with the development of algorithms and techniques that allow computers to "learn". • The machine learning algorithms in Weka include, among others, linear regression, classification trees, clustering and artificial neural networks (ANN).

  4. Weka Can Be A Useful Tool • Weka has the potential to be a useful tool to support local air quality forecasting efforts – particularly those operating on a limited budget. • Weka is open source (free) software - although the purchase of the associated text book is strongly recommended. • Weka is easily installed on standard PC's but can also run on Linux and other platforms. • Only minimal modifications are necessary to prepare data files for use in Weka. • The user interface is simple and intuitive.

  5. Weka and PM2.5 Forecasting • Of particular interest to air quality forecasters is the wide range of algorithms included in Weka. • These algorithms may be useful to address shortcomings in statistical forecast guidance for fine particulate matter (PM2.5). • Simple linear regression methods provide reasonable skill for O3 forecasting, due to the very strong and nearly linear ozone-temperature relationship, but linear regression methods have shown limited skill in forecasting PM2.5.

  6. PM2.5 Forecasting O3 (left panel) is well-behaved statistically. Distribution is near normal with a strong association with maximum temperature. As a result, linear techniques are useful. PM2.5 (right panel) is not well- behaved. Distribution is skewed, no strong association with any particular weather variable. Tools included in Weka, including ANN and classification and regression trees (CART), are capable of addressing non-linear problems posed by PM2.5.

  7. Weka: Information http://www.cs.waikato.ac.nz/ml/weka/

  8. Input File Format Weka uses its own file format called: *.aarf All you need to do though is provide a *.csv file with variable names in the first line and Weka will convert

  9. aarf Format aarf format is simple anyway: ASCII file List of variable and type Then data follows, comma separated Missing data marked as “?”

  10. Data Editing Data can be easily edited within Weka itself

  11. Analyzing Data Variables can be easily scanned with basic statistics and histograms provided by Weka

  12. Quick Analysis Tools

  13. Sampling and Test Data Set Options

  14. Functions Available WEKA includes a number of different techniques that can be useful for forecast development. These include: Linear and logistic regression Perceptron models (Neural networks)

  15. Linear Regression Unfortunately, the “work horse” linear regression module in Weka is limited in usefulness: -No automatic stepwise function -Poor diagnostics Compare: SYSTAT, Minitab

  16. Classification and Regression Trees (CART) A variety of classification algorithms are available. Standard algorithm is J48, which is a souped up version of the last free version of CART (Version 4.5) Commercial version is currently 5.0.

  17. CART Options • A number of options • are available to • fine tune the CART • Analysis: • Minimum # of cases • per node • Types of pruning: e.g., • sub-tree raising • Confidence values for • splitting nodes

  18. CART Diagnostics CART is notorious for using CPU resources but the WEKA version runs efficiently on my standard PC. Diagnostics are better for CART than linear regression. Example on left is of a 4 category PM2.5 CART forecast.

  19. CART Visualization

  20. Artificial Neural Networks (ANN) “Linear Regression by a mob” Produces forecast by taking the weighted sum of predictors and then layering the process.

  21. Artificial Neural Networks - Summary Known samples (historical data) are used to “train” the network. Input data (xi) are assigned weights (wi) and combined in the “hidden” layer – like a set of linear regressions. These sets are then combined in additional layers – like regressions of regressions. The sum of data and weights are transformed (“squashed”) to the range of the training data and error is measured. A supervised training algorithm uses output error to adjust network weights to minimize errors.

  22. Artificial Neural Networks – Pros/Cons • Pro: ANN’s are a powerful technique utilized across scientific disciplines. • Pro: Theoretically well suited to non-linear processes like air quality. • Con: Not transparent to users. Hard to integrate into forecast thinking. • Con: Technically difficult to understand, raises risk of misuse.

  23. Example: Neural Network Structure www.doc.ic.ac.uk/~sgc/teaching/v231/

  24. WEKA Neural Networks WEKA provides user control of training parameters: # of iterations or epochs (“training time”) Increment of weight adjustments in back propogation (“learning rate”) Controls on varying changes to increments (“momentum”)

  25. Conclusions • Weka is a low-cost forecasting tool that has the potential to be a useful for air quality forecasting – particularly in situations where non-linear effects dominate. • Some Weka modules are not fully developed for forecast algorithm development. • Patience, use of textbook and Weka listserv are required to get the most out of the program.

  26. URLs of Interest • Weka: • http://www.cs.waikato.ac.nz/ml/weka • Mailing List: • https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist • Mailing List Archives • https://list.scms.waikato.ac.nz/mailman/htdig/wekalist/ • Informal FAQ: • http://www.public.asu.edu/~sksinghi/weka-faq.html

  27. Acknowledgements • The Delaware Valley Regional Planning Commission (DVRPC) – Mike Boyer and Sean Greene – and the member states (PA, DE and NJ) for supporting air quality forecast development. • Dr. George Young of Penn State for his advice, patience and teaching skill.

More Related