1 / 16

Alison M. Eyth, Prashant P. Pai Carolina Environmental Program

The Analysis Engine – A New Tool for Model Evaluation, Sensitivity and Uncertainty Analysis, and more…. Alison M. Eyth, Prashant P. Pai Carolina Environmental Program University of North Carolina at Chapel Hill October 19, 2004. Background.

carnig
Download Presentation

Alison M. Eyth, Prashant P. Pai Carolina Environmental Program

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Analysis Engine – A New Tool for Model Evaluation, Sensitivity and Uncertainty Analysis, and more… Alison M. Eyth, Prashant P. Pai Carolina Environmental Program University of North Carolina at Chapel Hill October 19, 2004 Carolina Environmental Program UNC Chapel Hill

  2. Background • Supports data analysis by creating plots and tables • “Analysis Configurations” facilitate repeated analyses • Developed as part of the Multimedia Integrated Modeling System (but can be used standalone) • Java application that runs on Windows, Linux, Unix, and Mac OS X • Open source – available from http://sourceforge.net/projects/mimsfw • Three main components: • Table application • Plotting engine • Statistics package Carolina Environmental Program UNC Chapel Hill

  3. Table Application • Provides the top level user interface • File menu accesses import and export functions • Currently supported file formats include: • Comma separated (.csv), Custom and tab delimited, Fixed column width, SMOKE Report, and ARFF (to support data mining with WEKA) • Data files are imported as rows and columns • Multiple files can be loaded, with each file shown in its own tab • tabs include the file name, header, data table, and footer • Toolbar and pop up (i.e. right click) menus provide access to functions such as sort, filter, top N rows, format, plot, and statistics Carolina Environmental Program UNC Chapel Hill

  4. Table Application GUI Carolina Environmental Program UNC Chapel Hill

  5. Toolbar and Pop-up Menu Functions • Multi-column sort • Show only the rows with the Top N values • Show only the rows with the Bottom N values • Filter rows based on criteria (e.g. NOx > 500) • Show / hide columns • Format columns • e.g. color, width, font, number or date style • Create plots • Compute statistics • Edit analysis configuration • Reset Carolina Environmental Program UNC Chapel Hill

  6. Filter Rows Dialog • Use Filter Rows to limit the rows shown in the table • Any number of criteria can be added • Each criterion has a column, operation, and value • Available operations are <, <=, >, >=, not =, starts with, contains, ends with, does not start with, does not contain, ... • Select between showing rows matching ALL criteria or ANY Carolina Environmental Program UNC Chapel Hill

  7. Plotting Options Dialog • Choose Plot type from Bar, Box, CDF, Discrete Category, Histogram, Rank Order, XY (Scatter), Line, Time Series, and Tornado • Select Data Columns to plot • Specify Units and one to three columns to use for labels • Selected data is passed to the plotting engine Carolina Environmental Program UNC Chapel Hill

  8. Plot Properties are Specified using the Analysis Engine GUI Carolina Environmental Program UNC Chapel Hill

  9. Example Discrete Category Plot Note: Plots are created using a custom Java interface to R Carolina Environmental Program UNC Chapel Hill

  10. Statistics Dialog • Provides interface to the statistics package • Specify statistics to compute and data columns to analyze • Additional details are specified on other tabs • Statistics outputs appear as new tabs in the table application • Statistics are computed using Colt and Weka Carolina Environmental Program UNC Chapel Hill

  11. Example of Histogram Statistics Note: This is a new tab that supports all the standard functions such as sort, filter, format, and plot Carolina Environmental Program UNC Chapel Hill

  12. Analysis Configuration Dialog • The Analysis Configuration stores all the table settings and plots that you have created during your session • The selected plots can be viewed, edited or deleted • Plots can be given new names by double clicking the name • Some (or all) of the settings can be saved to a configuration file • Configuration files can be loaded in future sessions or for other data files in the current session Carolina Environmental Program UNC Chapel Hill

  13. Automation • An optional command line interface may be used specify: • Data files to load • Analysis configuration file to use • Type of plots to create (e.g., JPG, PDF, PNG) • Output directory for plots and tables • This allows plots and tables to be created in an automated fashion • Standard analysis products may be created for newly available data sets that have the same format Carolina Environmental Program UNC Chapel Hill

  14. Examples of Potential Applications • Model Evaluation • Sort to find stations at which the error was the largest • Plot modeled and observed values on box plots, etc. • Create scatter plots of one species vs. another • Sensitivity and Uncertainty Analysis • Perform linear regression and show in plots and tables • Compute correlation coefficients • Emissions Modeling Quality Assurance • Find states with top 10 emission values • Stacked bar charts to show total emissions • Compute histograms • General Data Analysis • Analyze data by sorting, filtering, and computing statistics Carolina Environmental Program UNC Chapel Hill

  15. Future Directions • Initial version will be released on SourceForge by 10/31/04 (which is the end date for the current funding for this work) • Many potential enhancements are listed on SourceForge, e.g.: • Create new rows and columns using functions (e.g difference, sum) • Create plots and tables with data from multiple tabs • Will likely be used as part of the new emissions quality assurance tool (http://sourceforge.net/projects/emisview) • Mr. Tommy Cathey will continue to develop the custom Java interface to R at the EPA Scientific Visualization Laboratory in FY05 Carolina Environmental Program UNC Chapel Hill

  16. References • MIMS Sourceforge page (for downloads): http://sourceforge.net/projects/mimsfw • R (for plots): http://www.r-project.org • Colt (for basic statistics): http://www-itg.lbl.gov/~hoschek/colt • Weka (for regression and correlation analysis): http://www.cs.waikato.ac.nz/~ml/weka/ • Carolina Environmental Program (for more information): http://www.cep.unc.edu • Authors: eyth@unc.edu, prapai@unc.edu Carolina Environmental Program UNC Chapel Hill

More Related