1 / 16

A computational tool for depth-based Statistical analysis

A computational tool for depth-based Statistical analysis. Eynat Rafalin, Tufts University Computer Science Department. The tool. Easy to use, efficient and expandable interface, for statistical research, based on the notion of data depth. For scientists with no computer science background.

vachel
Download Presentation

A computational tool for depth-based Statistical analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A computational tool fordepth-based Statistical analysis Eynat Rafalin, Tufts University Computer Science Department

  2. The tool • Easy to use, efficient and expandable interface, for statistical research, based on the notion of data depth. • For scientists with no computer science background.

  3. Our goal • Present the tool to the community • Code\software available on request • Run on real data • Get feedback • Is such a tool needed? • Additions\improvements?

  4. General • C++ based software (no additional tools\software needed) • Simple interface. Should allow to • enter data files, sort the data points and filter unwanted data • perform calculations • present the results in an easy to understand graphical interface • Save and output data for future use • Fast • Portable code

  5. General description Data filter txt, excel files output Statistical modules Geomview Contours display and selection

  6. Data filter • Graphical user interface developed in C++ • Used to crop\manipulate a data set before it is fed into the statistical modules • Fast and light • Convenient and easy to use user interface • Portable code (UNIX, Solaris, Linux, Win)

  7. Data filter

  8. Statistical modules Depth contours (2D) • Half-space (location) depth contours • optimal O(n2) time • Supports two approaches for defining contours • Including Tukey median and the bagplot • Including contours’ parameters (size, etc..) • Convex hull peeling depth contours • Simplicial depth contours • Tukey median computation (O(nlog3n)) • Locating a new point in a set of depth contours (O(log n) query time)

  9. Approaches for defining depth contours • P. Rousseeuw et al. • The k-th depth contour is the boundary of the set of points in the plane with depth k • R. Liu et al. (based on order statistics) • The sample p-th central hull is the convex hull containing the most central fraction p sample points.

  10. Half-space (location) depth contours module Depth contours for a sample set with 8 data points Depth contours for a data set describing diabetic patients

  11. Statistical modules – cntd. Plots • DD (Depth vs. Depth) plots • O(n2) time • Shrinkage plots • Fan plots

  12. DD (Depth vs. Depth) plots module Depth according to set A Depth according to set B Two 2D data sets of 50 points each, created from normal distribution, centered at (0,0), with different covariance matrices (1 and 4 id).

  13. Fan plots Relative area (CH of p%/CH) Percentile of points 50 data points, created from a random distribution, with covariance matrix 4 times identity. The fans are created for data sets containing the 1/6, 2/6, ..central regions. For each region the area of the CH of 2, 4, 6,…% of the points is computed.

  14. Graphical contour selection tool • Plots depth contours and selects data ranges. • Actions • Import\export • Select points • Depth slider • Filter

  15. Future work • Run the tool on existing data sets • Distribute preliminary versions and get users feedback • Data filter • Group by row\column • Filter by row\column • Interactions between rows\columns (addition, substitution, logical operations) • Statistical modules • Implement additional modules • Improve running times

  16. Contributors • Prof. Diane Souvaine • Prof. Alva Couch • Eynat Rafalin • Michael Burr • Joe Handelman • James Hayes • Ori Taka • Alok Lal • Janet Luan • Kim Miller • Tim Mitchell • Nikolai Shvertner

More Related