140 likes | 149 Views
Data 2 Knowledge study project. Ashish Mahabal ( aam@astro.caltech.edu ) Ciro Donalek Matthew Graham Ray Plante George Djorgovski. VAO-LSST Meeting, NOAO, 24 March 2011. Goals. Feasibility study What is out there What is needed Milestones What can be done.
E N D
Data 2 Knowledge study project Ashish Mahabal (aam@astro.caltech.edu) Ciro Donalek Matthew Graham Ray Plante George Djorgovski VAO-LSST Meeting, NOAO, 24 March 2011
Goals • Feasibility study • What is out there • What is needed • Milestones • What can be done
Exploration of observable parameter spaces and searches for rare or new types of objects Djorgovski
Overview – many connections Astroinformatics (next meeting in Sep. 2011) VOStat and other R/Statistics tools Data challenges Various sky surveys Related issues Semantics Classification/characterization Distributed data GPUs Focus on time domain
Focus on time-domain Expertise, and it encompasses all aspects of data mining (save one) Plus, real-time forces us to be fast. Portfolio building – growing columns of tables Bayesian networks utilizing auxiliary information Lightcurve techniques for characterizing objects
Missing stat and CS tools Bootstrap aggregating Mixture of experts Boosting Simulated annealing Semi-supervised learning …. From IVOA KDD User guide for Data Mining (Nick Ball)
Science goal: to solve the growing gap between the huge generation of data and our understandingof it • Data Gathering (e.g., new generation instruments …) • Data Farming: • Storage/Archiving • Indexing, Searchability • Data Fusion, Interoperability, ontologies, etc. • Data Mining (or Knowledge Discovery in Databases): • Pattern or correlation search • Clustering analysis, automated classification • Outlier / anomaly searches • Hyperdimensional visualization • Data visualization and understanding • Computer aided understanding • KDD • Etc. • New Knowledge Data storage , Pbytes Data access >103 access Scalability: Petaflops, Exaflops Computing power (multicore) Algorithm: parallelism Visualization: N-dimensional
Currently on the plate • DAME • Knime (Konstanz Information Miner) • Orange (Visual/python) • Weka (ML/Java) • Rapidminer (standalone)
Comparison matrix for DM/Viz tools Accuracy Scalability Interpretability Usability Robustness Versatility Speed Popularity
Related activities Skyalert integration (Graham) – adding data and methods Solicitation of examples from community WD, Blazars’ example Making R more astronomy friendly Various datasets Differing number of rows, columns For supervised/unsupervised classification TA on GPUs – incorporate in pipeline
Slide from Budavari CUDA zone, PyCUDA, …
VAO People working on this • Ashish Mahabal, Ciro Donalek, Matthew Graham, George Djorgovski (Caltech) • Ray Plante (NCSA) • But we are in touch with many others in astro/CS/stats and relying on many groups including LSST transients and informatics working groups