A web service for distributed covariance computation on astronomy catalogs
1 / 20

A Web service for Distributed Covariance Computation on Astronomy Catalogs - PowerPoint PPT Presentation

  • Uploaded on

A Web service for Distributed Covariance Computation on Astronomy Catalogs. Presented by Haimonti Dutta CMSC 691D. ROADMAP Background Information Interesting Astronomy Data Mining Problems What has / not been done (Literature review) My project objectives

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' A Web service for Distributed Covariance Computation on Astronomy Catalogs' - saima

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
A web service for distributed covariance computation on astronomy catalogs

A Web service for Distributed Covariance Computation on Astronomy Catalogs

Presented by

Haimonti Dutta


  • ROADMAP Astronomy Catalogs

  • Background Information

  • Interesting Astronomy Data Mining Problems

  • What has / not been done (Literature review)

  • My project objectives

  • The problem of Alignment in astronomy catalogs

  • The Fundamental Plane

  • A case study for recreating the Fundamental Plane from astronomy catalogs

  • Experimental Results

  • Efforts towards building Web services

  • Background Information Astronomy Catalogs

  • Next generation Astronomy catalogs will contain data for most of the sky

  • Existing astronomy sky surveys – SDSS, 2Mass, FIRST, etc

  • Terabytes and Peta bytes of Data

  • Data Avalanche in Astronomy

  • Getting useful information is like looking for a needle in a haystack

  • National Virtual Observatory (NVO) has been set up to facilitate scientific discovery

  • Obvious need for Distributed Data Mining

  • What kind of Data Mining activities are astronomers interested in ?

  • Detection of transient objects such as supernovae (Online transient object detection in real time)

  • Obtain statistics of variable and moving objects (model variability, refine existing models, fit models to irregularly sampled data )

  • Parameterize shapes of objects using rotationally invariant quantities

  • Efficient cluster and outlier detection

  • Supervised Data Mining problems (match objects detected in multiple bands, derive photometric red shifts)

What has not been done
What has/not been done interested in ?

  • Lot of efforts in centralized data mining (NVO, FMass, Class X, FIRST etc )

  • Some grid mining (Notable GRIST project)

  • Very few distributed data mining efforts in their preliminary stages


  • Objectives of this project interested in ?

  • Aligning of Catalogs (The Fundamental Plane Problem)

  • Implementation of algorithms for Distributed Data Mining on Astronomy Catalogs

  • Development of webservices for the catalogs / investigation into what needs to be done to integrate this into the NVO

Alignment of astronomy catalogs
Alignment of Astronomy Catalogs interested in ?

Cross matching is a non trivial problem in itself. We assume cross matching happens off line and there exists an indexing scheme by which catalogs know the exact cross matched tuples

Some interesting numbers
Some interesting numbers interested in ?

  • Size of current SDSS catalogs 3.0 TB , contains about 180 million objects (As per Data Release 4)

  • 2Mass has already observed 99% of the sky and reports 470,992,970 Point sources and 1,647,599 Extended sources

Portion of the sky observed by SDSS

Problems interested in ?

Cross Matching is an inherently difficult problem for the astronomy catalogs

We assume data sets are cross matched and this computation is done offline

This is a strong assumption and often may not be acceptable to astronomers

A real life cross matching exercise
A real life cross matching Exercise interested in ?

Problems encountered

  • Which catalogs to use ?

  • We tried several - SDSS, 2Mass, HyperLeda, CfA RedShift Catalog

  • Catalogs have different indexing schemes – more recent ones use HTM (Hierarchical Triangular Mesh), others use (ra,dec) or even Names of objects

  • Some attributes are really not available ! (SDSS has -9999 for most of its red shift values)

  • Different catalogs observe different portions of the sky (SDSS covers only about 16% of the sky in the latest release while 2Mass covers the entire sky) – Select subsets to cross match wisely !

The successful cross matching
The successful cross matching ….. interested in ?

  • Chose a region of the sky between 0 and 15 (dec) and 150 and 200 degrees (ra) – observed by both SDSS and 2Mass

  • Use a web interface provided by SDSS to do the cross matching

  • Selected the K-band for obtaining red shift and surface brightness (astronomical significance)

    Case Study

  • Centralized database 1249 cross matched objects

  • Attributes are size, surface brightness, velocity dispersion

  • Does not really make a case for a distributed data mining scenario ! Solution

    - try a larger subset of the data from both catalogs

The fundamental plane
The Fundamental Plane interested in ?

  • Interesting problem in astronomy - Identify correlations in high dimensional spaces

  • For the class of elliptical and spiral galaxies

    Observed features – radius, mean surface brightness and central velocity dispersion

    A two dimensional plane in the observed space of 3D parameters exist called


Experimental results
Experimental Results interested in ?

  • First PC captured 69.4193% of variance

  • Second PC captured 12.1333% of the variance

  • The astronomy literature suggests 1st and 2nd PC together should capture about 88% of variance

Reasonably close recreation of the Fundamental Plane from two cross matched data sets in the centralized setting

Algorithm for distributed covariance computation
Algorithm for Distributed Covariance Computation interested in ?

  • A central co-ordination site S sends A and B a random number generation seed

  • A and B generate and n X l Random matrix R where l << n

  • A and B send S – R T A and R T B

  • S computes ( R A )T (RB) / n

Experimental results distributed setting
Experimental Results – Distributed Setting interested in ?

Case Study

  • 1249 attributes at site A and B

  • 2 attributes at site A and 1 attribute at site B

More results
More results interested in ?

Development of a web service
Development of a Web Service interested in ?

Architecture of the Proposed System


Soap Message


For Distributed




Soap Message


Current implementation
Current Implementation interested in ?

  • Using Apache Axis (SOAP engine – a framework for making SOAP processors such as clients, servers )

  • Tomcat version 4.1

  • SOAP version 1.2

  • Short Demo

  • Further System Developmental Issues (use of SOAP with attachments)

QUESTIONS ? interested in ?