scientific databases lecture hubble space telescope science databases
Download
Skip this Video
Download Presentation
Scientific Databases Lecture: Hubble Space Telescope Science Databases

Loading in 2 Seconds...

play fullscreen
1 / 74

Hubble Space Telescope Science Databases - PowerPoint PPT Presentation


  • 713 Views
  • Uploaded on

Scientific Databases Lecture: Hubble Space Telescope Science Databases. Dr. Kirk Borne, GMU SCS November 11, 2003 GMU CSI 710. Outline. Introduction to the Information Age Data Mining - a target application area for scientific databases Hubble Space Telescope (HST) HST Databases

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Hubble Space Telescope Science Databases' - Melvin


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
scientific databases lecture hubble space telescope science databases

Scientific Databases Lecture:Hubble Space Telescope Science Databases

Dr. Kirk Borne, GMU SCS

November 11, 2003

GMU CSI 710

outline
Outline
  • Introduction to the Information Age
  • Data Mining - a target application area for scientific databases
  • Hubble Space Telescope (HST)
  • HST Databases
  • HST Science Data Archive
  • Multi-mission Archive at Space Telescope (MAST)

Hubble Space Telescope Databases

the information age
The Information Age

Hubble Space Telescope Databases

slide4

The Information Age is Here!

  • "Data doubles about every year, but useful information seems to be decreasing."
    • Margaret Dunham, "Data Mining Techniques & Algorithms", 2002
  • "There is a growing gap between the generation of data and our understanding of it."
    • Witten & Frank, "Data Mining: Practical Machine Learning Tools", 1999
  • "The trouble with facts is that there are so many of them"
    • Samuel McChord Crothers, "The Gentle Reader", 1973
  • "Get your facts first, and then you can distort them as much as you please."
    • Mark Twain

Hubble Space Telescope Databases

slide5

Characteristics of The Information Age:

  • Data “Avalanche”
    • the flood of Terabytes of data is already happening, whether we like it or not
    • our present techniques of handling these data do not scale well with data volume
  • Distributed Digital Archives
    • will be the main access to data
    • will need to handle hundreds to thousands of queries per day
  • Systematic Data Exploration and Data Mining
    • will have a central role
      • statistical analysis of “typical” events
      • automated search for “rare” events

Hubble Space Telescope Databases

data mining application outlier detection
Data Mining Application: Outlier Detection

Figure: The clustering of data clouds (dc#) within a multidimensional parameter space (p#).

Such a mapping can be used to search for and identify clusters, voids, outliers, one-of-kinds, relationships, and associations among arbitrary parameters in a database (or among various parameters in geographically distributed databases).

Hubble Space Telescope Databases

data mining a target application area for scientific databases
Data Mining = A Target Application Area for Scientific Databases

http://nvo.gsfc.nasa.gov/nvo_datamining.html

http://nvo.gsfc.nasa.gov/nvo_datamining.html

Hubble Space Telescope Databases

what is data mining here is one idea
What is Data Mining?Here is one idea …

Hubble Space Telescope Databases

what is data mining
What is Data Mining?
  • Data mining is defined as “an information extraction activity whose goal is to discover hidden facts contained in (large) databases."
  • Data mining is used to find patterns and relationships in data. (EDA = Exploratory Data Analysis)
  • Patterns can be analyzed via 2 types of models:
    • Descriptive : Describe patterns and to create meaningful subgroups or clusters.
    • Predictive : Forecast explicit values, based upon patterns in known results.
  • How does this apply to Scientific Research? …
    • through KNOWLEDGE DISCOVERY

Data  Information  Knowledge  Understanding / Wisdom!

Hubble Space Telescope Databases

slide10

Some words of wisdom

  • "We have confused information (of which there is too much) with ideas (of which there are too few)."
    • Paul Theroux
  • "The great Information Age is really an explosion of non-information; it is an explosion of data ... it is imperative to distinguish between the two; information is that which leads to understanding."
    • R.S. Wurman in his book: Information Anxiety2

Hubble Space Telescope Databases

scientific data have a purpose
Scientific data have a purpose …

Data  Information  Knowledge  Understanding / Wisdom!

  • EXAMPLE :
  • Data = 00100100111010100111100 (stored in database)
  • Information = ages and heights of children (metadata)
  • Knowledge = the older children tend to be taller
  • Understanding = children’s bones grow as they get older

Hubble Space Telescope Databases

slide12

Astronomy Example

Data:

(a) Imaging data (ones & zeroes)

(b) Spectral data (ones & zeroes)

Information (catalogs / databases):

  • Measure brightness of galaxies from image (e.g., 14.2 or 21.7)
  • Measure redshift of galaxies from spectrum (e.g., 0.0167 or 0.346)

Knowledge:

Hubble Diagram 

Redshift-Brightness Correlation 

Redshift = Distance

Understanding: the Universe is expanding!!

Hubble Space Telescope Databases

what is the goal of building and maintaining scientific databases
What is the Goal ofBuilding and Maintaining Scientific Databases?
  • The end goal is not the data themselves, but the new knowledge and understanding that are revealed through the analysis of the data.
  • This is why the Data Mining research field is usually referred to asKDD = Knowledge Discovery in Databases.

Hubble Space Telescope Databases

hst satellite architecture
HST satellite architecture

Hubble Space Telescope Databases

hst focal plane layout
HST focal plane layout

Hubble Space Telescope Databases

hst scientific instruments
HST Scientific Instruments
  • 1990: WFPC, FOC, FOS, GHRS, HSP, FGS
  • 1993: WFPC2, COSTAR(removed WFPC, HSP)
  • 1997: NICMOS, STIS(removed FOS, GHRS)
  • 1999: 1 of 3 FGS sensors and all 6 gyros were replaced
  • 2002: ACS, NICMOS cryocooler upgrade(removed FOC)
  • 2004(?): COS, WF3 (will remove WFPC2, COSTAR)
      • Cameras
      • Spectrometers
      • Photometer
      • Fine Guidance Sensor
      • Optical Path Correction Device

More details at:http://www.ess.sunysb.edu/fwalter/AST443/hst.html

Hubble Space Telescope Databases

slide18

The Nature of Astronomical Data

  • Imaging
    • 2D map of the sky at multiple wavelengths
  • Derived catalogs
    • subsequent processing of images
    • extracting object parameters (400+ per object)
  • Spectroscopic follow-up
    • spectra: more detailed object properties
    • clues to physical state and formation history
    • lead to distances: 3D maps
  • Numerical simulations
  • All inter-related!

Hubble Space Telescope Databases

derived data from images tables of numbers that can be plotted to study correlations
Derived data from images: tables of numbers, that can be plotted to study correlations

Hubble Space Telescope Databases

slide20

The Electromagnetic Spectrum

  • Radiation is the Astronomer’s only source of information about the Universe!
  • And it is a remarkably rich & diverse source!

Hubble Space Telescope Databases

need multi wavelength science instruments to observe a multi wavelength universe
Need Multi-Wavelength Science Instruments to Observe a Multi-Wavelength Universe

Hubble Space Telescope Databases

nasa astronomy mission data the tip of the data mountain
NASA Astronomy Mission Data:the tip of the data mountain

NSSDC’s

astrophysics

data

holdings:

One of many

science data

collections

for astronomy

across the US

and the world!

NSSDC =

National

Space Science

Data Center

@ NASA/GSFC

Hubble Space Telescope Databases

http://nssdc.gsfc.nasa.gov/astro/astrolist.html

why so many telescopes
Why so many Telescopes?

Hubble Space Telescope Databases

why so many telescopes24
Why so many Telescopes? …

Because …

  • Many great astronomical
  • discoveries have come
  • from inter-comparisons
  • of various wavelengths:
  • Quasars
  • Gamma-ray bursts
  • Ultraluminous IR galaxies
  • X-ray black-hole binaries
  • Radio galaxies
  • . . .

Overlay

Hubble Space Telescope Databases

slide25

Therefore, our science data

archive systems should enable

multi-wavelength interdisciplinary

distributed database access,

discovery, mining, and analysis.

Hubble Space Telescope Databases

so what wavelengths does hst observe
So what wavelengths does HST observe?

Range

of 101

in λ

Range

of >1016

in λ

Full

Electromagnetic

Spectrum

HST

Hubble Space Telescope Databases

where has hst looked
Where has HST looked?

Hubble Space Telescope Databases

hst s cameras have very small field of view
HST’s cameras have very small field-of-view

3o

HST

Hubble Space Telescope Databases

edwin hubble measured distances to galaxies and thereby discovered expansion of the universe
Edwin Hubblemeasured distances to galaxies, and thereby discovered expansion of the Universe.

The #1 goal of HST:

to measure the expansion

rate of the Universe to within

10% uncertainty. Previously,

it was not known to within

a factor of 2 = typical

astronomical accuracy, but

definitely not good enough.

Hubble Space Telescope Databases

slide30
Henrietta Leavitt measured brightness variations of 1000’s of stars –the basis for the distance scale of the Universe

The Cephus

Constellation:

“The King”

Hubble Space Telescope Databases

slide31

Variable Star Data Examples

  • Periodic -- sinusoidal:
  • Periodic -- smooth non-sine:
  • Periodic -- spiked events:
  • Aperiodic events:
  • Single spiked events:
  • Single long-duration events:

(Chirp)

Hubble Space Telescope Databases

slide32
Real Cepheid variable star data.Note the characteristic light curve shape – a rapid rise, and then slow decline …

Hubble Space Telescope Databases

cepheid variables cosmic yardsticks
Cepheid Variables = Cosmic Yardsticks

Period-Luminosity

Relation – shows

2 types of Cepheid

Variables – notice

the 2 bands in this

correlation plot.

We need to know

which Cepheid type

to assign to a given

star in order to get

the star’s distance

right!

The most famous

example is Polaris =

The North Star.

Hubble Space Telescope Databases

hst reaches its goal determines expansion rate to within 10 and age of universe 14 billion yrs
HST reaches its goal!Determines expansion rate to within 10%, and age of Universe = 14 billion yrs

Hubble Space Telescope Databases

slide36
But, HST almost didn’t get it right at all !Why? … well… something about a mirror problem.Bad news early in 1990.

This is HST’s

first-light image --

not too impressive.

This should have

told us that things

were less than

expected.

Note that the left and right images are not particularly different in image resolution quality.

Hubble Space Telescope Databases

slide37

HST should have much better image resolution.Resolution is measured in arcseconds.1 degree = 60 arcminutes = 3600 arcsecondsNote that the moon is ½ degree (30 arcmin) on the sky.

Hubble Space Telescope Databases

hst image is better but not dramatically and not even particularly scientifically new
HST image is better, but not dramatically … and not even particularly scientifically new.

Ground

Telescope

image

HST

image

Hubble Space Telescope Databases

costar installed in december 1993 so let us compare before and after images
COSTAR installed in December 1993.So let us compare before and after images.

PLUTO

and its

moon

BEFORE

REPAIR

(1990)

Hubble Space Telescope Databases

slide40

AFTER Optical Repair (1994)

Can you notice any difference from previous slide?

Pluto’s moon

Charon

Pluto

Hubble Space Telescope Databases

here is the real comparison test before and after images of a single star
Here is the real comparison test :Before and After images of a single star!

Hubble Space Telescope Databases

software fixes
Software fixes
  • Before COSTAR was installed in 1993:
    • Image restoration (deconvolution) was needed.
    • One of the image restoration algorithms was later used on a regular basis for the analysis of medical images in potential cancer patients (mammograms).
    • To design and build COSTAR, an exact mapping of the image distortion characteristics had to be derived from long and numerous HST images of star fields … for each science instrument (S.I.) and each mode of that S.I. … the design of the telescope architecture then became important for the design of the science database and data analysis systems.
    • All new science instruments now include this optical correction within their design.
    • Users of the Science Data Archive need database info to track the condition of each image; and need image processing tools to correct pre-COSTAR images.

Hubble Space Telescope Databases

slide43

“Before Repair” images of a Globular Cluster.(note how the smeared images of single stars overlapand therefore ruin any chance of studying individual stars in this massive pile of 100,000 stars)

Hubble Space Telescope Databases

slide45

Okay, I will say more …Individual White Dwarf Stars were identified and discoveredfor the first time ever in Globular Clusters, as predicted by stellar evolution theories since the 1930’s.

Hubble Space Telescope Databases

therefore the mirror flaw is what could have prevented hst from fulfilling its 1 goal
Therefore, the mirror flaw is what could have prevented HST from fulfilling its #1 goal.

Hubble Space Telescope Databases

but there is so much more here are a few of the big impact hst science results
But there is so much more – Here are a few of the “big impact” HST science results!
  • Hubble expansion rate & age of Universe
  • Super star clusters in merging galaxies
  • Massive black holes in every(?) galaxy
  • Quasar host galaxies revealed
  • Protoplanetary disks found and studied
  • Starbirth unveiled and mapped in exquisite detail
  • Supernovae and novae shells resolved
  • Hierarchical evolution of galaxies proven
  • Most distant galaxies ever seen
  • Storms on planets
  • Kuiper belt comets found
  • Outflows from young stars
  • Gamma-Ray Burst (GRB) sources solved, at last!

Hubble Space Telescope Databases

hst 1990 2010 and beyond already a rich legacy of spectacular images discoveries
HST: 1990-2010, and beyond?Already a rich legacy of spectacular images & discoveries.

Hubble Space Telescope Databases

what comes next the jwst
What comes next? ... The JWST
  • The Next-Generation Space Telescope is now named the James Webb Space Telescope (JWST): launch in 2011?
  • If HST has shown the first galaxies, then JWST will see the first stars (“first light in the Universe”)
  • JWST will include some on-board processing (controversial)

Hubble Space Telescope Databases

hst databases
HST Databases

Hubble Space Telescope Databases

slide51

The HST Data Pipeline = how the data flows

Hubble Space Telescope Databases

what types of databases are needed for hst
What types of databases are needed for HST?
  • Science Instrument (S.I.) observations
  • Engineering - instrument status
  • Telemetry - satellite status
  • Scientific users - P.I. information
  • Proposals - abstracts, titles, proposers, etc.
  • Approved programs - who, what, where
  • Science observation scheduling – mission calendars
  • Science data processing system
  • Calibration data - for every S.I. mode & filter
  • … and more …

Hubble Space Telescope Databases

http mars96 dlr de science processing shtml

Levels of Science Data Processing

http://mars96.dlr.de/science/processing.shtml
  • Level 0 (preprocessing output)compressed, unmerged raw data
  • Level 1 (edited output)decompressed, merged data (i.e. data from different ground stations and different telemetry frames are combined into a consistent file)
  • Level 2 (standard processing output)photometrically corrected data (based on in-flight and laboratory calibration sets)
  • Level 3 (systematic processing output)geometrically corrected data (based on “ground truth” calibration information)
  • Level 4 (scientific processing output; value-added)specific value-added scientific data products:
    • models and image overlays
    • combined images from multiple instruments and filters
    • analyzed data
    • coordinate-rectified products (maps)

Hubble Space Telescope Databases

hst archive databases
HST Archive Databases
  • 3 archive databases: http://archive.stsci.edu/hst/manual/datadesc.htm
    • Catalog database:
      • contains information on scientific and engineering datasets
      • 45 tables = archive_data_set_all, proposal, target_keyword, science, shp_data, scan_parameters, wfpc2_primary_data, wfpc2_ref_data, moving_target_position, archive_extensions, … etc.
    • Proposal database:
      • contains information on observations that have been approved
      • 11 tables = abstract, address_view, conflicts, conflicts_abstract, coverpage, exposure, proposals, prop_track, su_track, tar_fixedpos_j2000, targets
    • Calibration database (CDBS):
      • provides information on the raw data used to create the recommended reference files (files used to calibrate the science data)
      • 2 sets of files for each S.I. = Reference Files, Reference Tables. Plus additional synthetic calibration data files. Therefore, CDBS consists of at least 20 tables.

Hubble Space Telescope Databases

hst science data archive
HST Science Data Archive

Hubble Space Telescope Databases

general features of science data archives
General Features of Science Data Archives
  • Data need to be calibrated and processed
  • Data are often reprocessed (enhanced data product)
  • Data values have associated errors (S/N)
  • Derived data products are routinely generated
  • Science database front-end (data-ordering GUI)
  • Hierarchical storage management (HSM):
      • On-line: Metadata, ordering system database
      • Near-line: Cached “popular” (processed) data products
      • Off-line: Deep archive of full (raw) data sets

Hubble Space Telescope Databases

simple user model of a science data archive
Simple User Model of a Science Data Archive

Hubble Space Telescope Databases

hst science data archive history
HST Science Data Archive History
  • NASA originally did not expect to have a science data archive for HST
  • Assumed that primary data users would be the P.I. teams and STScI (Science Institute) personnel only
  • Archival Science research was considered low priority, at best
  • Original storage system was DMF = Data Management Facility
    • As name suggests, it was primarily to manage the HST data, not to distribute it to science users
  • Interim solution for Archival Researchers (1990-1993): AEC = Archived Exposures Catalog -- listed main observation and proposal parameters to help search for science data (AEC is still used today – “poor mans HST science database” – invented by yours truly)
  • Permanent HST Science Data Archive came on-line in early 1994): ST-DADS = Space Telescope Data Archive and Distribution System
  • Transition from DMF to ST-DADS took 2 years:
    • Data Verification (4 years of data, from all science instruments and modes, plus non-science db)
    • Software and Hardware: installation and verification and upgrades
    • Database design and implementation (data warehouse for tracking and accessing the data)
    • StarView User Interface developed for science users to query the full HST science data archive

Hubble Space Telescope Databases

hst science archive features
HST Science Archive Features
  • On-the-fly (OTF) calibration:
    • BEST (recommended) versus USED calibration reference files
  • Quick-look science database table:
    • composite of frequently used information found in a number of other tables. By combining this information into a single table, the speed of searches through the archive is improved (few or zero joins).
  • Data previews (thumbnails):

http://archive.stsci.edu/hst/search.php

  • Ad hoc queries (user-generated SQL queries)
  • Duplication / Conflict tracking (proprietary observations)
  • Scrapbook of multiple obs for single objects:

http://archive.stsci.edu/scrapbook.php

Hubble Space Telescope Databases

unique identifiers for hst science observation datasets

IDenotes the instrument type:

J - Advanced Camera for Surveys

U - Wide Field / Planetary Camera 2

V - High Speed Photometer

W - Wide Field / Planetary Camera

X - Faint Object Camera

Y - Faint Object Spectrograph

Z - Goddard High Resolution Spectrograph

E - Reserved for engineering data

F - Fine Guide Sensor (Astrometry)

H-I,M - Reserved for additional instruments

N - Near Infrared Camera Multi Object Spectrograph

O - Space Telescope Imaging Spectrograph

S - Reserved for engineering subset data

T - Reserved for guide star position data

PPPDenotes the program ID, any combination of letters or numbers

SSDenotes the observation set ID, any combination of letters or numbers

OODenotes the observation ID, any combination of letters or numbers

TDenotes the source of transmission:

R - Real time (not tape recorded)

T - Tape recorded

M - Merged real time and tape recorded

N - Retransmitted merged real time and tape recorded

O - Retransmitted real time

P - Retransmitted tape recorded

Unique Identifiers for HST Science Observation Datasets:
  • dataset_ID = IPPPSSOOT(9-character SI-specific naming convention)
  • Used to track dataset info through all HST database tables

Hubble Space Telescope Databases

sample dataset id values
Sample dataset_ID values
  • K.Borne’s WFPC2 image of Cartwheel Ring Galaxy = U2JB0101T
  • K.Borne’s WFPC2 image of an UltraLuminous IR Galaxy = U33L2001M
  • K.Borne’s NICMOS image of an UltraLuminous IR Galaxy = N4GV1201O

Another

ring galaxy?

Hubble Space Telescope Databases

sample raw image quite a mess really needs some image processing
Sample Raw Image: quite a mess – really needs some image processing

This has already been

processed a little bit …

The WFPC2 is actually

4 separate 800x800 cameras in one! The images from the 4 separate cameras have already been mosaicked into the one image here.

3 of those cameras (WF) are the same, while the 4th camera (PC) is different (different pixel scale only).

Hubble Space Telescope Databases

slide63

HST images are not in color.

Pixel values are “grey” : 0 to 216-1 (65535).

(16 bits per pixel)

Multiple images through different filters are combined to reconstruct a color image.

Hubble Space Telescope Databases

m u l t i w a v e l e n g t h view of a spiral galaxy
Multiwavelengthview of a Spiral Galaxy

Hubble Space Telescope Databases

multi mission archive at space telescope mast
Multi-mission Archive at Space Telescope (MAST)

Hubble Space Telescope Databases

mast as part of adec
MAST - as part of ADEC
  • NASA has several astrophysics data centers:
      • UV/Optical mission data @ MAST
      • X-ray/GammaRay @ HEASARC
      • IR/Radio @ IPAC
      • Microwave @ LAMBDA
      • Astronomy literature @ ADS
      • Extragalactic (catalog) Database @ NED
      • Mission-specific centers @ SIRTF, Chandra
      • Permanent NASA archive @ NSSDC
      • Astronomy data tables (thousands) @ former-ADC (closed in 2002)
      • European partner data center @ CDS
  • Coordinated by the Astrophysics Data Centers Executive Council (ADEC) @ http://www.adccc.org/

Hubble Space Telescope Databases

nasa s adec
NASA’s ADEC
  • Coordinates NASA data centers’ roles and protocols
  • Coordinates data and metadata standards
  • Integrates all NASA astro mission data sets
  • Shares resources
  • Maximizes NASA data centers efficiency
  • Reduces duplication of effort
  • Sets priorities across NASA data centers
  • MAST is one member of ADEC, and HST is one of many NASA astrophysics data sets managed and distributed through MAST.

Hubble Space Telescope Databases

mast data sets
MAST data sets
  • Missions:
    • HST
    • EUVE
    • IUE
    • FUSE
  • Catalogs and Surveys:
      • GALEX
      • GSC
      • VLA-FIRST
  • Astro-1,2 = HUT, UIT, WUPPE
  • ROSAT (WFC)
  • Copernicus
  • BEFS, IMAPS, ORFEUS
    • Digital Sky Survey (DSS)
    • Sloan Digital Sky Survey (SDSS)

Hubble Space Telescope Databases

how does one integrate and use these distributed data archives
How does one integrate and use these distributed data archives? …

Hubble Space Telescope Databases

slide70

…The National Virtual Observatory (NVO)

  • National Academy of Sciences “Decadal Survey” recommended NVO as highest priority small (<$100M) project :

“Several small initiatives recommended by the committee span both ground and space. The first among them—the National Virtual Observatory (NVO)—is the committee’s top priority among the small initiatives. The NVO will provide a “virtual sky” based on the enormous data sets being created now and the even larger ones proposed for the future. It will enable a new mode of research for professional astronomers and will provide to the public an unparalleled opportunity for education and discovery.” (p.14)

Hubble Space Telescope Databases

next lectures
Next Lectures
  • November 18 – Virtual Observatories for Space Science (interoperable systems for science research)
  • November 25 – Intelligent Archives of the Future

Hubble Space Telescope Databases

simple user model of a science data archive 1
Simple User Model of a Science Data Archive - 1

Hubble Space Telescope Databases

simple user model of a science data archive 2
Simple User Model of a Science Data Archive - 2

Hubble Space Telescope Databases

summary
SUMMARY
  • Introduction to the Information Age
  • Data Mining - a target application area for scientific databases
  • Hubble Space Telescope (HST)
  • HST Databases
  • HST Science Data Archive
  • Multi-mission Archive at Space Telescope (MAST)

Hubble Space Telescope Databases

ad