Information at your fingertips web services
Download
1 / 38

information at your fingertips web services - PowerPoint PPT Presentation


  • 193 Views
  • Updated On :

Information At Your Fingertips Web Services. Jim Gray & Tom Barclay Microsoft Research Alex Szalay Johns Hopkins University. Communications Excitement!!. Point-to-Point. Broadcast. lecture concert. Net Work + DB. conversation money. Immediate. book newspaper. mail. Time Shifted.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'information at your fingertips web services' - oshin


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Information at your fingertips web services l.jpg

Information At Your FingertipsWeb Services

Jim Gray & Tom Barclay

Microsoft Research

Alex Szalay

Johns Hopkins University


Communications excitement l.jpg
Communications Excitement!!

Point-to-Point

Broadcast

lecture

concert

Net

Work

+ DB

conversation

money

Immediate

book

newspaper

mail

Time

Shifted

Data

Base

Its ALL going electronic

Immediate is being stored for analysis (so ALL database)

Analysis & Automatic Processing are being added

Slide borrowed from Craig Mundie


Information excitement l.jpg
Information Excitement!

  • All information will be online(somewhere)

    text, speech, sound, vision, graphics, spatial, time…

  • You might record everything

    • read: 10MB/day, 400 GB/lifetime (5 disks today)

    • hear: 400MB/day, 16 TB/lifetime (2 disks/year today)

    • see: 1MB/s, 40GB/day, 1.6 PB/lifetime (150 disks/year maybe someday)

  • Information at Your Fingertips

    • Make it easy to capture & present

    • Make it easy to store & organize & access

    • Make it easy to analyze & summarize


How much information is there l.jpg
How much information is there?

Yotta

Zetta

Exa

Peta

Tera

Giga

Mega

Kilo

Everything!

Recorded

  • Soon everything can be recorded and indexed

  • Most bytes will never be seen by humans.

  • Data summarization, trend detection, anomaly detection are key technologies

    See Mike Lesk: How much information is there: http://www.lesk.com/mlesk/ksg97/ksg.html

    See Lyman & Varian:

    How much information

    http://www.sims.berkeley.edu/research/projects/how-much-info/

All BooksMultiMedia

All LoC books

(words)

.Movie

A Photo

A Book

24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9 nano, 6 micro, 3 milli


How do we get information today l.jpg
How do we get information today.

  • Human searches web (with an index)

  • Human browses pages


How do we get information tomorrow l.jpg
How do we get information tomorrow?

Digital Dashboard

  • Agents gather and digest it for us.

  • Q: How?

  • A Microsoft: Dot Net

    • Discovery: UDDI, WSDL

    • Explore: SOAP

My Agents

SOAP

WSDL

Web Services


How do you publish information l.jpg
How do you publish information?

  • Get the data.

  • Conceptualize the data schema

  • Provide methods that return data subsets.

    • Challenge: how much processing on your server?

  • Publish the schema and methods.

  • We are exploring these issues.

f, g, x, y…


Terraserver example l.jpg
TerraServer Example

  • What is TerraServer?

    • 3TB Internet Map DB available since June 1998

    • USGS photo and topo maps of the US

    • Integrated with Home Advisor

    • Shows off SQL Server availability & scalability

    • Designed for basic computer systems and low speed communications

  • What is TerraService?

    • A .NET web service

    • Makes TerraServer data available to other apps


Application goals l.jpg
Application Goals

Available– Always, 24x7x52 99.99% of the time

Programmable -- .NET applications can integrate TerraServer data into their apps

BIG — 1 TB of data including catalog, temporary space, etc.

PUBLIC — available on the world wide web

INTERESTING — to a wide audience

ACCESSIBLE — using standard browsers (IE, Netscape)

REAL — a LOB application (users can buy imagery)

FREE — cannot require NDA or money to a user to access

FAST — usable on low-speed (56kbps) and high speeds(T-1+)

EASY — we do not want a large group to develop, deploy, or maintain the application

3 TB


Demo http terraserver microsoft com l.jpg
Demohttp://terraserver.microsoft.com

Show

photo

topo

gazetteer

demographics


Hardware l.jpg
Hardware

8 Compaq DL360 “Photon” Web Servers

One SQL database per rack

Each rack contains 4.5 tb

261 total drives / 13.7 TB total

2200

Fiber SAN

Switches

2200

2200

E

E

J

J

O

O

Meta Data

Stored on 101 GB

“Fast, Small Disks”(18 x 18.2 GB)

2200

2200

2200

SQL\Inst1

G

F

P

Q

K

L

Imagery Data

Stored on 4 339 GB

“Slow, Big Disks”

(15 x 73.8 GB)

SQL\Inst2

2200

2200

2200

SQL\Inst3

R

S

M

N

H

I

To Add 90 72.8 GB

Disks in Feb 2001

to create 18 TB SAN

Spare

4 Compaq ProLiant 8500 Db Servers


Terraserver experience l.jpg
TerraServer Experience

  • Successful Web Site

    • Met all 8 goals – interesting, big, real, public, fast, easy, accessible, and free

    • High Availability – Windows Data Center & Compaq SAN Technology

    • Top 1000 Web Site – continues to be popular

  • New Feature Requests

    • Programmable access to meta-data

    • User selectable image sizes, i.e. “a map server”

    • Permission to use TerraServer data within server applications


What is a web service l.jpg
What is a Web Service?

Web Service

Open Internet Protocols

SOAPDiscovery

UDDI

Universal Description, Design, and Integration

  • You can ask a site for a description of the Web Services it offers

  • Provide a Directory of Services on the Internet

SOAPContract Language

  • Web Services are defined in terms of the formats and ordering of messages

SOAP

  • Web Service consumers can send and receive messages using XML

XML & HTTP

  • All these capabilities are built using open Internet protocols

A programmable application component accessible via standard Web protocols


Net terraservice architecture l.jpg
.NET TerraService Architecture

Map UI

Web Forms

Existing

DB Server

Map Server Http Handler

705 m Rows

SQL 20001.0 TB Db

Smart

Clients

TerraServer Web Service

WindowsForms

SQL 20001.0 TB Db

.NET

Framework

SQL 20001.0 TB Db

ADO.NET

HTML

Standard

Browsers

Image/jpeg

Image/jpeg

SOAP/XML

OLEDB


Terraserver web services l.jpg
TerraServer Web Services

Query Gazetteer

Retrieve imagery meta-data

Retrieve imagery

Simple Projection conversions

Geo-coded places, e.g. Schools, Golf Courses, Hospitals, etc.

Place Polygons e.g. Zip Codes, Cities, etc.

Landmark-Service

Terra-Tile-Service

allows “overlay” information for Terra-Tile-Service applications

Clients can present TerraServer imagery

in new ways.


Web service methods l.jpg
Web Service Methods

Place Search

GetPlaceFacts

GetPlaceList

GetPlaceListInRect

CountPlacesInRect

Projection

ConvertLonLatPtToUtmPt

ConvertUtmPtToLonLatPt

ConvertLonLatTo NearestPlace

GetTheme

GetLatLonMetrics

Tile

GetAreaFromPt

GetAreaFromRect

GetAreaFromTileId

GetTileMetaFromLonLatPt

GetTileMetaFromTileId

GetTile (Image)

Landmark

GetLandmarkTypes

CountOfLandmarkPointsByRect

GetLandmarkPointsByRect

CountOfLandmarkShapesByRect

GetLandmarkShapesByRect

http://terraservice.net



Custom end product l.jpg
Custom End Product

Web Soil Data Viewer

XML Soil Report

Soil Interpretation Map


What tom showed you l.jpg
What Tom Showed You

  • Converted a Web Server

    • HTML get post

    • Server returns pictures to people

  • to a Web Service

    • SOAP service

    • returns XML self-describing data

    • Application integrates data (Agriculture and Geo data)


Rosetta stone l.jpg
Rosetta Stone

Distributed computing+ basic services

Yellow Pages

?

RPC – remote procedure call, CORBA, DCOM, RMI

IDL – interface definition language

XDR - eXternal Data Representation

Dot Net

UDDI – Universal description, discovery, and integration

Schema, XLANG

SOAP – simple object access protocol

WSDL – web services definition language

XML- eXtended Markup Language


Sky server l.jpg
Sky Server

  • 50 K Spectro Objects

  • ~ 100 attributes + 30 lines

  • 15M Photo Objects ~ 400 attributes

  • Like TerraServer pictures of the sky.

  • But also LOTS of data on each object So a data mining web service

    • Luminosity (multi-spectra), morphology, spectrum

    • So, it is a data mining application

    • Cross-correlation is challenging because

      • Multi-resolution

      • Data is dirty/fuzzy (error bars, cosmic rays, airplanes…)

      • Time varying

+


Astronomy data l.jpg
Astronomy Data

  • In the “old days” astronomers took photos.

  • Starting in the 1960’s they began to digitize.

  • New instruments are digital (100s of GB/nite)

  • Detectors are following Moore’s law.

  • Data avalanche: double every year

Total area of 3m+ telescopes in the world in m2, total number of CCD pixels in megapixel, as a function of time. Growth over 25 years is a factor of 30 in glass, 3000 in pixels.

3+ M telescopes area m^2

Courtesy of Alex Szalay

CCD area mpixels


Astronomy data23 l.jpg
Astronomy Data

  • Astronomers have a few Petabytes now.

    • 1 pixel (byte) / sq arc second ~ 4TB

    • Multi-spectral, temporal, … → 1PB

  • They mine it looking fornew (kinds of) objects or more of interesting ones(quasars), density variations in 400-D space correlations in 400D space

  • Data doubles every year.

  • Data is public after a year.

  • So, 50% of the data is public.

  • Some have private access to 5% more data.

  • So: 50% vs 55% access for everyone


Astronomy data24 l.jpg
Astronomy Data

  • But…..

  • How do I get at that 50% of the data?

  • Astronomers have culture of publishing.

    • FITS files and many tools.http://fits.gsfc.nasa.gov/fits_home.html

    • Encouraged by NASA.

  • Publishing data “details” is difficult. Astronomers want to do it but it is VERY hard.(What programs where used? what were the processing steps? How were errors treated?…)


Virtual observatory http www astro caltech edu nvoconf http www voforum org l.jpg
Virtual Observatoryhttp://www.astro.caltech.edu/nvoconf/http://www.voforum.org/

  • Premise: Most data is (or could be online)

  • So, the Internet is the world’s best telescope:

    • It has data on every part of the sky

    • In every measured spectral band: optical, x-ray, radio..

    • As deep as the best instruments (1 year ago).

    • It is up when you are up.The “seeing” is always great (no working at night, no clouds no moons no..).

    • It’s a smart telescope: links objects and data to literature on them.


Virtual observatory the age of mega surveys l.jpg
Virtual Observatory The Age of Mega-Surveys

  • Large number of new surveys

    • multi-TB in size, 100 million objects or more

    • individual archives planned, or under way

    • Data publication an integral part of the survey

    • Software bill a major cost in the survey

  • Multi-wavelength view of the sky

    • more than 13 wavelength coverage in 5 years

  • Impressive early discoveries

    • finding exotic objects by unusual colors

      • L,T dwarfs, high-z quasars

    • finding objects by time variability

      • gravitational micro-lensing

MACHO

2MASS

DENIS

SDSS

PRIME

DPOSS

GSC-II

COBE MAP

NVSS

FIRST

GALEX

ROSAT

OGLE ...

Slide courtesy of Alex Szalay, modified by jim


Virtual observatory federating the archives l.jpg
Virtual Observatory Federating the Archives

  • The next generation mega-surveys are different

    • top-down design

    • large sky coverage

    • sound statistical plans

    • well controlled/documented data processing

  • Each survey has a publication plan

  • Data mining will lead to stunning new discoveries

  • Federating these archives

     Virtual Observatory

Slide courtesy of Alex Szalay


The multiwavelength crab nebula l.jpg
The Multiwavelength Crab Nebula

Crab star

1053 AD

Nova first sighted 1054 A.D. by Chinese Astronomers

Now: Crab Nebula

X-ray,

optical,

infrared, and

radio

Slide courtesy of Robert Brunner @ CalTech.


Exploring parameter space l.jpg
Exploring Parameter Space

  • Given an arbitrary parameter space:

  • Data Clusters

  • Points between Data Clusters

  • Isolated Data Clusters

  • Isolated Data Groups

  • Holes in Data Clusters

  • Isolated Points

Nichol et al. 2001

Slide courtesy of Robert Brunner @ CalTech.


Virtual observatory and education l.jpg
Virtual Observatory and Education

  • In the beginning science was empirical.

  • Then theoretical branches evolved.

  • Now, we have a computational branches.

    • The computational branch has been simulation

    • It is becoming data analysis/visualization

  • The Virtual Observatory can be used to

    • Teach astronomy:make it interactive, demonstrate ideas and phenomena

    • Teach computational science skillsand the process of scientific discovery


Sloan digital sky survey http sdss org l.jpg
Sloan Digital Sky Survey http://sdss.org/

  • A group of astronomers has been building a telescope (with 90M$ from Sloan Foundation, NSF, and a dozen universities).for the last 12 years!

  • Now data is arriving:

    • 250GB/nite (20 nights per year).

    • 100 M stars, 100 M galaxies, 1 M spectra.

  • Public data at http://sdss.org/

    • 5% of the survey, 600 sq degrees, 15 M objects 60GB.

    • This data includes most of the known high z quasars.

    • It has a lot of science left in it but… that is just the start.


Demo of sky server l.jpg
Demo of Sky Server

Alex built SkyServer (based on TerraServer design).

http://skyserver.sdss.org/

Demo:

famous places

navigator

data

shopping cart

spectrum

SQL?

?


Virtual observatory challenges l.jpg
Virtual Observatory Challenges

  • Size : multi-Petabyte

    40,000 square degrees is 2 Trillion pixels

    • One band (at 1 sq arcsec) 4 Terabytes

    • Multi-wavelength 10-100 Terabytes

    • Time dimension >> 10 Petabytes

    • Need auto parallelism tools

  • Unsolved Meta-Data problem

    • Hard to publish data & programs

    • Hard to find/understand data & programs

  • Current tools inadequate

    • new analysis & visualization tools

  • Transition to the new astronomy

    • Sociological issues


3 steps to virtual observatory l.jpg
3-steps to Virtual Observatory

  • Get SDSS and Palomar online

    • Alex Szalay, Jan Vandenberg, Ani Thakar….

    • Roy Williams, Robert Brunner, Julian Bunn

  • Do queries and crossID matches with CalTech and SDSS to expose

    • Schema, Units,…

    • Dataset problems

    • the typical use scenarios.

  • Implement WebServices at CalTech and SDSS


The challenges l.jpg
The Challenges

  • How to federate the Archives to make a VO?

  • The hope: XML is the answer.

  • The reality: XML is syntax and tools: FITS on XML will be good but….. Explaining the data will still be very difficult.

  • Define Astronomy Objects and Methods.

    • Based on UDDI, WSDL, SOAP.

    • Each archive is a service

  • http://TerraService.net/ shows the idea.

    • Working with Caltech(Brunner, Williams, Djorgovski, Bunn)

    • But, how does data mining work?


Skyserver as a webservice wsdl soap just add details l.jpg
SkyServer as a WebServiceWSDL+SOAPjust add details 

Archive ss = new VOService(SkyServer);

Attributes A[] = ss.GetObjects(ra,dec,radius)

?? What are the objects (attributes…)?

?? What are the methods (GetObjects()...)?

?? What query language? SQL, Xquery…?


Summary l.jpg
Summary

  • All information at your fingertips.

  • How do we publish information so that our agents can digest it?

  • Example: TerraServer -> TerraService

  • The Virtual Observatory Concept

    • The Internet is worlds best telescope

      • For astronomy

      • For teaching astronomy and

      • For teaching computational science


ad