The future of the vo
Download
1 / 29

The Future of the VO - PowerPoint PPT Presentation


  • 181 Views
  • Uploaded on

The Future of the VO. (the Tale of Tails...). Alex Szalay The Johns Hopkins University Jim Gray Microsoft Research. The VO: Fast (and Furious). We have been moving forward at a very fast pace First services in production Big international collaboration Strong national projects

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'The Future of the VO' - Sharon_Dale


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
The future of the vo l.jpg

The Future of the VO

(the Tale of Tails...)

Alex SzalayThe Johns Hopkins University

Jim GrayMicrosoft Research


The vo fast and furious l.jpg
The VO: Fast (and Furious)

  • We have been moving forward at a very fast pace

  • First services in production

  • Big international collaboration

  • Strong national projects

  • But we need to understand

    • where are we running?

    • who will use the VO?

    • why?

  • Answers at the intersection of Technology, Sociology and Economics


Astronomy in an exponential world l.jpg
Astronomy in an Exponential World

  • Astronomers have a several hundred TB now

    • 1 pixel (byte) / sq arc second ~ 4TB

    • Multi-spectral, temporal, … → 1PB

  • Data doubles every year

  • Q: How much disk space do you own?

    • 0.1TB

    • 1TB

    • 10TB

    • 100TB

Q: How long can this growth continue?


Evolving science l.jpg
Evolving Science

  • Thousand years ago: science was empirical

    describing natural phenomena

  • Last few hundred years: theoretical branch

    using models, generalizations

  • Last few decades: a computational branch

    simulating complex phenomena

  • Today: data exploration (eScience)

    synthesizing theory, experiment and computation with advanced data management and statistics new algorithms!


Technical challenges l.jpg
Technical Challenges

  • Data Access:

    • Move analysis to the data

    • Locality is the key (e.g.: Image Stacking Service)

    • If downloaded, keep it

  • Discovery:

    • Shannon  new dimensions

    • Federation requires data movement (UDT)

  • Analysis:

    • max NlogN algorithms possible


Sociological challenges l.jpg
Sociological Challenges

  • How to avoid trying to be everything for everybody?

  • Rapidly changing “outside world”

  • Make it simple!!!

  • Publishing:

    • Exponential  linear

    • Data reliability  credits and career paths


Where are we going l.jpg
Where are we going?

  • Relatively easy to predict until 2010

    • Exponential growth continues

    • Most ground based observatories join the VO

    • More and more sky surveys in different wavebands

    • Simulations will have VO interfaces: can be ‘observed’

  • Much harder beyond 2010

    • PetaSurveys are coming on line (PANSTarrs, VISTA, LSST)

    • Technological predictions much harder

    • Changing funding climate

    • Changing sociology


Similarities to hep l.jpg

HEP

Van de Graaf

Cyclotrons

National Labs

International Labs

SSC vs LHC

Optical Astronomy

2.5m telescopes

4m telescopes

8-10m class telescopes

Surveys/Time Domain

30-100m telescopes

Similarities to HEP

  • Similar trends with a 20 year delay,

    • fewer and ever bigger projects…

    • increasing fraction of cost is in software…

    • more conservative engineering…

  • Can the exponential trend continue, or will be logistic?

  • What can astronomy learn from High Energy Physics?


But why is astronomy different l.jpg
But: Why Is Astronomy Different?

  • Especially attractive for the wide public

  • Data has more dimensions

    • Spatial, temporal, cross-correlations

  • Diverse and distributed

    • Many different instruments from many different places and many different times

  • A broad distribution of different questions


Future l.jpg
Future

How long does the data growth continue?

  • High end always linear

  • Exponential comes from technology + economics

     rapidly changing generations

    • like CCD’s replacing plates, and become ever cheaper

  • How many new generations of instruments do we have left?

  • Software is also an instrument

    • hierarchical data replication

    • virtual data

    • data cloning


Technology sociology economics l.jpg
Technology+Sociology+Economics

  • Neither of them is enough

    • We have technology changing very rapidly

    • Google, tags, sensors, Moore's Law

    • Trend driven by changing generations of technologies

  • Sociology is changing in unpredictable ways

    • In general, people will use a new technology if it is

      • Offers something entirely new

      • Or substantially cheaper

      • Or substantially simpler

  • Funding is essentially level


Tale of the tails l.jpg
Tale of the Tails

  • Long tailed distributions

    • Pareto: 20% of population holds 80% of wealth

    • Zipf: word frequency follows a power law

    • C. Anderson: everything on the web is a power law

  • Lognormal vs Gaussian

    • Multiplicative processes lead to lognormal

      Log P = Log p1 + Log p2 + … + Log pn …

    • Central limit theorem: Log P is a normal random var

    • Kapteyn: random fragmentation

  • Lognormal resembles a 1/f over large dynamic range

  • Extremely important in web-based economics

    • Amazon, Time-Warner, blogs, etc


Tale of the tails 2 l.jpg
Tale of the Tails #2

  • Barabasi: Power laws tend to arise in social systemswhere people are faced with many choices

  • The more choices, distribution more extreme

    • Measured by the distance between #1 and the median

  • Most elements in the power law system are below the average

  • People’s choices affect one another, they are not random independent events


Examples the grid l.jpg
Examples: the Grid

  • The size of computational problems is multiplicative

    • Has to have a lognormal distribution

  • Computers bought for the average job will not be large enough in the tail, but the system is still often idel

    • Need to borrow CPU for large jobs and loan when idle

M. Ripeanu (UC): Top 500 computers


Footprints and cardinalities l.jpg

SkyServer tables

Footprints and Cardinalities

S. Lubow (STScI)


Analyzing the skyserver l.jpg
Analyzing the SkyServer

  • Sloan Digital Sky Survey: Pixels + Objects

  • About 500 attributes per “object”, 400M objects

  • Currently 2.4TB fully public

  • Prototype eScience lab (800 users) CasJobs

    • Moving analysis to the data

  • Visual tools

    • Join pixels with objects

  • Prototype in data publishing

    • 200 million web hits in 5 years

    • 1,000,000 distinct usersvs 10,000 astronomers

      http://skyserver.sdss.org/







Data sharing in the vo l.jpg
Data Sharing in the VO

  • Users are more willing to part with their data if machine obtained

  • What is the business model?

  • Three tiers (power law!!!)

    (a) big surveys

    (b) value added, refereed products

    (c) mode ad-hoc data, images, outreach info

  • largely done (a)

  • need “Journal for Data” to solve (b)

  • need Flickr and an integrated environment for virtual excursions for (c)


Data reliability l.jpg

EDR

DR1

DR1

DR2

DR2

DR2

DR3

DR3

DR3

DR3

Data Reliability

  • Gilmore: Is new data necessary better?

    • Yes: more of it, better calibrations

    • But: always on the edge, Malmquist bias, etc

  • Usage of old data: changing into a power law

    • (CNN, Time-Warner)

  • Data publishing: once published, must stay

  • SDSS: DR1 is still used


Vo trends l.jpg
VO Trends

  • VO is inevitable, a new way of doing science

  • Present on every physical scale today, not just astronomy (NEON, Neptune, CERN, MS)

  • Driven by advances in technology, and economics, mapped onto society

  • Boundary conditions: funding will be at best level

  • Computational methods, algorithmic thinking will come just as naturally as mathematics today


Vo technology l.jpg
VO Technology

  • We will have Petabytes

  • We will need to save them, move them

    • several big archive centers connected

  • Need Journal for Data

    • curation is the key

  • Always will be an open-ended modular system

  • Archives -- also computational services

    • driven by economics: cheaper to process than move


Vo economics l.jpg
VO Economics

  • The Price of Software

    • 30% from SDSS, 50% for LSST

    • should there be full reuse vs no reuse today?

    • neither: we are not systems integrators

    • risks and benefits are power law

    • repurpose for other disciplines is an example

  • The Price of Data

    • $100,000 /paper (Norris etal)

    • Drives new projects

      • For SDSS there are 1300 refereed papers for $100M so far

  • Level budgets


Vo sociology l.jpg
VO Sociology

  • Learn from particle physics

    • do not for granted that there will be a next one

    • small is beautiful

  • What happens to the rest of astronomy after the world's biggest telescope?

  • The impact of power laws:

    • we need to look at problems in octaves

    • the astronomers may be the tail of our users

    • there is never a natural end or an edge (except for our funding)


The changing vo l.jpg
The Changing VO

  • Boundary conditions change, we need to change every year!

  • We must change at least as fast as the outside world or we will be left behind

  • We will make mistakes! We need to recognize and recover from them, step back and do it differently

  • If we do not make mistakes, we are not taking enough risks

  • But: we need to buffer/dampen these changes to the astronomy community


Summary the future of the vo l.jpg
Summary: The Future of the VO

  • Does not have much of a past…

  • We need to keep running forward

  • We must take risks

  • Technology driving Sociology - limited by economics

  • Everything is a power law – do not make assumptions!

  • Enormous potential

  • May be the only way to do 'small science' in 2020


ad