introduction to the millennium database with an sql tutorial
Download
Skip this Video
Download Presentation
Introduction to the Millennium Database with an SQL tutorial

Loading in 2 Seconds...

play fullscreen
1 / 50

Introduction to the Millennium Database with an SQL tutorial - PowerPoint PPT Presentation


  • 116 Views
  • Uploaded on

Introduction to the Millennium Database with an SQL tutorial. Overview. Why relational database ? Overview relational databases general Millennium DB design SQL Tutorial Science queries Tools Advanced subjects (not now). Website documentation http://www.g-vo.org/Millennium/Help.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Introduction to the Millennium Database with an SQL tutorial ' - quamar-riley


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
overview
Overview
  • Why relational database ?
  • Overview relational databases
    • general
    • Millennium DB design
  • SQL Tutorial
  • Science queries
  • Tools
  • Advanced subjects (not now)

Millennium DB Tutorial

why use relational database
Why use relational database ?
  • encapsulation of data in terms of rigorous logical model
    • no need to know about internals of data storage
    • forces one to think carefully about data structure
  • ANSI standard query language (SQL) for finding information one is interested in
    • remote filtering
    • speeds up path from science question to answer
    • facilitates communication
  • many implementations, commercial and open source
    • advanced query optimizers (indexes, clustering)

Millennium DB Tutorial

relational database concepts

Relational Database concepts

Millennium database design

Millennium DB Tutorial

tables
Tables
  • Tables have names
  • Related data values are stored in rows
  • Rows have columns
    • all the same for a given table
  • Columns have names and data types
  • Rows often have a unique identifier consisting of the values of >= 1 columns: primary key

Millennium DB Tutorial

slide8

Primary Key Column

Column

Foreign Key Columns

Row

Millennium DB Tutorial

foreign keys
Foreign keys
  • Database can contain many tables
  • The set of table definitions in a database is called the schemaof the database
  • Tables can related by foreign keys: pointers (by value) from a row in one table to a row in another (or possibly the same) table
  • Why not combine these rows into one table ?
  • Consider storing galaxies, with info about their sub-halo as well as the FOF groups these live in.Note, a subhalo contains >=1 galaxies, a FOF halo >= 0 subhalos

Millennium DB Tutorial

one table redundancy
One table: redundancy

GalaxyEtc

Millennium DB Tutorial

normalization
Normalization

FOF

Galaxy

SubHalo

Millennium DB Tutorial

millennium database
Millennium database

FOF

DHalo

Bower2006a

SubHalo

MPAMocks

DSubHalo

MField

DeLucia2006a

MPAHalo

Millennium DB Tutorial

web browser http www g vo org millennium http www g vo org mymillennium

Web browser: http://www.g-vo.org/Millenniumhttp://www.g-vo.org/MyMillennium

Millennium DB Tutorial

sql tutorial

SQL Tutorial

Millennium DB Tutorial

slide15
SQL
  • Sequentiual Query Language
  • Filtering, combining, sub-setting of tables
  • Functions, procedures, aggregations
  • Data manipulation: insert/update/delete
  • A query produces tabular results, which can be used as tables again in sub-queries, or stored in a database
  • Table creation...

Millennium DB Tutorial

table creation statement
Table creation statement

create table MPAHalo (

haloId long not null,

descendantId long, -- foreign key

lastProgenitorId long, -- foreign key

snapnum integer, redshift real,

x real,y real,z real,

np integer, velDisp real, vmax real,

...,

primary key (haloId)

);

Millennium DB Tutorial

select from where
SELECT ... FROM ... WHERE ...

1.

select *

from MPAHalo

2.

select snapnum, redshift, np

from MPAHalo

3.

select *

from MPAHalo

where redshift = 0

Millennium DB Tutorial

where conditions
WHERE conditions
  • = <> != < > <= >=
  • np between 100 and 200
  • name like ‘%Frenk’
  • a=b and d=e
  • a=b or e=d
  • id in (1,2,3)
  • a is null
  • a is not null
  • exists ... (later)

Millennium DB Tutorial

custom column names
Custom column names

select snapnum as snapshotIndex

, redshift as z

, np as numberOfParticles

from MPAHalo

Millennium DB Tutorial

demo queries
Demo queries

select *

from snapshots

select haloid,snapnum

from MPAHalo

where np = 100

select x,y

from MPAHalo

where z between 10 and 12

and np > 50

and snapnum = 63

Millennium DB Tutorial

order by asc desc
ORDER BY ... [ASC | DESC]

select h.*

from MPAHalo h

order by h.snapnum desc

, h.x asc

Millennium DB Tutorial

slide22
TOP

select top 10 haloid, np

from mpahalo

where snapnum = 63

order by np desc

Millennium DB Tutorial

aggregation count sum max min avg stddev
Aggregation: count, sum, max, min, avg, stddev

select count(*) as num

, max(stellarmass) as maxmass

, avg(stellarmass) as avgmass

from delucia2006a

where snapnum = 63

and type = 1

Millennium DB Tutorial

join note the aliases
JOIN (note the aliases)

select h.haloid, g.stellarMass from delucia2006a g , mpahalo h where h.np = 1000

and g.haloid = h.haloid

Millennium DB Tutorial

demo galaxies in massive halos
Demo: galaxies in massive halos

select h.haloId, g.*

from DeLucia2006a g

, MPAHalo h

where h.snapnum = 63

and h.np between 10000 and 11000

and g.haloId = h.haloId

Millennium DB Tutorial

demo direct progenitors of massive halos
Demo: direct progenitors of massive halos

select prog.*

from MPAHalo prog

, MPAHalo des

where des.haloId = prog.descendantId

and des.np > 10000

and des.snapnum = 63

Millennium DB Tutorial

group by
GROUP BY

select redshift

, type

, count(*) as numGal

, avg(stellarMass) as m_avg

, max(stellarMass) as m_max

from DeLucia2006a

group by redshift, type

order by redshift, type

Millennium DB Tutorial

sub selects
Sub-selects

select g.galaxyId

from DeLucia2006a g

, (select top 10 haloId

from mpahalo

where snapnum = 63

order by np desc) mh

where g.haloId = mh.haloId

Millennium DB Tutorial

science questions as sql

Science questions as SQL

Millennium DB Tutorial

motivation for data model

Return the galaxies residing in halos of mass between 10^13 and 10^14 solar masses.

  • Return the galaxy content at z=3 of the progenitors of a halo identified at z=0
  • Return all the galaxies within a sphere of radius 3Mpc around a particular halo
  • Return the complete halo merger tree for a halo identified at z=0
  • Find positions and velocities for all galaxies at redshift zero with B-luminosity, colour and bulge-to-disk ratio within given intervals.
  • Find properties of all galaxies in haloes of mass 10**14 at redshift 1 which have had a major merger (mass-ratio < 4:1) since redshift 1.5.
  • Find all the z=3 progenitors of z=0 red ellipticals (i.e. B-V>0.8 B/T > 0.5)
  • Find the descendents at z=1 of all LBG\'s (i.e. galaxies with SFR>10 Msun/yr) at z=3
  • Make a list of all haloes at z=3 which contain a galaxy of mass >10**9 Msun which is a progenitor of BCG\'s in z=0 cluster of mass >10**14.5
  • Find all z=3 galaxies which have NO z=0 descendant.
  • Return the complete galaxy merging history for a given z=0 galaxy.
  • Find all the z=2 galaxies which were within 1Mpc of a LBG (i.e. SFR>10Msun/yr) at some previous redshift.
  • Find the multiplicity function of halos depending on their environment (overdensity of density field smoothed on certain scale)
  • Find the dependency of halo formation times on environment (“Gao-effect”)
Motivation for data model

Millennium DB Tutorial

slide31

5. Find positions and velocities for all galaxies at redshift zero with B-luminosity, colour and bulge-to-disk ratio within given intervals.

select x,y,z,velX, velY, velZ

from DeLucia2006a

where mag_b between -23 and -18

and bulgeMass >= .9*stellarMass

and snapnum = 50

Millennium DB Tutorial

4 return the complete halo merger tree for a halo identified at z 0

4. Return the complete halo merger tree for a halo identified at z=0

Millennium DB Tutorial

efficient storage of merger trees in a relational database
Efficient storage of merger trees in a relational database
  • Goal: allow queries for the formation history of any object
  • No recursion possible in RDB, nor desired
  • Method:
    • depth first ordering of trees
    • label by rank in order
    • pointer to “last progenitor” below each node
    • all progenitors have label BETWEEN label of root AND that of last progenitor
    • cluster table on label

Millennium DB Tutorial

merger trees
Merger trees

Millennium DB Tutorial

slide35
select prog.snapnum

, prog.x

, prog.y

, prog.np

from millimil..mpahalo des

, millimil..mpahalo prog

where prog.haloId between des.haloId

and des.lastProgenitorId

and des.haloId = 0

Millennium DB Tutorial

slide36

Some more features of the merger tree data model

  • Leaves :
  • select galaxyId as leaf
  • from galaxies des
  • where galaxyId
    • = lastProgenitorId

Branching points :

select descendantId

from galaxies des

where descendantId != -1

group by descendantId

having count(*) > 1

Millennium DB Tutorial

main branches
Main branches
  • Roots and leaves:

select des.galaxyId as rootId

, min(prog.lastprogenitorid) as leafId

into rootLeaf

from mpagalaxies..delucia2006a des

, mpagalaxies.. delucia2006a prog

where des.galaxyId = 0

and prog.galaxyId between

des.galaxyId and des.lastProgenitorId

  • Main branch

select rl.rootId, b.*

from rootLeaf rl

, mpagalaxies..delucia2006a b

where b.galaxyId between

rl.rootId and rl.leafId

Millennium DB Tutorial

find all halos in a subvolume of space 15 x 20 20 y 25 5 z 10

Find all halos in a subvolume of space:15 <= x <= 2020 <= y <= 255 <= z <= 10

Millennium DB Tutorial

slide39
select x,y,z

from mpahalo

where snapnum = 63

and x between 10 and 20

and y between 20 and 30

and z between 0 and 10

Inefficient, even when indexed !

Millennium DB Tutorial

slide40
x y z

15.001083 42.471325 24.673561

15.001247 58.420914 42.722874

15.002215 38.042484 29.557423

15.002735 50.487785 57.716877

15.00275320.0001778.21466

15.005095 13.637599 16.135191

15.00659322.170828 48.242783

15.01148824.824438 19.773285

15.011741 48.099907 11.500685

15.01186823.312265 27.858799

15.01306523.969515 18.883507

15.013158 56.041866 40.82894

15.014361 59.503357 45.31733

15.017322 46.257664 44.37695

15.01820227.3338959.441319

Millennium DB Tutorial

spatial indexes
Spatial indexes
  • Performance of finding things is improved if those things are co-located on disk: ordering, indices
  • Co-locating a 3D configuration of points on a 1D disk can only be done approximately
  • Space filling curves: Peano-Hilbert, Z-curve

Millennium DB Tutorial

slide42

Zones

Millennium DB Tutorial

zone index
Zone index
  • Course sampling of points in multiple dimensions allows simple multi-dimensional ordering
  • ix = floor(x/10Mpc)iy = floor(y/10Mpc)iz = floor(z/10Mpc)
  • index on (snapnum,ix,iy,iz,x,y,z,galaxyId)

Millennium DB Tutorial

slide44
IX IY IZ X Y Z

1 2 0 15.061804 20.891907 4.4156647

1 2 0 15.069336 23.437601 9.812217

1 2 0 15.100678 20.905642 4.613036

1 2 0 15.173968 22.36883 8.01832

1 2 0 15.194122 20.67583 4.8034463

1 2 0 15.2500305 24.246683 1.6651521

1 2 0 15.365576 23.290754 9.404872

1 2 0 15.372606 20.203691 2.0006201

1 2 0 15.524696 21.03997 4.280077

1 2 0 15.583943 22.344622 9.421347

1 2 0 15.6358385 26.785904 9.881406

1 2 0 15.66383 22.829983 7.137772

1 2 0 15.673803 26.918291 3.302736

1 2 0 15.717824 22.365341 9.221828

1 2 0 15.847992 24.700747 1.389664

1 2 0 15.883896 22.593819 7.277129

1 2 0 15.91041 26.531118 2.5693457

1 2 0 15.916905 27.137867 4.289855

1 2 0 16.047333 28.93811 5.414605

Millennium DB Tutorial

slide45
Return B-band luminosity function of galaxies residing in halos of mass between 10^13 and 10^14 solar masses.

select .2*floor(5*g.mag_b) as magB

, count(*) as num

from DeLucia2006a g

, MPAHalo h

where g.haloId = h.haloId

and h.m_TopHat between 1000 and 10000

and h.redshift=0

group by .2*floor(5*g.mag_b)

Millennium DB Tutorial

13 find the dependency of halo formation times on environment

13.Find the dependency of halo formation times on environment

Millennium DB Tutorial

slide47
select zForm

, avg(g5) as g5

, avg(g10) as g10

from MMField

, ( select des.haloId, des.phkey,

max(PROG.redshift) as zForm

from MPAHalo PROG,

MPAHalo DES

where DES.snapnum = 63

and PROG.haloId between DES.haloId

and DES.lastProgenitorId

and prog.np >= des.np/2

and des.np between 100 and 200

group by des.haloId, des.phkey ) t

where t.phkey = f.phkey

and f.snapnum=63

group by zForm

Millennium DB Tutorial

tools

Tools

Millennium DB Tutorial

other tools
Other tools
  • wget, UNIX/LINUX command

wget "http://www.g-vo.org/Millennium?action=doQuery & SQL=select top 10 haloid,snapnum, x,y,z,np from mpahalo"

  • Use in R (similar in IDL) ...
  • TOPCAT

Millennium DB Tutorial

thank you

Thank you.

Millennium DB Tutorial

ad