Introduction to the millennium database with an sql tutorial
Download
1 / 50

Introduction to the Millennium Database with an SQL tutorial - PowerPoint PPT Presentation


  • 117 Views
  • Uploaded on

Introduction to the Millennium Database with an SQL tutorial. Overview. Why relational database ? Overview relational databases general Millennium DB design SQL Tutorial Science queries Tools Advanced subjects (not now). Website documentation http://www.g-vo.org/Millennium/Help.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Introduction to the Millennium Database with an SQL tutorial ' - quamar-riley


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Introduction to the millennium database with an sql tutorial

Introduction to the Millennium Database with an SQL tutorial

Millennium DB Tutorial


Overview
Overview

  • Why relational database ?

  • Overview relational databases

    • general

    • Millennium DB design

  • SQL Tutorial

  • Science queries

  • Tools

  • Advanced subjects (not now)

Millennium DB Tutorial


Website documentation http www g vo org millennium help

Website documentationhttp://www.g-vo.org/Millennium/Help

Millennium DB Tutorial


Why use relational database
Why use relational database ?

  • encapsulation of data in terms of rigorous logical model

    • no need to know about internals of data storage

    • forces one to think carefully about data structure

  • ANSI standard query language (SQL) for finding information one is interested in

    • remote filtering

    • speeds up path from science question to answer

    • facilitates communication

  • many implementations, commercial and open source

    • advanced query optimizers (indexes, clustering)

Millennium DB Tutorial


Relational database concepts

Relational Database concepts

Millennium database design

Millennium DB Tutorial


Relational database stores data in relations tables
Relational database stores data in relations ( = tables)

Millennium DB Tutorial


Tables
Tables

  • Tables have names

  • Related data values are stored in rows

  • Rows have columns

    • all the same for a given table

  • Columns have names and data types

  • Rows often have a unique identifier consisting of the values of >= 1 columns: primary key

Millennium DB Tutorial


Introduction to the millennium database with an sql tutorial

Primary Key Column

Column

Foreign Key Columns

Row

Millennium DB Tutorial


Foreign keys
Foreign keys

  • Database can contain many tables

  • The set of table definitions in a database is called the schemaof the database

  • Tables can related by foreign keys: pointers (by value) from a row in one table to a row in another (or possibly the same) table

  • Why not combine these rows into one table ?

  • Consider storing galaxies, with info about their sub-halo as well as the FOF groups these live in.Note, a subhalo contains >=1 galaxies, a FOF halo >= 0 subhalos

Millennium DB Tutorial


One table redundancy
One table: redundancy

GalaxyEtc

Millennium DB Tutorial


Normalization
Normalization

FOF

Galaxy

SubHalo

Millennium DB Tutorial


Millennium database
Millennium database

FOF

DHalo

Bower2006a

SubHalo

MPAMocks

DSubHalo

MField

DeLucia2006a

MPAHalo

Millennium DB Tutorial


Web browser http www g vo org millennium http www g vo org mymillennium

Web browser: http://www.g-vo.org/Millenniumhttp://www.g-vo.org/MyMillennium

Millennium DB Tutorial


Sql tutorial

SQL Tutorial

Millennium DB Tutorial


Introduction to the millennium database with an sql tutorial
SQL

  • Sequentiual Query Language

  • Filtering, combining, sub-setting of tables

  • Functions, procedures, aggregations

  • Data manipulation: insert/update/delete

  • A query produces tabular results, which can be used as tables again in sub-queries, or stored in a database

  • Table creation...

Millennium DB Tutorial


Table creation statement
Table creation statement

create table MPAHalo (

haloId long not null,

descendantId long, -- foreign key

lastProgenitorId long, -- foreign key

snapnum integer, redshift real,

x real,y real,z real,

np integer, velDisp real, vmax real,

...,

primary key (haloId)

);

Millennium DB Tutorial


Select from where
SELECT ... FROM ... WHERE ...

1.

select *

from MPAHalo

2.

select snapnum, redshift, np

from MPAHalo

3.

select *

from MPAHalo

where redshift = 0

Millennium DB Tutorial


Where conditions
WHERE conditions

  • = <> != < > <= >=

  • np between 100 and 200

  • name like ‘%Frenk’

  • a=b and d=e

  • a=b or e=d

  • id in (1,2,3)

  • a is null

  • a is not null

  • exists ... (later)

Millennium DB Tutorial


Custom column names
Custom column names

select snapnum as snapshotIndex

, redshift as z

, np as numberOfParticles

from MPAHalo

Millennium DB Tutorial


Demo queries
Demo queries

select *

from snapshots

select haloid,snapnum

from MPAHalo

where np = 100

select x,y

from MPAHalo

where z between 10 and 12

and np > 50

and snapnum = 63

Millennium DB Tutorial


Order by asc desc
ORDER BY ... [ASC | DESC]

select h.*

from MPAHalo h

order by h.snapnum desc

, h.x asc

Millennium DB Tutorial


Introduction to the millennium database with an sql tutorial
TOP

select top 10 haloid, np

from mpahalo

where snapnum = 63

order by np desc

Millennium DB Tutorial


Aggregation count sum max min avg stddev
Aggregation: count, sum, max, min, avg, stddev

select count(*) as num

, max(stellarmass) as maxmass

, avg(stellarmass) as avgmass

from delucia2006a

where snapnum = 63

and type = 1

Millennium DB Tutorial


Join note the aliases
JOIN (note the aliases)

select h.haloid, g.stellarMass from delucia2006a g , mpahalo h where h.np = 1000

and g.haloid = h.haloid

Millennium DB Tutorial


Demo galaxies in massive halos
Demo: galaxies in massive halos

select h.haloId, g.*

from DeLucia2006a g

, MPAHalo h

where h.snapnum = 63

and h.np between 10000 and 11000

and g.haloId = h.haloId

Millennium DB Tutorial


Demo direct progenitors of massive halos
Demo: direct progenitors of massive halos

select prog.*

from MPAHalo prog

, MPAHalo des

where des.haloId = prog.descendantId

and des.np > 10000

and des.snapnum = 63

Millennium DB Tutorial


Group by
GROUP BY

select redshift

, type

, count(*) as numGal

, avg(stellarMass) as m_avg

, max(stellarMass) as m_max

from DeLucia2006a

group by redshift, type

order by redshift, type

Millennium DB Tutorial


Sub selects
Sub-selects

select g.galaxyId

from DeLucia2006a g

, (select top 10 haloId

from mpahalo

where snapnum = 63

order by np desc) mh

where g.haloId = mh.haloId

Millennium DB Tutorial


Science questions as sql

Science questions as SQL

Millennium DB Tutorial


Motivation for data model

  • Return the galaxies residing in halos of mass between 10^13 and 10^14 solar masses.

  • Return the galaxy content at z=3 of the progenitors of a halo identified at z=0

  • Return all the galaxies within a sphere of radius 3Mpc around a particular halo

  • Return the complete halo merger tree for a halo identified at z=0

  • Find positions and velocities for all galaxies at redshift zero with B-luminosity, colour and bulge-to-disk ratio within given intervals.

  • Find properties of all galaxies in haloes of mass 10**14 at redshift 1 which have had a major merger (mass-ratio < 4:1) since redshift 1.5.

  • Find all the z=3 progenitors of z=0 red ellipticals (i.e. B-V>0.8 B/T > 0.5)

  • Find the descendents at z=1 of all LBG's (i.e. galaxies with SFR>10 Msun/yr) at z=3

  • Make a list of all haloes at z=3 which contain a galaxy of mass >10**9 Msun which is a progenitor of BCG's in z=0 cluster of mass >10**14.5

  • Find all z=3 galaxies which have NO z=0 descendant.

  • Return the complete galaxy merging history for a given z=0 galaxy.

  • Find all the z=2 galaxies which were within 1Mpc of a LBG (i.e. SFR>10Msun/yr) at some previous redshift.

  • Find the multiplicity function of halos depending on their environment (overdensity of density field smoothed on certain scale)

  • Find the dependency of halo formation times on environment (“Gao-effect”)

Motivation for data model

Millennium DB Tutorial


Introduction to the millennium database with an sql tutorial

5. Find positions and velocities for all galaxies at redshift zero with B-luminosity, colour and bulge-to-disk ratio within given intervals.

select x,y,z,velX, velY, velZ

from DeLucia2006a

where mag_b between -23 and -18

and bulgeMass >= .9*stellarMass

and snapnum = 50

Millennium DB Tutorial


4 return the complete halo merger tree for a halo identified at z 0

4. Return the complete halo merger tree for a halo identified at z=0

Millennium DB Tutorial


Efficient storage of merger trees in a relational database
Efficient storage of merger trees in a relational database identified at z=0

  • Goal: allow queries for the formation history of any object

  • No recursion possible in RDB, nor desired

  • Method:

    • depth first ordering of trees

    • label by rank in order

    • pointer to “last progenitor” below each node

    • all progenitors have label BETWEEN label of root AND that of last progenitor

    • cluster table on label

Millennium DB Tutorial


Merger trees
Merger trees identified at z=0

Millennium DB Tutorial


Introduction to the millennium database with an sql tutorial

select prog.snapnum identified at z=0

, prog.x

, prog.y

, prog.np

from millimil..mpahalo des

, millimil..mpahalo prog

where prog.haloId between des.haloId

and des.lastProgenitorId

and des.haloId = 0

Millennium DB Tutorial


Introduction to the millennium database with an sql tutorial

Some more features of the merger tree data model identified at z=0

  • Leaves :

  • select galaxyId as leaf

  • from galaxies des

  • where galaxyId

    • = lastProgenitorId

Branching points :

select descendantId

from galaxies des

where descendantId != -1

group by descendantId

having count(*) > 1

Millennium DB Tutorial


Main branches
Main branches identified at z=0

  • Roots and leaves:

    select des.galaxyId as rootId

    , min(prog.lastprogenitorid) as leafId

    into rootLeaf

    from mpagalaxies..delucia2006a des

    , mpagalaxies.. delucia2006a prog

    where des.galaxyId = 0

    and prog.galaxyId between

    des.galaxyId and des.lastProgenitorId

  • Main branch

    select rl.rootId, b.*

    from rootLeaf rl

    , mpagalaxies..delucia2006a b

    where b.galaxyId between

    rl.rootId and rl.leafId

Millennium DB Tutorial


Find all halos in a subvolume of space 15 x 20 20 y 25 5 z 10

Find all halos in a subvolume of space: identified at z=0 15 <= x <= 2020 <= y <= 255 <= z <= 10

Millennium DB Tutorial


Introduction to the millennium database with an sql tutorial

select x,y,z identified at z=0

from mpahalo

where snapnum = 63

and x between 10 and 20

and y between 20 and 30

and z between 0 and 10

Inefficient, even when indexed !

Millennium DB Tutorial


Introduction to the millennium database with an sql tutorial

x y z identified at z=0

15.001083 42.471325 24.673561

15.001247 58.420914 42.722874

15.002215 38.042484 29.557423

15.002735 50.487785 57.716877

15.00275320.0001778.21466

15.005095 13.637599 16.135191

15.00659322.170828 48.242783

15.01148824.824438 19.773285

15.011741 48.099907 11.500685

15.01186823.312265 27.858799

15.01306523.969515 18.883507

15.013158 56.041866 40.82894

15.014361 59.503357 45.31733

15.017322 46.257664 44.37695

15.01820227.3338959.441319

Millennium DB Tutorial


Spatial indexes
Spatial indexes identified at z=0

  • Performance of finding things is improved if those things are co-located on disk: ordering, indices

  • Co-locating a 3D configuration of points on a 1D disk can only be done approximately

  • Space filling curves: Peano-Hilbert, Z-curve

Millennium DB Tutorial


Introduction to the millennium database with an sql tutorial

Zones identified at z=0

Millennium DB Tutorial


Zone index
Zone index identified at z=0

  • Course sampling of points in multiple dimensions allows simple multi-dimensional ordering

  • ix = floor(x/10Mpc)iy = floor(y/10Mpc)iz = floor(z/10Mpc)

  • index on (snapnum,ix,iy,iz,x,y,z,galaxyId)

Millennium DB Tutorial


Introduction to the millennium database with an sql tutorial

IX IY IZ X Y Z identified at z=0

1 2 0 15.061804 20.891907 4.4156647

1 2 0 15.069336 23.437601 9.812217

1 2 0 15.100678 20.905642 4.613036

1 2 0 15.173968 22.36883 8.01832

1 2 0 15.194122 20.67583 4.8034463

1 2 0 15.2500305 24.246683 1.6651521

1 2 0 15.365576 23.290754 9.404872

1 2 0 15.372606 20.203691 2.0006201

1 2 0 15.524696 21.03997 4.280077

1 2 0 15.583943 22.344622 9.421347

1 2 0 15.6358385 26.785904 9.881406

1 2 0 15.66383 22.829983 7.137772

1 2 0 15.673803 26.918291 3.302736

1 2 0 15.717824 22.365341 9.221828

1 2 0 15.847992 24.700747 1.389664

1 2 0 15.883896 22.593819 7.277129

1 2 0 15.91041 26.531118 2.5693457

1 2 0 15.916905 27.137867 4.289855

1 2 0 16.047333 28.93811 5.414605

Millennium DB Tutorial


Introduction to the millennium database with an sql tutorial
Return B-band luminosity function of galaxies residing in halos of mass between 10^13 and 10^14 solar masses.

select .2*floor(5*g.mag_b) as magB

, count(*) as num

from DeLucia2006a g

, MPAHalo h

where g.haloId = h.haloId

and h.m_TopHat between 1000 and 10000

and h.redshift=0

group by .2*floor(5*g.mag_b)

Millennium DB Tutorial


13 find the dependency of halo formation times on environment

13.Find the dependency of halo formation times on environment

Millennium DB Tutorial


Introduction to the millennium database with an sql tutorial

select zForm environment

, avg(g5) as g5

, avg(g10) as g10

from MMField

, ( select des.haloId, des.phkey,

max(PROG.redshift) as zForm

from MPAHalo PROG,

MPAHalo DES

where DES.snapnum = 63

and PROG.haloId between DES.haloId

and DES.lastProgenitorId

and prog.np >= des.np/2

and des.np between 100 and 200

group by des.haloId, des.phkey ) t

where t.phkey = f.phkey

and f.snapnum=63

group by zForm

Millennium DB Tutorial


Tools

Tools environment

Millennium DB Tutorial


Other tools
Other tools environment

  • wget, UNIX/LINUX command

    wget "http://www.g-vo.org/Millennium?action=doQuery & SQL=select top 10 haloid,snapnum, x,y,z,np from mpahalo"

  • Use in R (similar in IDL) ...

  • TOPCAT

Millennium DB Tutorial


Thank you

Thank you. environment

Millennium DB Tutorial