" The Maximum Likelihood Problem and Fitting the Sagittarius Dwarf Tidal Stream "

"The Maximum Likelihood Problem and Fitting the Sagittarius Dwarf Tidal Stream" Matthew Newby Astronomy Seminar RPI Oct. 22, 2009

Overview: • Introduction • The Sagittarius Stream • SDSS • Locating • Maximum Likelihood • Methods • Differential Evolution • Monte-Carlo Markov-Chain • Gradient Descent • Genetic Search • Particle Swarm • Revisit the Sagittarius Stream • BOINC • Overview • Current and Future Work

Introduction • Modern Astronomy – No longer staring through a telescope • Automated Surveys produce large data sets • Errors in measurements – statistical methods needed • Fast and accurate computer routines are needed in order to analyze this information! Image : NASA.gov computer$ go faster_ Image : Wikimedia Commons

The Sloan Digital Sky Survey (SDSS): Image: sdss.org • 230+ million objects • 8,400 square degrees in the sky • Large percentage of north galactic cap • Very little data in galactic plane (too much dust) • Several hundred thousand stars

The Sagittarius Dwarf Tidal Stream • The Sagittarius Dwarf Galaxy is merging with the Milky Way • The dwarf is being tidally disrupted by the Milky Way, creating long “tails.” Mapping the Tidal Stream will: • Provide information on matter distribution in Milky Way • Provide constraints on Galactic Halo Image (above): [Ibata et al. 1997, AJ] Image (left): David Martinez-Delgado (MPIA) & Gabriel Perez (IAC)

The Milky Way: Halo Bulge Thin Disk Thick Disk Data Wedge Sun Sagittarius Dwarf Galaxy Tidal Stream ~30 kiloparsecs (100,000 light-years)

Data Stripe: F-turnoff stars on the H-R diagram Stripe 82 (southern galactic cap) Image: Newberg & Yanny 2006, JoP Conference series (modified by N. Cole

Sag. Stream: Model • Assume stream is a cylinder • Radial drop-off given by a Gaussian Distribution • 2 background parameters • r0, q • 6 parameters per stream • ε, μ, r, θ, φ, σ Background distribution: At least 8 parameters in the search – 8-dimensional solutions space! Cole, N.

Maximum Likelihood: • Bayesian Method • Must assume a “prior” – a model explaining the data • Find the parameters that are the “most likely” in a data set, given the prior • Law of large numbers • Can assume that large data sets have normally distributed data points • Find probability that each data point lies in the given distribution • The you can get the likelihood: L(Q|D) =  DataPointProbi

Computational Algorithms • Overview: • Set up problem • Parameter space: all allowed values of parameters • Likelihood evaluator for given parameters • Evaluation method – moves in parameter space in an efficient way • End conditions: when change in best is below a limit, or a predefined number of iterations is reached. • Problems: • Likelihood calculation is usually time-consuming • Need to avoid local maximums – find global max • What is the best method?

Computational Methods: “No Free Lunch” (David H. Wolpert, William G. Macready) Poor Students: Rosencrantz Guildenstern Ophelia • Vegetarian • Only eats meat • Low Carb Diet Local Eateries, same menus, random prices: Burger Palace Gourmet Salads No Carbs at All Prices differ by restaurant! Not everyone can eat cheaply! One restaurant cannot be the best solution for every person (problem)! • One solution method (or algorithm) will not be ideal for all problems! • Need to choose the best solution for the job at hand!

Conjugate Gradient Descent (CGD) • Calculates the gradient of the surface for each parameter • Moves towards best likelihood using a line search • Conjugate gradient uses the gradient of the previous step to converge faster • Requires many likelihood calculations per move • Unfortunately, may end at local maximums • Need to run from several different directions in order to find global best Likelihood vs. Position best solution The gradient, G: gradient location L = likelihood function Q = Parameter (i or j) hi = step size for ith parameter Local Maximum Gradient Descent: 1-dimensional case

Line Search • Evaluates two points in direction of gradient: one a distance 1d away, the other 2d • d is usually related to the gradient (slope) • If the middle point is not at a better likelihood than the end points, d is doubled and the process repeated • If the middle point is higher, then the middle point becomes the starting point for another CGD • Line Search causes the algorithm to reach the best likelihood efficiently next end point Line Search example (left): The first search does not find a better likelihood for the middle point (yellow), so the distance is doubled. This time, the new middle point (red) has the best likelihood. The next iteration of CGD will start at this point. next middle point first end point first middle point starting point

Monte-Carlo Markov-Chain (MCMC) • A “random walk” method • Samples parameter space well • Automatically produces error distribution • Easy to code • Sensitive to running time and step size • Never truly converges • Metropolis-Hastings: • Take a step in each direction (parameter) • Step size/direction is random, drawn from a normal distribution • If the new location has a better likelihood, move to it • If the new location has a worse likelihood, then there is a chance of moving to it The trajectory of a 1000 step MCMC straight-line fit (top) and the distribution in b (bottom).

Genetic Search • Inspired by natural selection • Start with multiple “individuals” (positions) in parameter space • Evaluate likelihood for each individual • Remove individuals with the worst likelihoods • Replace the removed individuals with “children” of the remaining individuals (“parents”) • Parents can be chosen randomly or from the best likelihoods • Create children through crossover and mutation: • Crossover: A child inherits the parameters of multiple parents, either by averaging the parents’ parameters or by inheriting select parameters from each parent • Mutation: Replace a parameter with a new, randomly generated one • Repeat until end conditions are met

Differential Evolution Difference Vector No Change • An individual moves according to the weighted difference between the locations of two “parent” individuals • If the new position has a worse likelihood, then the individual does not move • Parents may be random or chosen from the population best • Also, multiple pairs of parents may be used (averaging over the differences) X Change in position (center is global best)

Particle-Swarm Optimization Parameter Space • Physically Intuitive – based on animal behavior • Particles have velocities • “Forces” towards personal best, global best Global best Personal best to global best velocity to personal best particle Position (x) change at step t: w, c1,c2 are weighting parameters, p is personal best, g is global best, rand() is a random number

BOINC Berkeley Open Infrastructure for Network Computing • Users volunteer spare processor / graphics card time to the project • Massively parallel • Graphics processor technology has created a large increase in processing power • Milkyway@home is now the #2 ranked BOINC project • You can help, too: http://milkyway.cs.rpi.edu/milkyway/

Sgr Stream Stars Sgr Stream Stars Non-Sgr Stream Stars Separation: Stripe 82

Conclusions: • Modern astronomy produces large data sets • The Maximum Likelihood method is ideal for analyzing this data • Powerful computer algorithms exist to perform MLE • Mapping the Sagittarius Stream is possible by using these methods

Credits The Sloan Digital Sky Survey BOINC.com Milkyway@home Prof. Heidi Newberg, Rensselaer Polytechnic Institute Nathan Cole, “Maximum Likelihood Fitting of Tidal Streams with Applications to the Sagittarius Dwarf Tidal Tails” (PhD Thesis, Rensselaer Polytechnic Institute, 2008) Travis Desell, “Aysnchronous [sic] Global Optimization for Massively Distributed Computing” (PhD candidacy document, 2009) Shakespeare, et al. “Hamlet”

3 stream search:

" The Maximum Likelihood Problem and Fitting the Sagittarius Dwarf Tidal Stream "