Maximum Entropy

Maximum Entropy RESM 575 Spring 2010 Lecture 6

Maximum entropy (Phillips et al. 2008) • History • E. T. Janes 1957 • Thermodynamics • Inference and information theory

The Maximum Entropy Method • Origins: Jaynes 1957, statistical mechanics • Recent use: machine learning, eg. automatic language translation • To estimate an unknown distribution: • Determine what you know (constraints) • Among distributions satisfying constraints: • Output the one with maximum entropy

What is it? • Maxent is a general-purpose method for making predictions or inferences from incomplete information. • Its origins lie in statistical mechanics (Jaynes, 1957), and it remains an active area of research with an Annual Conference, Maximum Entropy and Bayesian Methods, that explores applications in diverse areas such as • astronomy, portfolio optimization, image reconstruction, statistical physics and signal processing.

Like other Bayesian models… • Uses prior information • Maxent is an alternative to methods of inference of classical statistics

Maximum Entropy Principle The fact that a certain probability distribution maximizes entropy subject to certain constraints representing our incomplete information, is the fundamental property which justifies the use of that distribution for inference; it agrees with everything that is known but carefully avoids assuming anything that is not known (Jaynes, 1990).

Why? • Introduced as a general approach for presence only modeling of species distributions, suitable for all existing applications involving presence-only datasets.

Predicted potential distribution Modeling species distributions Yellow-throated Vireo occurrence points … environmental variables

Estimating a probability distribution Given: • Map divided into cells • Environmental variables, with values in each cell • Occurrence points: samples from an unknown distribution Our task is to estimate the unknown probability distribution Note: • The distribution sums to 1 over the whole map • Most probability values will be very small • Different from estimating probability of presence

Entropy • More entropy : more spread out, closer to uniform distribution • 2nd law of thermodynamics: • Without external influences, a system moves to increase entropy • Maximum entropy method: • Apply constraints to remove external influences • Species spreads out to fill areas with suitable conditions

Using Maxent for Species Distributions “Features” “Constraints” “Regularization”

precipitation sample average temperature find distribution p such that for all features f: mean(f) = sample average of f Features impose constraints Feature = environmental variable, or function thereof find distribution p of maximum entropy such that for all features f: mean(f) = sample average of f

Features • Environmental variables or functions thereof. • Maxent has these classes of features (others are possible): • Linear … variable itself • Quadratic … square of variable • Product … product of two variables • Binary (indicator) … membership in a category • Threshold … • Hinge … 1 0 Environmental variable 1 0 Environmental variable

Constraints Each feature type imposes constraints on output distribution Linear features … mean Quadratic features … variance Product features … covariance Threshold features … proportion above threshold Hinge features … mean above threshold Binary features (categorical) … proportion in each category

confidence region Regularization precipitation sample average true mean temperature find distribution p of maximum entropy such that Mean(f) in confidence region of sample average of f

The Maxent distribution … is always a Gibbs distribution: qλ(x) = exp(Σjλjfj(x)) / Z Z is a scaling factor so distribution sums to 1 fj is the j’th feature λj is a coefficient, calculated by the program

Maxent is penalized maximum likelihood Log likelihood: LogLikelihood(qλ) = 1/m Σi ln(qλ(xi)) where x1 … xm are the occurrence points. Maxent maximizes regularized likelihood: LogLikelihood(qλ)- Σj βj|λj| where βj is the width of the confidence interval for fj Similar to Akaike Information Criterion (AIC).

Output • When Maxent is applied to presence-only species distribution modeling, the pixels of the study area make up the space on which the Maxent probability distribution is defined, • Pixels with known species occurrence records constitute the sample points, and the features are • climatic variables, • elevation, • soil category, • vegetation type or other environmental variables, and functions thereof.

To note • Sometimes both presence and absence occurrence data are available for the development of models, in which case general-purpose statistical methods can be used (for an overview of the variety of techniques currently in use, see Corsi et al., 2000; Elith, 2002; Guisan and Zimmerman, 2000; Scott et al., 2002).

Opportunity • However, while vast stores of presence-only data exist, (records etc.) absence data are rarely available, • Poorly sampled areas, remote, difficult… • Absence data may be of questionable value in many situations

Background • 16 modeling methods • 226 well surveyed species in 6 regions of the world

The authors used three statistics, the area under the Receiver Operating Characteristic curve (AUC), correlation (COR) and Kappa, to assess the agreement between the presence-absence records and the predictions.

Maximum Entropy • Only useful when applied to testable information. (whether a given distribution is consistent with it) • Given testable information, the maximum entropy procedure consists of seeking the probability distribution which maximizes information entropy, subject to the constraints of the information. • This constrained optimization problem is typically solved using the method of Lagrange multipliers.

Output format Raw output Cumulative output

Cumulative output format • Gives estimate of omission rate • A pixel p has cumulative value c: • Total probability of pixels with lower probability than p is c% • Set a threshold of c: • Binary model with presence if cumulative value ≥ c • Omission rate is c% if test data drawn from Maxent distribution • Predict omission rate of c% for real test data • Example thresholds: • 5% (light red) • 20% (dark red)

Logistic output format • Estimates probability of presence • Between 0 and 1 • Scaled so that a “typical” presence has value 0.5 • Defined as: c qλ(x) / (1 + c qλ(x)) where c = exp(H(qλ(x)) • Probability of presence depends on sampling details: • Site size • Observation time • These details should correspond to collection effort for occurrence points

Response curves • Show how predicted probability of presence depends on each variable • Simple features → simpler model • Easier interpretation • Complex features → complex model • Better fit to data • Linear + quadratic (top) • Threshold features (middle) • All feature types (bottom)

Effect of regularization: multiplier = 0.2 Smaller confidence Intervals Lower entropy Less spread-out

Effect of regularization: multiplier = 5 Larger confidence Intervals Higher entropy More spread-out

Effect of regularization: over-fitting Regularization multiplier = 1.0 (not over-fit) Regularization multiplier = 0.2 (clearly over-fit)

Sage grouse distribution model MAXENT software package Consistently superior to alternative methods Robust to colinearity between explanatory variables Accepts continuous and categorical variables Stable distribution with limited training data Evaluates relative variable importance

West Virginia Conservation Prioritization using Species Distribution Modeling Michael Dougherty West Virginia Division of Natural Resources The Conservation Fund

Project Goals: • Develop statewide conservation prioritization map based on the • distribution of: • Species of Greatest Conservation Need (SGCN) • Habitats of Concern • Existing public land • The Challenge: • Develop distribution models for 500 state-tracked species • Species include: plants, herps, birds, bats, mammals, aquatics • Modeling process must be defensible, transparent, and repeatable

Occurrence data: • 1. State Natural Heritage Program “Biotics” database: • Biologists collect “Source Features” • “Source Features” are grouped into “Element Occurrences” (EOs) • EOs represent known populations • Species identification is accurate and spatial accuracy documented • Use of EOs seems to greatly reduce spatial autocorrelation • 2. Community Ecologists’ Vegetation Plots Database

Predictor Variables: • Developed a broad range of predictor variables: • Climate • Landcover • Terrain • Ecoregions • Geology • Soils • Disturbances

Workflow Overview: • Build an array of workstations to run models • Develop R scripts to automate running the maxent models by iterating through all 500 species • Develop web-based map viewer to assist biologists in reviewing maxent model results • Perform patch and connectivity analysis using FunConn • (TBD) Assign weights to patches and connectors

Scripting Steps: • Developed R script to performed variable pre-selection using boosted regression trees to reduce the number of variables to an appropriate number (~30) • Developed R script to produce the maxent batch files and perform file management • Developed R script to harvest maxent results, a Python script to store grids in an ArcSDE database, and publish results to a website • (TBD) Develop R scripts to perform functional connectivity analysis • (TBD) Perform layer weighting to produce conservation prioritization index

Occurrence localities Csv file format. Each line has: • Species name • X coordinate • Y coordinate Multiple species can be in 1 file. Example: species,longitude,latitude bradypus_variegatus,-65.4,-10.3833 bradypus_variegatus,-65.3833,-10.3833 bradypus_variegatus,-65.1333,-16.8 bradypus_variegatus,-63.6667,-17.45

Environmental variables ESRI ascii raster grid file format. • One file per environmental variable • All files must have exactly the same bounds, cell size • Coordinate system must be same as for occurrence localities Alternative: Diva .grd format. …

Samples with data (SWD) format Environmental data given with samples in a .csv format file Example: species,longitude,latitude,cld,dtr,ecoreg,frs,h_dem,pre,pre_l10,pre_l1,pre_l4,pre_l7,tmn,tmp,tmx,vap bradypus_variegatus,-65.4,-10.3833,76.0,104.0,10.0,2.0,121.0,46.0,41.0,84.0,54.0,3.0,192.0,266.0,337.0,279.0 bradypus_variegatus,-65.3833,-10.3833,76.0,104.0,10.0,2.0,121.0,46.0,40.0,84.0,54.0,3.0,192.0,266.0,337.0,279.0 bradypus_variegatus,-65.1333,-16.8,57.0,114.0,10.0,1.0,211.0,65.0,56.0,129.0,58.0,34.0,140.0,244.0,321.0,221.0 bradypus_variegatus,-63.6667,-17.45,57.0,112.0,10.0,3.0,363.0,36.0,33.0,71.0,27.0,13.0,135.0,229.0,307.0,202.0

Maximum Entropy

Maximum Entropy

Presentation Transcript

A brief maximum entropy tutorial

Maximum Entropy (ME) Maximum Entropy Markov Model (MEMM) Conditional Random Field (CRF )

Maximum Entropy Model (I)

MaxImum entropy

Feature Selection & Maximum Entropy

Maximum Entropy

Maximum Entropy Model (I)

General Database Statistics Using Maximum Entropy

Maximum Entropy Model (II)

Natural Language Learning: MaxImum entropy

Maximum Entropy: Modeling, Decoding, Training

Maximum Entropy Model

Segmentation via Maximum Entropy Model

The Maximum-Entropy Stewpot

Maximum Entropy Discrimination

Maximum Entropy, Maximum Entropy Production and their Application to Physics and Biology

Maximum Entropy Model

MAXIMUM ENTROPY MARKOV MODEL

Maximum Entropy

Maximum Entropy, Maximum Entropy Production and their Application to Physics and Biology

Maximum Entropy Model (II)

Maximum Entropy Discrimination

Maximum Entropy

Maximum Entropy

Presentation Transcript

A brief maximum entropy tutorial

Maximum Entropy (ME) Maximum Entropy Markov Model (MEMM) Conditional Random Field (CRF )

Maximum Entropy Model (I)

MaxImum entropy

Feature Selection &amp; Maximum Entropy

Maximum Entropy

Maximum Entropy Model (I)

General Database Statistics Using Maximum Entropy

Maximum Entropy Model (II)

Natural Language Learning: MaxImum entropy

Maximum Entropy: Modeling, Decoding, Training

Maximum Entropy Model

Segmentation via Maximum Entropy Model

The Maximum-Entropy Stewpot

Maximum Entropy Discrimination

Maximum Entropy, Maximum Entropy Production and their Application to Physics and Biology

Maximum Entropy Model

MAXIMUM ENTROPY MARKOV MODEL

Maximum Entropy

Maximum Entropy, Maximum Entropy Production and their Application to Physics and Biology

Maximum Entropy Model (II)

Maximum Entropy Discrimination

Feature Selection & Maximum Entropy