- By
**thad** - Follow User

- 88 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Introduction to C onditional R andom F ields' - thad

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Overview

- Useful Definitions
- Background
- HMM
- MEMM
- Conditional Random Fields
- Statistical and Graph Definitions
- Computation (Training and Inference)
- Extensions
- Bayesian Conditional Random Fields
- Hierarchical Conditional Random Fields
- Semi-CRFs
- Future Directions

Useful Definitions

- Random Field (wikipedia)
- In probability theory, let S = {X1, ..., Xn}, with the Xi in {0, 1, ..., G − 1} being a set of random variables on the sample space Ω = {0, 1, ..., G − 1}n. A probability measure π is a random field if, for all ω in Ω, π(ω) > 0.
- Markov Process (chain if finite sequence)
- Stochastic process with Markov property
- Markov Property
- The probability that a random variable assumes a value depends on the other random variables only through the ones that are its immediate neighbors
- “memoryless”
- Hidden Markov Model (HMM)
- Markov Model where the current state is unobserved
- Viterbi Algorithm
- Dynamic programming technique to discover the most likely sequence of states required to explain the observed states in an HMM
- Determine labels
- Potential Function == Feature Function
- In CRF the potential function scores the compatibility of yt, yt-1 and wt(X)

Background

- Interest in CRFs arose from Richa’s work with gene expression
- Current literature shows them performing better on NLP tasks than other commonly used NLP approaches like Support Vector Machines (SVM), neural networks, HMMs and others
- Termed coined by Lafftery in 2001
- Predecessor was HMM and maximum entropy Markov models (MEMM)

HMM

- Definition
- Markov Model where the current state is unobserved
- Generative Model
- To examine all input X would be prohibitive, hence Markov property looking at only current element in the sequence
- No multiple interacting features, long range dependencies

MEMMs

- McCallum et al, 2000
- Non-generative finite-state model based on next-state classifier
- Directed graph
- P(YjX) = ∏t P(yt| yt-1 wt(X)) where wt(X) is a sliding window over the X sequence

Label Bias Problem

- Transitions leaving a given state complete only against each other, rather than against all other transitions in the model
- Implies “Conversation of score mass” (Bottou, 1991)
- Observations can be ignored, Viterbi decoding can’t downgrade a branch

- CRF will solve this problem by having a single exponential model for the joint probability of the ENTIRE SEQUENCE OF LABELS given the observation sequence

Big Picture Definition

- Wikipedia Definition (Aug 2009)
- A conditional random field (CRF) is a type of discriminativeprobabilistic model most often used for the labeling or parsing of sequential data, such as natural language text or biological sequences.
- Probabilistic model is a statistical model, in math terms “a pair (Y,P) where Y is the set of possible observations and P the set of possible probability distributions on Y”
- In statistics terms this means the objective is to infer (or pick) the distinct element (probability distribution) in the set “P” given your observation Y
- Discriminative model meaning it models the conditional probability distribution P(y|x) which can predict y given x.
- It can not do it the other way around (produce x fromy) since it does not a generative model (capable of generating sample data given a model) as it does not model a joint probability distribution
- Similar to other discriminative models like support vector machines and neural networks
- When analyzing sequential data a conditional model specifies the probabilities of possible label sequences given an observation sequence

CRF Graphical Definition

Definition from Lafferty

CRF Undirected Graph

- Undirected graphical model
- Let g = (V,E) be a graph such that Y = (Yv)vεV, so that Y is indexed by the vertices of G. Then (X,Y) is a conditional random field in case, when conditioned on X, the random variables Yv obey the Markov property with respect to the graph: p(Yv|X,Yw,w≠v)=p(Yv|X,Yw,w~v), where w~v means that w and v are neighbors in G

Computation of CRF

- Training
- Conditioning
- Calculation of Feature Function
- P(Y|X) = 1/Z(X)exp ∑t PSI (yt, yt-1 and wt(X))
- Z is normalizing factor
- Potential Function in paratheses
- Inference
- Viterbi Decoding
- Approximate Model Averaging
- Others?

Training Approaches

- CRF is supervised learning so can train using
- Maximum Likehood (original paper)
- Used iterative scaling method, was very slow
- Gradient Assent
- Also slow when naïve
- Mallet Implementation used BFGS algorithm
- http://en.wikipedia.org/wiki/BFGS
- Broyden-Fletcher-Goldfarb – Shanno
- Approximate 2nd order algorithm
- Stochastic Gradient Method (2006) accelerated via Stochastic Meta Descent
- Gradient Tree Boosting (variant of a 2001
- http://jmlr.csail.mit.edu/papers/volume9/dietterich08a/dietterich08a.pdf
- Potential functions are sums of regression trees
- Decision trees using real values
- Published 2008
- Competitive with Mallet
- Bayesian (estimate posterior probability)

Conditional Random Field ExtensionsSemi-CRF

- Semi-CRF
- Instead of assigning labels to each member of sequence, labels are assigned to sub-sequences
- Advantage – “features for semi-CRF can measure properties of segments, and transition within a segment can be non-Markovian”
- http://www.cs.cmu.edu/~wcohen/postscript/semiCRF.pdf

Bayesian CRF

- Qi et al, (2005)
- http://www.cs.purdue.edu/homes/alanqi/papers/Qi-Bayesian-CRF-AIstat05.pdf
- Replacement for ML method of Lafferty
- Reducing over-fitting
- “Power EP Method”

Hierarchical CRF (HCRF)

- http://www.springerlink.com/content/r84055k2754464v5/
- http://www.cs.washington.edu/homes/fox/postscripts/places-isrr-05.pdf
- GPS motion, for surveillance, tracking, dividing people’s workday into labels of work, travel, sleep, etc..
- Less work

Future Directions

- Less work on conditional random fields in biology
- PubMed hits
- Conditional Random Field - 21
- Conditional Random Fields - 43
- CRF variants & promoter/regulatory element shows no hits
- CRF and ontology show no hits
- Plan
- Implement CRF in Java, apply to biology problems, try to find ways to extend?

Useful Papers

- Link to original paper and review paper
- http://www.inference.phy.cam.ac.uk/hmw26/crf/
- Review paper:
- http://www.inference.phy.cam.ac.uk/hmw26/papers/crf_intro.pdf
- Another review
- http://www.cs.umass.edu/~mccallum/papers/crf-tutorial.pdf
- Review slides
- http://www.cs.pitt.edu/~mrotaru/comp/nlp/Random%20Fields/Tutorial%20CRF%20Lafferty.pdf
- The boosting paper has a nice review
- http://jmlr.csail.mit.edu/papers/volume9/dietterich08a/dietterich08a.pdf

Download Presentation

Connecting to Server..