introduction to c onditional r andom f ields
Skip this Video
Download Presentation
Introduction to C onditional R andom F ields

Loading in 2 Seconds...

play fullscreen
1 / 16

Introduction to C onditional R andom F ields - PowerPoint PPT Presentation

  • Uploaded on

Introduction to C onditional R andom F ields. John Osborne Sept 4, 2009. Overview. Useful Definitions Background HMM MEMM Conditional Random Fields Statistical and Graph Definitions Computation (Training and Inference) Extensions Bayesian Conditional Random Fields

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Introduction to C onditional R andom F ields' - thad

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
  • Useful Definitions
  • Background
    • HMM
    • MEMM
  • Conditional Random Fields
    • Statistical and Graph Definitions
  • Computation (Training and Inference)
  • Extensions
    • Bayesian Conditional Random Fields
    • Hierarchical Conditional Random Fields
    • Semi-CRFs
  • Future Directions
useful definitions
Useful Definitions
  • Random Field (wikipedia)
    • In probability theory, let S = {X1, ..., Xn}, with the Xi in {0, 1, ..., G − 1} being a set of random variables on the sample space Ω = {0, 1, ..., G − 1}n. A probability measure π is a random field if, for all ω in Ω, π(ω) > 0.
  • Markov Process (chain if finite sequence)
    • Stochastic process with Markov property
  • Markov Property
    • The probability that a random variable assumes a value depends on the other random variables only through the ones that are its immediate neighbors
    • “memoryless”
  • Hidden Markov Model (HMM)
    • Markov Model where the current state is unobserved
  • Viterbi Algorithm
    • Dynamic programming technique to discover the most likely sequence of states required to explain the observed states in an HMM
    • Determine labels
  • Potential Function == Feature Function
    • In CRF the potential function scores the compatibility of yt, yt-1 and wt(X)
  • Interest in CRFs arose from Richa’s work with gene expression
  • Current literature shows them performing better on NLP tasks than other commonly used NLP approaches like Support Vector Machines (SVM), neural networks, HMMs and others
    • Termed coined by Lafftery in 2001
  • Predecessor was HMM and maximum entropy Markov models (MEMM)
  • Definition
    • Markov Model where the current state is unobserved
  • Generative Model
  • To examine all input X would be prohibitive, hence Markov property looking at only current element in the sequence
  • No multiple interacting features, long range dependencies
  • McCallum et al, 2000
  • Non-generative finite-state model based on next-state classifier
  • Directed graph
  • P(YjX) = ∏t P(yt| yt-1 wt(X)) where wt(X) is a sliding window over the X sequence
label bias problem
Label Bias Problem
  • Transitions leaving a given state complete only against each other, rather than against all other transitions in the model
  • Implies “Conversation of score mass” (Bottou, 1991)
  • Observations can be ignored, Viterbi decoding can’t downgrade a branch
  • CRF will solve this problem by having a single exponential model for the joint probability of the ENTIRE SEQUENCE OF LABELS given the observation sequence
big picture definition
Big Picture Definition
  • Wikipedia Definition (Aug 2009)
    • A conditional random field (CRF) is a type of discriminativeprobabilistic model most often used for the labeling or parsing of sequential data, such as natural language text or biological sequences.
  • Probabilistic model is a statistical model, in math terms “a pair (Y,P) where Y is the set of possible observations and P the set of possible probability distributions on Y”
    • In statistics terms this means the objective is to infer (or pick) the distinct element (probability distribution) in the set “P” given your observation Y
  • Discriminative model meaning it models the conditional probability distribution P(y|x) which can predict y given x.
    • It can not do it the other way around (produce x fromy) since it does not a generative model (capable of generating sample data given a model) as it does not model a joint probability distribution
    • Similar to other discriminative models like support vector machines and neural networks
  • When analyzing sequential data a conditional model specifies the probabilities of possible label sequences given an observation sequence
crf graphical definition
CRF Graphical Definition

Definition from Lafferty

CRF Undirected Graph

  • Undirected graphical model
  • Let g = (V,E) be a graph such that Y = (Yv)vεV, so that Y is indexed by the vertices of G. Then (X,Y) is a conditional random field in case, when conditioned on X, the random variables Yv obey the Markov property with respect to the graph: p(Yv|X,Yw,w≠v)=p(Yv|X,Yw,w~v), where w~v means that w and v are neighbors in G
computation of crf
Computation of CRF
  • Training
    • Conditioning
    • Calculation of Feature Function
    • P(Y|X) = 1/Z(X)exp ∑t PSI (yt, yt-1 and wt(X))
      • Z is normalizing factor
      • Potential Function in paratheses
  • Inference
    • Viterbi Decoding
    • Approximate Model Averaging
    • Others?
training approaches
Training Approaches
  • CRF is supervised learning so can train using
    • Maximum Likehood (original paper)
      • Used iterative scaling method, was very slow
    • Gradient Assent
      • Also slow when naïve
    • Mallet Implementation used BFGS algorithm
      • Broyden-Fletcher-Goldfarb – Shanno
      • Approximate 2nd order algorithm
    • Stochastic Gradient Method (2006) accelerated via Stochastic Meta Descent
    • Gradient Tree Boosting (variant of a 2001
      • Potential functions are sums of regression trees
        • Decision trees using real values
      • Published 2008
      • Competitive with Mallet
    • Bayesian (estimate posterior probability)
conditional random field extensions semi crf
Conditional Random Field ExtensionsSemi-CRF
  • Semi-CRF
    • Instead of assigning labels to each member of sequence, labels are assigned to sub-sequences
    • Advantage – “features for semi-CRF can measure properties of segments, and transition within a segment can be non-Markovian”
bayesian crf
Bayesian CRF
  • Qi et al, (2005)
  • Replacement for ML method of Lafferty
  • Reducing over-fitting
  • “Power EP Method”
hierarchical crf hcrf
Hierarchical CRF (HCRF)
  • GPS motion, for surveillance, tracking, dividing people’s workday into labels of work, travel, sleep, etc..
  • Less work
future directions
Future Directions
  • Less work on conditional random fields in biology
    • PubMed hits
      • Conditional Random Field - 21
      • Conditional Random Fields - 43
    • CRF variants & promoter/regulatory element shows no hits
  • CRF and ontology show no hits
  • Plan
    • Implement CRF in Java, apply to biology problems, try to find ways to extend?
useful papers
Useful Papers
  • Link to original paper and review paper
    • Review paper:
  • Another review
  • Review slides
  • The boosting paper has a nice review