Learning Based Java for Rapid Development of NLP Systems
1 / 1

static model noOverlaps :: ArgumentCandidate [] candidates -> discrete [] types - PowerPoint PPT Presentation

  • Uploaded on

Learning Based Java for Rapid Development of NLP Systems Nick Rizzolo and Dan Roth. Introduction. For Example, Semantic Role Labeling:. LBJ: Specifying Models, Features, and Constraints.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'static model noOverlaps :: ArgumentCandidate [] candidates -> discrete [] types' - chet

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Learning Based Java for Rapid Development of NLP Systems

Nick Rizzolo and Dan Roth


For Example, Semantic Role Labeling:

LBJ: Specifying Models, Features, and Constraints

Today's natural language processing systems are growing more complex with the need to incorporate a wider range of language resources and more sophisticated statistical methods. In many cases, it is necessary to learn a component with input that includes the predictions of other learned components or to simultaneously assign values to multiple components with an expressive, data dependent structure among them. As a result, the design of systems with multiple learning components is inevitably quite technically complex, and implementations of conceptually simple NLP systems can be time consuming and prone to error. Our new modeling language, Learning Based Java (LBJ), facilitates the rapid development of systems that learn and perform inference. LBJ has already been used to build state of the art NLP systems.

In this work, we first demonstrate that there exists a theoretical model that describes most NLP approaches adeptly. Second, we show how our LBJ language enables the programmer to describe the theoretical model succinctly. Finally, we introduce the concept of data driven compilation, a translation process in which the efficiency of the generated code benefits from the data given as input to the learning algorithms.

Goal: identify and classify the arguments of a given verb.

Models can be specified modularly.

modelArgumentIdentifier :: discrete[] input -> booleanisArgument

input[*] /\ ^isArgument;

modelArgumentType :: discrete[] input -> discrete type

input[*] /\ type;

input[*] /\ input[*] /\ type;

static model pertinentData :: ArgumentCandidate candidate

-> discrete[] data

data.phraseType = candidate.phraseType();

data.headWord = candidate.hadWord();

data.headTag = candidate.headTag();

data.path = candidate.path();

The multi-class argument type classifier labels each argument with a type.

The binary argument identification classifier determines whether each sequence of words is an argument.

Features are written in First-Order Logic, and will have learned weights associated with them.



Every application of either classifier contributes to


Models can also be hard-coded, interfacing directly with data.






static model noOverlaps :: ArgumentCandidate[] candidates

-> discrete[] types

for (i : (0 .. candidates.size() - 1))

for (j : (i + 1 .. candidates.size() - 1))

#: candidates[i].overlapsWith(candidates[j])

=> types[i] :: "null" || types[j] :: "null";

static model noDuplicates :: -> discrete[] types

#: forall (v : types[0].values)

atmost1of (t : types) t :: v;

static model referenceConsistency :: -> discrete[] types

#: forall (value : types[0].values)

(exists (var : types) var :: "R-" + value)

=> (exists (var : types) var :: value);

Constraints look just like features, but their weights are explicit and constant.




to my daughter

The pearls

were shiny.


Extentions to first order logic, like the “atmost” quantifier, make common constraints simple.

R-A1 =>


A2 =>

No other A2

Thus, we have a CCM.



LBJ: Model Combination and Inference

Data-Driven Compilation

Developing a machine learning framework as a stand-alone language as opposed to a library opens the door to opportunities for automatically improving the efficiency of the code based on high-level analyses. In particular, much of the information necessary to generate the final program code is only available in the training data. Thus, we say that an LBJ compiler performs data-driven compilation.

Constrained Conditional Models (CCMs)

Model application looks like assignment of another model to this model’s output variables.

Inference can be as easy as selecting algorithms and applying them over specific variables.

The models from the upper-right panel are now applied over a single group of candidate arguments corresponding to a single verb.

Example: Feature Extraction

Consider the following lexicon, mapping from features to integers, that appears in many NLP systems.

model SRLProblem :: ArgumentIdentifier ai, ArgumentType at,

ArgumentCandidate[] candidates

-> boolean[] isArgument, discrete[] types

for (i : (0 .. candidates.size() - 1))

1e9: isArgument[i] <- ai (commonFeatures candidates[i]);

1: types[i] <- at (commonFeatures candidates[i]);

#: ~isArgument[i] => types[i] :: "null";

types <- noOverlaps candidates;

types <- noDuplicates ();

types <- referenceConsistency ();

In a nut shell, to perform inference, a CCM finds the values of the output variables that maximize the following scoring function:

solver SRLInference :: SRLProblem problem

Greedy.solve problem.isArgument[*];

ILP.solve problem.types[*];

Input variables

Output variables

State-of-the-art LBJ Implementations


Learned Weights

LBJ has already been used to develop several state-of-the-art resources.


Constant Weights

The LBJ POS tagger reports a competitive 96.6% accuracy on the standard Wall Street Journal corpus.

Result of Inference

Output space

In the named entity recognizer of (Ratinov and Roth, 2009), non-local features, gazetteers, and wikipedia are all incorporated into a system that achieves 90.8 F1on the CoNLL-2003 dataset, the highest score we are aware of.

CCMs subsume discriminative and probabilistic modeling formalisms. Thus, a wide variety of learning and inference algorithms can be applied to them.

While a typical lexicon will store an integer index associated with every feature value, LBJ will only store indexes for the atomic features. Indexes of composite features can then be computed on the fly based on the indexes of the atomic features’ values. The result is a much smaller lexicon, and fewer hash table look-ups.

The co-reference resolution system of (Bengston and Roth, 2008) achieves state-of-the-art performance on the ACE 2004 dataset while employing only a single learned classifier and a single constraint.

This work was supported by NSF grant NSF SoD-HCER-0613885.