Learning Based Java for Rapid Development of NLP Systems Nick Rizzolo and Dan Roth. Introduction. For Example, Semantic Role Labeling:. LBJ: Specifying Models, Features, and Constraints.
Nick Rizzolo and Dan Roth
For Example, Semantic Role Labeling:
LBJ: Specifying Models, Features, and Constraints
Today's natural language processing systems are growing more complex with the need to incorporate a wider range of language resources and more sophisticated statistical methods. In many cases, it is necessary to learn a component with input that includes the predictions of other learned components or to simultaneously assign values to multiple components with an expressive, data dependent structure among them. As a result, the design of systems with multiple learning components is inevitably quite technically complex, and implementations of conceptually simple NLP systems can be time consuming and prone to error. Our new modeling language, Learning Based Java (LBJ), facilitates the rapid development of systems that learn and perform inference. LBJ has already been used to build state of the art NLP systems.
In this work, we first demonstrate that there exists a theoretical model that describes most NLP approaches adeptly. Second, we show how our LBJ language enables the programmer to describe the theoretical model succinctly. Finally, we introduce the concept of data driven compilation, a translation process in which the efficiency of the generated code benefits from the data given as input to the learning algorithms.
Goal: identify and classify the arguments of a given verb.
Models can be specified modularly.
modelArgumentIdentifier :: discrete input -> booleanisArgument
input[*] /\ ^isArgument;
modelArgumentType :: discrete input -> discrete type
input[*] /\ type;
input[*] /\ input[*] /\ type;
static model pertinentData :: ArgumentCandidate candidate
-> discrete data
data.phraseType = candidate.phraseType();
data.headWord = candidate.hadWord();
data.headTag = candidate.headTag();
data.path = candidate.path();
The multi-class argument type classifier labels each argument with a type.
The binary argument identification classifier determines whether each sequence of words is an argument.
Features are written in First-Order Logic, and will have learned weights associated with them.
Every application of either classifier contributes to
Models can also be hard-coded, interfacing directly with data.
static model noOverlaps :: ArgumentCandidate candidates
-> discrete types
for (i : (0 .. candidates.size() - 1))
for (j : (i + 1 .. candidates.size() - 1))
=> types[i] :: "null" || types[j] :: "null";
static model noDuplicates :: -> discrete types
#: forall (v : types.values)
atmost1of (t : types) t :: v;
static model referenceConsistency :: -> discrete types
#: forall (value : types.values)
(exists (var : types) var :: "R-" + value)
=> (exists (var : types) var :: value);
Constraints look just like features, but their weights are explicit and constant.
to my daughter
Extentions to first order logic, like the “atmost” quantifier, make common constraints simple.
No other A2
Thus, we have a CCM.
LBJ: Model Combination and Inference
Developing a machine learning framework as a stand-alone language as opposed to a library opens the door to opportunities for automatically improving the efficiency of the code based on high-level analyses. In particular, much of the information necessary to generate the final program code is only available in the training data. Thus, we say that an LBJ compiler performs data-driven compilation.
Constrained Conditional Models (CCMs)
Model application looks like assignment of another model to this model’s output variables.
Inference can be as easy as selecting algorithms and applying them over specific variables.
The models from the upper-right panel are now applied over a single group of candidate arguments corresponding to a single verb.
Example: Feature Extraction
Consider the following lexicon, mapping from features to integers, that appears in many NLP systems.
model SRLProblem :: ArgumentIdentifier ai, ArgumentType at,
-> boolean isArgument, discrete types
for (i : (0 .. candidates.size() - 1))
1e9: isArgument[i] <- ai (commonFeatures candidates[i]);
1: types[i] <- at (commonFeatures candidates[i]);
#: ~isArgument[i] => types[i] :: "null";
types <- noOverlaps candidates;
types <- noDuplicates ();
types <- referenceConsistency ();
In a nut shell, to perform inference, a CCM finds the values of the output variables that maximize the following scoring function:
solver SRLInference :: SRLProblem problem
State-of-the-art LBJ Implementations
LBJ has already been used to develop several state-of-the-art resources.
The LBJ POS tagger reports a competitive 96.6% accuracy on the standard Wall Street Journal corpus.
Result of Inference
In the named entity recognizer of (Ratinov and Roth, 2009), non-local features, gazetteers, and wikipedia are all incorporated into a system that achieves 90.8 F1on the CoNLL-2003 dataset, the highest score we are aware of.
CCMs subsume discriminative and probabilistic modeling formalisms. Thus, a wide variety of learning and inference algorithms can be applied to them.
While a typical lexicon will store an integer index associated with every feature value, LBJ will only store indexes for the atomic features. Indexes of composite features can then be computed on the fly based on the indexes of the atomic features’ values. The result is a much smaller lexicon, and fewer hash table look-ups.
The co-reference resolution system of (Bengston and Roth, 2008) achieves state-of-the-art performance on the ACE 2004 dataset while employing only a single learned classifier and a single constraint.
This work was supported by NSF grant NSF SoD-HCER-0613885.