Learning-Based Java for NLP Systems Development

Learning Based Java for Rapid Development of NLP Systems Nick Rizzolo and Dan Roth Introduction For Example, Semantic Role Labeling: LBJ: Specifying Models, Features, and Constraints Today's natural language processing systems are growing more complex with the need to incorporate a wider range of language resources and more sophisticated statistical methods. In many cases, it is necessary to learn a component with input that includes the predictions of other learned components or to simultaneously assign values to multiple components with an expressive, data dependent structure among them. As a result, the design of systems with multiple learning components is inevitably quite technically complex, and implementations of conceptually simple NLP systems can be time consuming and prone to error. Our new modeling language, Learning Based Java (LBJ), facilitates the rapid development of systems that learn and perform inference. LBJ has already been used to build state of the art NLP systems. In this work, we first demonstrate that there exists a theoretical model that describes most NLP approaches adeptly. Second, we show how our LBJ language enables the programmer to describe the theoretical model succinctly. Finally, we introduce the concept of data driven compilation, a translation process in which the efficiency of the generated code benefits from the data given as input to the learning algorithms. Goal: identify and classify the arguments of a given verb. Models can be specified modularly. modelArgumentIdentifier :: discrete[] input -> booleanisArgument input[*] /\ ^isArgument; modelArgumentType :: discrete[] input -> discrete type input[*] /\ type; input[*] /\ input[*] /\ type; static model pertinentData :: ArgumentCandidate candidate -> discrete[] data data.phraseType = candidate.phraseType(); data.headWord = candidate.hadWord(); data.headTag = candidate.headTag(); data.path = candidate.path(); The multi-class argument type classifier labels each argument with a type. The binary argument identification classifier determines whether each sequence of words is an argument. Features are written in First-Order Logic, and will have learned weights associated with them. Type ID Every application of either classifier contributes to Classifiers Models can also be hard-coded, interfacing directly with data. A1 R-A1 A0 V A2 static model noOverlaps :: ArgumentCandidate[] candidates -> discrete[] types for (i : (0 .. candidates.size() - 1)) for (j : (i + 1 .. candidates.size() - 1)) #: candidates[i].overlapsWith(candidates[j]) => types[i] :: "null" || types[j] :: "null"; static model noDuplicates :: -> discrete[] types #: forall (v : types[0].values) atmost1of (t : types) t :: v; static model referenceConsistency :: -> discrete[] types #: forall (value : types[0].values) (exists (var : types) var :: "R-" + value) => (exists (var : types) var :: value); Constraints look just like features, but their weights are explicit and constant. which I left to my daughter The pearls were shiny. ∞ # ≡ Extentions to first order logic, like the “atmost” quantifier, make common constraints simple. R-A1 => A1 A2 => No other A2 Thus, we have a CCM. Constraints E LBJ: Model Combination and Inference Data-Driven Compilation Developing a machine learning framework as a stand-alone language as opposed to a library opens the door to opportunities for automatically improving the efficiency of the code based on high-level analyses. In particular, much of the information necessary to generate the final program code is only available in the training data. Thus, we say that an LBJ compiler performs data-driven compilation. Constrained Conditional Models (CCMs) Model application looks like assignment of another model to this model’s output variables. Inference can be as easy as selecting algorithms and applying them over specific variables. The models from the upper-right panel are now applied over a single group of candidate arguments corresponding to a single verb. Example: Feature Extraction Consider the following lexicon, mapping from features to integers, that appears in many NLP systems. model SRLProblem :: ArgumentIdentifier ai, ArgumentType at, ArgumentCandidate[] candidates -> boolean[] isArgument, discrete[] types for (i : (0 .. candidates.size() - 1)) 1e9: isArgument[i] <- ai (commonFeatures candidates[i]); 1: types[i] <- at (commonFeatures candidates[i]); #: ~isArgument[i] => types[i] :: "null"; types <- noOverlaps candidates; types <- noDuplicates (); types <- referenceConsistency (); In a nut shell, to perform inference, a CCM finds the values of the output variables that maximize the following scoring function: solver SRLInference :: SRLProblem problem Greedy.solve problem.isArgument[*]; ILP.solve problem.types[*]; Input variables Output variables State-of-the-art LBJ Implementations Features Learned Weights LBJ has already been used to develop several state-of-the-art resources. Constraints Constant Weights The LBJ POS tagger reports a competitive 96.6% accuracy on the standard Wall Street Journal corpus. Result of Inference Output space In the named entity recognizer of (Ratinov and Roth, 2009), non-local features, gazetteers, and wikipedia are all incorporated into a system that achieves 90.8 F1on the CoNLL-2003 dataset, the highest score we are aware of. CCMs subsume discriminative and probabilistic modeling formalisms. Thus, a wide variety of learning and inference algorithms can be applied to them. While a typical lexicon will store an integer index associated with every feature value, LBJ will only store indexes for the atomic features. Indexes of composite features can then be computed on the fly based on the indexes of the atomic features’ values. The result is a much smaller lexicon, and fewer hash table look-ups. The co-reference resolution system of (Bengston and Roth, 2008) achieves state-of-the-art performance on the ACE 2004 dataset while employing only a single learned classifier and a single constraint. This work was supported by NSF grant NSF SoD-HCER-0613885.

Learning-Based Java for NLP Systems Development

Learning-Based Java for NLP Systems Development

Presentation Transcript

Discrete Distributions

The Role of Logic and Proof in Teaching Discrete Mathematics

RECRUITMENT TALK

NAVY ADVANCEMENT CENTER

Static Race Detection for C

Entity-Relationship Model

Foundations of Discrete Mathematics

Discrete and Categorical Data

MASTER MASON EXAMINATION NO. 3

Discrete Choice Modeling

NJ Heath & Physiology test for TCNJ Teacher Candidates

Republican Candidates 2012

Slides for CISC 2315: Discrete Structures Chapter 1

Discrete Structures

Slides for CISC 2315: Discrete Structures Chapter 1

Discrete Mathematics Lecture 2.

Model Transformation

Chapter 3-2 Discrete Random Variables

Biomedical Signal processing Chapter 6 structures for discrete-time system

Discrete Techniques

Discrete Mathematics

The Ultimate FREE Java Course Part 1

Learning-Based Java for NLP Systems Development