Using String-Kernels for Learning Semantic Parsers

Using String-Kernels for Learning Semantic Parsers Rohit J. Kate Raymond J. Mooney

Semantic Parsing • Semantic Parsing: Transforming natural language (NL) sentences into computer executablecomplete meaning representations (MRs) for some application • Example application domains • CLang: Robocup Coach Language • Geoquery: A Database Query Application

CLang: RoboCup Coach Language • In RoboCup Coach competition teams compete to coach simulated players [http://www.robocup.org] • The coaching instructions are given in a formal language called CLang [Chen et al. 2003] If the ball is in our goal area then player 1 should intercept it. Simulated soccer field Semantic Parsing (bpos (goal-area our) (do our {1} intercept)) CLang

Geoquery: A Database Query Application • Query application for U.S. geography database containing about 800 facts [Zelle & Mooney, 1996] Which rivers run through the states bordering Texas? Arkansas,Canadian,Cimarron, Gila,Mississippi, RioGrande … Answer Semantic Parsing Query answer(traverse(next_to(stateid(‘texas’)))) answer(traverse(next_to(stateid(‘texas’)))) answer(traverse(next_to(stateid(‘texas’))))

Learning Semantic Parsers • We assume meaning representation languages (MRLs) have deterministic context free grammars • True for almost all computer languages • MRs can be parsed unambiguously

ANSWER RIVER answer TRAVERSE STATE traverse NEXT_TO STATE next_to STATEID ‘texas’ stateid NL: Which rivers run through the states bordering Texas? MR: answer(traverse(next_to(stateid(‘texas’)))) Parse tree of MR: Non-terminals: ANSWER, RIVER, TRAVERSE, STATE, NEXT_TO, STATEID Terminals: answer, traverse, next_to, stateid, ‘texas’ Productions: ANSWER  answer(RIVER), RIVER  TRAVERSE(STATE), STATE  NEXT_TO(STATE), TRAVERSE  traverse, NEXT_TO  next_to, STATEID  ‘texas’

Learning Semantic Parsers • Assume meaning representation languages (MRLs) have deterministic context free grammars • True for almost all computer languages • MRs can be parsed unambiguously • Training data consists of NL sentences paired with their MRs • Induce a semantic parser which can map novel NL sentences to their correct MRs • Learning problem differs from that of syntactic parsing where training data has trees annotated over the NL sentences

KRISP: Kernel-based Robust Interpretation for Semantic Parsing • Learns semantic parser from NL sentences paired with their respective MRs given MRL grammar • Productions of MRL are treated like semantic concepts • SVM classifier with string subsequence kernel is trained for each production to identify if an NL substring represents the semantic concept • These classifiers are used to compositionally build MRs of the sentences

Overview of KRISP MRL Grammar Collect positive and negative examples NL sentences with MRs Best MRs (correct and incorrect) Train string-kernel-based SVM classifiers Training Semantic Parser Testing Novel NL sentences Best MRs

KRISP’s Semantic Parsing • We first define Semantic Derivation of an NL sentence • We next define Probability of a Semantic Derivation • Semantic parsing of an NL sentence involves finding its Most Probable Semantic Derivation • Straightforward to obtain MR from a semantic derivation

ANSWER RIVER answer TRAVERSE STATE traverse NEXT_TO STATE next_to STATEID ‘texas’ stateid Semantic Derivation of an NL Sentence MR parse with non-terminals on the nodes: Which rivers run through the states bordering Texas?

Semantic Derivation of an NL Sentence MR parse with productions on the nodes: ANSWER  answer(RIVER) RIVER  TRAVERSE(STATE) TRAVERSE  traverse STATE  NEXT_TO(STATE) NEXT_TO  next_to STATE  STATEID STATEID  ‘texas’ Which rivers run through the states bordering Texas?

Semantic Derivation of an NL Sentence Semantic Derivation: Each node coversan NL substring: ANSWER  answer(RIVER) RIVER  TRAVERSE(STATE) TRAVERSE  traverse STATE  NEXT_TO(STATE) NEXT_TO  next_to STATE  STATEID STATEID  ‘texas’ Which rivers run through the states bordering Texas?

Semantic Derivation of an NL Sentence Semantic Derivation: Each node contains a production and the substring of NL sentence it covers: (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE),[1..9]) (TRAVERSE  traverse,[1..4]) (STATE  NEXT_TO(STATE),[5..9]) (NEXT_TO  next_to, [5..7]) (STATE  STATEID,[8..9]) (STATEID  ‘texas’,[8..9]) Which rivers run through the states bordering Texas? 1 2 3 4 5 6 7 8 9

Semantic Derivation of an NL Sentence Substrings in NL sentence may be in a different order: ANSWER  answer(RIVER) RIVER  TRAVERSE(STATE) TRAVERSE  traverse STATE  NEXT_TO(STATE) NEXT_TO  next_to STATE  STATEID STATEID  ‘texas’ Through the states that border Texas which rivers run?

Semantic Derivation of an NL Sentence Nodes are allowed to permute the children productions from the original MR parse (ANSWER  answer(RIVER), [1..10]) (RIVER  TRAVERSE(STATE), [1..10]] (STATE  NEXT_TO(STATE), [1..6]) (TRAVERSE  traverse, [7..10]) (NEXT_TO  next_to, [1..5]) (STATE  STATEID, [6..6]) (STATEID  ‘texas’, [6..6]) Through the states that border Texas which rivers run? 1 2 3 4 5 6 7 8 9 10

(NEXT_TO  next_to, [5..7]) the states bordering 5 6 7 Probability of a Semantic Derivation • Let Pπ(s[i..j]) be the probability that production π covers the substring s[i..j] of sentence s • For e.g., PNEXT_TO  next_to(“the states bordering”) • Obtained from the string-kernel-based SVM classifiers trained for each production π • Assuming independence, probability of a semantic derivation D: 0.99

Probability of a Semantic Derivation contd. (ANSWER  answer(RIVER), [1..9]) 0.98 (RIVER  TRAVERSE(STATE), [1..9]) 0.9 (TRAVERSE  traverse, [1..4]) (STATE  NEXT_TO(STATE), [5..9]) 0.95 0.89 (NEXT_TO  next_to, [5..7]) (STATE  STATEID, [8..9]) 0.99 0.93 (STATEID  ‘texas’, [8..9]) 0.98 Which rivers run through the states bordering Texas? 1 2 3 4 5 6 7 8 9

Computing the Most Probable Semantic Derivation • Task of semantic parsing is to find the most probable semantic derivation of the NL sentence given all the probabilities Pπ(s[i..j]) • Implemented by extending Earley’s [1970] context-free grammar parsing algorithm • Resembles PCFG parsing but different because: • Probability of a production depends on which substring of the sentence it covers • Leaves are not terminals but substrings of words

Computing the Most Probable Semantic Derivation contd. • Does a greedy approximation search, with beam width ω=20, and returns ω most probable derivations it finds • Uses a threshold θ=0.05 to prune low probability trees

Overview of KRISP MRL Grammar Collect positive and negative examples NL sentences with MRs Best semantic derivations (correct and incorrect) Train string-kernel-based SVM classifiers Pπ(s[i..j]) Training Semantic Parser Testing Novel NL sentences Best MRs

KRISP’s Training Algorithm • Takes NL sentences paired with their respective MRs as input • Obtains MR parses • Induces the semantic parser and refines it in iterations • In the first iteration, for every production π: • Call those sentences positives whose MR parses use that production • Call the remaining sentences negatives

Positives Negatives • which rivers run through the states bordering texas? • what is the most populated state bordering oklahoma ? • what is the largest city in states that border california ? • … • what state has the highest population ? • what states does the delaware river run through ? • which states have cities named austin ? • what is the lowest point of the state with the largest area ? • … String-kernel-based SVM classifier KRISP’s Training Algorithm contd. First Iteration STATE  NEXT_TO(STATE)

String Subsequence Kernel • Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the states next to” K(s,t) = ?

String Subsequence Kernel • Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the states next to” u = states K(s,t) = 1+?

String Subsequence Kernel • Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the states next to” u = next K(s,t) = 2+?

String Subsequence Kernel • Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the states next to” u = to K(s,t) = 3+?

String Subsequence Kernel • Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the states next to” u = states next K(s,t) = 4+?

String Subsequence Kernel • Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002] s = “states that are next to” t = “the statesnext to” K(s,t) = 7

String Subsequence Kernel contd. • The kernel is normalized to remove any bias due to different string lengths • Lodhi et al. [2002] give O(n|s||t|) algorithm for computing string subsequence kernel • Used for Text Categorization [Lodhi et al, 2002] and Information Extraction [Bunescu & Mooney, 2005]

String Subsequence Kernel contd. • The examples are implicitly mapped to the feature space of all subsequences and the kernel computes the dot products state with the capital of states with area larger than the states next to states that border states bordering states through which states that share border

Separating hyperplane Support Vector Machines • SVMs find a separating hyperplane such that the margin is maximized state with the capital of states that are next to 0.97 states with area larger than the states next to states that border states bordering states through which states that share border Probability estimate of an example belonging to a class can be obtained using its distance from the hyperplane [Platt, 1999]

Positives Negatives • which rivers run through the states bordering texas? • what is the most populated state bordering oklahoma ? • what is the largest city in states that border california ? • … • what state has the highest population ? • what states does the delaware river run through ? • which states have cities named austin ? • what is the lowest point of the state with the largest area ? • … String-kernel-based SVM classifier PSTATENEXT_TO(STATE)(s[i..j]) KRISP’s Training Algorithm contd. First Iteration STATE  NEXT_TO(STATE)

Overview of KRISP MRL Grammar Collect positive and negative examples NL sentences with MRs Best semantic derivations (correct and incorrect) Train string-kernel-based SVM classifiers Pπ(s[i..j]) Training Semantic Parser Testing Novel NL sentences Best MRs

KRISP’s Training Algorithm contd. • Using these classifiers Pπ(s[i..j]), obtain the ω best semantic derivations of each training sentence • Some of these derivations will give the correct MR, called correct derivations, some will give incorrect MRs, called incorrect derivations • For the next iteration, collect positives from most probable correct derivation • Extended Earley’s algorithm can be forced to derive only the correct derivations by making sure all subtrees it generates exist in the correct MR parse • Collect negatives from incorrect derivations with higher probability than the most probable correct derivation

KRISP’s Training Algorithm contd. Most probable correct derivation: (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..4]) (STATE  NEXT_TO(STATE), [5..9]) (NEXT_TO  next_to, [5..7]) (STATE  STATEID, [8..9]) (STATEID  ‘texas’, [8..9]) Which rivers run through the states bordering Texas? 1 2 3 4 5 6 7 8 9

KRISP’s Training Algorithm contd. Most probable correct derivation: Collect positive examples (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..4]) (STATE  NEXT_TO(STATE), [5..9]) (NEXT_TO  next_to, [5..7]) (STATE  STATEID, [8..9]) (STATEID  ‘texas’, [8..9]) Which rivers run through the states bordering Texas? 1 2 3 4 5 6 7 8 9

KRISP’s Training Algorithm contd. Incorrect derivation with probability greater than the most probable correct derivation: (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..7]) (STATE  STATEID, [8..9]) (STATEID  ‘texas’, [8..9]) Which rivers run through the states bordering Texas? 1 2 3 4 5 6 7 8 9 Incorrect MR: answer(traverse(stateid(‘texas’)))

KRISP’s Training Algorithm contd. Incorrect derivation with probability greater than the most probable correct derivation: Collect negative examples (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..7]) (STATE  STATEID, [8..9]) (STATEID  ‘texas’, [8..9]) Which rivers run through the states bordering Texas? 1 2 3 4 5 6 7 8 9 Incorrect MR: answer(traverse(stateid(‘texas’)))

(ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..4]) (STATE  NEXT_TO (STATE), [5..9]) (TRAVERSE  traverse, [1..7]) (STATE  STATEID, [8..9]) (STATEID  ‘texas’, [8..9]) (NEXT_TO  next_to, [5..7]) (STATE  STATEID, [8..9]) (STATEID ‘texas’,[8..9]) Which rivers run through the states bordering Texas? Which rivers run through the states bordering Texas? KRISP’s Training Algorithm contd. Most Probable Correct derivation: Incorrect derivation: Traverse both trees in breadth-first order till the first nodes where their productions differ are found.

(ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..4]) (STATE  NEXT_TO (STATE), [5..9]) (TRAVERSE  traverse, [1..7]) (STATE  STATEID, [8..9]) (STATEID  ‘texas’, [8..9]) (NEXT_TO  next_to, [5..7]) (STATE  STATEID, [8..9]) (STATEID ‘texas’,[8..9]) Which rivers run through the states bordering Texas? Which rivers run through the states bordering Texas? KRISP’s Training Algorithm contd. Most Probable Correct derivation: Incorrect derivation: Mark the words under these nodes.

(ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..4]) (STATE  NEXT_TO (STATE), [5..9]) (TRAVERSE  traverse, [1..7]) (STATE  STATEID, [8..9]) (STATEID  ‘texas’, [8..9]) (NEXT_TO  next_to, [5..7]) (STATE  STATEID, [8..9]) (STATEID ‘texas’,[8..9]) Which rivers run through the states bordering Texas? Which rivers run through the states bordering Texas? KRISP’s Training Algorithm contd. Most Probable Correct derivation: Incorrect derivation: Consider all the productions covering the marked words. Collect negatives for productions which cover any marked word in incorrect derivation but not in the correct derivation.

(ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (ANSWER  answer(RIVER), [1..9]) (RIVER  TRAVERSE(STATE), [1..9]) (TRAVERSE  traverse, [1..4]) (STATE  NEXT_TO (STATE), [5..9]) (TRAVERSE  traverse, [1..7]) (STATE  STATEID, [8..9]) (STATEID  ‘texas’, [8..9]) (NEXT_TO  next_to, [5..7]) (STATE  STATEID, [8..9]) (STATEID ‘texas’,[8..9]) Which rivers run through the states bordering Texas? Which rivers run through the states bordering Texas? KRISP’s Training Algorithm contd. Most Probable Correct derivation: Incorrect derivation: Consider the productions covering the marked words. Collect negatives for productions which cover any marked word in incorrect derivation but not in the correct derivation.

Using String-Kernels for Learning Semantic Parsers

Using String-Kernels for Learning Semantic Parsers

Presentation Transcript

Learning Semantic Parsers: An Important But Under-Studied Problem

Learning From String Sequences

XML Parsers

Latent Semantic Kernels

LR PARSERS

Protein Homology Detection Using String Alignment Kernels

Parsers

A Privacy Preserving Efficient Protocol for Semantic Similarity Join Using Long String Attributes

Support Vector Machine and String Kernels for Protein Classification

Mismatch string kernels for discriminative protein classification

Dynamic Network Selection using Kernels

Parsers

Semantic Learning

XML Parsers

XML Parsers

Learning Semantic String Transformations from Examples

String Kernels on Slovenian documents

Using XML Parsers and Unicode

Parsers