PFA Node Alignment Algorithm

1 / 14

# PFA Node Alignment Algorithm - PowerPoint PPT Presentation

PFA Node Alignment Algorithm. Consider the parse trees of a Chinese-English parallel pair of sentences. PFA Node Alignment Algorithm. Each of the nodes stores a value. All nodes are initialized with the value 1. Each Word to Word alignment is assigned a unique prime number.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' PFA Node Alignment Algorithm' - tia

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
PFA Node Alignment Algorithm

Consider the parse trees of a Chinese-English parallel pair of sentences.

PFA Node Alignment Algorithm

Each of the nodes stores a value.

All nodes are initialized with the value 1.

Each Word to Word alignment is assigned a unique prime number.

PFA Node Alignment Algorithm
• For every word to word alignment, we do the following:
• Let p be the unique prime value assigned to the alignment.
• Let wsand wt be the aligned words on the source and target side.
• Assign the value p to the nodes corresponding to the words wsand wt .
• Example: “Australia” gets value 2, “is” gets value 3.
PFA Node Alignment Algorithm

In case there are “one-to-many” alignments, they are considered as multiple “one-to-one” alignments, and all of these alignments are given the same prime value.

Example: “North Korea” is just one word on Chinese side. That word is assigned the value 25, which is a product 5*5.

PFA Node Alignment Algorithm
• Once all the lexical items have values, we propogate the values up the tree as follows:
• Work bottom-up
• A node updates its value as the product of the values of its children.
PFA Node Alignment Algorithm
• Once all the lexical items have values, we propogate the values up the tree as follows:
• Work bottom-up
• A node updates its value as the product of the values of its children.
• Values could become large!
PFA Node Alignment Algorithm
• Once all nodes have values, they can be aligned as follows:
• If a node on Chinese side has a value same as node on English side, align them.
• If two nodes have equal values, take the node at lowest level in the tree, but not the lexical level node.
PFA Node Alignment Algorithm
• Once all nodes have values, they can be aligned as follows:
• If a node on Chinese side has a value same as node on English side, align them.
• If two nodes have equal values, take the node at lowest level in the tree, but not the lexical level node.
PFA Node Alignment Algorithm

Features of the algorithm:

Order of the constituents does not matter in node alignment.

Extra words in constituents are allowed, but the least number of them is allowed.

PFA Node Alignment Algorithm

Extraction of Phrases:

Get the Yields of the aligned nodes and build a phrase table tagged with syntactic categories on source and target sides!

Example:

NP # NP :: 澳洲 # Australia

PFA Node Alignment Algorithm

All Phrases from this tree:

IP # S :: 澳洲 是 与 北韩 有 邦交 的 少数 国家 之一 。 # Australia is one of the few countries that have diplomatic relations with North Korea .

VP # VP :: 是 与 北韩 有 邦交 的 少数 国家 之一 # is one of the few countries that have diplomatic relations with North Korea

NP # NP :: 与 北韩 有 邦交 的 少数 国家 之一 # one of the few countries that have diplomatic relations with North Korea

VP # VP :: 与 北韩 有 邦交 # have diplomatic relations with North Korea

NP # NP :: 邦交 # diplomatic relations

NP # NP :: 北韩 # North Korea

NP # NP :: 澳洲 # Australia

PFA Node Alignment Performance
• If data is manually word-aligned, alignment error rate is very small, so is the PFA Node-Alignment Error Rate.
• What happens when word-alignments are done automatically?
PFA Node Alignment Performance
• Evaluation Data: Treebank corpus.
• Parallel Chinese-English Treebank with manual word-alignments
• 3342 Sentence Pairs
• Node Alignments: 39874 (About 12/tree pair)
• NP to NP Alignments: 5427
• (Makes good phrase table!)
• With manual alignments as gold standard, evaluation done with automatic word alignments.
PFA Node Alignment Performance

Viterbi word alignments from Chinese-English and reverse directions were merged

Using different algorithms to test the performance of Node-Alignment