# PFA Node Alignment Algorithm - PowerPoint PPT Presentation

PFA Node Alignment Algorithm. Consider the parse trees of a Chinese-English parallel pair of sentences. PFA Node Alignment Algorithm. Each of the nodes stores a value. All nodes are initialized with the value 1. Each Word to Word alignment is assigned a unique prime number.

## PowerPoint Slideshow about ' PFA Node Alignment Algorithm' - tia

Presentation Transcript

Consider the parse trees of a Chinese-English parallel pair of sentences.

Each of the nodes stores a value.

All nodes are initialized with the value 1.

Each Word to Word alignment is assigned a unique prime number.

• For every word to word alignment, we do the following:

• Let p be the unique prime value assigned to the alignment.

• Let wsand wt be the aligned words on the source and target side.

• Assign the value p to the nodes corresponding to the words wsand wt .

• Example: “Australia” gets value 2, “is” gets value 3.

In case there are “one-to-many” alignments, they are considered as multiple “one-to-one” alignments, and all of these alignments are given the same prime value.

Example: “North Korea” is just one word on Chinese side. That word is assigned the value 25, which is a product 5*5.

• Once all the lexical items have values, we propogate the values up the tree as follows:

• Work bottom-up

• A node updates its value as the product of the values of its children.

• Values could become large!

• Once all nodes have values, they can be aligned as follows:

• If a node on Chinese side has a value same as node on English side, align them.

• If two nodes have equal values, take the node at lowest level in the tree, but not the lexical level node.

Features of the algorithm:

Order of the constituents does not matter in node alignment.

Extra words in constituents are allowed, but the least number of them is allowed.

Extraction of Phrases:

Get the Yields of the aligned nodes and build a phrase table tagged with syntactic categories on source and target sides!

Example:

NP # NP :: 澳洲 # Australia

All Phrases from this tree:

IP # S :: 澳洲 是 与 北韩 有 邦交 的 少数 国家 之一 。 # Australia is one of the few countries that have diplomatic relations with North Korea .

VP # VP :: 是 与 北韩 有 邦交 的 少数 国家 之一 # is one of the few countries that have diplomatic relations with North Korea

NP # NP :: 与 北韩 有 邦交 的 少数 国家 之一 # one of the few countries that have diplomatic relations with North Korea

VP # VP :: 与 北韩 有 邦交 # have diplomatic relations with North Korea

NP # NP :: 邦交 # diplomatic relations

NP # NP :: 北韩 # North Korea

NP # NP :: 澳洲 # Australia

• If data is manually word-aligned, alignment error rate is very small, so is the PFA Node-Alignment Error Rate.

• What happens when word-alignments are done automatically?

• Evaluation Data: Treebank corpus.

• Parallel Chinese-English Treebank with manual word-alignments

• 3342 Sentence Pairs

• Node Alignments: 39874 (About 12/tree pair)

• NP to NP Alignments: 5427

• (Makes good phrase table!)

• With manual alignments as gold standard, evaluation done with automatic word alignments.

Viterbi word alignments from Chinese-English and reverse directions were merged

Using different algorithms to test the performance of Node-Alignment