Bilingual Co-Training for monolingual hyponymy-relation acquisition

Bilingual Co-Training for monolingual hyponymy-relation acquisition Jong-Hoon Oh, KiyotakaUchimoto, KentaroTorisawa ACL 2009

Outline • Goal • Motivation • Co-Training Concept • Task • Co-Training Algorithm • System Architecture • Experiment • Conclusion

Goal • Bilingual Co-training to improve monolingual semantic knowledge. • Hyponym ~ word whose semantic knowledge is contained in another word (is-a relationship). • Prior work hacking Wikipedia to acquire hyponymy relationship (2008, Sumida, Torisawa; Hacking Wikipedia for Hyponymy Relationship Acquisition). • Need to use a data set that has been manually labeled.

Motivation • Developing high-level NLP applications requires accurate semantic knowledge. • The relation can be seen as a classification task of semantic relationship. • This technique can inexpensively increase our semantic knowledge base. • Learning settings vary from language to language and the reliable data in one set may be unreliable in another set, so we can fix the unreliable data with the reliable data.

Bilingual Co-Training Concept

Task • Hyponym-relation acquisition from Wikipedia. • In Sumida 2008, original approach was recognize relation between words like 酵素 and 加水分解酵素, but this cannot account for the English version enzyme and hydrolase because they do not share a sub-string. • Solution: borrow this data from Japanese and add it to English. • Continually swap back and forth to increase data.

Should we use Machine Translation? • NO! • Since we are only dealing with nouns we can simply use dictionary look up. Consistent results were achieved without Machine Translation.

Co-Training 1/5 • S and T are two different languages. • CL is binary classification result {yes, no}. • X = XS U XT set of instances in languages S and T to be classified. • Classifier cassigns class label clin CL and a confidence value rin R+ for the assignment of the label ~ c(x) = (x, cl, r). • Use Support Vector Machine (SVM), the distance between a sample and the hyperplane determined by the SVMS are used as the confidence value r.

Co-training 2/5 • L is subset of Cartesian Product of X and CL. • Classifier ctrained by training data L, then c = LEARN(L). • S and T are manually prepared by LS and LT. • Bilingual dictionary DBI is translation pair of instances in XS and XT. DBI = {(s,t)} subset of cartesian product of XS and XT. • (s=(enzyme, hydrolase), t=(酵素,加水分解酵素))

Co-training 3/5 • c0S and c0T are leaned with manually labeled instances LS and LT. • ciSciT are applied to classify instances in XS and ST. • CRiS set of classification result of ciS on instances XS that is not in LiS and is registered in DBI. • Select from CRiS newly labeled instances to be added to a new training set T. • TopN(CRiS) is a set of ciS(x), whose rS is top-N highest in CRiS. • ciS acts as teacher and ciT as student.

Co-training 4/5 • The teacher instructs the student in the class label xT, which is a translation of xS through DBI, through clS only if he has a certain level of confidence, rS > threshold, and rT < theta or clS = clT (avoid possibility that student has confidence but disagrees with the teacher). • Then roles are reversed. • Co-training is based on different features of the same instances, and this case they are divided by languages.

Co-Training 5/5

System Architecture 1/4

System Architecture - Candidate Extraction 2/4 • Every English and Japanese Article constructed as: • Item: Subsection: List items • (Tiger, Siberian Tiger) is a Hyponym relation

System Architecture - Hyponymy-Relation Classification 3/4 • hyper is a hyponym candidate. • hypo is hyper’s candidate. • (hyper, hypo) is the hyponym-relation candidate.

System Architecture - Bilingual Instance Dictionary Construction 4/4 • Multi-lingual wikipedia articles are linked by cross-language links. • English and Japanese articles are extracted and their titles are regarded as translation pairs. • Use these pairs to build a dictionary.

Experiment 1/3 • May 2008 English Wikipedia and June 2008 Japanese Wikipedia. • Use 24,000 randomly selected hyponym candidates. • 8,000 relations were found in the manually checked data for both languages. • Use TinySVM ~ 100 iterations, Threshold = 1, and TopN = 900.

Experiment 2/3 • Three experiments to show effects of bilingual co-training, training data size, and bilingual instance dictionary. • SYT = Sumida 2008, INIT = based on initial classifier with new training data, TRAN = based on the classifier, BICO = Bilingual co-training.

Experiment 3/3 • Can we always improve performance through bilingual co-training with one strong and one weak classifier? • Use training data (20,000) for a strong classifier, and for the other language various weak classifiers (1,000; 5,000; 10,000; 15,000)

Conclusion • BICO showed 3.6% – 10. 9% improvement in F1. • Can help reduce the cost of preparing new training data in other languages. • Can be useful for any weak set if a strong set exists.

Bilingual Co-Training for monolingual hyponymy-relation acquisition