Optimizing Automatic Name Transliteration Using OCR and NLP

Automatic Name Transliteration via OCR and NLP Yu Cao Tao Wang

Integration

Optical Character Recognition (OCR) • ICDAR 2011 dataset • character embedded in natural scene • histogram of oriented gradients (HOG) • 8x8 window sliding across at step of 2 • linear kernel SVM • 52 classes, i.e. capital and small letters • overall character-level accuracy 74%

Bayesian Correction • Char-level bigram language model • Char-level accuracy improved to 75.3%

Named Entity Recognition (NER) • essentially two types of labels, “PERSON” and “NONPERSON” • MUC 7 corpora • maximum entropy Markov model • set of features: “CUR_WORD”, “PREV_ LABEL”, “MID_INITIAL”, “IN_DICT”, “IN_NAME DATABASE”, “NEXT_WORD” • F1 score of 77.5% (Precision 76.9% & Recall 78.1%)

F r a n c i s c o 弗朗西斯科 Transliteration • character-level translation model • training data: 4,256 English – Chinese name pairs obtained online • trigram Chinese language model • alignment model IBM model 1,3,4 • human evaluation • 120 English names obtained by NER for testing • acceptance score 100 ± 2 /120

Optimizing Automatic Name Transliteration Using OCR and NLP

Optimizing Automatic Name Transliteration Using OCR and NLP

Presentation Transcript

Prosody and NLP

OCR

CS460/626 : Natural Language Processing/Speech, NLP and the Web Lecture 33 : Transliteration

CRLB via Automatic Differentiation: DESPOT2

Automatic Product Profiling via NLP

OCR Nationals Name – Teacher -

OCR Nationals Name – Teacher -

OCR Nationals Name – Joe fisher Teacher – mr nightingale

Transliteration in ICU

Clustering and NLP

Transliteration

OCR Nationals Name – Teacher -

Transliteration

SECURE WEB APPLICATIONS VIA AUTOMATIC PARTITIONING

Secure Web Applications via Automatic Partitioning

AUTOMATIC KEYPHRASE EXTRACTION VIA TOPIC DECOMPOSITION

Automatic Name Transliteration via OCR and NLP

Translation - Transliteration

Prosody and NLP