50 likes | 117 Views
Explore the complexities of multilingual machine learning (MLR) and the necessity to adapt to diverse data environments. Discover the significance of different languages, varying query types, and emerging features for optimal MLR performance.
E N D
International/JP MLR Issues • Have to do more with less data • Blending different languages? • Can’t necessarily filter adult • May need new/different features • Different types of queries English/Bracket/Phrase/etc • Metrics designed for English • China has lots more spam • Japan has much less spam • Germany looks 10-20% ahead of Google by DCG
Different features important for JP • http://internal.inktomi.com/~lukeb/FeatureImportance.html • “Linkflux” • How soon the word appears in the document • Is the first word in query in the title
New features for JP • Query Word Length very important • Query type important • Phonetic url match • Future: • vcano match • Matching segmented chunks