50 likes | 171 Views
Join the DLSI Lexical Analysis project led by Prof. Brook Wu and Ph.D. student Xin Chen, focusing on the complexities of processing textual information. Address issues like word sense ambiguities (e.g., "mouse" as an animal vs. a device) and part-of-speech ambiguities (e.g., "offer" as a noun vs. a verb). The goal is to create link anchors for key concepts in documents. Participants will analyze glossaries and thesauri using techniques like tokenization and part-of-speech tagging. Proficiency in JAVA or C++ is required. This is an exciting opportunity as text processing gains importance in the industry.
E N D
DLSI Lexical Analysis Prof Brook Wu and Ph.D. student Xin Chen
Lexical Analysis • Focus on processing “text” • Difficulties: • word sense ambiguities, e.g.: regular “mouse” v.s. computer “mouse” • irregularities, e.g.: datum, data • Part-of-speech tag ambiguities, e.g.: an “offer” (noun) v.s. “Prof Bieber offers …” (verb)
Lexical Analysis in DLSI project • Purpose: generate link anchors for important concepts in returned documents. • Work involved: • Find glossaries/thesauri on the web or contact DLSI partners for information. • Organize them into a master file. • Find glossary/thesaurus term in text using lexical analysis techniques, including tokenization, part-of speech tagging, parsing, and matching.
Qualifications and Supervision • You should participate because text processing and lexical analysis is getting popular, for there is very rich information available in text. Industry will want people who know how to effectively process documents. • Qualifications: • Proficiency in JAVA, or C++ • Supervision: • A team of up to 3 students will be supervised by Prof Wu, but will mainly be led by Xin Chen, a Ph.D. candidate in IS.