1 / 10

Sketch engine for Chinese

Sketch engine for Chinese. Discussion notes. Wordsketch, subsequently Sketch Engine. Was developed by Kilgarriff et al at Brighton Gives automatic, corpus-based summaries of a word’s grammatical and collocational behaviour

afi
Download Presentation

Sketch engine for Chinese

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sketch engine for Chinese Discussion notes

  2. Wordsketch, subsequently Sketch Engine • Was developed by Kilgarriff et al at Brighton • Gives automatic, corpus-based summaries of a word’s grammatical and collocational behaviour • Captures information in a more accessible way then hundreds of KWIC lines • Uses MI based salience algorithm

  3. Other corpus query tools do collocational salience too, but… • Sketch engine uses lemmata not word-forms • So that eat and eats are treated the same • And it takes account of grammatical relations • So that The plane banks and The investment banks are treated separately • And (if the corpus is appropriately parsed) He robs banks and He robbed the bank would be accorded similar treatment

  4. Grammatical relations example Unary relations Word2 and Prep are not specified Binary relations Prep not specified Binary relations, Word2 not specified Trinary relations

  5. Sketch engine modules • Concordance • KWIC or sentence context • Thesaurus • A list of “similar” words • Sketch differences, for distinguishing near-synonyms • If both lemmata x and y have strong collocational salience with a, then they are near-synonyms • Wordsketch

  6. Sample of grammatical relation definitions script (M language) • define(`wh_word',`[tag=3D"AVQ"|tag=3D"D`$ p& TQ"|tag=3D"PNQ"]') • define(`whether_if',`[tag=3D"PNQ" & word=3D"if" |word=3D"whether"]') • define(`determiner',`[tag=3D"AT."|tag=3D"DT."|tag=3Dposs_pro]') • define(`conjunction',`"CJC"') • define(`simple_neg',`"XX."') • define(`rel_start',`[tag=3D"DTQ"|tag=3D"PNQ"|tag=3Dthat_comp]') • define(`adv_neg',`[tag=3Dany_adv|tag=3Dsimple_neg]') • define(`number',`"[OC]RD"') • define(`goal_adv',`[word=3D"back"|word=3D"over"|word=3D"home"|word=3D"awa= • y"|word=3D"out"]') • define(`long_np',`[tag=3D"AT."|tag=3D"DT."|tag=3Dposp& €( s_pro|tag=3Dnumber|ta= • g=3Dany_adv|tag=3Dany_adj|tag=3Dgenitive]{0,3} any_noun{0,2} 2:any_noun = • [tag!=3Dany_noun & tag !=3D genitive]') • define(`np_start',`[tag=3D"AT."|tag=3D"DT."|tag=3Dposs_pro|tag=3Dnumber|t= • ag=3Dany_adj|tag=3Dany_noun]')

  7. Applications • Intended as an aid to lexicographers • At least one paper on MT application • Could be used in pedagogical applications • Earlier NSF grant aimed at a complete Chinese learning platform, with Wordsketch as a module • Comparison of similar lexemes cross-linguistically • Yiching is publishing about express vs biaoshi, and this work may use Wordsketch

  8. Chinese Wordsketch • Kilgarriff et al report that Wordsketch can be ported to any language • Pavel Rychly in Czech Rep has implemented concordancing at Chinese character level only • AS has acquired Chinese Gigaword, and POS-tagged it automatically • No parsing has been attempted so far • Grammatical relations ruleset for Chinese is needed • I would plan to • contribute to the writing of this ruleset • collaborate on cross-linguistic lexical analyses, using Wordsketch where possible

  9. links • http://nlp.fi.muni.cz/projects/bonito2/chinese/ • test chin • http://www.sketchengine.co.uk/sampler/ • ssmith ssmith

More Related