1 / 33

Automatic Identification of Similar Languages, Varieties, and Dialects

This lecture discusses the automatic identification of similar languages, varieties, and dialects using traditional and deep learning models. It includes examples and code snippets.

lovej
Download Presentation

Automatic Identification of Similar Languages, Varieties, and Dialects

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LING/C SC 581: Advanced Computational Linguistics Lecture 12 Feb 19th

  2. Administrivia • Hope you sent feedback on the last lecture on HMMs by Tatjana Scheffler. • This Thursday (Feb 21st), we have another guest lecture from faculty candidate Marcos Zampieri. • His job talk on "Automatic Identification of Similar Languages, Varieties, and Dialects: Between traditional and deep learning models" is tomorrow at noon in CHEM 209

  3. Last Time: Perl Interface to WordNet 3.0

  4. Last Time: Perl Interface to WordNet 3.0 • Files: • bfs.perl(basic code) • bfs4.perl (all minimal length solutions) • Example: perl bfs4.perl lorry#n#1 car#n#1 Found at distance 5 (1011 nodes explored) car#n#1 hypo motor_vehicle#n#1 hypo self-propelled_vehicle#n#1 hypo wheeled_vehicle#n#1 hype wagon#n#1 hype lorry#n#1 All minimal solutions found

  5. cpanm on Ubuntu • sudo apt-get install cpanminus • which cpanm /usr/bin/cpanm • sudocpanm WordNet::QueryData • sudo apt-get install pmtools

  6. WordNet::QueryData on Ubuntu

  7. WordNet Verb senses • destroy#v#1-4

  8. Framenet: destroy • https://framenet2.icsi.berkeley.edu/fnReports/data/frameIndex.xml?frame=Destroying • https://framenet2.icsi.berkeley.edu/fnReports/data/frameIndex.xml?frame=Killing • https://framenet2.icsi.berkeley.edu/fnReports/data/frameIndex.xml?frame=Stimulate_emotion

  9. Framenet • https://framenet.icsi.berkeley.edu/fndrupal/

  10. WordNet Verb senses: destroy • Coordinate terms (up 1, down 1)

  11. WordNet Verb senses: destroy • Troponyms (ways to):

  12. WordNet Verb senses: destroy • Derivationally related to (parts of speech):

  13. WordNet Adjective senses: happy • Adjective

  14. WordNet Adjective senses: happy • Antonyms: unhappy#a#1 happy#a#1

  15. WordNet Adjective senses: happy • Value of: • Related to:

  16. WordNet Adjective senses: happy • Pertains to: in old WordNet version (1.7.1) abaxialsee also axial no connection here… as we’ll see

  17. A simple program

  18. perldocWordNet::QueryData • Start up • use WordNet::QueryData; • my $wn = WordNet::QueryData->new( noload => 1); • noload => 0 (caches indexes in computer memory, slower startup, faster lookup time)

  19. perldocWordNet::QueryData • querySense, queryWord • 1st argument • (1) word returns list of word#pos • Example: $wn-­‐>querySense("run")  • Answer: run#n, run#v • (2) word#posreturns list of word#pos#sense • Example: $wn-­‐>querySense("run#v")  • Answer: run#v#1, run#v#2, run#v#3, run#v#4, run#v#5, run#v#6, run#v#7, run#v#8, run#v#9, run#v#10, run#v#11, run#v#12, run#v#13, run#v#14, run#v#15, run#v#16, run#v#17, run#v#18, run#v#19, run#v#20, run#v#21, run#v#22, run#v#23, run#v#24, run#v#25, run#v#26, run#v#27, run#v#28, run#v#29, run#v#30, run#v#31, run#v#32, run#v#33, run#v#34, run#v#35, run#v#36, run#v#37, run#v#38, run#v#39, run#v#40, run#v#41 • (3) word#pos#sense + 2nd argument: a relation

  20. perldocWordNet::QueryData • querySense, queryWord • 1st argument • (3) word#pos#sense+ 2nd argument: a relation returns list of word#pos#sense • Example: $wn-->querySense("cat#n#1", "hypo") • Answer: domestic_cat#n#1, wildcat#n#3

  21. perldocWordNet::QueryData • To find other members of the synsets: perlcat.perl domestic_cat#n#1 house_cat#n#1 Felis_domesticus#n#1 Felis_catus#n#1 wildcat#n#3

  22. perldocWordNet::QueryData • Possible relations for queryWord: • “also” - also see • “ants” – antonyms • “deri” - derived forms (nouns and verbs only) • “part” - participle of verb (adjectives only) • “pert” - pertainym (pertains to noun) (adjectives only) • “vgrp” - verb group (verbs only)

  23. perldocWordNet::QueryData • Possible relations for querySense: • also - also see • glos - word definition • syns - synset words • hype – hypernyms • inst - instance of • hypes - hypernyms and "instance of” • hypo – hyponyms • hasi - has instance • hypos - hyponums and "has instance” • mmem - member meronyms • msub - substance meronyms • mprt - part meronyms • mero - all meronyms • hmem - member holonyms • hsub - substance holonyms • hprt - part holonyms • holo - all holonyms • attr - attributes (?) • sim - similar to (adjectives only) • enta - entailment (verbs only) • caus - cause (verbs only) • domn - domain – all • dmnc - domain – category • dmnu - domain – usage • dmnr - domain – region • domt - member of domain - all (nouns only) • dmtc - member of domain - category (nouns only) • dmtu - member of domain - usage (nouns only) • dmtr - member of domain - region (nouns only)

  24. perldocWordNet::QueryData • validForms: 1 argument word or word#pos • It returns a list of all alternate forms (alternate spellings, conjugations, plural/singular forms, etc.). perlvalidforms.perl go#v Note: in documentation “lay down#v” returns lay_down#v, lie_down#v “bank” returns bank#nbank#v

  25. perldocWordNet::QueryData • level: one argument word#pos#sense 13. 8. 7. 3. 1.

  26. wnb (WordNet browser) • Exercise: • find the relationship between minibike and convertible

  27. Programmed search CMDLINE PARAMETERS 1. word#pos#sense 2. word#pos#sense OPTIONAL $max can be set using a 3rdcmdline parameter for bfs4.perl

  28. Programmed search • We can write a program to find the minimum distance between word senses: bfs.perl (on website)

  29. Programmed search mark is used to count the depth of breadth-first search hash %seen holds already visited nodes so we don’t loop semantic distance = depth of breadth-first search when $end is matched

  30. Programmed search Example Synset: minibike#n#1 motorbike#n#1

  31. Programmed search get node to expand bookkeeping for depth of search record node as seen Recall Perl array: end (push/pop), beginning (shift/unshift)

  32. Programmed search node is related to newnode via rel increment node count if $end is found, jump out of nested loop up to scope FOUND if we haven’t seen newnode before, queue it for later expansion

  33. Programmed search only two ways to get to the end of the program, either we found our target node $end or $n is no longer < $max

More Related