1 / 11

GoogleDictionary

GoogleDictionary. Paul Nepywoda Alla Rozovskaya. Goal. Develop a tool for English that, given a word, will illustrate its usage. Who Will Benefit. Learners of English Teachers of English Native speakers who wish to find common usages of a word. Similar Tools?. Dictionaries BUT our tool

ipo
Download Presentation

GoogleDictionary

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GoogleDictionary Paul Nepywoda Alla Rozovskaya

  2. Goal • Develop a tool for English that, given a word, will illustrate its usage

  3. Who Will Benefit • Learners of English • Teachers of English • Native speakers who wish to find common usages of a word

  4. Similar Tools? • Dictionaries BUT our tool • focuses on the usage of words and not on defining their meanings • ranks expressions based on frequency • extracts examples straight from context

  5. Similar Tools? • Google BUT our tool • focuses on finding high frequency neighboring words instead of simply the documents that contain the target word

  6. Data Resources • Corpus of newspaper articles (3.5 Million words) [used for demo] • Advantage: large amount of data • Disadvantage: limited domain • Use a search engine to build a corpus of documents containing the target word • Advantages: various domains, dynamic data source • Disadvantage: time to download documents

  7. Implementation (1) • Search a corpus to determine the most typical words by extracting words within a certain window of the target word and rank words based on their frequencies -compute rank of single words and pairs of words within a window

  8. Implementation (2) • Computing rank of expression • Tf :raw count • Idf of a word : • Position Normalization: Reward context words closer to the target

  9. Interface • Output ranked list of expressions with example sentences via the Web Examples: course information notorious come come(without idf)

  10. Further Improvements • Use a search engine to build a corpus • Allow phrase searching • Provide option to search for highly frequent phrases as opposed to idiomatic expressions

  11. Conclusion • We have presented a tool that given a word will find typical usages of the word in natural language • The tool should be useful for • learners of English • native speakers

More Related