a model for learning words by crawling the web n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
A Model for Learning Words by Crawling the Web PowerPoint Presentation
Download Presentation
A Model for Learning Words by Crawling the Web

Loading in 2 Seconds...

play fullscreen
1 / 12

A Model for Learning Words by Crawling the Web - PowerPoint PPT Presentation


  • 76 Views
  • Uploaded on

A Model for Learning Words by Crawling the Web. Jeff Thomson, Sygys.com Rex Gantenbein, University of Wyoming. Overview. Goal: create an autonomous language learning system Use Web crawler technology Extract meaning from paragraphs and sentences to create language understanding Major issues

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'A Model for Learning Words by Crawling the Web' - lavender


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
a model for learning words by crawling the web

A Model for Learning Words by Crawling the Web

Jeff Thomson, Sygys.com

Rex Gantenbein, University of Wyoming

CAINE November 2009

overview
Overview
  • Goal: create an autonomous language learning system
    • Use Web crawler technology
    • Extract meaning from paragraphs and sentences to create language understanding
  • Major issues
    • Irregularity of natural language constructions
    • Understanding paragraphs and sentences
    • Determining meaning of new words

CAINE November 2009

handling irregularities
Handling irregularities
  • Most major parts of a language (English, anyway) can be generalized
    • Exceptions require preprocessing to fit them into generalizable categories
    • Example: Inflectional endings on verbs

bat is

bats am

batting are

batted was

CAINE November 2009

handling irregularities1
Handling irregularities
  • Idiomatic phrases require understanding of the entire phrase in a colloquial context

“Go jump in the lake” vs.

“Go cook yourself an egg”

  • Pronoun resolution

“Three boys each bought a pizza. They ate them in the park.”

CAINE November 2009

extracting understanding
Extracting understanding
  • Paragraph understanding
    • Matching paragraph structure to common forms
    • Finding the nucleus of the paragraph’s meaning
  • Sentence understanding
    • Matching sentence structure to common forms
    • Determining the meaning of the words in the sentence

CAINE November 2009

our approach
Our approach
  • Exception-first processing
    • Preprocessing to handle irregularities
  • Linguistic classifications based on tree structure

CAINE November 2009

our approach1
Our approach
  • Parser (incorporated into Web crawler) to determine structure
    • Some structures are disregarded when keywords are already classified
  • Word classification
    • Type, gender, number
    • Unknown words are analyzed according to rules using placement in sentence and surrounding classified words

CAINE November 2009

our approach2
Our approach
  • Keyword recognition
    • Use “word chains” (sequences of words) with application of linguistic knowledge
  • Word-level understanding
    • Reduce words to root form to process them as keywords
    • Reduce irregular forms using an exception database created at preprocessing

CAINE November 2009

system model
System model
  • Exception database
    • Separates generalizable and exception verbs
    • Processes word endings
    • Scans exception database for exception
    • Processes “normal” words according to rules

CAINE November 2009

system model1
System model
  • Categorization generator
    • Separates generalizable and exception words
    • Processes word endings
    • Scans exception database for exceptions and processes these first
    • Processes “normal” words according to rules
  • Sentence parser with disregard capacity
  • Paragraph understanding rules

CAINE November 2009

system model2
System model
  • Web crawler searches for source material
    • Processes the material and enhances its own rules and exceptions
    • Eventually will learn enough to understand most material in a given language
  • Future work
    • Implement a pilot version of this system
    • Determine how to control for a “given” language

CAINE November 2009

questions
Questions?

CAINE November 2009