1 / 17

Internet Resources Discovery (IRD)

Internet Resources Discovery (IRD). Concrete Learning Agents. Ahoy! - homepage finder Finds homepage of any person by name and organization. ShopBot - robot for comparison shopping Finds where user can buy some product in any pre-learned domain. Concrete Learning Agents.

gerda
Download Presentation

Internet Resources Discovery (IRD)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Internet Resources Discovery (IRD) Concrete Learning Agents T.Sharon-A.Frank

  2. Ahoy! - homepage finder Finds homepage of any person by name and organization. ShopBot - robot for comparison shopping Finds where user can buy some product in any pre-learned domain. Concrete Learning Agents • ILA - Internet Learning Agent • Learns to understand the content of semi-structured pages in terms of internal concepts. T.Sharon-A.Frank

  3. Ahoy! Homepage Finder • Personal homepages are a relatively new resource to be located on the Web. • Search engines don’t do a good job in finding personal homepages because they are hard to define/locate. • Ahoy! does it much better. • Ahoy! implements a new search method: DRS - Dynamic Reference Sifting. T.Sharon-A.Frank

  4. Buckets URL generator URL pattern extractor Dynamic Reference Sifting (DRS) • How to improve recall and precision? • DRS architecture is proposed as a way to provide high recall and precision in automatic page finding system. • DRS Components: • Candidate References Source • Cross Filter • Heuristic-based filter T.Sharon-A.Frank

  5. DRS Components (1) • Candidate References Source • comprehensive web indexes, like AltaVista. • E-mail services, like Whowhere, Bigfoot, Iaf • Cross Filter • filters candidates based on some orthogonal references source, like e-mail address directories. T.Sharon-A.Frank

  6. DRS Components (2) • Heuristic-based filter • filters candidates using domain-specific knowledge and heuristics • for homepages - look for the words: “homepage”, “my homepage”, “personal page”, etc. • for names - uses nicknames database and templates like “Sharon, Taly”, etc. T.Sharon-A.Frank

  7. DRS Components (3) • Buckets • ranks and labels the candidates into buckets of matches and near misses. • URL generator • tries to synthesize new candidate URLs if everything else fails. T.Sharon-A.Frank

  8. Example: URL Generator T.Sharon-A.Frank

  9. DRS Components (4) • URL pattern extractor • Extracts patterns from successful queries, to be used in URL generator. • For each successful hit saves : name, institution, URL • Learn institutions servers names and homepage paths. T.Sharon-A.Frank

  10. Ahoy! Flow User inputs target name and institution Institutional DB provides server names MetaCrawler provides raw references E-mail services provide user names Raw references filtered and bucketed YES Success? NO URLs generated using server name, username, stored URL patterns URL patterns extracted and stored References returned T.Sharon-A.Frank

  11. Ahoy! Search Example T.Sharon-A.Frank

  12. Ahoy! Example: Success T.Sharon-A.Frank

  13. Ahoy! Example Details T.Sharon-A.Frank

  14. Search Engines Results T.Sharon-A.Frank

  15. Ahoy! Evaluation Recall: Precision: T.Sharon-A.Frank

  16. ILA - Internet Learning Agent • Translation problem: how to interpret the source response in terms of internal concepts of the agent? • Search engines can’t understand the information contained in the returned source response. • ILA, as a learning agent, parses the response and uses heuristics to learn its format and data fields. • ILA uses learning by comparison. T.Sharon-A.Frank

  17. daemon:*:1:1:Mr Background:/:/dev/null sys:*:2:2::/:/bin/true bin:*:3:3::/bin:/bin/true gibuy:bncKACcgNpmFA:49:3:,,,,:/u/opers/gibuy:/bin/tcsh ariel:zNdAzJUj2G6vs:105:100:Ariel J. Frank,CS 019,035318407,03749454,:/u/opers/ariel:/bin/tcsh taly:pxEi5OQD/4N3E:1991:180:Sharon Taly:/u/grad/taly:/bin/Tcsh etc/passwd - Sample T.Sharon-A.Frank

More Related