120 likes | 218 Views
Explore automating the process of gathering and verifying facts from the web to enhance the Cyc knowledge base. Discover the stages of knowledge acquisition from the World Wide Web, from choosing queries to reviewing and asserting learned sentences. See experimental results and conclusions on this innovative approach.
E N D
Searching for Common Sense: Populating Cyc from the Web Presented by Yu-Chung Shen 2007/05/03
Introduction • In the last twenty years , over 3 million facts and rules have been entered manually in the Cyc knowledge base by ontologists. • Shouldn’t there be a better way ? • Automating the process of gathering and verifying facts from the World Wide Web.
Knowledge acquisition from WWW • Gather information from the web preceeds in six stages • Choosing queries • Searching ( Google ) • Parsing results • KB consistency checking • Google verification • Reviewing and asserting
Choosing Queries and Generating Search Strings • Example : • Limited to a set of 134 binary predicates. • Generating search strings using templates.
Parsing search results into CycL sentences • Example :
Checking Cyc KB Consistency • Discard facts that are redundant or contradictory via inference. • Example : Fact : (foundingAgent PalestineIslamicJihad AugusteRodin) • Cyc know AugusteRodin died in 1917. • Cyc know PIJ was founded in 1989 . • The fact is contradictory . It will be discarded.
Google Verification • Guard against parser error. • Example : New Fact : (foundingAgent PalestineIslamicJihad xasdawqeqw) Search string :PIJ founder xasdawqeqw
Review and Assertion • Learned sentences are reviewed by a human curator. • If correct , assert learned sentences into Cyc knowledge base.
Experimental Results • The majority of the searches expanded , about 80% were peformed in the verification phase. The results were as follows : (GAFs : Ground atomic formulas . Atomic sentences in Cyc KB. )
Experimental Results • A human reviewer then went through the verified GAFs , and a sample of 53 of the unverified GAFs , and determined their actual correctness.
Conclusions • The work being done here is immediately useful as a tool that makes human knowledge entry faster , easier , and more effective. • Hope to provide Cyc with a mechanism to truly acquire knowledge by learning. • Q&A ?