130 likes | 337 Views
Never-Ending Language Learning for Vietnamese. Coupled SEAL. Student: Phạm Xuân Khoái Instructor: PhD Lê Hồng Phương. Main content. 1. Introduction.
E N D
Never-Ending Language Learning for Vietnamese Coupled SEAL Student: PhạmXuânKhoái Instructor:PhD Lê Hồng Phương
1. Introduction • SEAL (Set Expander for Any Language) is a set expansions system that accepts input elements (seeds) of some target set S and automatically finds other probable elements of S in semi-structured documents such as web pages. • CSEAL (Coupled SEAL) is a SEAL systems which is added 2 constrants: • mutual-exclusion • type-checking constraints
1. Introduction Coupled SEAL : A semi-structured extractor SEAL: use wrapper induction algorithm Queries the internet with sets of beliefs from each category or relation; mines lists and tables for instances Uses mutual exclusion relationships to provide negative examples for filtering overly general lists and tables 5 queries/category 10 queries/relation fetches 50 web pages/query Rank by probabilities assigned as in CPL
New candidate facts Beliefs CSEAL 1. Introduction Internet
Knowledge Base Knowledge Integrator Data Resources Beliefs 1. Introduction Candidate facts 1 2 CSEAL CPL CMC RL 3 Subsystem Components
2. Concepts • Seed: input element • Wrapper: defined by 2 character strings, which specify the left-context and right-context necessary for an entity to be extracted from a page. These strings are chosen by 2 conditions: • Maximally-long contexts • At least 1 occurrence of every seed strings on a page
References Toward an Architecture for Never-Ending Language Learning (http://www.cs.cmu.edu/~acarlson/papers/carlson-aaai10.pdf) Language-Independent Set Expansion of Named Entities using the Web (http://www.cs.cmu.edu/~wcohen/postscript/icdm-2007.pdf) Coupled Semi-Supervised Learning for Information Extraction (http://www.cs.cmu.edu/~rcwang/papers/wsdm-2010.pdf) Character-level Analysis of Semi-Structured Documents for Set Expansion (https://www.cs.cmu.edu/~rcwang/papers/emnlp-2009.pdf)