120 likes | 153 Views
This research explores a graph-based relational learner, SUBDUE, for non-logic-based concept learning. SUBDUE discovers patterns in labeled graphs using a search guided by compression heuristics and the Minimum Description Length principle for concept learning evaluation. Preliminary results show its competitive performance with ILP systems. The study draws parallels between SUBDUE's search space and the Galois lattice, proposing further empirical and theoretical comparisons with ILP systems.
E N D
Graph-Based Concept Learning Jesus A. Gonzalez, Lawrence B. Holder, and Diane J. Cook Department of Computer Science and Engineering University of Texas at Arlington Box 19015, Arlington, TX 76019-0015 {gonzalez,holder,cook}@cse.uta.edu http://cygnus.uta.edu/subdue/
MOTIVATION AND GOAL • Need for non-logic-based relational concept learner • Empirical and theoretical comparisons of relational learners • Logic-based relational learners (ILP) • FOIL [Quinlan et al.] • Progol [Muggleton et al.] • Graph-based relational learner • SUBDUE
SUBDUE KNOWLEDGE DISCOVERY SYSTEM • SUBDUE discovers patterns (substructures) in structural data sets • SUBDUE represents data as a labeled graph. • Vertices represent objects or attributes • Edges represent relationships between objects • Input: Labeled graph • Output: Discovered patterns and instances
SUBDUE EXAMPLE Input Output shape triangle object shape square on object 4 instances of
SUBDUE’S SEARCH • Starts with a single vertex and repeatedly expands by one edge • Computationally-constrained beam search • Polynomially-constrained inexact graph matching • Search space is all sub-graphs of input graph • Guided by compression heuristic • Minimum description length
EVALUATION CRITERION MINIMUM DESCRIPTION LENGTH • Minimum Description Length (MDL) principle • The best theory to describe a set of data is the one that minimizes the DL of the entire data set. • DL of the graph: the number of bits necessary to completely describe the graph. • Search for the substructure that results in the maximum compression.
CONCEPT LEARNING SUBDUE • Modify Subdue for concept learning (SubdueCL) • Accept positive and negative graphs as input examples • Find substructure describing positive examples, but not negative examples • Learn multiple rules (DNF)
CONCEPT LEARNING SUBDUE • Evaluation criteria based on number of positive examples covered without covering negative examples • Substructure value = 1 - Error
WK = White King WR = White Rook BK = Black King lt = less than adj = adjacent pos = position eq = equal CONCEPT LEARNING SUBDUE EXAMPLE • Examples in graph format (chess domain): a) Board Configuration b) Graph Representation
SUBDUE: error = 0.051163 +/- 0.044935 FOIL: error = 0.069767 +/- 0.054814 PROGOL: error = 0.230233 +/- 0.066187 SUBDUE: error = 0.004600 +/- 0.006186 SUBDUE – FOIL = -0.018605 +/- 0.052347 (p=0.145068) FOIL: error = 0.006600 +/- 0.007183 SUBDUE - PROGOL = -0.179070 +/- 0.074394 (p=0.000016) PROGOL: error = 0.002600 +/- 0.002675 FOIL - PROGOL = -0.16046 5 +/- 0.067979 (p=0.000019) SUBDUE - FOIL = -0.002000 +/- 0.007542 (p=0.211723) ANOVA: 0.000000 SUBDUE - PROGOL = 0.002000 +/- 0.004989 (p=0.118354) FOIL - PROGOL = 0.004000 + /- 0.007242 (p=0.057322) ANOVA: 0.306232 PRELIMINARY RESULTS • Comparison with FOIL and Progol • Significance test p for the Vote domain • Significance test p for the Chess domain
RELATED THEORY • Galois lattice [reference?] • Subdue’s search space is similar to the Galois lattice • Polynomial convergence results for the Galois lattice apply to Subdue • PAC analysis of conceptual graphs [reference?] • Subdue’s representation is a superset of conceptual graphs • PAC sample complexity results for conceptual graphs apply to Subdue
CONCLUSIONS • Empirical results indicate Subdue is competitive with ILP systems • More empirical comparisons are necessary • Theoretical results on Galois lattice and conceptual graphs apply to Subdue • Need to identify specific components of the theory directly applicable to Subdue • Expand theories where needed