220 likes | 335 Views
Towards a Logic Formalization of Taxonomic Concepts. Dave Thau, Bertram Lud äscher, Shawn Bowers UC Davis thau@learningsite.com. Gray 1834. Chapman 1860. Kral 1998. Thau 2006. Names are Confusing . Adapted from R. Peet. Ranunculus plumosa. R.plumosa var intermedia. R.plumosa
E N D
Towards a Logic Formalization of Taxonomic Concepts Dave Thau, Bertram Ludäscher, Shawn Bowers UC Davis thau@learningsite.com
Gray 1834 Chapman 1860 Kral 1998 Thau 2006 Names are Confusing Adapted from R. Peet Ranunculus plumosa R.plumosa var intermedia R.plumosa var plumosa Ranunculus pinetcola Ranunculus plumosa Ranunculus plumosa Ranunculus homunculus 5th International Conference on Ecological Informatics
Impact on Data Analysis • Can’t find data • If A º B, a search on A should retrieve B • Same if A B • Can’t aggregate data • If A B, you should be able to combine data from A into B 5th International Conference on Ecological Informatics
Where In Greece Can I Find Ranunculus aquatilis? R. aquatilis R. trichophyllus 5th International Conference on Ecological Informatics
A B A B B A B A A overlap B A disjoint B A B A B B A Mapping Taxonomies Benson, 1948 FNA-03, 1997 Ranunculus aquatilis Ranunculus aquatilis º R.a. var calvescens R.a. var capillaceus R.a. var aquatilis R.a. var diffusus R.a. var hispidulus º º This results in 512 (more than 240 million) possible sets of relationships. 5th International Conference on Ecological Informatics
Overview • The problems – Names change, experts disagree, data become incomparable • The partial solution – Taxonomic Concepts • Another part of the solution – Logic • Representing taxonomy in logic • Using the representation to detect inconsistencies and discover new relations • Applications 5th International Conference on Ecological Informatics
Logic, why? • Precise modeling language • Solid mathematical basis • Good tools for reasoning are available • Explicit, “portable” representation (not buried in code) 5th International Conference on Ecological Informatics
T = (N, E) N = {A, B, C} E = {B A, C A} isaTx:m(x) n(x)m n E, T=(N,E)) } isa isa isa isa isa Basic Taxonomy A • Rooted tree • Only “Isa” relations isa isa B C B A C A In the basic taxonomy TisaT 5th International Conference on Ecological Informatics
A isa isa B C isa isa Some Additional Constraints • No empty nodes • All nodes have at least one element • Tx: n(x)n N, T=(N,E)) } • Disjointness • The children of a node are disjoint • !Tx: n1(x) n2(x) n1 m E, n2 m E, T=(N,E)) } • Closed World • A node with children is defined as the union of those children • This one’s formula is a bit long – trust me… 5th International Conference on Ecological Informatics
Mapping Formulae • Mappings between nodes in two different taxonomies have their owns • In the slides and proofs to come I will use these symbols: A B: A is included in B A B: A includes B A B: A and B are equivalent 5th International Conference on Ecological Informatics
Inferring Unstated Correspondences Benson, 1948 Kartesz, 2004 Ranunculus arizonicus Ranunculus arizonicus Given: º Given: R.a. var chihuahua R.a. var typicus We can demonstrate: Peet, 2005: B.1948:R.a.typicus is included in K.2004:R. arizonicus B.1948:R. arizonicus is congruent to K.2004:R. arizonicus 5th International Conference on Ecological Informatics
Proving New Mappings Benson, 1948 Kartesz, 2004 A Ranunculus arizonicus D Ranunculus arizonicus º B R.a. var chihuahua C R.a. var typicus ? Show B D and (D B) 5th International Conference on Ecological Informatics
Formal Proof of Mapping Part 1 Part 2 5th International Conference on Ecological Informatics
Inconsistent Mapping Benson, 1948 Kartesz, 2004 Ranunculus hydrocharoides Ranunculus hydrocharoides º R.h. var natans R.h. var stolonifer R.h. var typicus R.h. var stolonifer R.h. var typicus º º Peet, 2005: B.1948:R.h.stolonifer is congruent to K.2004:R.h.stolonifer B.1948:R.h.typicus is congruent to K.2004:R.h.typicus B.1948:R. hydrocharoides is congruent to K.2004:R. hydrocharoides 5th International Conference on Ecological Informatics
Proving Inconsistency Benson, 1948 Kartesz, 2004 Ranunculus hydrocharoides Ranunculus hydrocharoides º R.h. var natans R.h. var stolonifer R.h. var typicus R.h. var stolonifer R.h. var typicus º º 5th International Conference on Ecological Informatics
Formal Proof of Inconsistency 5th International Conference on Ecological Informatics
Showing Inconsistency Using Popular Tools Benson, 1948 Kartesz, 2004 Ranunculus Ranunculus Ranunculus macranthus Ranunculus petiolaris Ranunculus petiolaris … … B.48:R. petiolaris K.04:R. petiolaris B.48:R. macranthus contradicts B.48:R. macranthus and B.48:R. petiolaris are disjoint. Peet, 2005: B.1948:R. macranthus contains K.2004: R. petiolaris B.1948:R. petiolaris is contained by K. petiolaris 5th International Conference on Ecological Informatics
Resolving Inconsistencies • Trying to simultaneously satisfy no emptiness, disjointness and the closed world • Relaxing any of these makes the mapping consistent – giving us clues to hidden truths • It turns out that Kartesz and Benson focus on different localities. 5th International Conference on Ecological Informatics
Inconsistent Mapping Benson, 1948 Kartesz, 2004 Ranunculus hydrocharoides Ranunculus hydrocharoides º R.h. var natans R.h. var stolonifer R.h. var typicus R.h. var stolonifer R.h. var typicus º º Peet, 2005: B.1948:R.h.stolonifer is congruent to K.2004:R.h.stolonifer B.1948:R.h.typicus is congruent to K.2004:R.h.typicus B.1948:R. hydrocharoides is congruent to K.2004:R. hydrocharoides 5th International Conference on Ecological Informatics
Summary • Taxonomic Concepts are important • Logic is a useful tool when reasoning about mappings between taxonomies • We have the beginnings of a representation for taxonomies • That representation can find unstated mappings • And detect inconsistent mappings 5th International Conference on Ecological Informatics
Future Work • Beefing up the representation • Formalizing more constraints, such as rank • Working in other factors, such as locality • Adding ‘intelligence’ to tools which build mappings • Using the representation in a workflow system to aid data integration 5th International Conference on Ecological Informatics
Thanks! Questions? • We would like to acknowledge: • Bob Peet for the Ranunculus data set • NSF, under SEEK awards 0225676, 0225665, 0225635, and 0533368 5th International Conference on Ecological Informatics