1 / 16

An Empirical Investigation of Learning from the Semantic Web

An Empirical Investigation of Learning from the Semantic Web. Pete Edwards Gunnar AAstrand Grimnes Alun Preece Computing Science Department University of Aberdeen {pedwards, ggrimnes ,apreece}@csd.abdn.ac.uk Semantic Web Mining Workshop @ ECML 2002. Motivation. The Semantic Web should:

miya
Download Presentation

An Empirical Investigation of Learning from the Semantic Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Empirical Investigation of Learning from the Semantic Web Pete Edwards Gunnar AAstrand Grimnes Alun Preece Computing Science Department University of Aberdeen {pedwards,ggrimnes,apreece}@csd.abdn.ac.uk Semantic Web Mining Workshop @ ECML 2002 An Empirical Investigation of Learning from the Semantic Web

  2. Motivation • The Semantic Web should: • Facilitate learning from the Web. • Facilitate reuse of learning outcomes. Hypothesis : Learning from Semantically Marked-up data should outperform learning from plain text. An Empirical Investigation of Learning from the Semantic Web

  3. Methods • Compare performance of learning from plain text and from semantic Meta-data. • Using traditional ML algorithms as baseline approach. • Naïve Bayes • K-Nearest Neighbour • Explore application of more knowledge intensive approaches, such as ILP. An Empirical Investigation of Learning from the Semantic Web

  4. Datasets • Semantic Web still in its infancy, so available datasets are limited. • Need dataset with instances represented in plain-text and in some semantic markup-language. • Forced to use artificial data-sets. • No ontological support. An Empirical Investigation of Learning from the Semantic Web

  5. ITTalkshttp://ittalks.org • ITTalks is a real Semantic Web application. • Information about seminars at Universities in the US. • Plain HTML and DAML+OIL versions of each talk has slightly different content, but largely overlapping. • No classification of data, so we did personal preference labelling. An Empirical Investigation of Learning from the Semantic Web

  6. ITTalks example <rdf:RDF> <rdf:Description about="http://www.ittalks.org/jsp/Controller.jsp?action=ViewTalk&amp;as=HTML&amp;talkid=20010620141011"> <Talk rdf:parseType="Resource"> <Title>PROBABILISTIC OPTIMIZATION TECHNIQUES FOR MULTICAST KEY MANAGEMENT … </Title> <Abstract>Multicast is a key technology to support large group communications over the Internet… </Abstract> <BeginTime> <time:Year>2001</time:Year> <time:Month>06</time:Month><time:Day>20</time:Day> ... </BeginTime> ... <Audience>General Public</Audience> <DomainName>umbc</DomainName> <Location rdf:parseType="Resource"> <Institution>UMBC</Institution> </Location> <Speaker rdf:parseType="Resource"> <Name>Ali Selcuk</Name> <Organization>UMBC</Organization> <Email>aselcu1@csee.umbc.edu</Email> </Speaker> </Talk> </rdf:Description> </rdf:RDF> An Empirical Investigation of Learning from the Semantic Web

  7. ResearchIndexhttp://citeseer.nj.nec.com • ResearchIndex is scientific literature digital library. • Articles from 17 different subject areas within Computing Science. • Full text of article and BibTeX provided. • BibTex converted to RDF. • Full text is typically 6000 words. • BibTex is typically 10 RDF Statements. An Empirical Investigation of Learning from the Semantic Web

  8. BibTeX  RDF mapping @inproceedings{ davies94agentk, author = "W. H. E. Davies and P. Edwards", title = "Agent-K: An Integration of AOP and KQML", booktitle = "Proceedings of the CIKM'94 Workshop on Intelligent Agents", address = "Gaithersburg, MD, USA", editor = "T. Finin and Y. Labrou", year = "1994", url = "citeseer.nj.nec.com/15298.html" } <inproceedings rdf:about="davies94agentk"> <author>W. H. E. Davies and P.Edwards</author> <title>Agent-K: An Integration of AOP and KQML</title> <booktitle>Proceedings of the CIKM'94 Workshop on Intelligent Agents</booktitle> <address>Gaithersburg, MD, USA</address> <editor>T. Finin and Y. Labrou</editor> <year>1994</year> <url>citeseer.nj.nec.com/15298.html</url> </inproceedings> An Empirical Investigation of Learning from the Semantic Web

  9. Knowledge Sparse LearningRepresentation • For each algorithm we use 3 instance representations: 1. Conventional plain text 2. Meta-data as plain-text 3. Meta-data tags to feature mapping An Empirical Investigation of Learning from the Semantic Web

  10. Method 3 Meta-data tags to feature mapping Meta-data instance: <xml> <rdf> <talk id='mlsemweb1'> <title>An Empirical Investigation of Learning from the Semantic Web</title> <speaker> <name>Gunnar AAstrand Grimnes</name> <url>http://www.csd.abdn.ac.uk/~ggrimnes</url> </speaker> ... Feature tags: talk, title, speaker, name, url ... Instance representation: {}, {empirical, investigation, learning, semantic, web}, {}, {gunnar, aastrand, grimnes}, {csd, abdn, ggrimnes} ... An Empirical Investigation of Learning from the Semantic Web

  11. Knowledge Sparse LearningResults ITTalks ResearchIndex • ITTalks: • Meta 2 performs poorly, caused by redundant features. • Text & Meta 1 are very similar, as those instances in this dataset are almost identical. • ResearchIndex: • KNN performs better for the full text instances, as it is better at dealing with large numbers of features. An Empirical Investigation of Learning from the Semantic Web

  12. Knowledge Intensive LearningRepresentation • Ignore the plain-text representations. • RDF maps to 1st order logic Prolog representation. • Using the ILP algorithm Progol4.4 to learn Prolog rules for class descriptions. • Solve binary classification problems. An Empirical Investigation of Learning from the Semantic Web

  13. RDF  Prolog mapping url( davies94agentk, 'citeseer.nj.nec.com/15298.html' ). editor( davies94agentk, 'T. Finin' ). editor( davies94agentk, 'Y. Labrou' ). titleword( davies94agentk, 'agent' ). titleword( davies94agentk, 'integration' ). titleword( davies94agentk, 'aop' ). titleword( davies94agentk, 'kqml' ). author( davies94agentk, 'W. Davies' ). author( davies94agentk, 'P. Edwards' ). address( davies94agentk, 'Gaithersburg, MD,USA'). year( davies94agentk, '1994' ). type( davies94agentk, ‘#inproceedings' ). booktitleword( davies94agentk, 'proceedings' ). booktitleword( davies94agentk, 'cikm94' ). booktitleword( davies94agentk, 'workshop' ). booktitleword( davies94agentk, 'intelligent' ). booktitleword( davies94agentk, 'agents' ). <inproceedings rdf:about="davies94agentk"> <author>W. H. E. Davies and P.Edwards</author> <title>Agent-K: An Integration of AOP and KQML</title> <booktitle>Proceedings of the CIKM'94 Workshop on Intelligent Agents</booktitle> <address>Gaithersburg, MD, USA</address> <editor>T. Finin and Y. Labrou</editor> <year>1994</year> <url>citeseer.nj.nec.com/15298.html</url> </inproceedings> An Empirical Investigation of Learning from the Semantic Web

  14. Knowledge Intensive LearningResults Agents experiment (155 clauses): inClass(A) :- author(A,'A. Rao'). inClass(A) :- author(A,'D. Lambrinos'). inClass(A) :- titleword(A,agent), titleword(A,mobile). inClass(A) :- type(A,'http://www.csd.abdn.ac.uk/òggrimnes/exp/#misc'), textword(A,agent), titleword(A,agent). inClass(A) :- year(A,1999), titleword(A,agents). inClass(A) :- titleword(A,bdi). Machine Learning (259 clauses): inClass(A) :- publisher(A,'Morgan Kaufmann'), booktitleword(A,learning). inClass(A) :- titleword(A,based), titleword(A,case). Theory (279 clauses): inClass(A) :- volume(A,18). An Empirical Investigation of Learning from the Semantic Web

  15. Future workLearning Personal Profiles Gunnar’s profile. Based on 200 manually rated articles from the ResearchIndex dataset. inClass(A) :- titleword(A,image). inClass(A) :- type(A,'http://www.csd.abdn.ac.uk/~ggrimnes/exp/#misc'), textword(A,learning). inClass(A) :- booktitleword(A,mining). inClass(A) :- author(A,'N. Jennings'). inClass(A) :- titleword(A,indexing). inClass(A) :- pages(A,143). An Empirical Investigation of Learning from the Semantic Web

  16. Conclusion • In terms of accuracy learning from the Semantic Web was not superior. • Learning from RDF requires less resources. • Datasets have no ontological support. • Learning outcomes from the Semantic Web can be real, reusable knowledge. An Empirical Investigation of Learning from the Semantic Web

More Related