1 / 14

Concept-Based Analysis of Scientific Literature

Concept-Based Analysis of Scientific Literature. Chen-Tse Tsai , Gourab Kundu, Dan Roth CS @ UIUC. Understanding Research Communities. Consider following questions What are the key applications studied by the community?

Download Presentation

Concept-Based Analysis of Scientific Literature

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth CS @ UIUC

  2. Understanding Research Communities • Consider following questions • What are the key applications studied by the community? • What applications have matured enough to be used as a technique of other applications? • What methods were developed to solve a particular problem? • In this paper • Extract concepts from scientific papers • A concept is a cluster of possible mentions • {svm, support vector machines, maximal margin classifiers,…} • Analyze computational linguistic research by answering above questions

  3. Outline • Computational Approach • Concept Mention Extraction • Citation-Context based Concept Clustering • Evaluation of Algorithms • Understanding Computational Linguistic Research

  4. Concept Mention Extraction • Identify and categorize mentions of concepts (Gupta and Manning, 2011) • TECHNIQUE and APPLICATION “We apply support vector machines on text classification.” • Unsupervised Bootstrapping algorithm (Yarowsky, 1995; Collins and Singer, 1999) • The proposed algorithm • Extract noun phrases (Punyakanok and Roth, 2001) • For each category, initialize a decision list by seeds. • For several rounds, • Annotate NPs using the decision lists. • Extract top features from new annotated phrases, and add them intodecision lists.

  5. Citation-Context Based Concept Clustering(CitClus) • Cluster mentions into semantic coherent concepts Group concept mentions by citation context Merge clusters based on lexical similarity between mentions in the clusters Paper1…………………………………… support vector machine………………... ……………………………………………………………………………………. c4.5…….. Paper2…………………………………… svm-based classification………………… .…………………………………............. decision_trees………….…….………………………………… (Cortes,1995) (Cortes,1995) (Cortes,1995) (Cortes,1995) (Quinlan,1993) (Quinlan,1993) (Quinlan,1993) (Quinlan,1993) Paper3.………………………………………………………………………….. svm….…………………………………….…………………………………………………… Paper4…………………………………… maximal_margin_classifiers…………………………………….………………………………………………………………….. (Vapnik,1995) (Vapnik,1995) (Vapnik,1995) (Vapnik,1995) • c4.5 • decision trees • support vector machine • svm-based classification • svm • maximal margin classifiers

  6. Outline • Computational Approach • Concept Mention Extraction • Citation-Context based Concept Clustering • Evaluation of Algorithms • Understanding Computational Linguistic Research

  7. Evaluation of Mention Extraction • ACL Anthology Network Corpus (Radev et al., 2009) • Training data: 11,005 abstracts • Test data: 474 abstracts (Gupta and Manning 2011)

  8. Evaluation of Concept Clustering • Manually cluster the extracted mentions from 1000 full text papers. • CitClus: the proposed approach • LexClus: group the concept mentions by lexical similarity • CitClus groups • “maximal entropy classifier” and “logistic classifier” • “topic modeling” and “latent dirichlet allocation”

  9. Outline • Computational Approach • Concept Mention Extraction • Citation-Context based Concept Clustering • Evaluation of Algorithms • Understanding Computational Linguistic Research

  10. The emergence of SVM The emergence of Topic modeling Trends Analysis CitClus LexClus Topic modeling is high in 90’s, because LDA cannot generate a tight enough cluster for a specific concept LDA

  11. Predictive Quality • For a concept, predict the number of papers in a year, given the number of papers in the previous three years • Linear regression over every three consecutive years • The better the grouping of mentions into coherent concept is, the more stable the trend graph is.

  12. Relations Between Concept Categories • For a given concept, calculate the ratio between number of application mentions and technique mentions. • Three concepts in ACL community • Support vector machines, Machine translation, POS tagging POS tagging, #tech/#app SVM, #app/#tech MT, #tech/#app

  13. Relations Between Concept Categories • For a given application, what techniques have been applied to it. Phrase-based and MERT Machine translation Decision Tree CRF Decision Tree disappears Named entity recognition

  14. Conclusion • This work proposed algorithms for identifying, categorizing and clustering mentions of scientific concepts. • These tools can provide rather deep understanding and useful insight of research communities. Named entity recognition

More Related