Visualizing Knowledge Structure of an Academic Field

VisualizingKnowledge Structureof an Academic Field 世新大學資訊傳播學系林頌堅 scl@cc.shu.edu.tw

Research Problems

The rapid growth of papers Retrieving EBSCOhost DB by the query: “social network” or “social networks”

Retrieving ACM DL by the query: “information visualization”

For many academic fields , it is more difficult to analyze and understand their knowledge structures than before.

Knowledge structure in an academic field • Important research topics (taking the field of information visualization as an example) • interaction, evaluation, user study, interface, visual analytics, focus+context, overall+detail, trees, maps, networks, …… • Relations between research topics • eg. user interface evaluation interaction  visual analytics ……

Paper contents present the research topics of studies

Paper contents present the research topics of studies In this paper we present an approach that integrates interactive visualizations in the exploratory search process. In this model visualizations can act as hubs where large amounts of information are made accessible in easy user interfaces. Through interaction techniques this information can be combined with related information on the World Wide Web. We applied the new search concept to the domain of stock market information and conducted a user study. Participants could use this interface without instructions, could complete complex tasks like identifying related information items, link heterogeneous information types and use different interaction techniques to access related information more easily. In this way, users could quickly acquire knowledge in an unfamiliar domain.

When scientists decide to write a paper, one of the first things they do is identify an interesting subset of the many possible topics of scientific investigation. The topics addressed by a paper are also one of the first pieces of information a person tries to extract when reading a scientific abstract.

Papers with same topics have similar contents

Contents of papers show knowledge structure • The topics about a paper are presented in its content • Papers with similar contents are relevant to some topics in common • For a certain topic, the number of relevant papers may indicate its importance • If there were a relation between two different topics, they may be presented ina same set of papers

Papers as source to extract knowledge structure of an academic field 1. Collecting papers published in the academic field 2. Establishing feature vectors for representing the papers 3. Estimating relation between any pair of the papers 4. Presenting the relations with proper data structure for visualizing knowledge structure of the field

Data structure to present knowledge structure for visualization • Trees • Scatter plots • Networks • 2-dimenational maps

Network representation for knowledge structure of an academic domain • Vertices • Papers published in the field • Edges • Relations between the papers • Determined by their relevance scores

Features of network representation • Intuitive presentation • Ease for visual navigation and search • Applying developed network analysis technologies

Problems of network visualization • Observing the overview and the details of a large and dense network simultaneously is very difficult • Process of identifying topics by visual analysis and naming them is usually very arbitrary

Research Methods

Ideas to solving problems • Deleting redundant edges to reduce network complexity • Grouping highly-related nodes based on characteristics of network structure • Labeling node groups according to contents of the corresponding papers

The process of visualizing knowledge structure of a domain Term Extraction Relevance Estimation Network Establishment PFNet Scaling Network Partitioning Graph Drawing Topic Labeling

Term extraction • Terms are extracted from papers to be used as representing features • In Chinese text, boundaries of words are not clear • Term extraction should consider the unithood and termhood(Kageuraand Umino, 1996) • Unithood: the degree of strength or stability of syntagmatic combinations and collocations • Termhood: the degree that a linguistic unit is related to domain-specific concepts • Automatic Chinese term extraction using statistical information of occurrences of character strings

Term extraction • A feature vector is assigned to each paper based on occurrence frequency of the extracted terms in the paper and the collection : the frequency count of the term occurring in the paper : the inverse document frequency of the term in the collection

Relevance estimation • To determine relevance score between any pair of papers based on the closeness of their feature vectors • Vector space model

Network Establishment • Each paper corresponds to a vertex in the network • The edge between a pair of vertices is determined by the relevance score of their corresponding papers • Edges with very small relevance score are deleted to reduce computational resources

PF-Net scaling • Pruning a amount of less salient edges to reduce network complexity • Retaining the structural characteristics of the original network • keeping those edges not violate the triangle inequality <+

PF-Net scaling • Generalizing triangle inequality by using the Minkowski distance

PF-Net scaling • Generalizing triangle inequality by extending to q intermediate vertices, ……

PF-Net scaling • The result of PF-Net Scaling is a family of Networks determined by the parameters q and r PF-Net(q, r) • The PF-Net(n − 1, ∞) includes all of the edges in any minimum spanning tree

Network partition • Papers related to the same topics have similar contents • The corresponding vertices may be close to each other on the established network • Partitioning networks into groups of highly inter-connected nodes • The resulting node groups are considered to be research topics  Community Detection

Network partition • Partitioning networks into groups of highly inter-connected nodes • The nodes belonging to different groups are only sparsely connected • The quality of a possible partition is measured by it modularity • The fraction of all edges that lie within groups minus the expected value of the same quantity in a graph in which the vertices have the same degrees but edges are placed at random without regard for the groups

Network partition • Searching a network partition such that its modularity is maximum among all possible partitions • Divisive algorithms • Detecting inter-community links and removing them from the network • Agglomerative algorithms • Merging similar nodes/communities recursively • Optimization methods • Maximizing an objective function

Network partition • The Girvan and Newman’s algorithm was used in this study • Divisive algorithm • All partitions are generated by iteratively removing edges from the network • The remove of an edge is determined by it betweenness • The partition with the maximal modularity is output as the result

Graph drawing • Kamada-Kawai algorithm(1989) • A force-directed graph layout algorithm • Suitable for visualizing the result network of PFNet scaling

Topic labeling • The subgroups of nodes produced by the Girvan-Newman algorithm are considered to be important research topics • Selecting the terms with the five highest frequency count occurring in the content of the papers to correspond to node subgroups

An Experiment onthe Field of Information Communication in Taiwan

Experimental data • Data source: master theses published by related graduate schools • retrieved from the database of the National Digital Library of Theses and Dissertation in Taiwan • Total 778 theses

Extracted terms • 293 terms were extracted from titles and abstracts of the collected theses • 1 thesis without any extracted term was excluded in the following experiment • 777 feature vectors

Established network • 777 nodes for the examined theses • 7168 edgeswith relevance score > 0.1 • Network density =

Established network

The result of PF-Net scaling Insignificant edges were deleted 7168 edges  768 edges Network structure emerges

The result of community detection 30 subgroups with the condition of maximal modularity

Some examples of topics found • C4:線上遊戲、玩家、 online game 、論壇、女性 • C11: 風格、圖形、造形、藝術、平面 • C13: 故事、兒童、實境、閱讀、體驗 • C21: 圖書館、館員、館藏、社區、讀者 • C23: 圖像、檢索、雜訊、複製、影像品質 • C30: 報導、框架、形象、中國、編碼 • C17: 團體、家庭、電話、壓力、健康 • C27: 轉換、標籤、視覺化、演算法、投資

Concluding Remarks

Conclusions • Integrating several technologies to visualize knowledge structure of an academic field • Automatic Chinese term extraction • Relevance estimation • PF-Net scaling • Community detection algorithm • Easily discovering important topics and their relations on the resulting network • Most of the labels of topics are related to the research problems and the methodologies in the examined field

Future works • Improvement of term extraction • Processing of interferences by less relevant papers to the fields • Interactive functions for visual analytics of academic fields • More evaluations

Visualizing Knowledge Structure of an Academic Field

Visualizing Knowledge Structure of an Academic Field

Presentation Transcript

The Structure of Knowledge

Structure of an Essay:

Knowledge Is Structure

Academic Writing: elements of an academic register

Visualizing and mapping the intellectual structure of information retrieval

Academic Structure

LifeFlow : visualizing an overview of event sequences

Visualizing Rhetorical Structure

Structure of an Atom

3D Structure Visualizing, Comparing, Classifying

knowledge structure

Background knowledge Magnetic field

Academic Program of Study Basic Structure

Knowledge Structure

REVIEW OF ACADEMIC STRUCTURE

STRUCTURE OF AN ATOM

Structure and Flow Field of Sunspot

Visualizing Critical Trails of Scientific Knowledge

B Field Structure

Improve your knowledge by attending an academic conference

Work field Academic field Social field