Informetric methods seminar
Download
1 / 22

Informetric methods seminar - PowerPoint PPT Presentation


  • 65 Views
  • Uploaded on

Informetric methods seminar. Tutorial 2: Using Matlab for network construction, ranking, clustering, topic modeling, and path finding Erjia Yan. Contents. Network construction Ranking C lustering T opic modeling P ath finding. Contents. Network construction Ranking C lustering

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Informetric methods seminar' - galena-hanson


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Informetric methods seminar

Informetric methods seminar

Tutorial 2: Using Matlab for network construction, ranking, clustering, topic modeling, and path finding

Erjia Yan


Contents
Contents

  • Network construction

  • Ranking

  • Clustering

  • Topic modeling

  • Path finding


Contents1
Contents

  • Network construction

  • Ranking

  • Clustering

  • Topic modeling

  • Path finding


From data to networks
From data to networks

  • Bibliographical data


Web of science format
Web of Science format

  • Paper-to-paper citation network is the base

  • Web of Science cited references format:

    • First Author, Year Of Publication, Abbreviated Journal Name, Volume Number, Beginning Page Number

    • AANESTAD M, 2011, J STRATEGIC INF SYST, V20, P161

  • All fields can be found in “full record + cited references” downloading option

Some of the newer records may also have DOI. For a better match, it is better to remove the DOI from the cited references


Citation matching
Citation matching

  • For citing papers, extract these fields and format them into Web of Science cited reference format.

  • Now we have citing papers and cited references that have the same format

  • Use these two fields, construct an internal citation network that only contains those cited references that are cited by the citing papers in the data set


Procedures
Procedures

  • If you can write an app for this, it would be great!

  • Otherwise, you can follow these instructions

    • Converting into

    • Use Access to construct the network

      • Have a table for citing papers

      • Import the converted citation pairs to Access

      • Use query to extract those pairs whose papers are in the table

    • Now you have the node info and link info

    • Import both into Matlab


Adjacent matrices
Adjacent matrices

  • Now we have paper-to-paper citation networks, but in order to construct for instance author-to-author citation or author co-citation networks, we need to use adjacent matrices.

Authors

a cell number 1 (i,j)=1 indicates paper i is written by author j

Papers


Procedures1
Procedures

  • Convert into

  • Add to the beginning of the file

  • Use Txt2Pajek on the linkage file

  • Import the edge section of the .net file to Matlab

  • Select M(1:n,n+1:m) where m is the col size. The selection is our author-paper adjacent matrix



Cocitation and biblio coupling
Cocitation and biblio. coupling



Contents2
Contents

  • Network construction

  • Ranking

  • Clustering

  • Topic modeling

  • Path finding


Pagerank
PageRank

  • By David Gleich of Purdue University

  • http://www.mathworks.com/matlabcentral/fileexchange/11613-pagerank

  • pagerank(M,options)

    • options.c: the teleportation coefficient [double | {0.85}]

    • options.v: the personalization vector [vector | {uniform: 1/n}]


Contents3
Contents

  • Network construction

  • Ranking

  • Clustering

  • Topic modeling

  • Path finding


Built in functions
Built-in functions

  • K-means

    • IDX = kmeans(X,k)

    • http://www.mathworks.com/help/stats/kmeans.html

  • Hierarchical clustering

    • http://www.mathworks.com/help/stats/hierarchical-clustering.html


Modularity based clustering
Modularity-based clustering

  • By MIT Strategic Engineering

  • http://strategic.mit.edu/downloads.php?page=matlab_networks

    • [modules,module_hist,Q] = newmangirvan(adj,k)

    • [groups_hist,Q]=newman_comm_fast(adj)


Vosviewer clustering
VOSviewer clustering

  • By Nees van Eck and Ludo Waltman of Leiden University

  • http://www.vosviewer.com/relatedsoftware/

    • A variant of the modularity-based clustering technique

    • [X, cluster_size, V] = VOS_clustering(A, P)


Contents4
Contents

  • Network construction

  • Ranking

  • Clustering

  • Topic modeling

  • Path finding


Matlab topic modeling toolbox
Matlab Topic Modeling Toolbox

  • By Mark Steyvers of University of California Irvine

  • http://psiexp.ss.uci.edu/research/programs_data/toolbox.htm

  • Input: The input is a bag of word representation containing the number of times each words occurs in a document. 


Contents5
Contents

  • Network construction

  • Ranking

  • Clustering

  • Topic modeling

  • Path finding


Bioinformatics toolbox
Bioinformatics toolbox

  • http://www.mathworks.com/help/bioinfo/ref/graphshortestpath.html

  • [dist, path, pred]=graphshortestpath(G,S,T)

    • from S to T in graph G

  • [dist] = graphallshortestpaths(G)

    • find all shortest path in graph G; dist is a distance matrix for the shortest path of each pair of nodes


ad