link prediction in co authorship network n.
Skip this Video
Download Presentation
Link Prediction in Co-Authorship Network

Loading in 2 Seconds...

play fullscreen
1 / 37

Link Prediction in Co-Authorship Network - PowerPoint PPT Presentation

  • Uploaded on

Link Prediction in Co-Authorship Network. Le Nhat Minh ( A0074403N) Supervisor: Dongyuan Lu. Introduction. Link prediction Introduce future connections within the network scope Co-authorship network A network of collaborations among researchers, scientists, academic writers.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Link Prediction in Co-Authorship Network' - ciel

Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
link prediction in co authorship network

Link Prediction in Co-Authorship Network

Le Nhat Minh ( A0074403N)

Supervisor: Dongyuan Lu

  • Link prediction
    • Introduce future connections within the network scope
  • Co-authorship network
    • A network of collaborations among researchers, scientists, academic writers
  • Potential applications
    • Recommend experts or group of researchers for individual researcher.
  • Problem Background
  • Related Work
  • Workflow
  • Conclusion
    • Result Analysis
    • Research plan
problem background
Problem Background
  • What connect researchers together ?
  • Given an instance of co-authorship network:
    • A researcher connect to another if they collaborated on at least one paper.









problem background1
Problem Background
  • How to predict the link?
  • Based on criteria:
    • Co-authorship network topology
    • Researcher’s personal information
    • Researcher’s papers
  • Boost up link predictions performance
    • Recommend link should be really relevant to the interest of the authors or at least possible for researcher to collaborate.
related work
Related Work
  • Link prediction problems in Social network
    • Liben‐Nowell, D., & Kleinberg, J., 2007
    • Bliss, C. A., Frank, M. R., Danforth, C. M., & Dodds, P. S., 2013
  • In social network, interactions among users are very dynamic with:
    • Creation of new link within a few days
    • Deletion or replacement of the existent links
  • Different features present by the two networks
    • Characteristics of individual researcher : citations, affiliations , institutions, ...
    • Characteristics of person : marriage status, ages, working places, …

Three mainstream approaches for link prediction:

    • Similarity based estimation
      • Liben‐Nowell, D., & Kleinberg, J., 2007
    • Maximum likelihood estimation
      • Murata, T., & Moriyasu, S., 2008
      • Guimerà, R., & Sales-Pardo, M., 2009
    • Supervised Learning model
      • Pavlov, M., & Ichise, R., 2007
      • Al Hasan, M., Chaoji, V., Salem, S., & Zaki, M., 2006
similarity based estimation
Similarity Based Estimation
  • Use metrics to estimate proximities of pairs of researchers
  • Based on those proximities to rank pairs of researchers
  • The top pairs of researchers will likely to be the recommendations.
similarity based estimation1
Similarity Based Estimation
  • Network structure based measurement

Some conventions:

similarity based estimation2
Similarity Based Estimation
  • Common Neighbor:



similarity based estimation3
Similarity Based Estimation
  • Jaccard’s coefficient:



similarity based estimation4
Similarity Based Estimation
  • Preferential Attachment:



similarity based estimation6
Similarity Based Estimation
  • Shortest Path:
    • Defines the minimum number of edges connecting two nodes.
  • PageRank:
    • A random walk on the graph assigning the probability that a node could be reach. The proximity between a pair of node can be determined by the sum of the node PageRank.
maximum likelihood estimation
Maximum Likelihood Estimation
  • Predefine specific rules of a network
  • Required a prior knowledge of the network
  • The likelihood of any non-connected link is calculated according to those rules.
supervised learning model
Supervised Learning Model
  • Construct dimensional feature vectors
  • Fetch these vectors to classifiers to optimize a target function (training model)
  • Link prediction becomes a binary classification
supervised learning model1
Supervised Learning Model
    • Related work (Al Hasan, M., Chaoji, V., Salem, S., & Zaki, M., 2006) using:
  • Decision Tree
  • SVM (Linear Kernel)
  • K nearest neighbor
  • Multilayer Perceptron
  • Naives Bayes
  • Bagging
  • Combine many classifiers (Pavlov, M., & Ichise, R., 2007)
  • Decision stump + AdaBoost
  • Decision Tree + AdaBoost
  • SMO + AdaBoost
  • Similarity based estimation
    • Not quite well-perform
  • Maximum likelihood
    • Depend on the network
  • Supervised learning model
    • Perform better than similarity based estimation

Classifier Model


graph description
Graph Description
  • Co-authorship graph:
    • Undirected graph G (V , E)
  • Node or Vertex ( Author )
    • Author ID
    • Author Name
  • Link or Edge (Co-authorship)
    • Pair of author ID
    • List of publication year followed by paper title

(Ex: 2004 :”Introduction to …” )

setting up data
Setting up data
  • Dataset is separated into 2 timing spans: 2000 – 2010 and 2010 – 2013
  • The first is for training, the latter is for testing.
  • Currently, there are 134,307 researchers in the network 2000 – 2013.
  • Crop out authors who are not available in testing period, remaining 104,265 researchers
setting up data1
Setting up data
  • Choose a subset from 104,265 researchers
  • Experiment on 937 researchers
baseline features
Baseline Features
  • Extract features from the network structure:
    • Local similarity
      • Common Neighbor
      • Adamic/Adar
      • Preferential Attachment
      • Jaccard’s coefficient
    • Global similarity
      • Shortest Path
      • PageRank
baseline features1
Baseline Features
  • Feature for co-authorship network
    • Keywordmatching (Cohen, S., & Ebel, L., 2013 )

A suggested metric to measure the textual relavancy uses a TF-IDF based function to determine.

proposed features
Proposed Features
  • Productivity of the authors

Observe the “history” of an author

  • For example, at a particular node A:

T0 = 2000

T1 = 2004

T2 = 2005

T3= 2006









n : No. of shared paper

m: No. of collaborators





proposed features1
Proposed Features
  • Productivity of the authors

Observe the “history” of an author

The “productivity” of node A:

α : a constant to assign the weight of each time period

training set
Training set
  • Set up training data
    • Withn nodes, there is possible links.
    • Among those, separate two links
      • Positive link: links appear in training years.
      • Negativelink: the remaining non-existent link in training years.

Note: Avoid bias training by balancing the number of instances between trueand false label.

    • Classify all the non-existent links
    • Compare with the testing data
experimental results
Experimental Results
  • New links to predict: 57 links
  • Measurement of performance
    • Precision:
    • Recall:
    • Harmonic mean:
result analysis
Result Analysis
  • Possible reasons
    • Features
    • Small set of data – sampling problem
    • Instances of the negative links used for training
research plan
Research Plan
  • Use weighted graph with parameters:
    • No. of papers
    • No. of neighbor
    • No. of citations
  • Focus on features that specifically target the co-authorship network:
    • Citations
    • Institutions
  • Enlarge the experiment dataset size

Thank you

  • Adamic, L. A., & Adar, E. (2003). Friends and neighbors on the web. Social networks, 25(3), 211-230.
  • Al Hasan, M., Chaoji, V., Salem, S., & Zaki, M. (2006). Link prediction using supervised learning. In SDM’06: Workshop on Link Analysis, Counter-terrorism and Security.
  • Liben‐Nowell, D., & Kleinberg, J. (2007). The link‐prediction problem for social networks. Journal of the American society for information science and technology, 58(7), 1019-1031.
  • Pavlov, M., & Ichise, R. (2007). Finding Experts by Link Prediction in Co-authorship Networks. FEWS, 290, 42-55.
  • Murata, T., & Moriyasu, S. (2008). Link prediction based on structural properties of online social networks. New Generation Computing, 26(3), 245-257.
  • Guimerà, R., & Sales-Pardo, M. (2009). Missing and spurious interactions and the reconstruction of complex networks. Proceedings of the National Academy of Sciences, 106(52), 22073-22078.
  • Bliss, C. A., Frank, M. R., Danforth, C. M., & Dodds, P. S. (2013). An Evolutionary Algorithm Approach to Link Prediction in Dynamic Social Networks. arXiv preprint arXiv:1304.6257.
  • Cohen, S., & Ebel, L. (2013). Recommending collaborators using keywords. In Proceedings of the 22nd international conference on World Wide Web companion 959-962.

Link per year of training set is greater than link per year of testing set:

    • In testing period, only consider “new” collaborations.
    • Any collaborations between researchers that already has a link will be disregarded.
proposed feature
Proposed Feature
  • The reason for proposing this feature:
    • Keep track of the researcher tendency
    • Give “bonus” to researcher who tend to collaborate with “new” colleagues rather than “old” ones
    • Also give high score for prolific researchers (based on number of published paper)
stochastic block model
Stochastic Block Model
  • Guimerà, R., & Sales-Pardo, M., 2009
stochastic block model1
Stochastic Block Model










The reliability of an individual link is: