Link prediction in co authorship network
1 / 37

Link Prediction in Co-Authorship Network - PowerPoint PPT Presentation

  • Uploaded on

Link Prediction in Co-Authorship Network. Le Nhat Minh ( A0074403N) Supervisor: Dongyuan Lu. Introduction. Link prediction Introduce future connections within the network scope Co-authorship network A network of collaborations among researchers, scientists, academic writers.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Link Prediction in Co-Authorship Network' - ciel

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Link prediction in co authorship network

Link Prediction in Co-Authorship Network

Le Nhat Minh ( A0074403N)

Supervisor: Dongyuan Lu


  • Link prediction

    • Introduce future connections within the network scope

  • Co-authorship network

    • A network of collaborations among researchers, scientists, academic writers


  • Potential applications

    • Recommend experts or group of researchers for individual researcher.


  • Problem Background

  • Related Work

  • Workflow

  • Conclusion

    • Result Analysis

    • Research plan

Problem background
Problem Background

  • What connect researchers together ?

  • Given an instance of co-authorship network:

    • A researcher connect to another if they collaborated on at least one paper.









Problem background1
Problem Background

  • How to predict the link?

  • Based on criteria:

    • Co-authorship network topology

    • Researcher’s personal information

    • Researcher’s papers

  • Boost up link predictions performance

    • Recommend link should be really relevant to the interest of the authors or at least possible for researcher to collaborate.

Related work
Related Work

  • Link prediction problems in Social network

    • Liben‐Nowell, D., & Kleinberg, J., 2007

    • Bliss, C. A., Frank, M. R., Danforth, C. M., & Dodds, P. S., 2013

  • In social network, interactions among users are very dynamic with:

    • Creation of new link within a few days

    • Deletion or replacement of the existent links

  • Different features present by the two networks

    • Characteristics of individual researcher : citations, affiliations , institutions, ...

    • Characteristics of person : marriage status, ages, working places, …

  • Three mainstream approaches for link prediction:

    • Similarity based estimation

      • Liben‐Nowell, D., & Kleinberg, J., 2007

    • Maximum likelihood estimation

      • Murata, T., & Moriyasu, S., 2008

      • Guimerà, R., & Sales-Pardo, M., 2009

    • Supervised Learning model

      • Pavlov, M., & Ichise, R., 2007

      • Al Hasan, M., Chaoji, V., Salem, S., & Zaki, M., 2006

Similarity based estimation
Similarity Based Estimation

  • Use metrics to estimate proximities of pairs of researchers

  • Based on those proximities to rank pairs of researchers

  • The top pairs of researchers will likely to be the recommendations.

Similarity based estimation1
Similarity Based Estimation

  • Network structure based measurement

Some conventions:

Similarity based estimation2
Similarity Based Estimation

  • Common Neighbor:



Similarity based estimation3
Similarity Based Estimation

  • Jaccard’s coefficient:



Similarity based estimation4
Similarity Based Estimation

  • Preferential Attachment:



Similarity based estimation5
Similarity Based Estimation

  • Adamic/Adar:




Similarity based estimation6
Similarity Based Estimation

  • Shortest Path:

    • Defines the minimum number of edges connecting two nodes.

  • PageRank:

    • A random walk on the graph assigning the probability that a node could be reach. The proximity between a pair of node can be determined by the sum of the node PageRank.

Maximum likelihood estimation
Maximum Likelihood Estimation

  • Predefine specific rules of a network

  • Required a prior knowledge of the network

  • The likelihood of any non-connected link is calculated according to those rules.

Supervised learning model
Supervised Learning Model

  • Construct dimensional feature vectors

  • Fetch these vectors to classifiers to optimize a target function (training model)

  • Link prediction becomes a binary classification

Supervised learning model1
Supervised Learning Model

  • Related work (Al Hasan, M., Chaoji, V., Salem, S., & Zaki, M., 2006) using:

  • Decision Tree

  • SVM (Linear Kernel)

  • K nearest neighbor

  • Multilayer Perceptron

  • Naives Bayes

  • Bagging

  • Combine many classifiers (Pavlov, M., & Ichise, R., 2007)

  • Decision stump + AdaBoost

  • Decision Tree + AdaBoost

  • SMO + AdaBoost

  • Summary

    • Similarity based estimation

      • Not quite well-perform

    • Maximum likelihood

      • Depend on the network

    • Supervised learning model

      • Perform better than similarity based estimation


    Classifier Model


    Graph description
    Graph Description

    • Co-authorship graph:

      • Undirected graph G (V , E)

    • Node or Vertex ( Author )

      • Author ID

      • Author Name

    • Link or Edge (Co-authorship)

      • Pair of author ID

      • List of publication year followed by paper title

        (Ex: 2004 :”Introduction to …” )

    Setting up data
    Setting up data

    • Dataset is separated into 2 timing spans: 2000 – 2010 and 2010 – 2013

    • The first is for training, the latter is for testing.

    • Currently, there are 134,307 researchers in the network 2000 – 2013.

    • Crop out authors who are not available in testing period, remaining 104,265 researchers

    Setting up data1
    Setting up data

    • Choose a subset from 104,265 researchers

    • Experiment on 937 researchers

    Baseline features
    Baseline Features

    • Extract features from the network structure:

      • Local similarity

        • Common Neighbor

        • Adamic/Adar

        • Preferential Attachment

        • Jaccard’s coefficient

      • Global similarity

        • Shortest Path

        • PageRank

    Baseline features1
    Baseline Features

    • Feature for co-authorship network

      • Keywordmatching (Cohen, S., & Ebel, L., 2013 )

        A suggested metric to measure the textual relavancy uses a TF-IDF based function to determine.

    Proposed features
    Proposed Features

    • Productivity of the authors

      Observe the “history” of an author

    • For example, at a particular node A:

    T0 = 2000

    T1 = 2004

    T2 = 2005

    T3= 2006









    n : No. of shared paper

    m: No. of collaborators





    Proposed features1
    Proposed Features

    • Productivity of the authors

      Observe the “history” of an author

      The “productivity” of node A:

    α : a constant to assign the weight of each time period

    Training set
    Training set

    • Set up training data

      • Withn nodes, there is possible links.

      • Among those, separate two links

        • Positive link: links appear in training years.

        • Negativelink: the remaining non-existent link in training years.

          Note: Avoid bias training by balancing the number of instances between trueand false label.

      • Classify all the non-existent links

      • Compare with the testing data

    Experimental results
    Experimental Results

    • New links to predict: 57 links

    • Measurement of performance

      • Precision:

      • Recall:

      • Harmonic mean:

    Result analysis
    Result Analysis

    • Possible reasons

      • Features

      • Small set of data – sampling problem

      • Instances of the negative links used for training

    Research plan
    Research Plan

    • Use weighted graph with parameters:

      • No. of papers

      • No. of neighbor

      • No. of citations

    • Focus on features that specifically target the co-authorship network:

      • Citations

      • Institutions

    • Enlarge the experiment dataset size

    Thank you


    • Adamic, L. A., & Adar, E. (2003). Friends and neighbors on the web. Social networks, 25(3), 211-230.

    • Al Hasan, M., Chaoji, V., Salem, S., & Zaki, M. (2006). Link prediction using supervised learning. In SDM’06: Workshop on Link Analysis, Counter-terrorism and Security.

    • Liben‐Nowell, D., & Kleinberg, J. (2007). The link‐prediction problem for social networks. Journal of the American society for information science and technology, 58(7), 1019-1031.

    • Pavlov, M., & Ichise, R. (2007). Finding Experts by Link Prediction in Co-authorship Networks. FEWS, 290, 42-55.

    • Murata, T., & Moriyasu, S. (2008). Link prediction based on structural properties of online social networks. New Generation Computing, 26(3), 245-257.

    • Guimerà, R., & Sales-Pardo, M. (2009). Missing and spurious interactions and the reconstruction of complex networks. Proceedings of the National Academy of Sciences, 106(52), 22073-22078.

    • Bliss, C. A., Frank, M. R., Danforth, C. M., & Dodds, P. S. (2013). An Evolutionary Algorithm Approach to Link Prediction in Dynamic Social Networks. arXiv preprint arXiv:1304.6257.

    • Cohen, S., & Ebel, L. (2013). Recommending collaborators using keywords. In Proceedings of the 22nd international conference on World Wide Web companion 959-962.

    Proposed feature
    Proposed Feature of testing set:

    • The reason for proposing this feature:

      • Keep track of the researcher tendency

      • Give “bonus” to researcher who tend to collaborate with “new” colleagues rather than “old” ones

      • Also give high score for prolific researchers (based on number of published paper)

    Stochastic block model
    Stochastic Block Model of testing set:

    • Guimerà, R., & Sales-Pardo, M., 2009

    Stochastic block model1
    Stochastic Block Model of testing set:










    The reliability of an individual link is: