Loading in 5 sec....

Link Prediction in Co-Authorship NetworkPowerPoint Presentation

Link Prediction in Co-Authorship Network

- By
**ciel** - Follow User

- 204 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Link Prediction in Co-Authorship Network' - ciel

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Introduction

- Link prediction
- Introduce future connections within the network scope

- Co-authorship network
- A network of collaborations among researchers, scientists, academic writers

Introduction

- Potential applications
- Recommend experts or group of researchers for individual researcher.

Outline

- Problem Background
- Related Work
- Workflow
- Conclusion
- Result Analysis
- Research plan

Problem Background

- What connect researchers together ?
- Given an instance of co-authorship network:
- A researcher connect to another if they collaborated on at least one paper.

X

X

X

Y

X

2001

Y

2004

Problem Background

- How to predict the link?
- Based on criteria:
- Co-authorship network topology
- Researcher’s personal information
- Researcher’s papers

- Boost up link predictions performance
- Recommend link should be really relevant to the interest of the authors or at least possible for researcher to collaborate.

Related Work

- Link prediction problems in Social network
- Liben‐Nowell, D., & Kleinberg, J., 2007
- Bliss, C. A., Frank, M. R., Danforth, C. M., & Dodds, P. S., 2013

- In social network, interactions among users are very dynamic with:
- Creation of new link within a few days
- Deletion or replacement of the existent links

- Different features present by the two networks
- Characteristics of individual researcher : citations, affiliations , institutions, ...
- Characteristics of person : marriage status, ages, working places, …

- Three mainstream approaches for link prediction:
- Similarity based estimation
- Liben‐Nowell, D., & Kleinberg, J., 2007

- Maximum likelihood estimation
- Murata, T., & Moriyasu, S., 2008
- Guimerà, R., & Sales-Pardo, M., 2009

- Supervised Learning model
- Pavlov, M., & Ichise, R., 2007
- Al Hasan, M., Chaoji, V., Salem, S., & Zaki, M., 2006

- Similarity based estimation

Similarity Based Estimation

- Use metrics to estimate proximities of pairs of researchers
- Based on those proximities to rank pairs of researchers
- The top pairs of researchers will likely to be the recommendations.

Similarity Based Estimation

- Shortest Path:
- Defines the minimum number of edges connecting two nodes.

- PageRank:
- A random walk on the graph assigning the probability that a node could be reach. The proximity between a pair of node can be determined by the sum of the node PageRank.

Maximum Likelihood Estimation

- Predefine specific rules of a network
- Required a prior knowledge of the network
- The likelihood of any non-connected link is calculated according to those rules.

Supervised Learning Model

- Construct dimensional feature vectors
- Fetch these vectors to classifiers to optimize a target function (training model)
- Link prediction becomes a binary classification

Supervised Learning Model Decision Tree SVM (Linear Kernel) K nearest neighbor Multilayer Perceptron Naives Bayes Bagging Combine many classifiers (Pavlov, M., & Ichise, R., 2007) Decision stump + AdaBoost Decision Tree + AdaBoost SMO + AdaBoost

- Related work (Al Hasan, M., Chaoji, V., Salem, S., & Zaki, M., 2006) using:

Summary

- Similarity based estimation
- Not quite well-perform

- Maximum likelihood
- Depend on the network

- Supervised learning model
- Perform better than similarity based estimation

Graph Description

- Co-authorship graph:
- Undirected graph G (V , E)

- Node or Vertex ( Author )
- Author ID
- Author Name

- Link or Edge (Co-authorship)
- Pair of author ID
- List of publication year followed by paper title
(Ex: 2004 :”Introduction to …” )

Setting up data

- Dataset is separated into 2 timing spans: 2000 – 2010 and 2010 – 2013
- The first is for training, the latter is for testing.
- Currently, there are 134,307 researchers in the network 2000 – 2013.
- Crop out authors who are not available in testing period, remaining 104,265 researchers

Setting up data

- Choose a subset from 104,265 researchers
- Experiment on 937 researchers

Baseline Features

- Extract features from the network structure:
- Local similarity
- Common Neighbor
- Adamic/Adar
- Preferential Attachment
- Jaccard’s coefficient

- Global similarity
- Shortest Path
- PageRank

- Local similarity

Baseline Features

- Feature for co-authorship network
- Keywordmatching (Cohen, S., & Ebel, L., 2013 )
A suggested metric to measure the textual relavancy uses a TF-IDF based function to determine.

- Keywordmatching (Cohen, S., & Ebel, L., 2013 )

Proposed Features

- Productivity of the authors
Observe the “history” of an author

- For example, at a particular node A:

T0 = 2000

T1 = 2004

T2 = 2005

T3= 2006

n=3

m=1

n=4

m=2

n=6

m=2

n=7

m=3

n : No. of shared paper

m: No. of collaborators

i=0

i=1

i=2

i=3

Proposed Features

- Productivity of the authors
Observe the “history” of an author

The “productivity” of node A:

α : a constant to assign the weight of each time period

Training set

- Set up training data
- Withn nodes, there is possible links.
- Among those, separate two links
- Positive link: links appear in training years.
- Negativelink: the remaining non-existent link in training years.
Note: Avoid bias training by balancing the number of instances between trueand false label.

- Classify all the non-existent links
- Compare with the testing data

Experimental Results

- New links to predict: 57 links

- Measurement of performance
- Precision:
- Recall:
- Harmonic mean:

Result Analysis

- Possible reasons
- Features
- Small set of data – sampling problem
- Instances of the negative links used for training

Research Plan

- Use weighted graph with parameters:
- No. of papers
- No. of neighbor
- No. of citations

- Focus on features that specifically target the co-authorship network:
- Citations
- Institutions

- Enlarge the experiment dataset size

Thank you

References

- Adamic, L. A., & Adar, E. (2003). Friends and neighbors on the web. Social networks, 25(3), 211-230.
- Al Hasan, M., Chaoji, V., Salem, S., & Zaki, M. (2006). Link prediction using supervised learning. In SDM’06: Workshop on Link Analysis, Counter-terrorism and Security.
- Liben‐Nowell, D., & Kleinberg, J. (2007). The link‐prediction problem for social networks. Journal of the American society for information science and technology, 58(7), 1019-1031.
- Pavlov, M., & Ichise, R. (2007). Finding Experts by Link Prediction in Co-authorship Networks. FEWS, 290, 42-55.
- Murata, T., & Moriyasu, S. (2008). Link prediction based on structural properties of online social networks. New Generation Computing, 26(3), 245-257.
- Guimerà, R., & Sales-Pardo, M. (2009). Missing and spurious interactions and the reconstruction of complex networks. Proceedings of the National Academy of Sciences, 106(52), 22073-22078.
- Bliss, C. A., Frank, M. R., Danforth, C. M., & Dodds, P. S. (2013). An Evolutionary Algorithm Approach to Link Prediction in Dynamic Social Networks. arXiv preprint arXiv:1304.6257.
- Cohen, S., & Ebel, L. (2013). Recommending collaborators using keywords. In Proceedings of the 22nd international conference on World Wide Web companion 959-962.

- Link per year of training set is greater than link per year of testing set:
- In testing period, only consider “new” collaborations.
- Any collaborations between researchers that already has a link will be disregarded.

Results with different classifiers of testing set:

Proposed Feature of testing set:

- The reason for proposing this feature:
- Keep track of the researcher tendency
- Give “bonus” to researcher who tend to collaborate with “new” colleagues rather than “old” ones
- Also give high score for prolific researchers (based on number of published paper)

Stochastic Block Model of testing set:

- Guimerà, R., & Sales-Pardo, M., 2009

Download Presentation

Connecting to Server..