What is web link mining
1 / 23

What is web link mining? ? - PowerPoint PPT Presentation

  • Uploaded on

Virtual Knowledge Studio (VKS). Information Studies. What is web link mining? ?. Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK. 1. Definition and scope. Link analysis is:

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' What is web link mining? ?' - jules

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
What is web link mining

Virtual Knowledge Studio (VKS)

Information Studies

What is web link mining? ?

Mike Thelwall

Statistical Cybermetrics Research Group

University of Wolverhampton, UK

1 definition and scope
1. Definition and scope

  • Link analysis is:

    • mapping and measuring hyperlink networks for collections of web pages or sites

    • a flexible toolkit of methods and software rather than a field or single technique

  • A new source of information about:

    • relationships between people, organisations and information - via the web

    • the impact of information and ideas

  • Used in:

    • media studies, information science, politics, marketing, sociology

Link analysis motivation
Link Analysis: Motivation

  • Individual hyperlinks reflect concrete creation reasons such as connections between web page contents or creators

  • Counts of large numbers of hyperlinks may reflect wider underlying social processes

  • Links may reflect phenomena that have previously been difficult to study; e.g.,

    • informal scholarly communication

    • informal news discussions

    • friendship patterns

    • “amateur” politics

But link patterns vary by context
But link patterns vary by context…

  • Commercial web sites tend not to link much

  • Academic and government web sites link more

  • Disciplinary differences: e.g., History Web use is very low, Chemistry is very high

  • Individual projects/resources can have an enormous impact upon web sites

    • E.g. Arts web sites are often for specific exhibitions or for digital media projects

  • Links often not frequent enough to reliably reveal underlying patterns

Link type definitions
Link Type Definitions



  • Inlink – a hyperlink to a web page from anywhere

  • Site inlink – a hyperlink to a web page from a different web site

  • Outlink – a hyperlink from a web page to any other

  • Site outlink – a hyperlink from a web page to a page in a different site

Indirect link types colinks
Indirect link types - colinks

  • Useful when direct links rare

    • Indirect connection

  • Co-inlinks

    • B and C co-inlinked

  • Co-outlinks

    • D and E co-outlinked







Lennart Björneborn’s terminology

What to count
What to count?

  • Links between individual pages

  • Links between entire web sites

    • Site A links to site B if any page in site A links to any page in site B



2 link networks methods
2. Link Networks – Methods

  • Draw a network diagram

    • LexiURL Searcher, Issue Crawler, SocSciBot (web networks)

    • Pajek, UCINET, NetMiner (generic networks)

    • About 10-50 sites/pages is recommended

    • Diagrams should reveal patterns in the data

  • Social Network Analysis statistics

    • E.g., density, degree centrality

Direct link networks
Direct link networks

  • Start with list of web sites (or pages)

  • Build from many linkdomain:A site:B Yahoo searches

    • Powerful and free way to scan the entire web for links!

    • Returns pages in web site B that link to web site A

    • Can be automated with LexiURL Searcher

    • Or use SocSciBot to crawl web sites and get links

e.g., linkdomain:ox.ac.uk site:pku.edu.cn

Direct links







> 100 links


Han Woo


Top ASEAN universities network

Co inlink networks
Co-inlink networks

  • Start with a list of web sites or pages

  • Build from many linkdomain:A linkdomain:B -site:A -site:B Yahoo searches

    • can be automated in LexiURL Searcher

  • Suitable for commercial or competitive web sites that do not interlink

    • normally better than direct link diagrams

  • A web environment (co-inlink) network for a single web site

    • finds web sites that link to it

    • picks the top 50 web sites liked to by these web sites

    • draws a co-inlink diagram of these web sites

The web environment of


Indirect links


Another example –

no patterns

but interesting

3 link impact methods
3. Link Impact - Methods

  • Inlink counts often used as an impact/visibility indicator

    • Impact = “The effect or impression of one thing on another”, “to have an effect” *

  • Compare links to web sites to assess which site/organisation has the most online impact

* http://www.thefreedictionary.com/impact, definition 3

Link impact reports
Link Impact Reports

  • Standardised comparative analysis of the link impact of web sites

  • Example audit:

  • http://cybermetrics.wlv.ac.uk/audit/101/

  • Similar reports can be created for non-link impact (citation impact)

  • http://cybermetrics.wlv.ac.uk/audit/books/

impact spread


4 tools
4. Tools

  • E.g., …

Links to uk universities against their research productivity
Links to UK universities against their research productivity

5. Statistical analyses…

The reason for the strong correlation is the quantity of Web publication, not its quality

More statistical analyses
More statistical analyses…

Universities tend to link to neighbours

6 content analysis
6. Content analysis

  • Content analysis of random sample of links recommended to get context

  • Example of usefulness of content analysis results:

    • 90% of links between UK university sites relate to scholarly activity

      • But less than 1% are equivalent to citations

    • Link counts do not measure research but are a natural by-product of scholarly activity

      • Use link counts to track (an aspect of) communication

7 summary
7. Summary

  • Link networks

    • To investigate relationship patterns within collections of web sites

  • Link impact

    • Compare impact of web sites using inlinks

  • Methods

    • Toolkit of visual and statistical methods

    • Specialist software like LexiURL Searcher & Issue Crawler

  • Use to investigate web phenomena or offline phenomena reflected online in web sites


  • Thelwall, M. (2009). Introduction to webometrics: Quantitative web research for the social sciences. New York: Morgan & Claypool.

  • Rogers, R. (2005). Information politics on the Web. Massachusetts: MIT Press.

  • Thelwall, M. (2004). Link analysis: An information science approach. San Diego: Academic Press.

  • http://lexiurl.wlv.ac.ukhttp://webometrics.wlv.ac.ukhttp://www.issuecrawler.net