1 / 25

Methods for Exploiting Academic Hyperlinks

This research explores methods for mapping communication patterns between researchers based on university websites and journal citations. It provides valuable insights into the structure and evolution of research fields, identifying previously unknown connections. The analysis of web data can illustrate wider and more current patterns.

Download Presentation

Methods for Exploiting Academic Hyperlinks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Methods for Exploiting Academic Hyperlinks Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

  2. The Problem • To map patterns of communication between researchers in a country based upon university web sites • Patterns of communication are also mapped based upon journal citations or journal title words • Provides useful information about the structure and evolution of research fields • Can identify previously unknown field connections • Web analysis could illustrate wider and more current patterns

  3. Data collection • Web crawler • AltaVista advanced queries host:wlv.ac.uk AND link:gla.ac.uk • AllTheWeb advanced queries • Google • Does not support same level of Boolean querying

  4. Types of link count • Direct link counts • Inter-site links only • Co-inlink counts • B and C are co-inlinked • Co-outlink counts • D and E are co-outlinked D E A F B C

  5. Alternative Document Models • Domain ADM • Count links between domains (ignoring multiple links) instead of pages P1 P2 P3 P4 P5 P6 www.scit.wlv.ac.uk www.dcs.gla.ac.uk

  6. Alternative Document Models • Directory ADM • Counts links between directories • Estimated using URL slashes • University ADM • Counts links between entire university Web sites • Too extreme for most purposes • ADMs reduce the impact of replicated links • E.g. a subsite of 1000 pages linking to another university home page in its navigation bar

  7. Some Inter-University Hyperlink Patterns For the UK and Europe

  8. Citation-Style Hyperlink Analysis • Citation counts are known to be reasonable indicators of research quality but is the same true for inlink counts? • Counts of links to universities within a country can correlate significantly with measures of research productivity • The significance of this result is in giving ‘permission’ to investigate the use of inter-university links for researching scholarly communication

  9. Most links are only loosely related to research • 90% of links between UK university sites have some connection with scholarly activity, including teaching and research • But less than 1% are equivalent to citations • So link counts do not measure research dissemination but are more a natural by-product of scholarly activity • Cannot use link counts to assess research • Can use link counts to track an aspect of communication

  10. Links to UK universities against their research productivity The reason for the strong correlation is the quantity of Web publication, not its quality This is different to citation analysis

  11. Universities tend to link to neighbours

  12. Universities cluster geographically

  13. Language is a factor in international interlinking • English the dominant language for Web sites in the Western EU • In a typical country, 50% of pages are in the national language(s) and 50% in English • Non-English speaking extensively interlink in English {Research with Rong Tang & Liz Price}

  14. Can map patterns of international communication Counts of links between EU universities in Swedish are represented by arrow thickness.

  15. Counts of links between EU universities in French are represented by arrow thickness.

  16. Which language???

  17. Which language???

  18. Linking patterns vary enormously by discipline • No evidence of a significant geographic trend • Disciplinary differences in the extent of interlinking: e.g., history Web use is very low, Chemistry is very high • Individual research projects can have an enormous impact upon individual departments • E.g. Arts web sites are often for specific exhibitions or for digital media projects • Links not frequent enough to reliably reveal patterns of interdiscipliniarity

  19. Clustering using links

  20. Background: Power laws in Academic Webs • Academic Webs have a topology dominated by power laws, including • Counts of links to pages (inlink counts) • Counts of links to pages (outlink counts) • Groups of interconnected pages • Directed component sizes • Undirected component sizes • Power laws mean that clustering connected components will not yield useful results

  21. Page Outlinks

  22. Topological component sizes

  23. Community Identification Algorithm • Can apply to page, directory and domain models • Gives complimentary results: a “layered approach”

  24. Stretching links further: co-inlinks, co-outlinks • For the UK academic Web, about 42% of domains connected by links alone host similar disciplines, and about 43% connected by links, co-inlinks and co-outlinks • But over 100 times more domains are colinked or coupled than are directly linked • Links in any form are less than 50% reliable as indicators of subject similarity

  25. Summary • Studies of the relatively restricted subdomain of university web sites • Produce directly useful results • For Web IR, they also • Help refine methodologies • Help build intuition

More Related