1 / 40

Link analysis as a social science technique

This manifesto explores the use of link analysis as a social science technique, focusing on academic hyperlink analysis, data collection methods, and the interpretation of link counts. It also discusses the correlation between link counts and research productivity, the role of language in international interlinking, and the clustering of universities based on their interlinking patterns.

jimmyflores
Download Presentation

Link analysis as a social science technique

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK http://cybermetrics.wlv.ac.uk/

  2. Link Analysis Manifesto • Links are: • A wonderful new source of information about relationships between people, organisations and information • An easy to collect data source • But: • Results should be interpreted with care

  3. Talk Structure • Part 1: Academic link analysis –mainly from an information science perspective • Part 2: Software demonstration • Part 3: A social science link analysis methodology

  4. Link Analysis: Motivation • Individual hyperlinks reflect concrete creation reasons such as connections between web page contents or creators • Counts of large numbers of hyperlinks may reflect wider underlying social processes • Links may reflect phenomena that have previously been difficult to study, opening up new research areas • E.g. informal scholarly communication

  5. Part 1: Academic Hyperlink Analysis • To map patterns of communication between researchers in a country based upon university web sites • Patterns of communication are also mapped based upon journal citations or journal title words • Provides useful information about the structure and evolution of research fields • Can identify previously unknown field connections • Web analysis could illustrate wider and more current patterns

  6. Data Collection • Web crawler • AltaVista advanced queries, e.g. Links from Wolves Uni. to Oxford Uni. domain:wlv.ac.uk AND linkdomain:ox.ac.uk • Google link queries • Find links to specific URLs, e.g. links to the Institute home page link:www.oii.ox.ac.uk

  7. Types of link count • Direct link counts • Inter-site links only • Co-inlink counts • B and C are co-inlinked • Co-outlink counts • D and E are co-outlinked D E A F B C

  8. Alternative Document Models • A method to ignore multiple similar links • E.g., domain ADM: count links between domains instead of pages P1 P2 P3 P4 P5 P6 www.scit.wlv.ac.uk www.oii.ox.ac.uk

  9. Some Inter-University Hyperlink Patterns Mainly for the UK and Europe

  10. Citation-Style Hyperlink Analysis • Citation counts are known to be reasonable indicators of research quality but is the same true for inlink counts? • Counts of links to universities within a country can correlate significantly with measures of research productivity • The significance of this result is in giving ‘permission’ to investigate the use of inter-university links for researching scholarly communication

  11. Most links are only loosely related to research • 90% of links between UK university sites have some connection with scholarly activity, including teaching and research • But less than 1% are equivalent to citations • So link counts do not measure research dissemination but are more a natural by-product of scholarly activity • Cannot use link counts to assess research • Can use link counts to track an aspect of communication

  12. Links to UK universities against their research productivity The reason for the strong correlation is the quantity of Web publication, not its quality This is different to citation analysis

  13. Universities tend to link to neighbours

  14. Universities cluster geographically

  15. Language is a factor in international interlinking • English the dominant language for Web sites in the Western EU • In a typical country, 50% of pages are in the national language(s) and 50% in English • Non-English speaking extensively interlink in English {Research with Rong Tang & Liz Price}

  16. Can map patterns of international communication Counts of links between EU universities in Swedish are represented by arrow thickness.

  17. Counts of links between EU universities in French are represented by arrow thickness.

  18. Which language???

  19. Which language???

  20. Linking patterns vary enormously by discipline • No evidence of a significant geographic trend • Disciplinary differences in the extent of interlinking: e.g., history Web use is very low, Chemistry is very high • Individual research projects can have an enormous impact upon individual departments • E.g. Arts web sites are often for specific exhibitions or for digital media projects • Links not frequent enough to reliably reveal patterns of interdiscipliniarity

  21. The next slide is a (Kamada-Kawai) network of the interlinking of the “top” 5 universities in AEAN countries (Asia and Europe) with arrows representing at least 100 links and universities not connected removed. (Research with Han Woo Park)

  22. Clustering using links

  23. Background: Power laws in Academic Webs • Academic Webs have a topology dominated by power laws, including • Counts of links to pages (inlink counts) • Counts of links to pages (outlink counts) • Groups of interconnected pages • Power laws mean that • Link creation obeys the ‘rich get richer’ law • “Communities” of pages or sites are rarely pure but tend to multiply overlap

  24. Page Outlinks

  25. Topological component sizes: “pure link communities”

  26. Community Identification Algorithm: “Impure communities” • Can apply to pages, directories and domains • Gives complimentary results: a “layered approach”

  27. Stretching links further: co-inlinks, co-outlinks • More interlinked does not imply more similar • For the UK academic Web, about 42% of domains connected by links alone host similar disciplines, and about 43% connected by links, co-inlinks and co-outlinks • Can use any type of link to look for similar sites • Over 100 times more domains are co-inlinked or co-outlinked than are directly linked • Links in any form are less than 50% reliable as indicators of subject similarity

  28. Summary • Studies of the relatively restricted subdomain of university web sites • Produce direct research results • For Web Information Retrieval (e.g. search engines), they also • Help refine methodologies • Help build intuition about web structure

  29. Part 2: Software Demonstration • SocSciBot • Web crawler for social sciences research • SocSciBot Tools • Link analyser for SocSciBot data • Cyclist • Search engine with some corpus linguistics capability (e.g. word frequency lists for each site) • http://socscibot.wlv.ac.uk/

  30. Part 3: A General Social Science Link Analysis Methodology • A general framework for using link counts in social sciences research • For research into link creation or • Together with other sources, for research into other online or offline phenomena • Applicable when there are enough links relevant to the research question to count • For collections of large web sites or • For large collections of small web sites

  31. Nine stages for a research project • Formulate an appropriate research question, taking into account existing knowledge of web structure • Conduct a pilot study • Identify web pages or sites that are appropriate to address the research question

  32. Nine stages for a research project • Collect link data from a commercial search engine or a personal crawler, taking appropriate accuracy safeguards • Apply data cleansing techniques to the links, if possible, and select an appropriate counting method • Partially validate the link count results through correlation tests, if possible

  33. Nine stages for a research project • Partially validate the interpretation of the results through a link classification exercise • Report results with an interpretation consistent with link classification exercise, including either a detailed description of the classification or exemplars to illustrate the categories • Report the limitations of the study and parameters used in data collection and processing

  34. Interpreting link counts • For most research, need to be able to place an interpretation on link counts • E.g. A links to B more than C, therefore… • A is inlinked more than B therefore… • Do links ‘measure’ visibility, luminosity, authority, information exports/imports, communication, impact, online impact, quality, importance, interpersonal communication, nothing, random actions,…?

  35. Interpreting link counts • Classifying random samples of links can help decide how to interpret them • E.g. Links predominantly reflect… • Correlation test are also useful as a form of triangulation • E.g. Links counts associate with…

  36. The theoretical perspective for link counting • In order to be able to reliably interpret link counts, all links should be created • individually and independently, • by humans, • through equivalent gravity judgments (e.g., about the quality of the information in the target page). • Additionally, links to a site should target pages created by the site owner or somebody else closely associated with the site.

  37. Summary • Link counts are an information source that may reveal new insights into online and offline phenomena • Can be used in conjunction with other data sources to address many research questions • With existing tools, are relatively easy to use in research

More Related