1 / 17

Steps Towards Mapping e-Research and Measuring Impact

Steps Towards Mapping e-Research and Measuring Impact. Alex Voss, Rob Procter, Peter Halfpenny, Meik Poschen, Marzieh Asgari-Targhi. AHM’08: Workshop on Profiling e-Research: Mapping Communities and Measuring Impacts Edinburgh, 10 th September 2008. Aims.

bud
Download Presentation

Steps Towards Mapping e-Research and Measuring Impact

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Steps Towards Mapping e-Research and Measuring Impact Alex Voss, Rob Procter, Peter Halfpenny, Meik Poschen, Marzieh Asgari-Targhi AHM’08: Workshop on Profiling e-Research: Mapping Communities and Measuring Impacts Edinburgh, 10th September 2008

  2. Aims • To compile a comprehensive* database of e-Social Science activities in the UK and elsewhere • To analyse the data in order to capture snapshot of e-Social Science • To provide a monitoring tool that flags up new content • To provide an infrastructure for further research

  3. Problem • What I would call e-Social Science is not always labeled e-Social Science • Simply googling for the term will provide only a partial view • Need to establish a network of relevant nodes with context information on the web and expand search from there

  4. Approach • Using lists of conference and workshop attendees • Search for relevant URLs • Review resulting data • Harvest web pages connected to these • Extract key terms • Visualise results • Further steps…

  5. Seed List • Data about attendees of events (Intl. Conference and Agenda Setting) • 226 individuals • Removal of duplicates and erroneous entries • Import into SQL database

  6. Search • Using Yahoo Search API, generating list of URLs matching name, surname and affiliation • Restricted to .ac.uk, .edu and .nhs.uk and .gov.uk • Results in 30k hits for 226 people • Extraction of hostnames from URL

  7. Removing False Positives • Clustering of hostnames by frequency showed some systematic false positives through long lists of names on some sites • e.g., lists of alumni, sports teams etc. • Manually removing these for the top 80 hostnames reduced number of URLs by 10k to 20k

  8. Review • Clustering of hostnames by frequency (after cleaning): select count(host) as size, host from url group by host order by size desc; +------+-------------------------------------+ | size | host | +------+-------------------------------------+ | 211 | www.geog.leeds.ac.uk | | 204 | www.nottingham.ac.uk | | 140 | www.shef.ac.uk | | 126 | www.ncess.ac.uk | | 109 | www.manchester.ac.uk | | 97 | www.lancs.ac.uk | | 95 | www.psychology.nottingham.ac.uk | | 93 | redress.lancs.ac.uk | | 92 | www.cs.bris.ac.uk | | 91 | www.comlab.ox.ac.uk |

  9. Review (II) • Clustering of URLs by number of persons mentioned (after cleaning): • +---------------------------------------------------------------------+ • | size | url | +---------------------------------------------------------------------+ • | 24 | http://ess.si.umich.edu/papers.htm • | 17 | http://www.ncess.ac.uk/events/ASW/visualisation/ • | 17 | http://www.ncess.ac.uk/events/conference/2006/papers/ • | 12 | http://ess.si.umich.edu/committee.htm • | 12 | http://redress.lancs.ac.uk/resources/ • | 10 | http://www.kato.mvc.mcc.ac.uk/rss-wiki/VizNET • | 10 | http://www.informatics.manchester.ac.uk/aboutus/staff/| • | 8 | http://www.ncess.ac.uk/about_us/people/?centre= • | 7 | http://www.geog.leeds.ac.uk/people/a.turner/personal/blog/

  10. Checking Completeness • select id from url where url = 'http://ess.si.umich.edu/committee.htm'; • > 59765 • select surname, name from delegate join delegate_url on id = delegate_id where url_id = 59765; • This returns a list of 12 people but actual list of conference PC is much longer • Missing people who are in the database but also people missing in the database • Potential to expand list of people involved in e-Social Science

  11. Harvesting Content • Harvesting 20k web pages takes time • Using multithreaded code to mask latency • Using 40 harvesters still takes about 4h • All but 230 pages harvested • 1.3GB of data

  12. Amending Seed Data • Extracting email addresses • Finding mailto: links actually works quite well • Not much need to deal with obfuscation (such as alex.voss-at-ncess.ac.uk) • But doing this may improve results • How to deal with multiple valid emails • Extracting affiliations • Again, surprising how effective this was but ho • Again, how to deal with multiple affiliations • Affiliation does not map 1:1 to research area

  13. Key Term Extraction • Using NaCTeM’s Termine (using website at the moment, web service soon) • Rank Term5 e-social science10 national centre11 rob procter12 social science13 marina jirotka14 international conference15 social sciences18 mark rouncefield19 computer science22 research centre27 science studies unit35 lancaster university40 computer supported cooperative work46 text mining48 paul luff

  14. Key Term Extraction (II) • Next steps: • Change code to use web services API • Repeat key term extraction for 226 individuals • Create unified key term list • Review and create stop-list • Factor this into tailored Termine service • Named entity recognition to extend seed list

  15. Social Map • Co-occurrence of names on web pages

  16. Further Next Steps • Add weights to social map – how strongly are people connected? • Drawing social network graphs for interactive analysisusing information about link structure • Repeating Yahoo searches to flag up new data appearing • RSS feed on what’s new in e-Social Science • Doing Yahoo searches on the top key terms emerging

  17. Next Steps? • FOAF – type semantic data on e-Social Science projects • What incentives could we leverage to get people to provide the information we are interested in? • Combining with bibliometric work • New kinds of entities: • Publications • Projects, Organisations

More Related