Linking to a New Data Source. Using the Academic Search Data.
Linking to a New Data Source
Using the Academic Search Data
Disambiguating publication authorships is a well-recognized problem in the field of academic publishing. The task is further complicated by the welter of available sources of publications data. UF has taken steps to input publication data via hand-input and automated ingests of Thompson-Reuters data. Future efforts to add new data sources, such as Microsoft Academic Search, to VIVO will enrich publications data with the end goal of creating complete, fully-disambiguated publication records for each author.
This project was intended to be a proof of concept, demonstrating our ability to create a programmatic link between VIVO and the Microsoft Academic Search API, then retrieve publications data about University of Florida investigators. Project work was performed on a small subset of investigators homed in our CTSI.
Additional efforts have centered around providing a list of publications involving University of Florida authors back to the Microsoft Academic Search team.
Future work on this project is expected to include the possible correction/union of details attached to publication titles that may be present in both systems.
The Microsoft Academic Search API can be accessed using an API key, which can be requested from the Academic Search web site.
On the Academic Search side, our process involves getting JSON objects back from the RESTful interface using Python.
On the VIVO side, our process involves getting JSON objects using SPARQL.
A hybrid record consisting of data elements from both services is then constructed. Future work will involve serializing the data back out to VIVO-compliant RDF/XML to enrich the VIVO publication record.
Microsoft Academic Search uses machine-learning algorithms to disambiguate authorships, sometimes leading to papers being incorrectly attributed or grouped. As UF’s data is hand-curated and features authors we have a personal interest in, future work should involve sorting out incorrectly attributed papers . We believe the hybrid approach (automation and hand-entry) is needed to cover all the cases.
University of Florida CTSI: Consuming and disambiguating publications data from Microsoft Academic Search in VIVO.
Did it Work?
What is Academic Search?
Our project is considered a success, as we’ve been able to retrieve data from Academic Search, compare it to existing VIVO data in order to match or otherwise disambiguate, then ingest any new data into VIVO. We are also able to produce a list of missing publications for the Academic Search team, and are working on a process to provide this data to them.
We believe that evaluation of publication details is simply a matter of developing the proper code, likely in Python, since all connections are already in place and required data is available.
Please contact any of the authors of this poster regarding this work. All authors can be found in UF VIVO.
Microsoft Academic Search is a free service developed by Microsoft Research to help scholars, scientists, students, and practitioners quickly and easily find academic content, researchers, institutions, and activities.
Microsoft Academic Search takes full advantage of results from the Bing search engine, indexing thousands more publications than can be found at any other single source (almost 39 million publications for 20 million authors at this time).
Fetching data from two sources
Reading JSON objects
Nicholas Rejack1, Erik Schmidt1, Michael Conlon 1
1 Clinical and Translational Science Institute, University of Florida