390 likes | 538 Views
The Scientific Literature Background. Lack of universal well-structured repositories of scientific and research data for experimentation and benchmarking of pertinent research works in a given thematic area
E N D
The Scientific Literature Background Lack of universal well-structured repositories of scientific and research data for experimentation and benchmarking of pertinent research works in a given thematic area Fragmented, lengthy, weak and inefficientpeer review processes given the growing number of journals, magazines and conferences Non-objective and extremely focused (in terms of the aspects that they cover such as impact and popularity) tools and metrics for assessing research work as well as individuals, institutions and organizations which are based on a specific snapshot of the scientific work Poorly linking of research articles to data journals
The Scientific Literature Background • Growing wealth of the scientific work and information produced by researchers and scholars • scientific/research articles • monographs • research datasets • Need for more effective processes and improved tools and techniques towards: • reviewing scientific articles and research data • organising and managing data journals • bibliographic analysis • management of scientometrics and development of new ones • better collaboration between researchers
OpenScienceLink Objectives Open Semantically-enabled, Social-aware Access to Scientific Data • Provide a holistic approach to the publication, sharing, linking, reviewing and evaluation of research results based on open access to scientific information • Empower a novel eco-system for open access to scientific information, which will provide a range of added-value services for all stakeholders • Main Outcomes: • The OpenScienceLink platform • Implementation of 5 pilots • The Biomedical Data Journal (BMDJ)
OpenScienceLink Pilots Research Dynamics-aware Open Access Data Journals Development Novel open, semantically-assisted peer review process Data Mining for Biomedical and Clinical Research Trends Detection and Analysis Data Mining for Proactive Formulation of Scientific Collaborations Scientific field-aware, Productivity- and Impact-oriented Enhanced Research Evaluation Services
Pilots Overview • Novel open, semantically-assisted peer review process • Article-based reviewers suggestion • Assign competent reviewers • Review support tools (e.g. automatic retrieval of relevant research articles) • Review form submission • Post-review discussion 1 2 Research Dynamics-aware Open Access Data Journals Development Data journal establishment Journal issue suggestion Dataset submission Dataset peer review Publishing Assessment and evaluation Identification of research dynamics associated with specific datasets
Pilots Overview • Data Mining for Proactive Formulation of Scientific Collaborations • Enable the networking and collaboration of researchers and scholars working on similar or potentially collaborating scientific fields and sharing similar research interests • Infer relationships between researchers and research groups, including (in several cases) non-obvious, non-declared ones 3 4 Data Mining for Biomedical and Clinical Research Trends Detection and Analysis • Detect research trends • Analyse research trends • Essential for: • allocation of research funding (by private sponsors and governmental agencies) • overall planning of research strategies
Pilots Overview 5 Scientific field-aware, Productivity- and Impact-oriented Enhanced Research Evaluation Services • Current simplified indices and impact factors evaluate only an aspect of the scientific work • Introduce, produce and track new metrics of research and scientific performance, beyond conventional ones for evaluation of: • Research work (incl. data papers) • Researcher • Research group or community • Conference, Journal, Publisher • Department, Laboratory, Institution, University, Organisation • Country • Research grant
Integrated Platforms • FP7 SocIoS • A set of tools that leverage the potential of Social Networking Sites (SNSs) • Serves as an umbrella for accessing user data scattered among various SNSs through a common and secure interface • Hides SNS-specific complexity • Enables the delivery of services which exploit social graphs • GoPubMed • A semantic search engine for the life sciences • Allows exploring PubMed search results with concepts from the Medical Subject Headings (MeSH), the Gene Ontology (GO) and the Universal Protein Resource (UniProt) • A data management model expanded with the ability to index, annotate, and semantically search datasets • FP7 PONTE • A knowledge-oriented platform that supports the design and creation of clinical trial protocols • Provides a set of semantic web enabled mechanisms and services facilitating clinical trials lifecycle • Incorporates a set of advanced data mining and semantic reasoning mechanisms which are applied on a variety of web data sources containing clinical and non-clinical information
OpenScienceLink Core Components • The OpenScienceLink core components implement the main functionality of the Platform and form the OpenScienceLink API • Users Management • Responsible for handling all functionality related to the Platform users, their profile and access rights, such as user registration, profile editing, authentication and role-based authorisation. • Datasets Management • Responsible for handling all functionality related to datasets and the corresponding metadata. • Metadata are partially based on the Dryad Metadata Application Profile, including extensions at the level of parameters, e.g. dataset source type (real-world vs. synthetic), level of noise, and species, among others
Core Components Layer • Articles Management • Responsible for handling all functionality related to articles • Authors Management • This component is responsible for handling all functionality related to authors • Groups Management • This component is responsible for handling all functionality related to groups of people
Core Components Layer • Review Data Management • Responsible for handling all functionality related to the review process and the corresponding data • Covers the initiation and updating of the review process as well as the provision of access to the reviews to the corresponding users • For example, for a particular article or dataset, some users can see their own review (e.g. a reviewer), some users can see all reviews without knowing the reviewers (e.g. an author), and some users can see all reviews and reviewers (e.g. a publisher) • Comments and ratings are also managed by this component, always considering each user's access rights
Adaptors Layer The OpenScienceLink core components interact with the SocIoS, GoPubMed and PONTE platforms by means of the adaptors The latter undertake the required actions, mappings and transformations in order to enable communication with the existing platforms and ultimately the underlying data sources for the exploitation of the existing wealth of information
Social Networks Adaptor Comprises a simplification layer on top of the SocIoS Services Undertakes the integration of the underlying SocIoS platform and communicationwith the connected SNS(s) Receives requests from the OpenScienceLink core components for the provision of data stemming from the connected SNSs, including the exact type of information required and the SNS(s) involved Combines SocIoS services in order to provide tailored functionality pertaining to the specific data needs of the OpenScienceLink Core Components Queries the services built on top of the SocIoS platform in order to further process the specific requests and gather the required data Internally performs data processing or mapping that may required for the seamless collaboration between the OpenScienceLink core components and the SocIoS platform in either direction Offered functionality: Persons retrieval, connected persons retrieval, media items retrieval, activities retrieval, data transformation and data extraction
Content and Data Management Adaptor • Integrates the data management system of GoPubMed within the OpenScienceLink Platform • Integrates the services of the GoPubMed semantic search engine • Comprises a simplified layer of services on top of the GoPubMed platform that pertain to the indexing of data, annotation with the underlying ontology concepts, importing of new ontologies, semantic search on the indexed data and identification of trends in the indexed data. • Utilised for presenting statistics about the resulting set of documents, such as the number of publications over time, the top countries, cities, journals, authors and ontology terms • It is, thus, a summary of the trends observed for the documents that are returned via the input query
Semantically-enabled Inference Adaptor Enables the integration of the PONTE platform with OpenScienceLink Exploits the PONTE data mining and semantic reasoning mechanisms and services as well as the rich knowledge base of the PONTE platform Use of the term co-occurrence index building capability of the PONTE platform, in order to exploit the fact that relevant terms appear together in the literature – the more this happens, the more relevant they are considered to be – and build a co-occurrence index for pairs and triples of terms, ranked on each case by frequency (offering a first stage filter of information, able to reduce the amount of information to manageable levels, without sacrificing interesting results, for guiding research) Exploitation of a local knowledge base based on curated data from the web of linked data, as well as specialized data sources (incl. KEGG, ChEBI, DrugBank, Sider, etc) Application of various ranking algorithms to the discovered data, following the knowledge-based concept correlations capability stemming from PONTE, with the ranking results being used either for presentation purposes (top first) or for adjusting the level of inclusion / exclusion of terms deemed relevant.
Conclusions The OpenScienceLink platform enables accessing and offering of added value services (including trends detection and analysis, development of new scientometrics, data journals management, enhanced review processes) based on a multitude of openly accessible data sources (from literature and data sets to social network data), while at the same time empowering their semantic linking and data processing It further offers a wide range of opportunities for better collaboration between researchers, scholars, and research organisations, including their ability to formulate added-value scientific / research networks
Future Work Expand the capabilities of the components and user interfaces according to the recorded end user needs and requirements regarding all Pilots Address any issues with the implemented functionality and provide improvements based on the end user’s evaluation feedback Consider additional data sources for inclusion via integration with the underlying platforms, according to the needs of OpenScienceLink Investigate the integration more SNSs, with the aim to also include networks specifically addressed to researchers and research communities, with the most probable first candidate being Mendeley Analyse the steps required (e.g., link with other domains’ ontologies, data sources and models) for enabling the Platform to offer its services beyond the biomedical domain, and, thus, ideally become domain-agnostic
Thank you • Contact National Technical University of Athens School of Electrical and Computer Engineering Distributed Knowledge and Media Systems Group http://grid.ece.ntua.gr Efstathios Karanastasis Research Engineer +30 210 772 2132 ekaranas@mail.ntua.gr
Log in 23
My profile 28
My datasets 29
Trends 33
PONTE: Eligibility Criteria Model Scope within PONTE: • Formal representation of Eligibility (Inclusion/Exclusion) Criteria • Patients Model for Clinical Research purposes (especially recruitment) Current Status: 1st year work • Work upon extending and adapting the eligibility criteria model for OpenScienceLink purposes Future work: 2nd and mainly 3rd year • Update and Integrate I/E criteria model within OpenScienceLink platform • Annotate literature search results • Improve literature search process
PONTE: Abbreviations - Introduction • An abbreviation is shortened form of a term or expression (aka the expanded form) • Abbreviations are widely used in biomedical articles and datasets. Example: • An abbreviation is present within a document, e.g. “Cardiac testing for all patients at low-risk for ACS is not sustainable”… • But its expansion is missing Acute Coronary Syndrome • Highly Ambiguous • Over 5 expansions per abbreviation on average • Abbreviations expansion detection or prediction is a real challenge
PONTE: Abbreviations - Tasks • Current Status: Work done during 1st year • In-depth analysis of problem • Abbreviation Expansion Detection and Prediction System Architecture • Description of Algorithms / Methodology • Future Work: for 2nd and 3rd year • Repository of abbreviations with expansions along with context • Suggestion of most appropriate expansion for an abbreviation