Partners: Instruct-ERIC, EuroBioimaging
Project description
Although useful resources for data handling/integration are being developed, their visibility to the scientific community is limited due to lack of proper platforms that support their visibility. Additionally, long-term sustainability of resources is required (even if the projects are completed!). This is why this science project brings together the social science and humanities and life sciences clusters to assess the metadata schemas and domain catalogues to develop a strategy to map and relate the different schemas.
The science clusters use different metadata schema, based on strong and well-established domain standards, such as ECRIN metadata schema for clinical research, MIABIS for biobanks, DICOM for images. To address complex scientific and societal problems, the metadata of data objects managed by these different research infrastructures should be made interoperable.
This project will focus specifically on data objects related to COVID-19:
- reviewing models for automatic raw and intermediate data preservation and sharing, including software and analysis workflows;
- identifying and share best practice;
- linking the catalogues with the emerging EOSC resources and the EOSC interoperability framework.
This work will align with, and enrich, the data currently being exposed within the European COVID-19 data portal.
Technical challenge
The huge amount of biological and chemical data generated by different scientific community (EU-OpenScreen, ChEMBL, Work Package partners, etc.) lead to data silos. Therefore, there is a need to FAIRify the data to understand disease/medical conditions as a whole. The task in this project was to create a workflow for FAIR data and thus enable data integration/harmonization. Additionally, such data are subjected to visualization with semantic networks known as Knowledge Graphs. The KGs can further be used for making scientific queries or perform downstream analyses. In this project, reproducible workflows were created which enable time and cost effective KGs. In this process, APIs from public databases (ChEMBL, Uniprot and so on) are used to enrich the KG.
The EOSC Future added value
- EOSC provided an environment to collaborate with experts from different domain
- The resources are available/hosted in EOSC marketplace and can be found easily
- As a service provider, we are provided information about total views/downloads which help us monitor the impact of our resources
- The resources were created using EGI-Notebooks which is a horizontal service in EOSC. This is very useful because the developers do not have to install any software/tools in their local pc. Everything can be done on a personalized and secure remote server.
Main results
- Resources for in-silico methods towards drug-repurposing (use case: Mpox KG)
- Characterization of novel fragments using KGs and fingerprint analysis
- Alignment with similar resources developed by others (eg: Disease Maps in WP5 from BY-COVID)
- Enriching KGs with resources developed by other partners
Other resources
The codebases for the KG resources are maintained at:
Publications:
Poster: https://doi.org/10.5281/zenodo.7990992
KG hosted at NDex: https://doi.org/10.18119/N9SG7D