Partners: Instruct-ERIC, EuroBioimaging
Project description
Although useful resources for data handling/integration are being developed, their visibility to the scientific community is limited due to lack of proper platforms that support their visibility. Additionally, long-term sustainability of resources is required (even if the projects are completed!). This is why this science project brings together the social science and humanities and life sciences clusters to assess the metadata schemas and domain catalogues to develop a strategy to map and relate the different schemas.
This project focused specifically on data objects related to COVID-19:
- reviewing models for automatic raw and intermediate data preservation and sharing, including software and analysis workflows;
- identifying and share best practice;
- linking the catalogues with the emerging EOSC resources and the EOSC interoperability framework.
This work aligned with, and enriched, the data currently being exposed within the European COVID-19 data portal.
Technical challenge
Data silos emerged as a result of the huge amount of biological and chemical data generated by different scientific community (EU-OpenScreen, ChEMBL, Work Package partners, etc.). Therefore, there is a need to FAIRify the data to understand disease/medical conditions as a whole. The task in this project was to create a workflow for FAIR data and thus enable data integration/harmonisation. Additionally, such data are subjected to visualisation with semantic networks known as Knowledge Graphs. The KGs can further be used for making scientific queries or perform downstream analyses. In this project, reproducible workflows were created which enable time and cost effective KGs. In this process, APIs from public databases (ChEMBL, Uniprot and so on) were used to enrich the KG.
The EOSC Future added value
- EOSC provided an environment to collaborate with experts from different domain
- The resources were available/hosted in EOSC marketplace and can be found easily
- Service providers could see the latest information about total views/downloads, which help to monitor the impact of specific resources
- The resources were created using EGI-Notebooks which is a horizontal service in EOSC (This is very useful because the developers do not have to install any software/tools in their local pc. Everything can be done on a personalised and secure remote server)
Main results
- Resources for in-silico methods towards drug-repurposing (use case: Mpox KG)
- Characterisation of novel fragments using KGs and fingerprint analysis
- Alignment with similar resources developed by others (eg: Disease Maps in WP5 from BY-COVID)
- Enriching KGs with resources developed by other partners
Other resources
The codebases for the KG resources are maintained at:
Publications:
Poster: https://doi.org/10.5281/zenodo.7990992
KG hosted at NDex: https://doi.org/10.18119/N9SG7D