Go directly to page contentent

Distributed data management for Genomics in the EOSC Federation

Sector:Data science
Typology:International
Programme:Horizon Europe
Project duration:06/05/2026 - 30/11/2026

The FORGE project deploys and validates a cross-node genomics scientific use case within the EOSC Federation, demonstrating the practical added value of interoperable data and service integration across Nodes. Coordinated by Area Science Park, the initiative integrates the ORFEO HPC/AI infrastructure with the CERN EOSC Node through Rucio for federated data management, and with the Italian EOSC Node (Fondazione ICSC) for metadata registration and discoverability.

Two key technologies sit at the heart of the workflow:

  • ORFEO is Area Science Park’s HPC/AI data centre, supporting large-scale genomics, multi-omics, and data-intensive workflows within a FAIR-by-design ecosystem, and is being positioned as a federation-ready national infrastructure through collaboration with Fondazione ICSC.
  • Rucio is an open-source, policy-aware data management system developed at CERN for orchestrating distributed data across heterogeneous storage infrastructures, currently proposed as one of the possible backbones for EOSC-wide data federation beyond the High-Energy Physics domain.

The innovation lies in delivering a functional “find, access, process” workflow across nodes in a real-world genomics scenario.

The project builds on mature, production- grade infrastructures and open-source technologies, focusing on interoperability orchestration rather than new tool development.

 

MAIN OUTCOME

The main outcome is a reusable integration pattern (National Catalog + Rucio + Federated Analysis Environment) that can be adopted by other research infrastructures and future EOSC Nodes.
By strengthening intra-node and cross-node workflows and aligning metadata standards the project contributes to the EOSC Build-Up portfolio and supports scalable, FAIR-aligned data federation for life science research.

Outputs include:

  • a federation-ready integration of ORFEO S3 storage into CERN’s Rucio instance;
  • ingestion, metadata enrichment, and registration of genomics datasets;
  • metadata profiling and registration within the Italian EOSC Node catalogue to enhance findability;
  • a policy-compliant cross-node access model;
  • a fully documented scientific workflow demonstrating distributed analysis;
  • and a reusable integration pattern (National Catalog + Rucio + Federated Analysis Environment) that other research infrastructures and future EOSC Nodes can adopt directly.

Partners

Area Science Park

EOSC NODE CERN
EOSC NODE ITALY

Contact