SCOLORINA

Machine learning algorithms for single-cell genomics from long-read sequencing technologies

Sector:Data science, Life Sciences

Typology:National

Programme:PRIN 2022 PNRR

Project duration:01/12/2023 - 30/11/2025

SCOLORINA is dedicated to the development of new machine learning and artificial intelligence algorithms for the analysis of genomic data generated through next-generation sequencing technologies, with a particular focus on single-cell and long-read sequencing. The initiative stems from the convergence of two major frontiers in contemporary genomics: the ability to investigate the molecular content of individual cells and the possibility of reading much longer DNA or RNA sequences than with conventional techniques. This combination opens up highly promising perspectives for understanding biological and disease mechanisms, while also requiring the development of entirely new computational tools.

The project focuses in particular on the emerging single-cell long-read sequencing paradigm, a very recent technological field for which dedicated analysis methods are still lacking. To address this need, the SCOLORINA project aims to develop a first generation of advanced inference tools for single-cell long-read data produced on Oxford Nanopore platforms. The objective is to transform highly complex genomic data into meaningful information for biomedical research, contributing to the development of new methodologies for computational biology and precision medicine.

A central part of the activities concerns the generation and analysis of new single-cell long-read RNA data from chronic lymphocytic leukemia samples. On this basis, the project includes the development of algorithms capable of identifying allele-specific transcription patterns and copy number alterations at the single-cell level, with the goal of detecting biological signals that are difficult to observe through conventional approaches. The methods developed will combine Bayesian probabilistic models, deep learning techniques, and software tools implemented in R and Python, which will be made available as open source to encourage reuse by the scientific community.

Overall, the project seeks to strengthen the role of digital technologies and artificial intelligence in the life sciences by providing new analysis frameworks and advanced software tools to support researchers, research infrastructures, and clinical stakeholders involved in the study of complex diseases. In this way, the initiative contributes to the development of more advanced health technologies, with potential benefits for disease understanding and for the design of increasingly precise approaches in the field of personalized medicine.

OBJECTIVES

The project aims to develop new machine learning and artificial intelligence algorithms for the analysis of single-cell long-read data, to generate new genomic datasets from chronic lymphocytic leukemia samples, and to create open-source software tools for the inference of allele-specific transcription and copy number alterations. The objectives also include validating the methods on real-world data and disseminating advanced computational frameworks to support biomedical research and precision medicine. The expected outcome is the availability of new digital tools for interpreting highly complex genomic data and enabling a more accurate understanding of the biological mechanisms underlying disease.

Partners

University of Trieste (Cancer Data Science Laboratory, Prof. Giulio Caravagna)

Contact

Alberto Cazzaniga
Data Engineering Laboratory, Area Science Park
alberto.cazzaniga@areasciencepark.it