Understanding transcription signals at the individual cell resolution is fundamental to our understanding of more complex biological systems such as tissues, organs, and is essential to characterizing cell-to-cell heterogeneity. On the other hand, examining the epigenomic landscape is important for understanding the regulatory transcriptional programs. Emerging high-throughput sequencing technologies now allow for transcript quantification and chromatin accessibility at the single cell level. Nevertheless, these technologies present unique challenges, due to the low amounts of mRNA that is sequenced per cell (scRNA-seq), and low copy numbers (scATAC-seq), leading to inherent data sparsity. In scRNA-seq, proper signal correction is key to accurate gene expression quantification, which propagates into downstream analysis such as differential gene expression analysis, cell-type specific marker identification, and reconstruction of differentiation trajectory. In the even more sparse scATAC-seq data, the correct identification of informative features is key to assessing cell heterogeneity at the chromatin level.
The aim of this challenge is three-fold:
- evaluate computational methods for signal correction and peak identification in scRNA-seq and scATAC-seq, respectively
- assess the impact these methods have on downstream analysis
- map scRNA-seq and scATAC-seq data (also known as manifold alignment) for measurements that were collected simultaneously for the same cells.
This challenge will be implemented in 2 phases, where phase 1 will assess methods for signal correction and peak identification within each respective modality, and phase 2 will integrate the two modalities.