Transcription factors (TFs) are regulatory proteins that bind specific sequence motifs in the genome to activate or repress transcription of target genes. Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is the most commonly used experimental technique to profile genome-wide, protein-DNA binding maps. However, the in vivo binding landscapes of TFs vary across different cellular contexts and TFs. While the number of in vivo TF-binding datasets continue to grow, it is still not possible to perform TF ChIP-seq assays for every expressed TF against all cell types/tissues under all possible physiological conditions. Hence, accurate and high-resolution computational approaches are necessary to close this gap and complement experimental results.
Several computational approaches have been used to model the in vitro sequence affinity of TFs ranging from simple position weight matrices (PWMs) to more complex representations capturing higher-order positional interactions of nucleotides. However, computational models that rely only on sequence are generally inadequate to accurately predict context-specific TF binding maps as they fail to model several context-specific factors beyond the intrinsic DNA sequence affinity of the TF and the local sequence context such as TF concentration, cooperation and competition with other TFs and nucleosomes, local chromatin state and distal regulatory interactions. More recently, the DNase-seq assay was developed to obtain genome-wide maps of chromatin accessibility by leveraging the DNase I enzyme that preferentially cleaves accessible chromatin bound by TFs. New computational approaches that integrate DNA sequence with chromatin accessibility data are now capable of predicting context-specific TF-DNA binding maps. However, differences in data processing, selection of negative background and evaluation measures have confounded performance comparison of these methods.
The goal of the ENCODE-DREAM in vivo Transcription Factor Binding Site Prediction Challenge is twofold:
1. We aim to identify the best performing model for predicting positional in vivo TF binding maps across cell types and tissues.
2. The results of the challenge will represent a systematic benchmarking and comparison of such computational methods and will be used for both assessing the current state of the field and for future method development.