Cancer is driven by aberrations in the genome [1,2], and these alterations manifest themselves largely in the changes in the structure and abundance of proteins, the main functional gene products. Hence, characterization and analyses of alterations in the proteome has the promise to shed light into cancer development and may improve development of both biomarkers and therapeutics. Measuring the proteome is very challenging, but recent rapid technology developments in mass spectrometry are enabling deep proteomics analysis [3]. Multiple initiatives have been launched to take advantage of this development to characterize the proteome of tumours, such as the Clinical Proteomic Tumor Analysis Consortium (CPTAC). These efforts hold the promise to revolutionize cancer research, but this will only be possible if the community develops computational tools powerful enough to extract the most information from the proteome, and to understand the association between genome, transcriptome and proteome in tumors.
To address this issue we have created a community-based collaborative competition: The NCI-CPTAC DREAM Proteogenomics Challenge. The challenge will use public and novel proteogenomic data generated by the CPTAC to try to answer fundamental questions about how different levels of biological signal relate to one another. In particular, we focus on understanding:
- Can one impute missing values in proteomics data given observed proteins?
- Can one predict abundance of any given protein from mRNA and genetic data?
- Can one predict the phosphoproteomic data, using proteomic, mRNA and genetic data?