The highly distributed and disparate nature of genomic and clinical data generated around the world presents an enormous challenge for those scientists who wish to integrate and analyze these data. The sheer volume of data often exceeds the capacity for storage at any one site and prohibits the efficient transfer between sites. To address this challenge, researchers must bring their computation to the data. Numerous groups are now developing technologies and best practice methodologies for running portable and reproducible genomic analysis pipelines as well as tools and APIs for discovering genomic analysis resources. Software development, deployment, and sharing efforts in these groups commonly rely on the use of modular workflow pipelines and virtualization based on Docker containers and related tools.
There are three highly active and related Containerized Tool/Workflow groups: the GA4GH Containers and Workflows Task Team, the NCI Containers and Workflows Interest Group, and the NIH Commons Framework Working Group on Workflow Sharing and Docker Registry. Members of these groups have decided to work together to test and demonstrate tool portability while developing common standards. The resulting groundbreaking series of GA4GH/DREAM Infrastructure Challenges offer frameworks to test and evaluate different systems and platforms for executing tools and workflows. Participants in the previous GA4GH/DREAM Tool Execution Challenge downloaded a CWL/WDL-based Docker tool and showed it could be executed in a variety of platforms, with over 30 groups successfully participating. In the current GA4GH/DREAM Workflow Execution Challenge, participants will download a Dockerized, CWL/WDL-described workflow — along with any required input, reference, or parameter files — from a central registry on Synapse. Participants will then run the workflow in their system and upload the results back to Synapse along with a description of how they ran in their system of choice. Like the earlier tool challenge, the workflow execution challenge will demonstrate workflow portability and reproducibility in a concrete way.