Workflow Tools
Eric Eide copied this page into the report Subversion repository on Jun 15, 2010. The text below will soon be removed, so that it doesn't inadvertently diverge from the actual report text.
Workflows have emerged as a useful paradigm to describe, manage, and share complex scientific analyses [Gil et al 07]. Workflows declaratively represent the components or codes that need to be executed in a complex application, as well as the data dependencies among those components. Workflow systems address reproducibility by automatically managing the execution of the applications in distributed environments, and by assisting scientists to assemble the workflows and customizing them to their particular data. Workflows capture an end‐to‐end analysis composed of individual analytic steps as a dependency graph that indicates dataflow links as well as control flow links among steps. Workflows have been used to manage complex applications including biomedical imaging, genomics, astronomy, and geophysics among others.
Two workflow systems were discussed at the workshop:
VisTrails (http://www.vistrails.org) is an open-source scientific workflow and provenance management system developed at the University of Utah that provides support for data exploration and visualization. A key distinguishing feature of VisTrails is a comprehensive provenance infrastructure that maintains detailed history information about the steps followed and data derived in the course of an exploratory task: VisTrails maintains provenance of data products, of the workflows that derive these products and their executions. This information is persisted as XML files or in a relational database, and it allows users to navigate workflow versions in an intuitive way, to undo changes but not lose any results, to visually compare different workflows and their results, and to examine the actions that led to a result. VisTrails allows the combination of loosely coupled resources, specialized libraries, grid and Web services.
Wings on Pegasus. The Pegasus/Condor workflow execution system (http://pegasus.isi.edu) succeeds at isolating users from execution concerns, by automatically managing resource allocation (in grids, clusters, or clouds), failure recovery and resubmission, and large‐scale data management. The Wings semantic workflow system (http://wings.isi.edu) investigates the use of semantic representations to ensure valid reuse of the experimental method rendered as a workflow. Semantic workflow representations that support automatic constraint propagation and reasoning algorithms to manage constraints among the individual workflow steps [Gil et al 10]. Recently, results published in the literature could be reproduced by reusing workflows from a library that captured a wide range of methods that are common in population genomics.
