The convergence of traditional High Performance Computing (HPC) and new simulation, analysis, and data science approaches provides unprecedented opportunities for discovery but also creates new application and infrastructure challenges. Several ECP workflows exemplify this new reality: a heterogeneous combination of applications, models, and "glue" code, running on heterogeneous compute nodes with machine learning in the middle, and a scalable workflow infrastructure orchestrating the entire process. At extreme scale, these workflows will almost certainly require specialized workflow management software which is within the reach of only large and specialized inter-disciplinary teams. Historically, such approaches have tended to result in complex, integrated, and stovepiped software systems. Further, there are now hundreds of moribund workflow systems, which indicates that many teams, small and large, elect to build their own custom workflow solution rather than adopt, or build upon, an integrated system.
ExaWorks represents a new approach: developing a multi-level SDK that will enable teams to produce scalable and portable workflows for a wide range of exascale applications. ExaWorks does not aim to replace the many workflow solutions already deployed and used by scientists, but rather to provide a robust SDK and work with the community to identify well-defined and scalable component interfaces which can be leveraged by new and existing workflows. Most importantly, this SDK will enable a sustainable software infrastructure for workflows so that the software artifacts produced by teams will be easier to port, modify, and utilize long after projects end. SDK components will be usable by many other WMS thus facilitating software convergence in the workflows community.
The major aims of the project are to
- Create the ExaWorks SDK following a community-oriented process. First defining a community policy for inclusion in the SDK. Based on application requirements we will assemble a set of vertical workflow technologies (level 0), work to integrate those technlogies via existing native interfaces and shim layers (level 1), and collaborate to identify common component interfaces that can be used for interoperation between systems.
- Impact ECP applications. Working closely with ECP applications to design and develop workflows, supporting adoption of then ExaWorks SDK, deriving lessons and best-practices, and sharing these approaches with the community via documentation, tutorials, and hackathons.
- Community engagement with the ECP applications, workflows, and facility communities to harmonize the disparate and stovepiped workflow landscape, ensure ExaWorks SDK operates on exascale systems, and to create resources to support ECP applications and workflow users.
ExaWorks is supported by the the DOE Exascale Computing Project