A framework for the visual exploration of workflow schedules

Abstract

In a distributed processing environment a scheduling mechanism is required to assign the work to the compute nodes. Many different aspects play a role in creating the schedule. Besides the tasks and the time of when they are queued there can be dependencies to other tasks that require finishing these tasks before. Tasks can have special resource requirements (e.g., a minimum of memory) that have to be respected by the scheduler. With the increasing size of a compute cluster, the failure of nodes becomes more likely. If a node processing a task fails, the task might need to be rescheduled on a different node. Further, some tasks might have a priority over other tasks. This variety of aspects as well as a high number of tasks and compute nodes can make it difficult for a human to understand what is happening in a cluster. A number of activites such as tracing errors or optimizing schedulers benefit from having an understanding of the schedule. However, currently there is not much tool support for exploring schedules and investigating issues therein.

Improving the scheduling exploration tool support is the goal of this bachelor thesis. In the first step, current approaches and tools for visualizing and exploring schedules will be evaluated. Then, a conceptual software archtitecture is developed that should allow the interactive exploration. This architecture should be flexible with respect to integrating different information sources and should allow for a flexible visual exploration, e.g. by highlighting or showing only specific parts of the schedule based on what the user is interested in. This concept will be implemented in form a prototype that will be evaluted in an industry context by surveying developers and experts.

Feature ideas

A number of ideas for supporting the exploration and investigation of schedules already exist. Subsequently some are given:

  • Possibility to integrate different data sources providing the required information (e.g. schedule, queue-time, resource usage)
  • Exploration of the scheduled tasks by filtering and highlighting parts of the schedule with user-given criteria (e.g. via a GUI and/or query language)
  • Display/visualization of relevant metrics such as
    • queue-time (distinguished by reason: no free resources, dependency to not finished tasks),
    • resource usage (CPU, memory, disk space, IO – individually for each compute node)
    • unutilized compute nodes
  • Consideration of computer cluster changes such as as the addition of new compute nodes or the failure thereof (relevant for correct calculation of some metrics as well as for the visualization)
  • Graphical report generation of schedule views (e.g. current view after user applied some filtering & highlighting; option to make annotations/add markers)

Related work

  • Type: Bachelor Thesis
  • Status: Current
  • ID: 2020-016
  • Student: Anton Stanchev