Task types

Each task has a specific type defining the injected load into the simulated platform, which, depending on its type, has different properties specifying its execution behavior. This page summarizes all task types and their corresponding properties and introduces how ElastiSim distributes the simulated load on resources.

Task payloads & distribution patterns

All tasks carry a load simulated on the platform, which ElastiSim defines as the task’s payload, and the unit of a payload depends on the task type (e.g., FLOPS for compute tasks or bytes for communication). Payloads are always defined using a single number and distributed among participating resources following a payload distribution pattern.

ElastiSim defines two types of distribution patterns: vector and matrix. While vector distribution patterns consider a one-dimensional distribution (e.g., FLOPS per compute node), matrix distribution patterns define communication matrices.

Vector distribution patterns

Pattern Description
all_ranks The payload is evenly distributed among all resources
root_only Only the first resource performs the specified payload
even_ranks The payload is evenly distributed among all even-numbered resources
odd_ranks The payload is evenly distributed among all odd-numbered resources
uniform All assigned resources perform the specified payload without any distribution
vector An explicit vector defining the payload for each participating resource (only applicable to rigid jobs)

uniform is the only exception to other distribution patterns (vector and matrix), as it does not distribute the workload but describes the payload per resource. It is syntactic sugar for the pattern all_ranks with the performance model <payload size> * num_nodes or <payload size> * num_gpus.

Matrix distribution patterns

Pattern Description
all_to_all Each resource communicates bi-directionally with every other resource
gather The first resource receives uni-directionally from all remaining resources
scatter The first resource sends uni-directionally to all remaining other resources
master_worker The first resource bi-directionally communicates with all remaining resources
ring Each resource communicates bi-directionally with its direct neighbors
ring_clockwise Each resource communicates uni-directionally with its right neighbor
ring_counter_clockwise Each resource communicates uni-directionally with its left neighbor
matrix An explicit matrix defining the payload for each possible pair of resources (defined as a vector with the dimension #resources × #resources, only applicable to rigid jobs)

CPU computation & communication task

ElastiSim divides computational tasks into two parts: compute and communication. The reasoning behind this twofold structure is to allow overlapping and coupling computation and communication. Each assigned node computes the load based on its computational capabilities and communicates using the links defined by the underlying topology. However, users are not required to specify both payloads. Specifying only a computation or communication is valid and will only simulate the specified payload. By setting the type property to cpu, the following properties get available:

Property Description Value type Default value Mandatory
flops Computational load of the task integer (FLOPS) - Yes, if bytes is not specified
computation_pattern Payload distribution pattern of the computational load vector distribution pattern - Yes, if flops is specified
bytes Communication load of the task integer (bytes) - Yes, if flops is not specified
communication_pattern Payload distribution pattern of the communication load matrix distribution pattern - Yes, if bytes is specified
coupled Whether computation and communication is strictly coupled (i.e., bound by the slowest resource among all participating nodes) bool false No

Example

{
  "type": "cpu",
  "name": "CPU compute & communication",
  "flops": 8e11,
  "computation_pattern": "uniform",
  "bytes": 5e10,
  "communication_pattern": "all_to_all",
  "coupled": true
}

GPU computation & communication task

Analogous to CPU tasks, GPU tasks also comprise computation and communication. However, as compute nodes can be equipped with multiple GPUs, the communication among GPUs takes place using intra- or inter-node communication. Depending on the platform topology, ElastiSim automatically utilizes the correct links. The type property to gpu, supports the following properties:

Property Description Value type Default value Mandatory
flops Computational load of the task integer (FLOPS) - Yes, if bytes is not specified
computation_pattern Payload distribution pattern of the computational load vector distribution pattern - Yes, if flops is specified
bytes Communication load of the task integer (bytes) - Yes, if flops is not specified
communication_pattern Payload distribution pattern of the communication load matrix distribution pattern - Yes, if bytes is specified

Example

{
  "type": "gpu",
  "name": "GPU compute & communication",
  "flops": 8e12,
  "computation_pattern": "all_ranks",
  "bytes": 7e10,
  "communication_pattern": "ring_clockwise"
}

I/O tasks

All I/O tasks follow the same structure and define the operation (read or write) and the target of the operation (PFS or node-local burst buffer).

type Description
pfs_read Read operation targeting the PFS
pfs_write Write operation targeting the PFS
bb_read Read operation targeting burst buffers
bb_write Write operation targeting burst buffers

In contrast to compute tasks, I/O tasks support asynchronous execution among the following properties:

Property Description Value type Default value Mandatory
bytes I/O size integer (bytes) - Yes
pattern Payload distribution pattern of the I/O size vector distribution pattern - Yes
async Whether the operation is executed asynchronously bool false No

Example

{
  "type": "pfs_write",
  "name": "PFS write",
  "bytes": 5e11,
  "pattern": "all_ranks"
}

Delay tasks

Delay tasks are generic tasks occupying the compute node for a given amount of time, which can be useful to represent any task when computation, communication, or I/O tasks can not appropriately model the application. ElastiSim has two flavors of delay tasks representing either an idling or a busy wait activity. While idle tasks occupy compute nodes without resource utilization, busy wait tasks fully utilize the compute capabilities. Setting the type property to either idle or busy_wait introduces the following property:

Property Description Value type Default value Mandatory
delay Period of time to occupy resources integer (seconds) - Yes

Example

{
  "type": "busy_wait",
  "name": "Busy wait",
  "delay": 720,
  "pattern": "uniform"
}

Task sequences

Task sequences are simple containers that are especially useful when used for repeated execution of a specific sequence. The type to sequence defines a task sequence and makes the following property available:

Property Description Value type Default value Mandatory
tasks Array of tasks array - Yes

ElastiSim defines sequences recursively, allowing them to be nested.

Example

{
  "type": "sequence",
  "iterations": 12,
  "tasks": [
    {
      "type": "cpu",
      "flops": 8e10,
      "computation_pattern": "uniform"
    },
    {
      "type": "pfs_write",
      "name": "PFS write",
      "bytes": 6e10,
      "pattern": "all_ranks"
    }
  ]
}

Resource contention

All tasks in ElastiSim (except idle) utilize resources. While compute capabilities can be exclusively available to jobs if oversubscription is disabled (see Configuration), network communication depends on the underlying platform topology. The simulation engine evenly distributes the bandwidth of shared links when utilized by multiple jobs or overlapping asynchronous I/O tasks. However, if oversubscription is enabled, jobs can share computational resources. While CPUs are shared evenly and immediately with the execution of a new task, GPUs (and intra-node links) are utilized exclusively following a first come, first serve policy.

As busy_wait tasks utilize the compute capabilities of a node, multiple jobs oversubscribing the same node with a busy_wait (or even cpu) task will compete for resources (e.g., two busy_wait tasks of 15 minutes will take 30 minutes to finish).


Copyright © 2023, Technical University of Darmstadt.