Simulation results

ElastiSim provides detailed results after each simulated scenario as comma-separated value (CSV) files. By enabling the monitoring module and setting a sensing interval (see Configuration), users gain deeper insight into the utilization of the platform. This page describes the available columns in each CSV file.

Job statistics

The job statistics file provides the results of each job in the simulated scenario. Each row corresponds to one job.

Columns Value type Description
ID integer see Scheduling interface
Type string see Job
Submit Time float (seconds) see Job
Start Time float (seconds) see Scheduling interface
End Time float (seconds) see Scheduling interface
Wait Time float (seconds) see Scheduling interface
Makespan float (seconds) see Scheduling interface
Turnaround Time float (seconds) see Scheduling interface
Status string Whether the job completed gracefully or was killed ("completed" or killed)

Node utilization

The node utilization file provides insights into node state changes during the simulation. Each row describes a change of state and its corresponding node.

Columns Value type Description
Time float (seconds) Time (absolute) of node state change
Node string Node name
State string Node state
Running jobs string IDs of running jobs separated by a semicolon
Expected jobs string IDs of expected jobs separated by a semicolon

CPU utilization

CPU utilization requires active monitoring.

The monitoring module measures CPU utilization per node. The column count in the resulting CSV file depends on the number of compute nodes.

Columns Value type Description
Time float (seconds) Time (absolute)
<Node_0> float (percentage, [0,1]) CPU utilization
<Node_n> float (percentage, [0,1]) CPU utilization

Values represent the utilization of the compute node’s total computational power. As CPU workloads fully utilize the compute node, the utilization of a single node is either 0 % or 100 %. The only exception is for coupled CPU tasks (compute and communication), where the network is the bottleneck, resulting in underutilization of CPU resources.

Network activity

Network activity requires active monitoring.

ElastiSim defines network activity as the sum of the maximum bandwidth of all links in the simulated platform. Realistically, the network activity will never reach 100 %, as it is unlikely that all links in the network topology will be fully utilized simultaneously.

Columns Value type Description
Time float (seconds) Time (absolute)
Utilization float (percentage, [0,1]) Network activity

Network activity does not consider PFS or loopback links.

PFS utilization

PFS utilization requires active monitoring.

The monitoring module senses the utilization of the PFS links specified in the configuration file. The results include absolute and relative utilization (relative to maximum bandwidth).

Columns Value type Description
Time float (seconds) Time (absolute)
Read float (bytes/s) PFS read utilization
Write float (bytes/s) PFS write utilization
Read (rel.) float (percentage, [0,1]) PFS read utilization
Write (rel.) float (percentage, [0,1]) PFS write utilization

GPU utilization

GPU utilization requires active monitoring.

Analogous to CPU utilization, the monitoring module measures GPU utilization per node. The utilization is a relative value and reflects partial usage in the case of multiple GPUs per node.

Columns Value type Description
Time float (seconds) Time (absolute)
<Node_0> float (percentage, [0,1]) GPU utilization
<Node_n> float (percentage, [0,1]) GPU utilization

Task times

Task time logging requires log_task_times to be set to true.

If activated, ElastiSim measures task times per job and node. The duration of a task sequence includes all subtask times.

Columns Value type Description
Time float (seconds) Time (absolute)
Job integer Job ID
Node string Node name
Task string Task name (empty if not set)
Duration float (seconds) Measured task duration

Copyright © 2023, Technical University of Darmstadt.