Application model

Application models have a strict hierarchical structure and follow a building-block approach comprising phases and tasks. While phases represent stages within an application, tasks represent the load executed on the simulated platform.

To allow dynamic reconfigurations during runtime, ElastiSim introduces scheduling points. Malleable applications reaching a scheduling point can adapt to new resources if the scheduler has decided to reconfigure the job. Thus, dividing applications into phases and tasks allows for modeling malleable (and evolving) workloads. ElastiSim includes scheduling points by default at the beginning of each phase (unless specified otherwise).

A figure describing ElastiSim's application execution model of a malleable job

Scheduling decisions not taken at a job’s scheduling point (e.g., periodic invocation) will be applied when the job reaches its next scheduling point. If a job has no further scheduling point, the reconfiguration remains unapplied.

Phases

Users define phases by specifying the following properties using the JSON format:

Property Description Value type Default value Mandatory
phases Array containing all phases (top-level structure) array - Yes
iterations Number of iterations (i.e., repetitions) of the phase (has to be inferable at start time) integer or string (see Performance models) 0 No
scheduling_point Whether a scheduling point is included before (each iteration of) the phase bool true No
evolving_request Evolving request specifying the number of nodes to issue before (each iteration of) the phase (unspecified implies no evolving request) integer or string (see Performance models) unspecified No
final_scheduling_point (deprecated) Whether the final scheduling point is included bool true No
barrier Whether there is a barrier at the beginning of the phase (only considered when there is no corresponding scheduling point or evolving request) bool true No
tasks Array holding the tasks of the phase array - Yes

By definition, applications do not hold scheduling points or evolving requests before the first phase (for phases specified in the phases array).

ElastiSim clips evolving requests by default to stay in the possible range of configurations (i.e., [num_nodes_min, num_nodes_max], see Configuration). Applications consider evolving requests only if the number of requested nodes differs from the number of assigned nodes. In those cases, invoking the scheduler with an evolving request is mandatory. For adaptive jobs, if a phase specifies an evolving request and a scheduling point, the evolving request has a higher priority and will take precedence over the scheduling point.

Reconfiguration penalty

Reconfiguring applications during runtime can introduce additional overhead while the application adapts to its new configuration (e.g., data redistribution). ElastiSim defines that overhead as the reconfiguration penalty and provides two particular phases executed that can reflect such overhead. The on reconfiguration phase runs on all assigned resources on each reconfiguration. In contrast, the on expansion phase runs only on newly assigned resources. As applications also might have an initialization phase, ElastiSim provides an additional on initialization phase executed only on the first configuration of a job to facilitate the modeling of all stages during application execution.

Phase Description
on_init Executed only on the first job configuration (i.e., once in total)
on_reconfiguration Executed on each reconfiguration on all resources
on_expansion Executed on each reconfiguration only on newly assigned resources

Tasks

Analogously, tasks are defined with the following properties:

Property Description Value type Default value Mandatory
type Task type (see Task types) string - Yes
name Name of the task (only relevant in log messages) string None No
iterations Number of iterations (i.e., repetitions) of the task integer or string (see Performance models) 0 No
synchronized Whether all resources (i.e., compute nodes) synchronize before executing the task (similar to an MPI_Barrier() before execution) bool false No

Example application model

An example of an application model with ten repetitions of a compute (CPU) and PFS write task.

See Task types for detailed task descriptions.

{
    "on_init": {
        "tasks": [
            {
                "type": "pfs_read",
                "name": "Read model",
                "bytes": "model_size",
                "pattern": "root_only"
            },
            {
                "type": "cpu",
                "name": "Scatter model",
                "bytes": "model_size",
                "communication_pattern": "scatter"
            }
        ]
    },
    "on_reconfiguration": {
        "tasks": [
            {
                "type": "pfs_read",
                "name": "Read model",
                "bytes": "model_size",
                "pattern": "root_only"
            },
            {
                "type": "cpu",
                "name": "Scatter model",
                "bytes": "model_size",
                "communication_pattern": "scatter"
            }
        ]
    },
    "phases": [
        {
            "iterations": "iterations",
            "evolving_request": "num_nodes % 2 == 0 ? num_nodes + 1 : num_nodes - 1",
            "tasks": [
                {
                    "type": "cpu",
                    "name": "Compute & communicate",
                    "flops": "flops/num_nodes^alpha",
                    "computation_pattern": "uniform",
                    "bytes": "communication_size",
                    "communication_pattern": "all_to_all",
                    "coupled": true
                },
                {
                    "type": "pfs_write",
                    "name": "Checkpoint",
                    "bytes": "checkpoint_size",
                    "pattern": "all_ranks"
                }
            ]
        }
    ]
}

Copyright © 2023, Technical University of Darmstadt.