Quickstart
The easiest way to get started with ElastiSim is by cloning the example project available on GitHub. This scenario simulates an FCFS (first come, first serve) scheduling algorithm applied on 500 jobs—including all job types—with alternating compute and I/O phases (see Application model) running on a crossbar topology with 128 compute nodes. While evolving and adaptive jobs request a new configuration at the beginning of each iteration of the specified phase, malleable jobs accept any reconfiguration. The scheduler accepts all evolving requests that shrink the job but accepts evolving requests to expand the job only to a maximum of available nodes. For malleable jobs, the scheduler expands to the highest possible number of nodes based on the number of available nodes. The following steps will create a Docker container including all the required libraries for ElastiSim and start the simulation.
Installation
To build the container required to run ElastiSim, install Docker and execute the following command:
docker build -t elastisim .
Simulation
To run the simulation, execute the following commands in two different sessions:
Linux:
docker run -v $PWD/data:/data -v $PWD/algorithm:/algorithm -u `id -u $USER` --name elastisim -it --rm elastisim /data/input/configuration.json --log=root.thresh:warning
docker exec -u `id -u $USER` -it elastisim python3 /algorithm/algorithm.py
Mac OS:
docker run -v $PWD/data:/data -v $PWD/algorithm:/algorithm --name elastisim -it --rm elastisim /data/input/configuration.json --log=root.thresh:warning
docker exec -it elastisim python3 /algorithm/algorithm.py
Windows (PowerShell):
docker run -v ${PWD}\data:/data -v ${PWD}\algorithm:/algorithm --name elastisim -it --rm elastisim /data/input/configuration.json --log=root.thresh:warning
docker exec -it elastisim python3 /algorithm/algorithm.py
The first command runs the ElastiSim simulator process and accepts two inputs:
- the configuration file (JSON)
- the logging level
For a more detailed output, change --log=root.thresh:warning
to --log=root.thresh:info
(caution: verbose).
The second command runs the scheduling algorithm.
Application model
The following flowchart visualizes the application model used in the example project.
flowchart TD
Start([Start])
Start --> Read["Read model from PFS<br>(root only)"]
Read --> Scatter[Scatter model]
Scatter --> Compute[Compute &<br>communicate]
Compute --> Write[Checkpoint to PFS]
Write --> WD{Workload<br>done?}
WD -->|yes| Stop([End])
WD -->|no| Evol{Evolving or<br>adaptive job?}
Evol -->|yes| Even{Number of phase<br>iteration even?}
Evol -->|no| Mall{Malleable<br>job?}
Even -->|yes| Req_more[Request four<br>fewer nodes]
Even -->|no| Req_fewer[Request four<br>more nodes]
Req_more -.-> Inv
Req_fewer -.-> Inv
Mall -.->|yes| Inv
Inv[[invoke scheduler]]
Inv -.-> NC{New<br>configuration?}
Mall -->|no| Compute
NC -->|no| Compute
NC -->|yes| Reconf[[Reconfigure]]
Reconf --> Read
Communicating with the scheduler is the runtime’s responsibility and is not controlled by the application model (represented with dotted links).