34/Flux Task Map
The Flux Task Map is a compact mapping between job task ranks and node IDs.
Name |
github.com/flux-framework/rfc/spec_34.rst |
Editor |
Jim Garlick <garlick@llnl.gov> |
State |
raw |
Language
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.
Background
The task map communicates how a parallel program’s tasks are assigned to the allocated nodes. Given a node ID, the task map can provide a set of ranks. Given a rank, the task map can provide a node ID. The task map has the following uses:
Inform parallel runtimes which tasks are co-located on a node so they can use local inter-process communication instead of the network.
Assuming that the RFC 20 resource set (R) and the task map are part of the persistent job record, allow stderr tagged with a task ranks to be mapped to a node ID for postmortem correlation of job errors with node problems.
Note
The task map does not communicate which tasks are bound to or contained with specific resources on the node such as cores or GPUs.
A task map can naively represented as a node ID-ordered list of RFC 22 idsets, with each idset separated by a semicolon. We use this format when defining test vectors and refer to it as the raw task map.
Goals
Represent common regular task distributions such as block and cyclic in a space efficient manner so that the task map can be scalably communicated.
Avoid the need for a custom parser.
Allow custom mappings to be expressed.
Existing Implementations
A de-facto standard task map format is the PMI-1 PMI_process_mapping
format
described in RFC 13, which specifies a list of map blocks, each a 3-tuple
of (nodeid, nnodes, ppn). Some examples are:
nnodes*ppn |
block |
cyclic:1 |
cyclic:2 |
---|---|---|---|
4*4 |
(vector,(0,4,4)) |
(vector,(0,4,1),(0,4,1),(0,4,1),(0,4,1)) |
(vector,(0,4,2),(0,4,2)) |
4*2 + 2*4 |
(vector,(0,4,2),(4,2,4)) |
(vector,(0,6,1),(0,6,1),(4,2,1),(4,2,1)) |
(vector,(0,6,2),(4,2,2)) |
4096*256 |
(vector,(0,4096,256)) |
long (256 map blocks) |
long (128 map blocks) |
Note
The cyclic:N distribution for N > 1 is equivalent to Slurm’s plane distribution.
This mapping is compact for block task distributions, where blocks of contiguous task ranks are assigned to nodes in ascending order. Its scalability breaks down for cyclic task distributions, where one or more task ranks are assigned to nodes in round-robin order. As an example, a PMI task map for 1M tasks distributed over 4K nodes in block distribution is compact as shown above, but the same job with a cyclic distribution (stride of 1) is a string of 2824 characters.
Implementation
The Flux task map SHALL be represented as a JSON array to avoid the need for a custom parser. The array MUST contain zero or more map blocks.
A Flux task map that contains zero map blocks SHALL indicate that the task mapping is unknown.
A Flux task map block is a JSON array with four REQUIRED integer array elements:
- nodeid
The starting node ID for the block (zero-origin).
- nnodes
The number of nodes represented by the block.
- ppn
The number of tasks per node in the block.
- repeat
The number of times the map block is logically repeated.
Note
The Flux 4-tuple map block is a superset of the 3-tuple employed by PMI. Flux adds the repeat element so that map blocks need not be explicitly repeated in cyclic distributions.
The following table provides simple examples of Flux task maps for common regular task distributions:
nnodes*ppn |
block |
cyclic:1 |
cyclic:2 |
---|---|---|---|
4*4 |
[[0,4,4,1]] |
[[0,4,1,4]] |
[[0,4,2,2]] |
4*2 + 2*4 |
[[0,4,2,1],[4,2,4,1]] |
[[0,6,1,2],[4,2,1,2]] |
[[0,6,2,1],[4,2,2,1]] |
4096*256 |
[[0,4096,256,1]] |
[[0,4096,1,256]] |
[[0,4096,2,128]] |
The Flux task map MAY be wrapped in a JSON object when it is communicated. The JSON object has the following REQUIRED keys:
- version
The integer task map version (1 for this RFC).
- map
The task map array described above.
Example:
{"version":1, "map":[[0,4096,256,1]]}
Test Vectors
raw task map |
Flux task map |
---|---|
mapping unknown |
[] |
0 |
[[0,1,1,1]] |
0;1 |
[[0,2,1,1]] |
0-1 |
[[0,1,2,1]] |
0-1;2-3 |
[[0,2,2,1]] |
0,2;1,3 |
[[0,2,1,2]] |
1;0 |
[[1,1,1,1],[0,1,1,1]] |
0-3;4-7;8-11;12-15 |
[[0,4,4,1]] |
0,4,8,12;1,5,9,13;2,6,10,14;3,7,11,15 |
[[0,4,1,4]] |
0-1,8-9;2-3,10-11;4-5,12-13;6-7,14-15 |
[[0,4,2,2]] |
0-1;2-3;4-5;6-7;8-11;12-15 |
[[0,4,2,1],[4,2,4,1]] |
0,6;1,7;2,8;3,9;4,10,12,14;5,11,13,15 |
[[0,6,1,2],[4,2,1,2]] |
14-15;12-13;10-11;8-9;4-7;0-3 |
[[5,1,4,1],[4,1,4,1],[3,1,2,1],[2,1,2,1],[1,1,2,1],[0,1,2,1]] |
0-1;2-3;4-5;6-7;8-9;12-13;10-11;14-15 |
[[0,5,2,1],[6,1,2,1],[5,1,2,1],[7,1,2,1]] |
12-15;8-11;4-7;0-3 |
[[3,1,4,1],[2,1,4,1],[1,1,4,1],[0,1,4,1]] |