20/Resource Set Specification Version 1
This specification defines the version 1 format of the resource-set representation or R in short.
Name: github.com/flux-framework/rfc/spec_20.rst
Editor: Dong H. Ahn <ahn1@llnl.gov>
State: Raw
Language
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.
Overview
Flexible resource representation is important for some of the key components of Flux. Resource requests are part of Flux jobspec, described in RFC 14. This RFC describes the format of a concrete resource-set representation referred to as R, constructed by the scheduler in response to a resource request. R is input to the remote execution system, which uses information expressed in R to establish containment, binding, mapping, and execution of program tasks, apportioned across broker ranks. As a program terminates, the execution system releases shards of the original R, eventually adding up to its union, back to the scheduler. Finally, when a Flux instance launches a child instance, R is passed down from the enclosing instance to the child instance, where it primes the child scheduler with a block of allocatable resources.
Design Goals
The R format is designed with the following goals:
Allow the resource data conformant to our resource model (RFC 4) to be serialized and deserialized with no data loss;
Express the resource allocation information to the program execution service (RFC 8);
Use the same format to release a resource subset of R to the scheduler;
Allow the consumers of R to deserialize an R object while minimizing the parsing complexity and the data to read;
Producers and Consumers
The scheduler for a Flux instance (or instance scheduler) uses this format to serialize each resource allocation as REQUIRED by the instance program execution service and OPTIONALLY REQUIRED by child scheduler instances.
The instance scheduler deserializes an R object to build its internal resource data used for scheduling.
Users MAY manually write an R object for testing and debugging.
User-facing utilities that query a resource status (e.g., what resources are available or idle, or what resources are allocated to a job) MAY use an R object to extract this information;
The program execution service emits a valid R object to release a resource subset of an R to the instance scheduler.
Resource Set Format Definition
The JSON documents that conform to the R format SHALL be referred
to as R JSON documents or in short R documents.
An R JSON document SHALL consist of a dictionary with four
keys: version
, execution
, scheduling
and attributes
.
It SHALL be valid if and only
if it contains the version
key and either or both the execution
and scheduling
keys. The value of the execution
key SHALL contain
sufficient data for the execution system to perform its
core tasks. The value of scheduling
SHALL contain sufficient data
for schedulers. Finally, the value of attributes
SHALL provide
optional information including but not being limited
to data specific to the scheduler used to create
this JSON document.
Version
The value of the version
key SHALL contain 1 to indicate
the format version.
Execution
The value of the execution
key SHALL contain at least the keys
R_lite
, and nodelist
, with optional keys properties
,
starttime
and expiration
. Other keys are reserved for future
extensions.
R_lite
is a strict list of dictionaries each of which SHALL contain
at least the following two keys:
- rank
The value of the
rank
key SHALL be a string list of broker rank identifiers in idset format (See RFC 22). This list SHALL indicate the broker ranks to which other information in the current entry applies.- children
The
children
key encodes the information about certain compute resources contained within this compute node. The value of this key SHALL contain a dictionary with two keys:core
andgpu
. Other keys are reserved for future extensions.- core
The
core
key SHALL contain a logical compute core IDs string in RFC 22 idset format.- gpu
The OPTIONAL
gpu
key SHALL contain a logical GPU IDs string in RFC 22 idset format.
The nodelist
key SHALL be an array of hostnames which correspond to
the rank
entries of the R_lite
dictionary, and serves as a mapping
of R_lite
rank
entries to hostname. Each entry in nodelist
MAY
contain a string in RFC 29 Hostlist Format, e.g. host[0-16]
.
The execution
key MAY also contain any of the following optional keys:
- properties
The optional properties key SHALL be a dictionary where each key maps a single property name to a RFC 22 idset string. The idset string SHALL represent a set of execution target ranks. A given execution target rank MAY appear in multiple property mappings. Property names SHALL be valid UTF-8, and MUST NOT contain the following illegal characters:
! & ' " ^ ` | ( )
Additionally, the
@
character is reserved for scheduler specific property use. In this case, the literal property SHALL still apply to the defined execution target ranks, but the scheduler MAY use the suffix after@
to apply the property to children resources of the execution target or for another scheduler specific purpose. For example, the propertyamd-mi50@gpu
SHALL apply to the defined execution target ranks, but a scheduler MAY use thegpu
suffix to perform scheduling optimization for gpus of the corresponding ranks. This MAY result in bothamd-mi50@gpu
andamd-mi50
being valid properties for resources in the instance.- starttime
The value of the
starttime
key, if present, SHALL encode the start time at which the resource set is valid. The value SHALL be the number of seconds elapsed since the Unix Epoch (1970-01-01 00:00:00 UTC) with optional microsecond precision. Ifstarttime
is unset, then the resource set has no specified start time and is valid beginning at any time up toexpiration
.- expiration
The value of the
expiration
key, if present, SHALL encode the end or expiration time of the resource set in seconds since the Unix Epoch, with optional microsecond precision. Ifstarttime
is also set,expiration
MUST be greater thanstarttime
. Ifexpiration
is unset, the resource set has no specified end time and is valid beginning atstarttime
without expiration.
Scheduling
The scheduling
key allows RFC4-compliant schedulers to serialize any subset
of graph resource data into its value and later deserialize this value with
no data loss. The scheduling
key contains a dictionary with a single key: graph
.
Other keys are reserved for future extensions.
The graph
key SHALL conform to the latest version of the JSON Graph Format (JGF).
Thus, its value is a dictionary with two keys, nodes
and edges
,
that encode the resource vertices and edges as described in RFC 4.
Graph Vertices
The value of the nodes
key defined in JGF is a strict list
of graph vertices. Each list member is a vertex that contains
two keys: id
and metadata
.
The id
key SHALL contain a unique string ID for the containing vertex.
The value of the metadata
key is a dictionary that encodes
the resource pool data described in RFC 4.
Thus, this dictionary SHALL contain the following
keys to describe the base data of a resource pool:
type
uuid
basename
name
id
properties
size
unit
It MAY contain other OPTIONAL resource vertex data.
Graph Edges
The value of the edges
key defined in JGF SHALL be a strict list of graph edges.
Each list element SHALL be an edge that connects two graph vertices and
contains the source
, target
and metadata
keys.
The value of the source
key SHALL contain the ID of the source graph vertex.
The value of the target
key SHALL contain the ID of the target graph vertex.
The value of this metadata
key SHALL contain a dictionary that encodes
the resource subsystem and relationship data for the containing edge
as described in RFC 4. It SHALL contain two keys:
- subsystem
The value of the
subsystem
key SHALL be a string that indicates a specific subsystem to which this edge belongs. (e.g., containment or power subsystems).- relationship
The value of the
relationship
key SHALL be a string that indicates a relationship between the source and target resource vertices. The relationship SHALL only be defined within the subsystem defined above. (e.g., “contains” relationship within the “containment” subsystem).
Attributes
The purpose of the attributes
key is to provide optional
information on this R document. The attributes
key SHALL
be a dictionary of one key: system
.
Other keys are reserved for future extensions.
system
Attributes in the system
dictionary provide additional system
information that have affected the creation of this R document.
All of the system attributes are optional.
A common system attribute is:
scheduler
The value of the scheduler
is a free-from dictionary that
may provide the information specific to the scheduler used
to produce this document. For example, a scheduler that
manages multiple job queues may add queue=batch
to indicate that this resource set was allocated from within
its batch
queue.
Example R
The following is an example of a version 1 resource specification.
The example below indicates a resource set with the ranks 19
through 22. These ranks correspond to the nodes node186 through
node189. Each of the nodes contains 48 cores (0-47) and 8 gpus (0-7).
The startime
and expiration
indicate the resources were valid
for about 30 minutes on February 16, 2023.
{
"version": 1,
"execution": {
"R_lite": [
{
"rank": "19-22",
"children": {
"core": "0-47",
"gpu": "0-7"
}
}
],
"nodelist": [
"node[186-189]"
],
"starttime": 1676560542,
"expiration": 1676562342
}
}
References
JSON Graph Format Github, Anthony Bargnesi, et al., Visited Jan. 2019