flux-config-rabbit(5)
DESCRIPTION
Flux system instance configuration is needed to enable Flux to interact with HPE rabbit software. No configuration is necessary in Flux sub-instances.
COMPONENTS
Jobtap Plugin
In order for a Flux system instance to be able to allocate rabbit storage, the flux-jobtap-dws(1) plugin must be loaded in the leader broker of the Flux system instance. The plugin can be loaded in a config file like so:
[job-manager]
plugins = [
{ load = "dws-jobtap.so", conf = { epilog-timeout = 0.0 }}
]
Systemd Service
Also, the flux-coral2-dws systemd service must be started
on the same node as the rank 0 broker of the system instance.
The flux user must have a kubeconfig file in its home directory granting
it read and write access to, at a minimum, Storages, Workflows,
Servers, and Computes resources (all of which are defined by
dataworkflowservices). There are instructions for how to grant Flux
the minimum permissions necessary by setting up role-based access control
here.
Fluxion Configuration
The Fluxion scheduler must be configured to recognize rabbit
resources. This can be done by generating a file describing the rabbit layout
for the cluster and then running flux dws2jgf like so:
flux rabbitmapping > /tmp/rabbitmapping.json
flux dws2jgf [--no-validate] --from-config /etc/flux/system/conf.d/resource.toml --only-sched /tmp/rabbitmapping.json
The output (which may be large) must be saved to a file and pointed to with the
resource.scheduling config key (see
here).
In order to facilitate Fluxion restart when using this new JGF
(as it is called), Fluxion must be configured to use a match-format
of rv1 instead of the otherwise recommended default of rv1_nosched.
For example, in a config file:
[sched-fluxion-resource]
match-format = "rv1"
Prolog/Epilog Scripts
Prolog and epilog scripts, provided by the flux-coral2 package, automatically run during those phases of a job. The scripts stop and start the nnf-clientmount service, respectively. The prolog script also holds the job in that state until the rabbit file systems have been mounted.
Shell Plugin
A dws_environment shell plugin, responsible for managing the rabbit
environment presented to applications, is loaded automatically for each job.
KEYS
The rabbit config table captures site-general policies and options for
Flux's interactions with the rabbits. The following keys are valid:
- mapping (string)
(required) Path to rabbitmapping file for the cluster, as generated by flux-rabbitmapping(1).
- kubeconfig (string)
(optional) Path to kubeconfig file for Flux to use, ideally with restricted permissions. This can be left undefined if the file is placed at the path
~flux/.kube/config(assuming thefluxuser is the instance owner).- tc_timeout (integer)
(optional) Time in seconds to tolerate a workflow stuck in TransientCondition state before killing the associated job. Defaults to 10 seconds.
- teardown_after (float)
(optional) Maximum time for a workflow to be in either
PostRunorDataOutstate before it is moved to Teardown. If unset or negative, allow the workflow to stay in those states indefinitely. See also theepilog-timeoutoption to flux-jobtap-dws(1), which is similar but takes more drastic action. It may be useful to set theteardown_aftertimeout to something smaller than theepilog-timeout, to give the NNF software time to clean up before theepilog-timeouttakes effect.- setup_timeout (float)
(optional) Maximum time for a workflow to be in the
Setupstate before the job is canceled. If unset or negative, do not set a timer.- prerun_timeout (float)
(optional) Maximum time for a workflow to be in the
PreRunstate before the job is canceled. If unset or negative, do not set a timer.- postrun_timeout (float)
(optional) Maximum time for a workflow to be in the
PostRunstate before it is moved to Teardown. If unset or negative, do not set a timer. If bothpostrun_timeoutandteardown_afterare set,postrun_timeoutshould be set to a smaller number.- drain_compute_nodes (boolean)
(optional) Whether to automatically drain compute nodes that lose PCIe connection with their rabbit. Defaults to
true.- save_datamovements (integer)
(optional) Number of
nnfdatamovementresources to save to jobs' KVS, may be useful for debugging but too many may degrade performance. Defaults to 0.- restrict_persistent_creation (boolean)
(optional) Restrict the creation of persistent file systems to the instance owner (in most cases the
fluxuser).- prolog_timeout (FSD)
(optional) Maximum time in Flux Standard Duration format to wait for the dws_environment event in the prolog script.
- policy.maximums (table)
(optional) The maximum filesystem capacity per node, in GiB, that users may request. Leave undefined for no limit. See below for an example.
- presets (table)
(optional) Defines preset #DW strings. May potentially save users time and energy, allowing them to run, for instance,
flux alloc -N1 -S dw=NAMErather thanflux alloc -N1 -S "dw=#DW jobdw ..."See below for an example.
EXAMPLE
[rabbit]
kubeconfig = "/var/flux/.kube/config"
tc_timeout = 600
drain_compute_nodes = true
save_datamovements = 5
restrict_persistent_creation = true
teardown_after = 4800.0
# maximum filesystem capacity per node, in GiB
[rabbit.policy.maximums]
xfs = 1024
gfs2 = 2048
raw = 4096
lustre = 1024
# defines preset #DW strings
[rabbit.presets]
small_xfs = "#DW jobdw type=xfs capacity=100GiB name=smallxfs"
large_lustre = "#DW jobdw type=lustre capacity=50TiB name=largelustre"
[job-manager]
plugins = [
{ load = "dws-jobtap.so", conf = { epilog-timeout = 5400.0 }}
]
[sched-fluxion-resource]
match-format = "rv1"
SEE ALSO
flux-dws2jgf(1), flux-rabbitmapping(1), flux-config(5), flux-config-job-manager(5)
Flux CORAL2 Documentation: https://flux-framework.readthedocs.io/projects/flux-coral2