flux-jobtap-plugins(7)
DESCRIPTION
The jobtap interface supports loading of builtin and external plugins into the job manager broker module. These plugins can be used to assign job priorities using algorithms other than the default, assign job dependencies, aid in debugging of the flow of job states, or generically extend the functionality of the job manager.
Jobtap plugins are defined using the Flux standard plugin format. Therefore
a jobtap plugin should export the single symbol: flux_plugin_init()
,
from which calls to flux_plugin_add_handler(3)
should be used to
register functions which will be called for the callback topic strings
described in the JOB CALLBACK TOPICS section below.
Each callback function uses the Flux standard plugin callback form, e.g.:
int callback (flux_plugin_t *p,
const char *topic,
flux_plugin_arg_t *args,
void *arg);
where p
is the handle for the current jobtap plugin, topic
is
the topic string for the currently invoked callback, args
contains
a set of plugin arguments which may be unpacked with the
flux_plugin_arg_unpack(3)
call, and arg
is any opaque argument
passed along when registering the handler.
Multiple plugins may be loaded in the job-manager simultaneously. In this case, all matching handlers are called in all loaded plugins in the order in which they were loaded. For more information about loading plugins see the flux-conf-job-manager(7) or flux-jobtap(1) manpage.
JOBTAP PLUGIN NAMES
Jobtap plugins are loaded into the job-manager and referenced in the
output of flux jobtap list
by file name. If a plugin is loaded by
a fully qualified path, the plugin name is shortened to the basename,
such that all dynamically loaded plugins have names such as
plugin-name.so
.
Builtin plugins, on the other hand, are named with a leading .
,
and are hidden in flux jobtap list
, do not match the
glob(7) *
or "all" keyword, etc. (similar to hidden
filesystem files). To list builtin plugins, use the -a, --all
option to flux jobtap list
, and to remove them use the name
explicitly or include the leading .
in any pattern.
A plugin may optionally assign a name with flux_plugin_set_name(3)
,
however this name is not displayed in flux jobtap list
or used in
matching. The internal plugin name is only used as part of the service
name generated by flux_jobtap_service_register()
, i.e. the service
name will be job-manager.<name>.<method>
. If a plugin does not
set a name with flux_plugin_set_name(3)
, then the basename of the
plugin file will be used with the trailing .so
removed.
JOBTAP PLUGIN ARGUMENTS
For job-specific callbacks, all job data is passed to the plugin via
the flux_plugin_arg_t *args
, and return data is sent back to the
job manager via the same args
. Incoming arguments may be unpacked
using flux_plugin_arg_unpack(3)
, e.g.:
rc = flux_plugin_arg_unpack (args, FLUX_PLUGIN_ARG_IN,
"{s{s:o}, s:I}",
"jobspec", "resources", &resources,
"id", &id);
will unpack the resources
section of jobspec and the jobid into
resources
and id
respectively.
The full list of available args includes the following:
name |
type |
description |
---|---|---|
jobspec |
o |
jobspec with environment redacted |
R |
o |
R with scheduling key redacted (RUN state or later) |
id |
I |
jobid |
state |
i |
current job state |
prev_state |
i |
previous state ( |
userid |
i |
userid |
urgency |
i |
current urgency |
priority |
I |
current priority |
t_submit |
f |
submit timestamp in floating point seconds |
entry |
o |
posted eventlog entry, including context |
Return arguments can be packed using the FLUX_PLUGIN_ARG_OUT
and
optionally FLUX_PLUGIN_ARG_REPLACE
flags. For example to return
a priority:
rc = flux_plugin_arg_pack (args, FLUX_PLUGIN_ARG_OUT,
"{s:I}",
"priority", (int64_t) priority);
While a job is pending, jobtap plugin callbacks may also add job
annotations by returning a value for the annotations
key:
flux_plugin_arg_pack (args, FLUX_PLUGIN_ARG_OUT,
"{s:{s:s}}",
"annotations", "test", value);
JOB CALLBACK TOPICS
The following job callback "topic strings" are currently provided by the jobtap interface:
- job.create
The
job.create
topic notifies a jobtap plugin about a newly introduced job. This call may be made in three different situations:on job submission
when the job manager is restarted and has reloaded a job from the KVS
when a new jobtap plugin is loaded
In case 1 above, the job state will always be
FLUX_JOB_STATE_NEW
, while jobs in cases 2 and 3 can be in any state exceptFLUX_JOB_STATE_INACTIVE
.In case 1, the job is not yet validated. If necessary,
job.create
may reject the job in the same manner asjob.validate
using flux_jobtap_reject_job(3) and a negative return code from the callback.In cases 2 and 3, fatal errors may be handled by raising a fatal job exception, as usual.
It is safe to post events from a
job.create
handler in all cases.- job.destroy
The
job.destroy
topic is called after a job is rejected or becomes inactive.- job.validate
The
job.validate
topic allows a plugin to reject a job before it is introduced to the job manager. A rejected job will result in a job submission error in the submitting client, and any job data in the KVS will be purged. No further callbacks exceptjob.destroy
will be made for rejected jobs. Note: If a job is not rejected, then thejob.new
callback will be invoked immediately afterjob.validate
. This allows limits or other checks to be implemented in thejob.validate
callback, but accounting for those limits should be confined to thejob.new
callback, sincejob.new
may also be called during job-manager restart or plugin reload.- job.dependency.*
The
job.dependency.*
topic allows a dependency plugin to notify the job-manager that it handles a given dependency _scheme_. The job-manager will scan theattributes.system.dependencies
array, if provided, and issue ajob.dependency.SCHEME
callback for each listed dependency. If no plugin has registered forSCHEME
, then the job is rejected. The plugin should then callflux_jobtap_dependency_add(3)
to add a new named dependency to the job (if necessary). Jobs with dependencies will remain in theDEPEND
state until all dependencies are removed with a corresponding call toflux_jobtap_dependency_remove(3)
. Seejob.state.depend
below for more information about dependencies. If there is an error in the dependency specification, the job may be rejected with flux_jobtap_reject_job(3) and a negative return code from the callback.- job.new
The
job.new
topic announces a new valid job. It may be called in the same three situations listed forjob.create
,- job.state.*
The
job.state.*
callbacks are made just after a job state transition. The callback is made after the state has been published to the job's eventlog, but before any action has been taken on that state (since the action could involve immediately transitioning to a new state)- job.event.*
The
job.event.*
callbacks are only made for plugins that have explicitly subscribed to a job withflux_jobtap_job_subscribe()
. In this case, all job events result in this callback being invoked on all subscribed plugins. This may be useful for plugins to get notification of events that do not necessarily result in a state transition, e.g. thestart
event or a non-fatalexception
.- job.state.depend
The callback for
FLUX_JOB_STATE_DEPEND
is the final place from which a plugin may add dependencies to a job. Dependencies are added via theflux_jobtap_dependency_add()
function. This function allows a named dependency to be attached to a job. Jobs with dependencies will remain in theDEPEND
state until all dependencies are removed with a corresponding call theflux_jobtap_dependency_remove()
. A dependency may only be used once. A second call toflux_jobtap_dependency_add()
with the same dependency description will returnEEXIST
, even if the dependency was subsequently removed. (This allows idempotent operation of plugin-managed dependencies for job-manager or plugin restart).- job.state.priority
The callback for
FLUX_JOB_STATE_PRIORITY
is special, in that a plugin must return a priority at the end of the callback (if the plugin is a priority-managing plugin). If the job priority is not available, the plugin should useflux_jobtap_priority_unavail()
to indicate that the priority cannot be set. Jobs that do not have a priority due to unavailable priority or when no current priority plugin is loaded will remain in the PRIORITY state until a priority is assigned. Therefore, a plugin should arrange for the priority to be set asynchronously usingflux_jobtap_reprioritize_job()
. See the PRIORITY section for more detailed information about plugin management of job priority.- job.state.sched
In the callback for
FLUX_JOB_STATE_SCHED
a plugin may setR
in output args. In this case, if anR
is not already assigned, then this will forceR
for the current job and bypass the scheduler.- job.priority.get
The job manager calls the
job.priority.get
topic whenever it wants to update the job priority of a single job. The plugin should return a priority immediately, but if one is not available when a job is in the PRIORITY state, the plugin may useflux_jobtap_priority_unavail()
to indicate the priority is not available. Returning an unavailable priority in the SCHED state is an error and it will be logged, but otherwise ignored. A call ofjob.priority.get
can be requested for all jobs by callingflux_jobtap_reprioritize_all()
. See the PRIORITY section for more information about plugin management of job priority.- job.inactive-add
The job has transitioned to INACTIVE state and has been added to the inactive hash.
- job.inactive-remove
The job has been purged from the inactive hash.
- job.update
The job has been updated with an RFC 21
jobspec-update
event.
CONFIGURATION CALLBACK TOPIC
Jobtap plugins may register a conf.update
callback. The current/proposed
configuration object is present in the input arguments under the conf
key.
The callback is invoked in the following circumstances:
When the plugin is first loaded. If the callback returns failure, the plugin load fails.
Each time the configuration changes. If the callback returns failure,
flux config reload
fails.
The callback should return 0 on success, and -1 on failure. On failure,
it may optionally set a human readable error string in the errstr
output
argument. The flux_jobtap_error()
convenience function may be useful here.
JOB UPDATE CALLBACKS
The job manager allows updates of select job attributes through a
plugin-based scheme. Plugins may register a callback topic matching
job.update.KEY
, where KEY
is a period-delimited jobspec attribute,
e.g. job.update.attributes.system.duration
. The requested updates are
passed as an additional argument to the plugin in the updates
key.
The purpose of job.update.*
callbacks to enable plugins to allow or
deny the update of specific job attributes. Updates are denied by default
unless a callback exists for the updated attribute and the plugin returns 0
from the callback. Plugins deny an attribute update by returning -1 from
the callback, and may optionally set an error message to return to the
user with flux_jobtap_error(3)
.
After all updates in a request are allowed by plugins, then the updated
jobspec is passed through the job.validate
plugin stack to ensure the
result is valid. Plugins can note that an update is already validated by
setting a validated
flag in the FLUX_PLUGIN_OUT_ARGS
. If all updated
attributes have this flag then this validation step is skipped. This can
be useful to allow an instance owner to update a job attribute beyond limits
for example.
Some updates may benefit from a job feasibility check before the updates
are applied. This prevents a user from inadvertently causing a job that
was feasible at the time of submission to become infeasible through an
update. Because the update plugin is in the best position to determine
if a feasibility check should be completed for an update, feasibility
checks are only done if a feasibility
flag in FLUX_PLUGIN_OUT_ARGS
is set. If any plugin for a set of updates requires a feasibility check,
then feasibility of the updated jobspec as a whole will be checked. If
the updated job is determined to be infeasible, then the update is aborted
and an error returned to the user.
The update of one attribute may require modification of other attributes.
For example, an update of attributes.system.queue
may require
modification of attributes.system.constraints
to apply the constraints
of the new queue. To support this use case, plugins may additionally push
an updates
object onto FLUX_PLUGIN_OUT_ARGS
. This object has the
same form as the jobspec-update
context defined in RFC 21. For example,
if a plugin wishes to update attributes.system.foo
to 1, it can set
{"updates": {"attributes.system.foo": 1}}
in the FLUX_PLUGIN_OUT_ARGS
before returning. Updates are applied by
updating the requested updates, so this method could overwrite other user-
requested updates and caution is advised.
PLUGIN CALLBACK TOPICS
- plugin.query
The job manager calls the
plugin.query
callback topic to give a plugin the opportunity to provide extra data in response to ajobtap-query
request (as used by theflux jobtap query PLUGIN
command). This can be used by a plugin to export internal plugin state for inspection by an admin or user by placing the data in the output arguments of the callback, e.g.:flux_plugin_arg_pack (p, FLUX_PLUGIN_ARG_OUT, "{s:O}" "data", internal_data);
PRIORITY
Custom assignment of job priority values is one of the core
features supported by the jobtap plugin interface. A builtin
.priority-default
plugin is always loaded in the job-manager to
ensure that jobs move past the PRIORITY state when no other priority
plugin is loaded. The default plugin simply assigns the priority to
the same value as the current job urgency.
When loading a new jobtap plugin that assigns priority, it is important
to be cognizant of the fact that the .priority-default
plugin may
still be loaded. This will result in the priority
set in the return
arguments to always be initialized to the job urgency. However, since
plugin job.state.priority
and job.priority.get
callbacks are
run in order, any subsequently loaded plugin that assigns a priority
will overwrite the returned default priority
and thus the last
loaded priority plugin will be active.
To ensure the default priority is always overridden priority plugins
should therefore make sure to always set a priority, or use
flux_jobtap_priority_unavail()
if the priority is not available,
in any callback in which a priority is expected to be returned, i.e.
job.state.priority
and job.priority.get
.
To fully ensure priority plugins do not conflict, the builtin priority plugin may explicitly be removed with
flux jobtap remove .priority-default
or via configuration (See flux-conf-job-manager(7))
[job-manager]
plugins = [
{ remove = ".priority-default",
load = "complex-priority.so"
},
]
PROLOG AND EPILOG ACTIONS
Plugins that need to perform asynchronous tasks for jobs after an alloc
event but before the job is running, or after a finish
event but before
resources are freed to the scheduler can make use of job manager prolog or
epilog actions.
Prolog and epilog actions are delineated by the following functions:
int flux_jobtap_prolog_start (flux_plugin_t *p,
const char *description);
int flux_jobtap_prolog_finish (flux_plugin_t *p,
flux_jobid_t id,
const char *description,
int status);
int flux_jobtap_epilog_start (flux_plugin_t *p,
const char *description);
int flux_jobtap_epilog_finish (flux_plugin_t *p,
flux_jobid_t id,
const char *description,
int status);
To initiate a prolog action, a plugin should call the function
flux_jobtap_prolog_start()
. This will block the job from starting
even after resources have been assigned until a corresponding call to
flux_jobtap_prolog_finish()
has been called. While the status of the
prolog action is passed to flux_jobtap_prolog_finish()
so it can be
captured in the eventlog, the action itself is responsible for raising
a job exception or taking other action on failure. That is, a non-zero
prolog finish status does not cause any automated behavior on the part of
the job manager. Similarly, the prolog description
is used for
informational purposes only, so that multiple actions in an eventlog
may be differentiated.
Similarly, an epilog action is initiated with flux_jobtap_epilog_start()
,
and prevents resources from being released to the scheduler until a
corresponding call to flux_jobtap_epilog_finish()
. The same caveats
described for prolog actions regarding description and completion status
of epilog actions apply.
The flux_jobtap_prolog_start()
function may be initiated anytime
before the start
request is made to the execution system, though most
often from the job.state.run
or job.event.alloc
callbacks,
since this is the point at which a job has been allocated resources.
(Note: plugins will only receive the job.event.*
callbacks for
jobs to which they have subscribed with a call to
flux_jobtap_job_subscribe()
). A prolog action cannot be started
after a job enters the CLEANUP state.
The flux_jobtap_epilog_start()
function may only be called after a
job is in the CLEANUP state, but before the free
request has been
sent to the scheduler, for example from the job.state.cleanup
or job.event.finish
callbacks.
If flux_jobtap_prolog_start()
, flux_jobtap_prolog_finish()
,
flux_jobtap_epilog_start()
or flux_jobtap_epilog_finish()
are
called for a job in an invalid state, these function will return -1 with
errno
set to EINVAL
.
Multiple prolog or epilog actions can be active at the same time.
RESOURCES
Flux: http://flux-framework.org
Flux RFC: https://flux-framework.readthedocs.io/projects/flux-rfc
Issue Tracker: https://github.com/flux-framework/flux-core/issues