flux-jobtap-plugins(7)
DESCRIPTION
The jobtap interface supports loading of builtin and external plugins into the job manager broker module. These plugins can be used to assign job priorities using algorithms other than the default, assign job dependencies, aid in debugging of the flow of job states, or generically extend the functionality of the job manager.
Jobtap plugins are defined using the Flux standard plugin format. Therefore
a jobtap plugin should export the single symbol: flux_plugin_init(),
from which calls to flux_plugin_add_handler(3) should be used to
register functions which will be called for the callback topic strings
described in the JOB CALLBACK TOPICS section below.
Each callback function uses the Flux standard plugin callback form, e.g.:
int callback (flux_plugin_t *p,
const char *topic,
flux_plugin_arg_t *args,
void *arg);
where p is the handle for the current jobtap plugin, topic is
the topic string for the currently invoked callback, args contains
a set of plugin arguments which may be unpacked with the
flux_plugin_arg_unpack(3) call, and arg is any opaque argument
passed along when registering the handler.
Multiple plugins may be loaded in the job-manager simultaneously. In this case, all matching handlers are called in all loaded plugins in the order in which they were loaded. For more information about loading plugins see the flux-conf-job-manager(7) or flux-jobtap(1) manpage.
JOBTAP PLUGIN NAMES
Jobtap plugins are loaded into the job-manager and referenced in the
output of flux jobtap list by file name. If a plugin is loaded by
a fully qualified path, the plugin name is shortened to the basename,
such that all dynamically loaded plugins have names such as
plugin-name.so.
Builtin plugins, on the other hand, are named with a leading .,
and are hidden in flux jobtap list, do not match the
glob(7) * or "all" keyword, etc. (similar to hidden
filesystem files). To list builtin plugins, use the -a, --all
option to flux jobtap list, and to remove them use the name
explicitly or include the leading . in any pattern.
A plugin may optionally assign a name with flux_plugin_set_name(3),
however this name is not displayed in flux jobtap list or used in
matching. The internal plugin name is only used as part of the service
name generated by flux_jobtap_service_register(), i.e. the service
name will be job-manager.<name>.<method>. If a plugin does not
set a name with flux_plugin_set_name(3), then the basename of the
plugin file will be used with the trailing .so removed.
JOBTAP PLUGIN ARGUMENTS
For job-specific callbacks, all job data is passed to the plugin via
the flux_plugin_arg_t *args, and return data is sent back to the
job manager via the same args. Incoming arguments may be unpacked
using flux_plugin_arg_unpack(3), e.g.:
rc = flux_plugin_arg_unpack (args, FLUX_PLUGIN_ARG_IN,
"{s{s:o}, s:I}",
"jobspec", "resources", &resources,
"id", &id);
will unpack the resources section of jobspec and the jobid into
resources and id respectively.
The full list of available args includes the following:
name |
type |
description |
|---|---|---|
jobspec |
o |
jobspec with environment redacted |
R |
o |
R with scheduling key redacted (RUN state or later) |
id |
I |
jobid |
state |
i |
current job state |
prev_state |
i |
previous state ( |
userid |
i |
userid |
urgency |
i |
current urgency |
priority |
I |
current priority |
t_submit |
f |
submit timestamp in floating point seconds |
entry |
o |
posted eventlog entry, including context |
end_event |
o |
copy of event that cause transition to CLEANUP, if available |
Return arguments can be packed using the FLUX_PLUGIN_ARG_OUT and
optionally FLUX_PLUGIN_ARG_REPLACE flags. For example to return
a priority:
rc = flux_plugin_arg_pack (args, FLUX_PLUGIN_ARG_OUT,
"{s:I}",
"priority", (int64_t) priority);
While a job is pending, jobtap plugin callbacks may also add job
annotations by returning a value for the annotations key:
flux_plugin_arg_pack (args, FLUX_PLUGIN_ARG_OUT,
"{s:{s:s}}",
"annotations", "test", value);
JOB CALLBACK TOPICS
The following job callback "topic strings" are currently provided by the jobtap interface:
- job.create
The
job.createtopic notifies a jobtap plugin about a newly introduced job. This call may be made in three different situations:on job submission
when the job manager is restarted and has reloaded a job from the KVS
when a new jobtap plugin is loaded
In case 1 above, the job state will always be
FLUX_JOB_STATE_NEW, while jobs in cases 2 and 3 can be in any state exceptFLUX_JOB_STATE_INACTIVE.In case 1, the job is not yet validated. If necessary,
job.createmay reject the job in the same manner asjob.validateusing flux_jobtap_reject_job(3) and a negative return code from the callback.In cases 2 and 3, fatal errors may be handled by raising a fatal job exception, as usual.
It is safe to post events from a
job.createhandler in all cases.Note
In case 3
job.createis called for active jobs in unspecified order. If a plugin requires an ordering guarantee, the plugin should callflux_jobtap_set_load_sort_order(3)from theflux_plugin_init()callback. This function takes amodeparameter of eitherstate, to sort jobs by state (then jobid), or-stateto sort by reverse state (then jobid). For exampleflux_jobtap_set_load_sort_order (p, "state");
will ensure that
job.createandjob.neware called on jobs in PRIORITY first, then DEPEND, then SCHED, and so on.- job.destroy
The
job.destroytopic is called after a job is rejected or becomes inactive.- job.validate
The
job.validatetopic allows a plugin to reject a job before it is introduced to the job manager. A rejected job will result in a job submission error in the submitting client, and any job data in the KVS will be purged. No further callbacks exceptjob.destroywill be made for rejected jobs. Note: If a job is not rejected, then thejob.newcallback will be invoked immediately afterjob.validate. This allows limits or other checks to be implemented in thejob.validatecallback, but accounting for those limits should be confined to thejob.newcallback, sincejob.newmay also be called during job-manager restart or plugin reload.- job.dependency.*
The
job.dependency.*topic allows a dependency plugin to notify the job-manager that it handles a given dependency _scheme_. The job-manager will scan theattributes.system.dependenciesarray, if provided, and issue ajob.dependency.SCHEMEcallback for each listed dependency. If no plugin has registered forSCHEME, then the job is rejected. The plugin should then callflux_jobtap_dependency_add(3)to add a new named dependency to the job (if necessary). Jobs with dependencies will remain in theDEPENDstate until all dependencies are removed with a corresponding call toflux_jobtap_dependency_remove(3). Seejob.state.dependbelow for more information about dependencies. If there is an error in the dependency specification, the job may be rejected with flux_jobtap_reject_job(3) and a negative return code from the callback.- job.new
The
job.newtopic announces a new valid job. It may be called in the same three situations listed forjob.create,- job.state.*
The
job.state.*callbacks are made just after a job state transition. The callback is made after the state has been published to the job's eventlog, but before any action has been taken on that state (since the action could involve immediately transitioning to a new state)- job.event.*
The
job.event.*callbacks are only made for plugins that have explicitly subscribed to a job withflux_jobtap_job_subscribe(). In this case, all job events result in this callback being invoked on all subscribed plugins. This may be useful for plugins to get notification of events that do not necessarily result in a state transition, e.g. thestartevent or a non-fatalexception.- job.state.depend
The callback for
FLUX_JOB_STATE_DEPENDis the final place from which a plugin may add dependencies to a job. Dependencies are added via theflux_jobtap_dependency_add()function. This function allows a named dependency to be attached to a job. Jobs with dependencies will remain in theDEPENDstate until all dependencies are removed with a corresponding call theflux_jobtap_dependency_remove(). A dependency may only be used once. A second call toflux_jobtap_dependency_add()with the same dependency description will returnEEXIST, even if the dependency was subsequently removed. (This allows idempotent operation of plugin-managed dependencies for job-manager or plugin restart).- job.state.priority
The callback for
FLUX_JOB_STATE_PRIORITYis special, in that a plugin must return a priority at the end of the callback (if the plugin is a priority-managing plugin). If the job priority is not available, the plugin should useflux_jobtap_priority_unavail()to indicate that the priority cannot be set. Jobs that do not have a priority due to unavailable priority or when no current priority plugin is loaded will remain in the PRIORITY state until a priority is assigned. Therefore, a plugin should arrange for the priority to be set asynchronously usingflux_jobtap_reprioritize_job(). See the PRIORITY section for more detailed information about plugin management of job priority.- job.state.sched
In the callback for
FLUX_JOB_STATE_SCHEDa plugin may setRin output args. In this case, if anRis not already assigned, then this will forceRfor the current job and bypass the scheduler.- job.priority.get
The job manager calls the
job.priority.gettopic whenever it wants to update the job priority of a single job. The plugin should return a priority immediately, but if one is not available when a job is in the PRIORITY state, the plugin may useflux_jobtap_priority_unavail()to indicate the priority is not available. Returning an unavailable priority in the SCHED state is an error and it will be logged, but otherwise ignored. A call ofjob.priority.getcan be requested for all jobs by callingflux_jobtap_reprioritize_all(). See the PRIORITY section for more information about plugin management of job priority.- job.inactive-add
The job has transitioned to INACTIVE state and has been added to the inactive hash.
- job.inactive-remove
The job has been purged from the inactive hash.
- job.update
The job has been updated with an RFC 21
jobspec-updateevent.
CONFIGURATION CALLBACK TOPIC
Jobtap plugins may register a conf.update callback. The current/proposed
configuration object is present in the input arguments under the conf key.
The callback is invoked in the following circumstances:
When the plugin is first loaded. If the callback returns failure, the plugin load fails.
Each time the configuration changes. If the callback returns failure,
flux config reloadfails.
The callback should return 0 on success, and -1 on failure. On failure,
it may optionally set a human readable error string in the errstr output
argument. The flux_jobtap_error() convenience function may be useful here.
JOB UPDATE CALLBACKS
The job manager allows updates of select job attributes through a
plugin-based scheme. Plugins may register a callback topic matching
job.update.KEY, where KEY is a period-delimited jobspec attribute,
e.g. job.update.attributes.system.duration. The requested updates are
passed as an additional argument to the plugin in the updates key.
The purpose of job.update.* callbacks to enable plugins to allow or
deny the update of specific job attributes. Updates are denied by default
unless a callback exists for the updated attribute and the plugin returns 0
from the callback. Plugins deny an attribute update by returning -1 from
the callback, and may optionally set an error message to return to the
user with flux_jobtap_error(3).
After all updates in a request are allowed by plugins, then the updated
jobspec is passed through the job.validate plugin stack to ensure the
result is valid. Plugins can note that an update is already validated by
setting a validated flag in the FLUX_PLUGIN_OUT_ARGS. If all updated
attributes have this flag then this validation step is skipped. This can
be useful to allow an instance owner to update a job attribute beyond limits
for example.
Some updates may benefit from a job feasibility check before the updates
are applied. This prevents a user from inadvertently causing a job that
was feasible at the time of submission to become infeasible through an
update. Because the update plugin is in the best position to determine
if a feasibility check should be completed for an update, feasibility
checks are only done if a feasibility flag in FLUX_PLUGIN_OUT_ARGS
is set. If any plugin for a set of updates requires a feasibility check,
then feasibility of the updated jobspec as a whole will be checked. If
the updated job is determined to be infeasible, then the update is aborted
and an error returned to the user.
The update of one attribute may require modification of other attributes.
For example, an update of attributes.system.queue may require
modification of attributes.system.constraints to apply the constraints
of the new queue. To support this use case, plugins may additionally push
an updates object onto FLUX_PLUGIN_OUT_ARGS. This object has the
same form as the jobspec-update context defined in RFC 21. For example,
if a plugin wishes to update attributes.system.foo to 1, it can set
{"updates": {"attributes.system.foo": 1}}
in the FLUX_PLUGIN_OUT_ARGS before returning. Updates are applied by
updating the requested updates, so this method could overwrite other user-
requested updates and caution is advised.
PLUGIN CALLBACK TOPICS
- plugin.query
The job manager calls the
plugin.querycallback topic to give a plugin the opportunity to provide extra data in response to ajobtap-queryrequest (as used by theflux jobtap query PLUGINcommand). This can be used by a plugin to export internal plugin state for inspection by an admin or user by placing the data in the output arguments of the callback, e.g.:flux_plugin_arg_pack (p, FLUX_PLUGIN_ARG_OUT, "{s:O}" "data", internal_data);
PRIORITY
Custom assignment of job priority values is one of the core
features supported by the jobtap plugin interface. A builtin
.priority-default plugin is always loaded in the job-manager to
ensure that jobs move past the PRIORITY state when no other priority
plugin is loaded. The default plugin simply assigns the priority to
the same value as the current job urgency.
When loading a new jobtap plugin that assigns priority, it is important
to be cognizant of the fact that the .priority-default plugin may
still be loaded. This will result in the priority set in the return
arguments to always be initialized to the job urgency. However, since
plugin job.state.priority and job.priority.get callbacks are
run in order, any subsequently loaded plugin that assigns a priority
will overwrite the returned default priority and thus the last
loaded priority plugin will be active.
To ensure the default priority is always overridden priority plugins
should therefore make sure to always set a priority, or use
flux_jobtap_priority_unavail() if the priority is not available,
in any callback in which a priority is expected to be returned, i.e.
job.state.priority and job.priority.get.
To fully ensure priority plugins do not conflict, the builtin priority plugin may explicitly be removed with
flux jobtap remove .priority-default
or via configuration (See flux-conf-job-manager(7))
[job-manager]
plugins = [
{ remove = ".priority-default",
load = "complex-priority.so"
},
]
PROLOG AND EPILOG ACTIONS
Plugins that need to perform asynchronous tasks for jobs after an alloc
event but before the job is running, or after a finish event but before
resources are freed to the scheduler can make use of job manager prolog or
epilog actions.
Prolog and epilog actions are delineated by the following functions:
int flux_jobtap_prolog_start (flux_plugin_t *p,
const char *description);
int flux_jobtap_prolog_finish (flux_plugin_t *p,
flux_jobid_t id,
const char *description,
int status);
int flux_jobtap_epilog_start (flux_plugin_t *p,
const char *description);
int flux_jobtap_epilog_finish (flux_plugin_t *p,
flux_jobid_t id,
const char *description,
int status);
To initiate a prolog action, a plugin should call the function
flux_jobtap_prolog_start(). This will block the job from starting
even after resources have been assigned until a corresponding call to
flux_jobtap_prolog_finish() has been called. While the status of the
prolog action is passed to flux_jobtap_prolog_finish() so it can be
captured in the eventlog, the action itself is responsible for raising
a job exception or taking other action on failure. That is, a non-zero
prolog finish status does not cause any automated behavior on the part of
the job manager. Similarly, the prolog description is used for
informational purposes only, so that multiple actions in an eventlog
may be differentiated.
Similarly, an epilog action is initiated with flux_jobtap_epilog_start(),
and prevents resources from being released to the scheduler until a
corresponding call to flux_jobtap_epilog_finish(). The same caveats
described for prolog actions regarding description and completion status
of epilog actions apply.
The flux_jobtap_prolog_start() function may be initiated anytime
before the start request is made to the execution system, though most
often from the job.state.run or job.event.alloc callbacks,
since this is the point at which a job has been allocated resources.
(Note: plugins will only receive the job.event.* callbacks for
jobs to which they have subscribed with a call to
flux_jobtap_job_subscribe()). A prolog action cannot be started
after a job enters the CLEANUP state.
The flux_jobtap_epilog_start() function may only be called after a
job is in the CLEANUP state, but before the free request has been
sent to the scheduler, for example from the job.state.cleanup
or job.event.finish callbacks.
If flux_jobtap_prolog_start(), flux_jobtap_prolog_finish(),
flux_jobtap_epilog_start() or flux_jobtap_epilog_finish() are
called for a job in an invalid state, these function will return -1 with
errno set to EINVAL.
Multiple prolog or epilog actions can be active at the same time.
CALLING OTHER PLUGINS
Plugins may invoke custom callbacks in other plugins using
flux_jobtap_call(). Note that topic strings starting with job.
are reserved for use by the job-manager and will cause this function
to fail immediately with errno set to EINVAL:
int flux_jobtap_call (flux_plugin_t *p,
flux_jobid_t id,
const char *topic,
flux_plugin_arg_t *args)
Much of the jobtap API assumes a current job, so a job id argument
is required. Note, that args is passed unmodified when invoking
callbacks for topic, so expected data listed in JOBTAP PLUGIN ARGUMENTS for
job id may not be present in newly created args unless manually
added by the caller. However, when invoked from another jobtap callback,
the existing args object along with FLUX_JOBTAP_CURRENT_JOB may be
used in flux_jobtap_call(), in which case args will still contain
the expected job arguments. For example, the following will call all
plugins registered for the topic custom.topic when the callback
is called
int callback (flux_plugin_t *p,
const char *topic,
flux_plugin_arg_t *args,
void *arg)
{
return flux_jobtap_call (p,
FLUX_JOBTAP_CURRENT_JOB,
"custom.topic",
args);
}
RESOURCES
Flux: http://flux-framework.org
Flux RFC: https://flux-framework.readthedocs.io/projects/flux-rfc
Issue Tracker: https://github.com/flux-framework/flux-core/issues