flux-jobtap-plugins(7)

DESCRIPTION

The jobtap interface supports loading of builtin and external plugins into the job manager broker module. These plugins can be used to assign job priorities using algorithms other than the default, assign job dependencies, aid in debugging of the flow of job states, or generically extend the functionality of the job manager.

Jobtap plugins are defined using the Flux standard plugin format. Therefore a jobtap plugin should export the single symbol: flux_plugin_init(), from which calls to flux_plugin_add_handler(3) should be used to register functions which will be called for the callback topic strings described in the JOB CALLBACK TOPICS section below.

Each callback function uses the Flux standard plugin callback form, e.g.:

int callback (flux_plugin_t *p,
              const char *topic,
              flux_plugin_arg_t *args,
              void *arg);

where p is the handle for the current jobtap plugin, topic is the topic string for the currently invoked callback, args contains a set of plugin arguments which may be unpacked with the flux_plugin_arg_unpack(3) call, and arg is any opaque argument passed along when registering the handler.

Multiple plugins may be loaded in the job-manager simultaneously. In this case, all matching handlers are called in all loaded plugins in the order in which they were loaded. For more information about loading plugins see the flux-conf-job-manager(7) or flux-jobtap(1) manpage.

JOBTAP PLUGIN NAMES

Jobtap plugins are loaded into the job-manager and referenced in the output of flux jobtap list by file name. If a plugin is loaded by a fully qualified path, the plugin name is shortened to the basename, such that all dynamically loaded plugins have names such as plugin-name.so.

Builtin plugins, on the other hand, are named with a leading ., and are hidden in flux jobtap list, do not match the glob(7) * or "all" keyword, etc. (similar to hidden filesystem files). To list builtin plugins, use the -a, --all option to flux jobtap list, and to remove them use the name explicitly or include the leading . in any pattern.

A plugin may optionally assign a name with flux_plugin_set_name(3), however this name is not displayed in flux jobtap list or used in matching. The internal plugin name is only used as part of the service name generated by flux_jobtap_service_register(), i.e. the service name will be job-manager.<name>.<method>. If a plugin does not set a name with flux_plugin_set_name(3), then the basename of the plugin file will be used with the trailing .so removed.

JOBTAP PLUGIN ARGUMENTS

For job-specific callbacks, all job data is passed to the plugin via the flux_plugin_arg_t *args, and return data is sent back to the job manager via the same args. Incoming arguments may be unpacked using flux_plugin_arg_unpack(3), e.g.:

rc = flux_plugin_arg_unpack (args, FLUX_PLUGIN_ARG_IN,
                             "{s{s:o}, s:I}",
                             "jobspec", "resources", &resources,
                             "id", &id);

will unpack the resources section of jobspec and the jobid into resources and id respectively.

The full list of available args includes the following:

name	type	description
jobspec	o	jobspec with environment redacted
R	o	R with scheduling key redacted (RUN state or later)
id	I	jobid
state	i	current job state
prev_state	i	previous state (`job.state.*` callbacks)
userid	i	userid
urgency	i	current urgency
priority	I	current priority
t_submit	f	submit timestamp in floating point seconds
entry	o	posted eventlog entry, including context
end_event	o	copy of event that cause transition to CLEANUP, if available

Return arguments can be packed using the FLUX_PLUGIN_ARG_OUT and optionally FLUX_PLUGIN_ARG_REPLACE flags. For example to return a priority:

rc = flux_plugin_arg_pack (args, FLUX_PLUGIN_ARG_OUT,
                           "{s:I}",
                           "priority", (int64_t) priority);

While a job is pending, jobtap plugin callbacks may also add job annotations by returning a value for the annotations key:

flux_plugin_arg_pack (args, FLUX_PLUGIN_ARG_OUT,
                      "{s:{s:s}}",
                      "annotations", "test", value);

JOB CALLBACK TOPICS

The following job callback "topic strings" are currently provided by the jobtap interface:

job.create

The job.create topic notifies a jobtap plugin about a newly introduced job. This call may be made in three different situations:

on job submission

when the job manager is restarted and has reloaded a job from the KVS

when a new jobtap plugin is loaded

In case 1 above, the job state will always be FLUX_JOB_STATE_NEW, while jobs in cases 2 and 3 can be in any state except FLUX_JOB_STATE_INACTIVE.

In case 1, the job is not yet validated. If necessary, job.create may reject the job in the same manner as job.validate using flux_jobtap_reject_job(3) and a negative return code from the callback.

In cases 2 and 3, fatal errors may be handled by raising a fatal job exception, as usual.

It is safe to post events from a job.create handler in all cases.

Note

In case 3 job.create is called for active jobs in unspecified order. If a plugin requires an ordering guarantee, the plugin should call flux_jobtap_set_load_sort_order(3) from the flux_plugin_init() callback. This function takes a mode parameter of either state, to sort jobs by state (then jobid), or -state to sort by reverse state (then jobid). For example

flux_jobtap_set_load_sort_order (p, "state");

will ensure that job.create and job.new are called on jobs in PRIORITY first, then DEPEND, then SCHED, and so on.

job.destroy

The job.destroy topic is called after a job is rejected or becomes inactive.

job.validate

The job.validate topic allows a plugin to reject a job before it is introduced to the job manager. A rejected job will result in a job submission error in the submitting client, and any job data in the KVS will be purged. No further callbacks except job.destroy will be made for rejected jobs. Note: If a job is not rejected, then the job.new callback will be invoked immediately after job.validate. This allows limits or other checks to be implemented in the job.validate callback, but accounting for those limits should be confined to the job.new callback, since job.new may also be called during job-manager restart or plugin reload.

job.dependency.*

The job.dependency.* topic allows a dependency plugin to notify the job-manager that it handles a given dependency _scheme_. The job-manager will scan the attributes.system.dependencies array, if provided, and issue a job.dependency.SCHEME callback for each listed dependency. If no plugin has registered for SCHEME, then the job is rejected. The plugin should then call flux_jobtap_dependency_add(3) to add a new named dependency to the job (if necessary). Jobs with dependencies will remain in the DEPEND state until all dependencies are removed with a corresponding call to flux_jobtap_dependency_remove(3). See job.state.depend below for more information about dependencies. If there is an error in the dependency specification, the job may be rejected with flux_jobtap_reject_job(3) and a negative return code from the callback.

job.new

The job.new topic announces a new valid job. It may be called in the same three situations listed for job.create,

job.state.*

The job.state.* callbacks are made just after a job state transition. The callback is made after the state has been published to the job's eventlog, but before any action has been taken on that state (since the action could involve immediately transitioning to a new state)

job.event.*

The job.event.* callbacks are only made for plugins that have explicitly subscribed to a job with flux_jobtap_job_subscribe(). In this case, all job events result in this callback being invoked on all subscribed plugins. This may be useful for plugins to get notification of events that do not necessarily result in a state transition, e.g. the start event or a non-fatal exception.

job.state.depend

The callback for FLUX_JOB_STATE_DEPEND is the final place from which a plugin may add dependencies to a job. Dependencies are added via the flux_jobtap_dependency_add() function. This function allows a named dependency to be attached to a job. Jobs with dependencies will remain in the DEPEND state until all dependencies are removed with a corresponding call the flux_jobtap_dependency_remove(). A dependency may only be used once. A second call to flux_jobtap_dependency_add() with the same dependency description will return EEXIST, even if the dependency was subsequently removed. (This allows idempotent operation of plugin-managed dependencies for job-manager or plugin restart).

job.state.priority

The callback for FLUX_JOB_STATE_PRIORITY is special, in that a plugin must return a priority at the end of the callback (if the plugin is a priority-managing plugin). If the job priority is not available, the plugin should use flux_jobtap_priority_unavail() to indicate that the priority cannot be set. Jobs that do not have a priority due to unavailable priority or when no current priority plugin is loaded will remain in the PRIORITY state until a priority is assigned. Therefore, a plugin should arrange for the priority to be set asynchronously using flux_jobtap_reprioritize_job(). See the PRIORITY section for more detailed information about plugin management of job priority.

job.state.sched

In the callback for FLUX_JOB_STATE_SCHED a plugin may set R in output args. In this case, if an R is not already assigned, then this will force R for the current job and bypass the scheduler.

job.priority.get

The job manager calls the job.priority.get topic whenever it wants to update the job priority of a single job. The plugin should return a priority immediately, but if one is not available when a job is in the PRIORITY state, the plugin may use flux_jobtap_priority_unavail() to indicate the priority is not available. Returning an unavailable priority in the SCHED state is an error and it will be logged, but otherwise ignored. A call of job.priority.get can be requested for all jobs by calling flux_jobtap_reprioritize_all(). See the PRIORITY section for more information about plugin management of job priority.

job.inactive-add

The job has transitioned to INACTIVE state and has been added to the inactive hash.

job.inactive-remove

The job has been purged from the inactive hash.

job.update

The job has been updated with an RFC 21 jobspec-update event.

CONFIGURATION CALLBACK TOPIC

Jobtap plugins may register a conf.update callback. The current/proposed configuration object is present in the input arguments under the conf key. The callback is invoked in the following circumstances:

When the plugin is first loaded. If the callback returns failure, the plugin load fails.

Each time the configuration changes. If the callback returns failure, flux config reload fails.

The callback should return 0 on success, and -1 on failure. On failure, it may optionally set a human readable error string in the errstr output argument. The flux_jobtap_error() convenience function may be useful here.

JOB UPDATE CALLBACKS

The job manager allows updates of select job attributes through a plugin-based scheme. Plugins may register a callback topic matching job.update.KEY, where KEY is a period-delimited jobspec attribute, e.g. job.update.attributes.system.duration. The requested updates are passed as an additional argument to the plugin in the updates key.

The purpose of job.update.* callbacks to enable plugins to allow or deny the update of specific job attributes. Updates are denied by default unless a callback exists for the updated attribute and the plugin returns 0 from the callback. Plugins deny an attribute update by returning -1 from the callback, and may optionally set an error message to return to the user with flux_jobtap_error(3).

After all updates in a request are allowed by plugins, then the updated jobspec is passed through the job.validate plugin stack to ensure the result is valid. Plugins can note that an update is already validated by setting a validated flag in the FLUX_PLUGIN_OUT_ARGS. If all updated attributes have this flag then this validation step is skipped. This can be useful to allow an instance owner to update a job attribute beyond limits for example.

Some updates may benefit from a job feasibility check before the updates are applied. This prevents a user from inadvertently causing a job that was feasible at the time of submission to become infeasible through an update. Because the update plugin is in the best position to determine if a feasibility check should be completed for an update, feasibility checks are only done if a feasibility flag in FLUX_PLUGIN_OUT_ARGS is set. If any plugin for a set of updates requires a feasibility check, then feasibility of the updated jobspec as a whole will be checked. If the updated job is determined to be infeasible, then the update is aborted and an error returned to the user.

The update of one attribute may require modification of other attributes. For example, an update of attributes.system.queue may require modification of attributes.system.constraints to apply the constraints of the new queue. To support this use case, plugins may additionally push an updates object onto FLUX_PLUGIN_OUT_ARGS. This object has the same form as the jobspec-update context defined in RFC 21. For example, if a plugin wishes to update attributes.system.foo to 1, it can set

{"updates": {"attributes.system.foo": 1}}

in the FLUX_PLUGIN_OUT_ARGS before returning. Updates are applied by updating the requested updates, so this method could overwrite other user- requested updates and caution is advised.

PLUGIN CALLBACK TOPICS

plugin.query

The job manager calls the plugin.query callback topic to give a plugin the opportunity to provide extra data in response to a jobtap-query request (as used by the flux jobtap query PLUGIN command). This can be used by a plugin to export internal plugin state for inspection by an admin or user by placing the data in the output arguments of the callback, e.g.:

flux_plugin_arg_pack (p, FLUX_PLUGIN_ARG_OUT,
                      "{s:O}"
                      "data", internal_data);

PRIORITY

Custom assignment of job priority values is one of the core features supported by the jobtap plugin interface. A builtin .priority-default plugin is always loaded in the job-manager to ensure that jobs move past the PRIORITY state when no other priority plugin is loaded. The default plugin simply assigns the priority to the same value as the current job urgency.

When loading a new jobtap plugin that assigns priority, it is important to be cognizant of the fact that the .priority-default plugin may still be loaded. This will result in the priority set in the return arguments to always be initialized to the job urgency. However, since plugin job.state.priority and job.priority.get callbacks are run in order, any subsequently loaded plugin that assigns a priority will overwrite the returned default priority and thus the last loaded priority plugin will be active.

To ensure the default priority is always overridden priority plugins should therefore make sure to always set a priority, or use flux_jobtap_priority_unavail() if the priority is not available, in any callback in which a priority is expected to be returned, i.e. job.state.priority and job.priority.get.

To fully ensure priority plugins do not conflict, the builtin priority plugin may explicitly be removed with

flux jobtap remove .priority-default

or via configuration (See flux-conf-job-manager(7))

[job-manager]
plugins = [
  { remove = ".priority-default",
    load = "complex-priority.so"
  },
]

PROLOG AND EPILOG ACTIONS

Plugins that need to perform asynchronous tasks for jobs after an alloc event but before the job is running, or after a finish event but before resources are freed to the scheduler can make use of job manager prolog or epilog actions.

Prolog and epilog actions are delineated by the following functions:

int flux_jobtap_prolog_start (flux_plugin_t *p,
                              const char *description);

int flux_jobtap_prolog_finish (flux_plugin_t *p,
                               flux_jobid_t id,
                               const char *description,
                               int status);

int flux_jobtap_epilog_start (flux_plugin_t *p,
                              const char *description);

int flux_jobtap_epilog_finish (flux_plugin_t *p,
                               flux_jobid_t id,
                               const char *description,
                               int status);

To initiate a prolog action, a plugin should call the function flux_jobtap_prolog_start(). This will block the job from starting even after resources have been assigned until a corresponding call to flux_jobtap_prolog_finish() has been called. While the status of the prolog action is passed to flux_jobtap_prolog_finish() so it can be captured in the eventlog, the action itself is responsible for raising a job exception or taking other action on failure. That is, a non-zero prolog finish status does not cause any automated behavior on the part of the job manager. Similarly, the prolog description is used for informational purposes only, so that multiple actions in an eventlog may be differentiated.

Similarly, an epilog action is initiated with flux_jobtap_epilog_start(), and prevents resources from being released to the scheduler until a corresponding call to flux_jobtap_epilog_finish(). The same caveats described for prolog actions regarding description and completion status of epilog actions apply.

The flux_jobtap_prolog_start() function may be initiated anytime before the start request is made to the execution system, though most often from the job.state.run or job.event.alloc callbacks, since this is the point at which a job has been allocated resources. (Note: plugins will only receive the job.event.* callbacks for jobs to which they have subscribed with a call to flux_jobtap_job_subscribe()). A prolog action cannot be started after a job enters the CLEANUP state.

The flux_jobtap_epilog_start() function may only be called after a job is in the CLEANUP state, but before the free request has been sent to the scheduler, for example from the job.state.cleanup or job.event.finish callbacks.

If flux_jobtap_prolog_start(), flux_jobtap_prolog_finish(), flux_jobtap_epilog_start() or flux_jobtap_epilog_finish() are called for a job in an invalid state, these function will return -1 with errno set to EINVAL.

Multiple prolog or epilog actions can be active at the same time.

CALLING OTHER PLUGINS

Plugins may invoke custom callbacks in other plugins using flux_jobtap_call(). Note that topic strings starting with job. are reserved for use by the job-manager and will cause this function to fail immediately with errno set to EINVAL:

int flux_jobtap_call (flux_plugin_t *p,
                      flux_jobid_t id,
                      const char *topic,
                      flux_plugin_arg_t *args)

Much of the jobtap API assumes a current job, so a job id argument is required. Note, that args is passed unmodified when invoking callbacks for topic, so expected data listed in JOBTAP PLUGIN ARGUMENTS for job id may not be present in newly created args unless manually added by the caller. However, when invoked from another jobtap callback, the existing args object along with FLUX_JOBTAP_CURRENT_JOB may be used in flux_jobtap_call(), in which case args will still contain the expected job arguments. For example, the following will call all plugins registered for the topic custom.topic when the callback is called

int callback (flux_plugin_t *p,
              const char *topic,
              flux_plugin_arg_t *args,
              void *arg)
{
     return flux_jobtap_call (p,
                              FLUX_JOBTAP_CURRENT_JOB,
                              "custom.topic",
                              args);
}

RESOURCES

Flux: http://flux-framework.org

Flux RFC: https://flux-framework.readthedocs.io/projects/flux-rfc

Issue Tracker: https://github.com/flux-framework/flux-core/issues