16/KVS Job Schema
This specification describes the format of data stored in the KVS for Flux jobs.
Editor: Jim Garlick <email@example.com>
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.
Components that use the KVS job schema
Instance components have direct, read/write access to the primary KVS namespace:
Guest components have direct, read/write access to a private KVS namespace:
Command line tools
Job Life Cycle
A job is submitted to the ingest agent which validates jobspec, adds the job to the KVS, and informs the job manager of the new job. Upon success, the jobid is returned to the user. The job manager then takes the active role in moving a job through its life cycle:
If a job has dependencies, interacting with a job dependency subsystem to ensure they are met before proceeding.
Submitting an allocation request to the scheduler to obtain resources.
Once resources are allocated, submitting a start request to the exec service.
The exec service starts job shells directly in a single-user instance. In a multi-user instance, it directs the IMP to start them with guest credentials, with appropriate containment.
The job shell examines jobspec and allocated resource set, then launches tasks on local resources. It provides standard I/O, parallel bootstrap, signal propagation, and exit code collection services. It is a user-replaceable component.
Once tasks exit, or an exceptional condition such as cancellation or expiration of wall clock allocation occurs, the exec service cleans up any lingering tasks and job shells, and notifies the job manager which frees resources back to the scheduler.
The job is now complete.
Primary KVS Namespace
The Flux instance has a default, shared namespace that is accessible only by the instance owner.
All job data is stored under a
jobs directory in the primary
namespace. Each job has a directory under
<jobid> is a unique sequence number assigned by the ingest agent.
Jobs listed in the
jobs directory may need to be periodically
archived and purged to keep its size manageable in long-running
Guest KVS Namespace
A guest-writable KVS namespace is created by the exec service
for the use of the job shell and the application. While the job
is active, this namespace is linked from
in the primary KVS namespace. While linked, it can be changed
by the guest components without impacting performance of the primary
namespace, while still being accessible through the link in the
When the job transitions to inactive, the final snapshot of the guest namespace content is linked by the exec service into the primary namespace, and the guest namespace is destroyed.
Access to Primary Namespace by Guest Users
Guests may access data in the primary KVS namespace only through instance services that allow selective guest access, by proxy or by staging copies to the guest namespace.
Guest access for primary namespace contents
eventlog is provided via a proxy service in the instance.
Active jobs undergo change represented as events that are recorded under
job.<jobid>.eventlog. A KVS append operation
is used to add events to this log.
Each append consists of a string matching the format described in RFC 18.
Content Produced by Ingest Agent
A user submits J with attached signature, as described in RFC 15.
The ingest agent validates J and if accepted, populates the KVS with:
signed user request token for passing to IMP in a multi-user instance.
jobspec in JSON form, as described in RFC 14
eventlog described above
The ingest agent logs one event to the eventlog:
job was submitted, with authenticated userid and urgency (0-31)
Content Consumed/Produced by Job Manager
Upon notification of a new
job.<jobid>, the job manager takes
the active role in moving a job through its life cycle, and logs events
to the eventlog as described in RFC 21.
When the job manager is restarted, it recovers its state by scanning
jobs and replaying the eventlog for each job found there.
Content Consumed/Produced by Scheduler
When the scheduler receives an allocation request containing a jobid,
it reads the jobspec from
The scheduler allocates resources by writing a resource set
as described in RFC 20
job.<jobid>.R and answering the allocation request.
The scheduler frees resources by answering the free request,
R in place for job provenance. During a restart, the
job manager uses the eventlog to determine whether
R is currently
Content Consumed/Produced by Exec Service
When the exec system receives a start request containing a jobid,
it reads the
and uses this information to launch job shells and subsequently tasks.
The exec system creates the job’s guest namespace and links it to
job.<jobid>.guest. Its initial contents are populated with
An eventlog for the use of job shells, TBD.
Once all job shells have exited and all outstanding writes to the guest namespace have stopped, the exec system links the guest namespace into the primary KVS namespace before notifying the job manager that the job is finished.
Content Produced/Consumed by Other Instance Services
Other services not mentioned in this RFC MAY store arbitrary data associated
with jobs under the
<service> is a name unique to the service producing the data.
For example, a job tracing service may store persistent trace data under
Content Consumed/Produced by Other Guest Services
Other guest services not mentioned in this RFC MAY store service-specific
data in the guest KVS namespace under
a name unique to the service producing the data.
Content Consumed/Produced by the Application
The application MAY store application-specific data in the guest KVS
Content Consumed/Produced by Tools
Tools such as parallel debuggers, running as the guest, MAY store data
in the guest KVS namespace under
a name unique to the tool producing the data.