42/Subprocess Server Protocol

The subprocess server protocol is used for execution, monitoring, and standard I/O management of remote processes.

Name

github.com/flux-framework/rfc/spec_42.rst

Editor

Jim Garlick <garlick@llnl.gov>

State

raw

Language

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.

Background

The subprocess server protocol is implemented in two distinct Flux components:

Name

Service Name

Notes

Flux broker

rexec

Always available

sdexec broker module

sdexec

If systemd support is configured

The primary use cases are:

  1. The job execution service runs job shells on job nodes.

  2. The job manager perilog plugin runs prolog/epilog scripts on job nodes.

  3. The instance owner runs arbitrary processes with flux exec.

In a multi-user Flux instance where a user transition is necessary in order for the instance owner to run commands with the credentials of a guest user, the subprocess server delegates this to the IMP. On its own, the subprocess server can only run jobs with the credentials of the process it is embedded within (the broker, for example). For more detail, refer to RFC 15.

Goals

  • Run a command with configurable path, arguments, environment, and working directory.

  • Launch the command directly, without a remote shell.

  • Monitor the command for completion or error.

  • Forward standard I/O.

  • Optionally forward additional I/O channels.

  • Provide signal delivery capability.

  • Protect against unauthorized use.

Implementation

exec

The streaming exec RPC creates a new subprocess. Payloads are are defined as follows:

exec request

The request SHALL consist of a JSON object with the following keys:

cmd

(object, REQUIRED) An object that defines the command.

See Command Object below.

flags

(integer, REQUIRED) A bitfield comprised of zero or more flags:

stdout (1)

Forward standard output to the client.

stderr (2)

Forward standard error to the client.

channel (4)

Forward auxiliary channel output to the client.

Several response types are distinguished by the type key:

exec started response

The remote process has been started.

The response SHALL consist of a JSON object with the following keys:

type

(string, REQUIRED) The response type with a value of started.

pid

(integer, REQUIRED) The remote process ID returned from the UNIX fork() system call.

exec stopped response

The remote process has been stopped due to a SIGSTOP signal. The response SHALL consist of a JSON object with the following keys:

type

(string, REQUIRED) The response type with a value of stopped.

Note

No response is generated if the process continues with SIGCONT.

exec finished response

The remote process is no longer running. The response SHALL consist of a JSON object with the following keys:

type

(string, REQUIRED) The response type with a value of finished.

status

(integer, REQUIRED) The UNIX wait() status value.

exec output response

The remote process has produced output. The response SHALL consist of a JSON object with the following keys:

type

(string, REQUIRED) The response type with a value of output.

io

(object, REQUIRED) Output data from the process.

See I/O Object below.

exec error response

The exec response stream SHALL be terminated by an error response per RFC 6, with ENODATA (61) indicating success.

Failure of the remote command SHALL be indicated in finished response and SHALL NOT result in an error response. Other errors, such as an ENOENT error from the execvp() system call SHALL result in an error response.

write

The write RPC sends data to an I/O channel of a remote process. Valid I/O channel names MAY include stdin and auxiliary channel names specified in the exec request command object.

write request

The request SHALL consist of a JSON object with the following keys:

pid

(integer, REQUIRED) The process ID of the remote process.

io

(object, REQUIRED) Input data for the process.

See I/O Object below.

This request receives no response, thus the request message SHOULD set FLUX_MSGFLAG_NORESPONSE. Write Requests to invalid channel names MAY be ignored by the subprocess server.

kill

The kill RPC sends a signal to a remote process.

kill request

The request SHALL consist of a JSON object with the following keys:

pid

(integer, REQUIRED) The process ID of the remote process.

signum

(integer, REQUIRED) The signal number.

kill response

The successful response SHALL contain no payload.

Command Object

The subprocess server command object SHALL consist of a JSON object with the following keys:

cwd

(string, OPTIONAL) The current working directory.

If unspecified, the server working directory SHALL be used.

cmdline

(array of string, REQUIRED) The command and its arguments.

The array SHALL have at least one element.

env

(object, REQUIRED) A set of key-value pairs that define the command’s environment.

All values SHALL be of type string.

opts

(object, REQUIRED) A set of key-value pairs that set subprocess options.

All values SHALL be of type string.

Options are implementation dependent and are not specified here.

channels

(array of string, REQUIRED) A list of I/O channel names.

A socketpair SHALL be created for each channel and one end passed to the subprocess in an environment variable whose name is the same as the channel name.

I/O Object

The subprocess server io object is identical to the RFC 24 Data Event context.

It SHALL consist of a JSON object with the following keys:

stream

(string, REQUIRED) The stream name such as stdout, stderr.

rank

(string, REQUIRED) An RFC 22 idset describing the source rank(s).

data

(string, OPTIONAL) Output data, encoded per encoding.

encoding

(string, OPTIONAL) Encoding type for data.

Possible values:

UTF-8

Encode as a UTF-8 string.

base64

Encode as a base64 string

If not present, UTF-8 is assumed.

eof

(boolean, OPTIONAL) EOF indicator for stream.

Example

exec request

{
  "cmd": {
    "cwd": "/home/test",
    "cmdline": [
      "hostname"
    ],
    "env": {
      "PATH": "/bin:/usr/bin:/home/test/bin",
    },
    "opts": {},
    "channels": []
  },
  "flags": 3
}

exec responses

{
  "type": "started",
  "pid": 1848495
}
{
  "type": "output",
  "pid": 1848495,
  "io": {
    "stream": "stdout",
    "rank": "0",
    "data": "system76-pc\n"
  }
}
{
  "type": "output",
  "pid": 1848495,
  "io": {
    "stream": "stderr",
    "rank": "0",
    "eof": true
  }
}
{
  "type": "output",
  "pid": 1848495,
  "io": {
    "stream": "stdout",
    "rank": "0",
    "eof": true
  }
}
{
  "type": "finished",
  "status": 0
}