flux-start(1)

SYNOPSIS

[launcher] flux start [OPTIONS] [initial-program [args...]]

flux start --test-size=N [OPTIONS] [initial-program [args...]]

DESCRIPTION

flux start assists with launching a new Flux instance, which consists of one or more flux-broker(1) processes functioning as a distributed system. It is primarily useful in environments that don't run Flux natively, or when a standalone Flux instance is required for test, development, or post-mortem debugging of another Flux instance.

When already running under Flux, single-user Flux instances can be more conveniently started with flux-batch(1) and flux-alloc(1). The Flux Administration Guide covers setting up a multi-user Flux "system instance", where Flux natively manages a cluster's resources and those commands work ab initio for its users.

flux start operates in two modes. In NORMAL MODE, it does not launch broker processes; it becomes a single broker which joins an externally bootstrapped parallel program. In TEST MODE, it starts one or more brokers locally, provides their bootstrap environment, and then cleans up when the instance terminates.

NORMAL MODE

Normal mode is used when an external launcher like Slurm or Hydra starts the broker processes and provides the bootstrap environment. It is selected when the --test-size option is not specified.

In normal mode, flux start replaces itself with a broker process by calling execvp(2). The brokers bootstrap as a parallel program and establish overlay network connections. The usual bootstrap method is some variant of the Process Management Interface (PMI) provided by the launcher.

For example, Hydra provides a simple PMI server. The following command starts brokers on the hosts listed in a file called hosts. The instance's initial program prints a URI that can be used with flux-proxy(1) and then sleeps forever:

mpiexec.hydra -f hosts -launcher ssh \
  flux start "flux uri --remote \$FLUX_URI; sleep inf"

Slurm has a PMI-2 server plugin with backwards compatibility to the simple PMI-1 wire protocol that Flux prefers. The following command starts a two node Flux instance in a Slurm allocation, with an interactive shell as the initial program (the default if none is specified):

srun -N2 --pty --mpi=pmi2 flux start

When Flux is started by a launcher that is not Flux, resources are probed using HWLOC. If all goes well, when Slurm launches Flux flux resource info in Flux should show all the nodes, cores, and GPUs that Slurm allocated to the job.

TEST MODE

Test mode, selected by specifying the --test-size option, launches a single node Flux instance that is independent of any configured resource management on the node. In test mode, flux start provides the bootstrap environment and launches the broker process(es). It remains running as long as the Flux instance is running. It covers the following use cases:

Start an interactive Flux instance on one node such as a developer system
```
flux start --test-size=1
```
Jobs can be submitted from the interactive shell started as the initial program, similar to the experience of running on a one node cluster.
Mock a multi-node (multi-broker) Flux instance on one node
```
flux start --test-size=64
```
When the test size is greater than one, the actual resource inventory is multiplied by the test size, since each broker thinks it is running on a different node and re-discovers the same resources.
Start a Flux instance to run a continuous integration test. A test that runs jobs in Flux can be structured as:
```
flux start --test-size=1 test.sh
```
where test.sh (the initial program) runs work under Flux. The exit status of flux start reflects the exit status of test.sh. This is how many of Flux's own tests work.
Start a Flux instance to access job data from an inactive batch job that was configured to leave a dump file:
```
flux start --test-size=1 --recovery=dump.tar
```
Start a Flux instance to repair the on-disk state of a crashed system instance (experts only):
```
sudo -u flux flux start --test-size=1 --recovery
```

Run the broker under gdb(1) from the source tree:

${top_builddir}/src/cmd/flux start --test-size=1 \
   --wrap=libtool,e,gdb

OPTIONS

-o, --broker-opts=OPTIONS: Add options to the message broker daemon, separated by commas.

-v, --verbose=[LEVEL]: This option may be specified multiple times, or with a value, to set a verbosity level (1: display commands before executing them, 2: trace PMI server requests in TEST MODE only).

-X, --noexec: Don't execute anything. This option is most useful with -v.

--caliper-profile=PROFILE: Run brokers with Caliper profiling enabled, using a Caliper configuration profile named PROFILE. Requires a version of Flux built with --enable-caliper. Unless CALI_LOG_VERBOSITY is already set in the environment, it will default to 0 for all brokers.

--rundir=DIR: (only with --test-size) Set the directory that will be used as the rundir directory for the instance. If the directory does not exist then it will be created during instance startup. If a DIR is not set with this option, a unique temporary directory will be created. Unless DIR was pre-existing, it will be removed when the instance is destroyed.

--wrap=ARGS: Wrap broker execution in a comma-separated list of arguments. This is useful for running flux-broker directly under debuggers or valgrind.

-s, --test-size=N: Launch an instance of size N on the local host.

--test-hosts=HOSTLIST: Set FLUX_FAKE_HOSTNAME in the environment of each broker so that the broker can bootstrap from a config file instead of PMI. HOSTLIST is assumed to be in rank order. The broker will use the fake hostname to find its entry in the configured bootstrap host array.

--test-exit-timeout=FSD: After a broker exits, kill the other brokers after a timeout (default 20s).

--test-exit-mode=MODE: Set the mode for the exit timeout. If set to leader, the exit timeout is only triggered upon exit of the leader broker, and the flux start exit code is that of the leader broker. If set to any, the exit timeout is triggered upon exit of any broker, and the flux start exit code is the highest exit code of all brokers. Default: any.

--test-start-mode=MODE: Set the start mode. If set to all, all brokers are started immediately. If set to leader, only the leader is started. Hint: in leader mode, use --setattr=broker.quorum=1 to let the initial program start before the other brokers are online. Default: all.

--test-rundir=PATH: Set the directory to be used as the broker rundir instead of creating a temporary one. The directory must exist, and is not cleaned up unless --test-rundir-cleanup is also specified.

--test-rundir-cleanup: Recursively remove the directory specified with --test-rundir upon completion of flux start.

--test-pmi-clique=MODE: Set the pmi clique mode, which determines how PMI_process_mapping is set in the PMI server used to bootstrap the brokers. If none, the mapping is not created. If single, all brokers are placed in one clique. If per-broker, each broker is placed in its own clique. Default: single.

-r, --recovery=[TARGET]: Start the rank 0 broker of an instance in recovery mode. If TARGET is a directory, treat it as a statedir from a previous instance. If TARGET is a file, treat it as an archive file from flux-dump(1). If TARGET is unspecified, assume the system instance is to be recovered. In recovery mode, any rc1 errors are ignored, broker peers are not allowed to connect, and resources are offline.

--sysconfig: Run the broker with --config-path set to the default system instance configuration directory. This option is unnecessary if --recovery is specified without its optional argument. It may be required if recovering a dump from a system instance.

TROUBLESHOOTING

NORMAL MODE requires Flux, the launcher, and the network to cooperate. If flux start appears to hang, the following tips may be helpful:

Reduce the size of the Flux instance to at most two nodes. This reduces the volume of log data to look at and may be easier to allocate on a busy system. Rule out the simple problems that can be reproduced with a small allocation first.
Use an initial program that prints something and exits rather than the default interactive shell, in case there are problems with the launcher's pty setup. Something like:
```
[launcher] flux start [options] echo hello world
```
Ensure that standard output and error are being captured and add launcher options to add rank prefixes to the output.

Slurm

--label

Hydra

-prepend-rank

flux-run(1)

--label-io
Tell the broker to print its rank, size, and network endpoint by adding the flux start -o,-v option. If this doesn't happen, most likely the PMI bootstrap is getting stuck.
Trace Flux's PMI client on stderr by setting the FLUX_PMI_DEBUG environment variable:
```
FLUX_PMI_DEBUG=1 [launcher] flux start ...
```
Consider altering FLUX_PMI_CLIENT_METHODS to better match the launcher's PMI offerings. See flux-environment(7).
A launcher's PMI capabilities can also be explored in a simplified way using the flux-pmi(1) client.
If PMI is successful but the initial program fails to run, the brokers may not be able to reach each other over the network. After one minute, the rank 0 broker should log a "quorum delayed" message if this is true.
Examine the network endpoints in the output above. Flux preferentially binds to the IPv4 network address associated with the default route and a random port. The address choice can be modified by setting the FLUX_IPADDR_HOSTNAME and/or FLUX_IPADDR_V6. See flux-environment(7).
More logging can be enabled by adding the flux start -o,-Slog-stderr-level=7 option, which instructs the broker to forward its internal log buffer to stderr. See flux-broker-attributes(7).

Another common failure mode is getting a single node instance when multiple nodes were expected. This can occur if no viable PMI server was found and the brokers fell back to singleton operation. It may be helpful to enable PMI tracing, check into launcher PMI options, and possibly adjust the order of options that Flux tries using FLUX_PMI_CLIENT_METHODS as described above.

Finally, if Flux starts but GPUs are missing from flux resource info output, verify that the version of HWLOC that Flux is using was built with the appropriate GPU plugins.

RESOURCES

Flux: http://flux-framework.org

Flux RFC: https://flux-framework.readthedocs.io/projects/flux-rfc

Slurm	`--label`
Hydra	`-prepend-rank`
flux-run(1)	`--label-io`