Some frequently asked questions about flux and their answers.
Does flux run on a mac?¶
How do I report a bug?¶
Why is Flux not discovering and managing all of the resources on the system/node?¶
This can be due to various bind flags that need to be passed parallel launcher that started Flux. For example at LLNL you must pass
Also on all systems, Flux relies on hwloc to auto-detect the on-node resources
available for scheduling. The hwloc that Flux is linked against must be
--enable-cuda for Flux to be able to detect Nvidia GPUs.
You can test to see if your system default hwloc is CUDA-enabled with:
lstopo | grep CoProc
If no output is produced, then your hwloc is not CUDA-enabled.
How do I efficiently launch a large number of jobs to Flux?¶
Memory exhaustion on a node when running large ensembles with Flux¶
Flux’s in-memory KVS is backed by an on-disk content store. The default location for the content store is
/tmp, which for some systems is configured as a RAMDisk. To minimize Flux’s memory footprint at the cost of a slower content store, set
rundir so that the content store is not saved to
/tmp but to filesystem.
An example command to set the
rundir could look like:
flux start -o,-Srundir=/path/to/rundir
How do I mimic Slurm’s job step semantics ?¶
flux mini submit to submit a script containing multiple
flux mini run invocations will not result in Slurm-style job steps unless the job script is prefixed with
flux start .
Flux is failing to bootstrap a specific MPI implementation (e.g. OpenMPI, MPICH, Spectrum MPI)¶
Flux’s shell plugins for Intel MPI, MVAPICH, and OpenMPI run by default with every job. If you experience any issues bootstrapping these MPIs, use the
flux mini run/submit -o mpi= option when running or submitting, or please open an issue: How do I report a bug?
For Spectrum MPI follow the instructions here: Launching Spectrum MPI within Flux
My message callback is not being run. How do I debug?¶
Check the error codes from
FLUX_HANDLE_TRACEto see messages moving through the overlay
FLUX_HANDLE_TRACEis set when starting a Flux instance:
FLUX_HANDLE_TRACE=t flux start
FLUX_O_TRACEis passed as a flag to flux_open
I’m experiencing a hang while running my parallel application. How can I debug?¶
flux mini run/submitwith the
- If it is hanging in startup, try adding the
PMI_DEBUG=t flux mini run my_app.exe
How does the versioning of Flux work with its multi-repo structure?¶
For any given repository, the versioning is typical semantic versioning. All of the Flux repos are still < v1.0, so all of our interfaces are subject to change. Once a repo hits v1.0, the interfaces for that repo will only break backwards compatibility on major version increments. New features get added in minor releases. Etc
The interesting part of the versioning comes from the multi-repo structure. Flux-sched is it’s own repo with it’s own versioning scheme. A release on flux-core may not break anything in flux-sched or require changes and thus might not warrant a new release. So the flux-core and flux-sched versions do not get incremented in lockstep. Already as of June 2020, flux-core is on 0.16.0 and flux-sched is on 0.8.0. We have the compatibility of the various flux-core/flux-sched versions codified in our spack packages, and that will get more extensive as we add additional repos like flux-depend and flux-accounting.
A ‘flux’ meta-package (such as in spack or distro package managers) that would pull in compatible versions of the various sub-packages/repos is also versioned independently of any of its subcomponents. It is a similar situation for the flux-docs repo and the documentation up on readthedocs. Each repo has it’s own documentation and that gets tagged and released along with the code, but the high-level “meta” documentation has it’s own versioning that is divorced from any particular sub-packages/repos versioning.