FAQs

Some frequently asked questions about flux and their answers.

Does flux run on a mac?

Not yet. We have an open issue on GitHub tracking the progress towards the goal of natively compiling on a mac. In the meantime, you can use Docker, see: Quick Start.

How do I report a bug?

You can read up on reporting bugs here: Contributing or report one directly for flux core or sched.

Why is Flux not discovering and managing all of the resources on the system/node?

This can be due to various bind flags that need to be passed parallel launcher that started Flux. For example at LLNL you must pass --mpibind=off to srun and --bind=none to jsrun.

Also on all systems, Flux relies on hwloc to auto-detect the on-node resources available for scheduling. The hwloc that Flux is linked against must be configured with --enable-cuda for Flux to be able to detect Nvidia GPUs.

You can test to see if your system default hwloc is CUDA-enabled with:

lstopo | grep CoProc

If no output is produced, then your hwloc is not CUDA-enabled.

How do I efficiently launch a large number of jobs to Flux?

See bulksubmit.py for an example workflow or flux tree.

Memory exhaustion on a node when running large ensembles with Flux

Flux’s in-memory KVS is backed by an on-disk content store. The default location for the content store is /tmp, which for some systems is configured as a RAMDisk. To minimize Flux’s memory footprint at the cost of a slower content store, set rundir so that the content store is not saved to /tmp but to filesystem. An example command to set the rundir could look like:

flux start -o,-Srundir=/path/to/rundir

How do I mimic Slurm’s job step semantics ?

Using flux mini submit to submit a script containing multiple flux mini run invocations will not result in Slurm-style job steps unless the job script is prefixed with flux start .

Flux is failing to bootstrap a specific MPI implementation (e.g. OpenMPI, MPICH, Spectrum MPI)

Flux’s shell plugins for Intel MPI, MVAPICH, and OpenMPI run by default with every job. If you experience any issues bootstrapping these MPIs, use the flux mini run/submit -o mpi= option when running or submitting, or please open an issue: How do I report a bug?

For Spectrum MPI follow the instructions here: Launching Spectrum MPI within Flux

My message callback is not being run. How do I debug?

  • Check the error codes from flux_msg_handler_addvec, flux_register_service, flux_rpc_get, etc

  • Use FLUX_O_TRACE and FLUX_HANDLE_TRACE to see messages moving through the overlay

    • FLUX_HANDLE_TRACE is set when starting a Flux instance: FLUX_HANDLE_TRACE=t flux start
    • FLUX_O_TRACE is passed as a flag to flux_open

I’m experiencing a hang while running my parallel application. How can I debug?

  • Run flux mini run/submit with the -vvv argument
  • If it is hanging in startup, try adding the PMI_DEBUG environment variable: PMI_DEBUG=t flux mini run my_app.exe

How does the versioning of Flux work with its multi-repo structure?

For any given repository, the versioning is typical semantic versioning. All of the Flux repos are still < v1.0, so all of our interfaces are subject to change. Once a repo hits v1.0, the interfaces for that repo will only break backwards compatibility on major version increments. New features get added in minor releases. Etc

The interesting part of the versioning comes from the multi-repo structure. Flux-sched is it’s own repo with it’s own versioning scheme. A release on flux-core may not break anything in flux-sched or require changes and thus might not warrant a new release. So the flux-core and flux-sched versions do not get incremented in lockstep. Already as of June 2020, flux-core is on 0.16.0 and flux-sched is on 0.8.0. We have the compatibility of the various flux-core/flux-sched versions codified in our spack packages, and that will get more extensive as we add additional repos like flux-depend and flux-accounting.

A ‘flux’ meta-package (such as in spack or distro package managers) that would pull in compatible versions of the various sub-packages/repos is also versioned independently of any of its subcomponents. It is a similar situation for the flux-docs repo and the documentation up on readthedocs. Each repo has it’s own documentation and that gets tagged and released along with the code, but the high-level “meta” documentation has it’s own versioning that is divorced from any particular sub-packages/repos versioning.