.. _admin-guide:

********************
Migrating from Slurm
********************

*I run a large HPC center.  Should I kick Slurm to the curb?*

Not today.  Enjoy your Slurm (it's highly addictive!).
As a reminder, Flux can coexist with Slurm as an enhanced step
manager and portability layer.  See :ref:`start_slurm`.

Check back with us at the end of 2026.

Flux Maturity
=============

The Flux project began around 2012.  Flux has been used for a decade or more
for managing complex ensembles and workflows at LLNL under Slurm and LSF
in situations where the traditional workload managers were not up to the task.
Although Flux was designed from the beginning to replace Slurm and was
deployed as such on several small systems at LLNL, it did not gain momentum
as a system workload manager until 2024 with the early deliveries of
`El Capitan <https://hpc.llnl.gov/hardware/compute-platforms/el-capitan>`_.

Flux is now in daily production use as the sole system workload manager on
El Capitan (currently, in late 2025, in slot 1 of the `TOP500 <https://www.top500.org/>`_)
and its unclassified sister systems at LLNL.  These machines are capability
workhorses, continuously in demand for LLNL's most cutting edge,
mission-critical activities.  The process of standing them up brought the
Flux team together to address problems of scale and stability with
unprecedented urgency.  As a result, system deployments of Flux on
virtually any size system are viable at this point.

This experience brought missing features into focus that have now been
prioritized for near term development.  We expect substantial progress to
be made on all of these items in 2026:

Rolling software upgrade
  Flux nodes won't interact with other Flux nodes unless they are running
  *exactly* the same flux-core version.  When flux-core 1.0.0 is released,
  relaxed rules will be enacted to support rolling upgrades.

Flux restart with running jobs
  Restarting Flux kills running jobs.  The design to allow running
  jobs to continue is not yet fully implemented.

Reservations
  Flux does not yet have a way to request immovable future allocations.
  Arrangements for dedicated application time currently
  require manual/scripted actions by system administrators.

Improve graph-based scheduling serialization efficiency
  Resource serialization inefficiency prevents Flux's graph-based
  scheduling advantages from being utilized on large systems. Efforts
  to optimize the Fluxion scheduler resource graph serialization are
  ongoing.

Preservation of job step information
  Slurm's smallest unit of work is the job step.  Slurm keeps metadata
  for each step which can be retrieved after the job (allocation) completes.
  In Flux, there are no job steps.  Instead, a sub-instance of Flux is started
  on a resource allocation, and user applications are run as jobs within the
  sub-instance.  There is little coupling between the system instance and its
  sub-instances which improves scalability.  Unfortunately, sub-instance job
  data is not preserved unless the user explicitly arranges for it, which some
  find surprising.

Multi-system fair share accounting
  Flux has an optional fair-share accounting system for gathering usage data
  and setting job priorities.  Unlike Slurm, there is not yet a capability to
  deploy one accounting system for multiple Flux systems.

Package availability
  Flux system administrators are encouraged to install Flux as system packages,
  but pre-built packages are not widely available.  Source RPM packages for
  RHEL 8 and 9 are manually attached to GitHub releases.  Very soon this will
  be automated for all sub-projects that the team currently packages for RHEL.
  Also, discussions are underway with Red Hat regarding inclusion in Fedora
  and EPEL.

Commercial support and training
  This is an important and current area of discussion.

Flux Design Advantages
======================

Replacing Slurm with Flux may bring substantial long term rewards.
Here are some advantages of Flux's design over Slurm:

Flux has a solid security design
  While a significant amount of Slurm code runs as root, virtually none of
  Flux's does.  For more detail on Flux's security model, refer to
  :ref:`background_security`.

Flux doesn't need a step manager
  In Slurm, a *job* is a resource allocation scheduled by the Slurm scheduler.
  Within the allocation, a *job step* is a unit of work scheduled by the the
  Slurm step manager. Since a new Flux instance can be launched as a
  :term:`job` in another Flux instance (subdividing its resources), there is
  no need have a step abstraction or a step manager.  All units of work are
  jobs scheduled and launched by the same robust system.  Limitations of
  Slurm's step manager have been a source of long standing problems.

Flux APIs are not an afterthought
  Well thought out Python and C APIs make it easy to integrate Flux with
  workflow systems and new environments.  Flux APIs are licensed under the
  non-viral LGPL-3.0.  Experimental bindings for Go and Julia are also
  available.

Flux has a rich resource representation
  Flux's design incorporates a graph-based resource model that is
  extensible to arbitrary resource types.

Flux is dynamic
  The Flux design supports growing and shrinking allocations.

Flux is scalable
  Flux's recursive launch design enables each allocation to scale
  independently.  Each instance of Flux has job throughput and node count
  scalability comparable to Slurm.  But a cluster will typically be running
  many instances of Flux compared to one instance of Slurm.

Flux uses event-driven messaging
  In contrast to Slurm's multi-threaded, monolithic server design, Flux
  is built upon distributed message brokers and event-driven (asynchronous,
  reactive) agents that communicate only with messages.  Building distributed
  services on this substrate is interesting, fun, and scales well.

Flux is portable, modular, and composable
  A single-user Flux instance is trivial to start anywhere, without
  administrative privilege.  It can run as a parallel job in an allocation
  of any workload manager, standalone on your laptop or Raspberry Pi, and can
  be integrated into converged and cloud environments.  A single-user Flux
  instance is controlled by its owner and may be reconfigured, extended, or
  modified at will.  Slurm's monolithic design cannot offer this freedom.

Slurm Long Term Viability
=========================

When it was started at LLNL, Slurm was named "SLURM", a backronym for *Simple
Linux Utility for Resource Management*.  The guiding principle of its
design was simplicity, a laudable goal.  However, Slurm's brief design
phase and relatively short development period before its first production
deployment, followed by rapid expansion into even non-Linux environments,
put a lot of stress on that original design without leaving much time to
pay back the technical debt that accrued.

SchedMD and the impressive Slurm community have taken Slurm on quite a
journey since those days.  Despite growing complexity, the fundamental
design of Slurm has not changed and has not provided a strong basis for
the organic feature growth that has occurred.  Consequently, the Slurm
code base is not well positioned to support the emerging needs of
HPC/Cloud/ML computing into the future.

Continuing to extend Slurm indefinitely without breaking it will cost
more than building support for emerging capabilities on Flux, which
is well on its way to basic feature parity with Slurm and offers a
superior, stable foundation.

Command Equivalencies
=====================

LLNL's `Batch System Cross-Reference Guides
<https://hpc.llnl.gov/banks-jobs/running-jobs/batch-system-cross-reference-guides>`_ may be helpful.

Slurm Wrappers
==============

Asking very important people who do very important things to change their
workflows can cause friction.  Wrappers scripts that implement Slurm
functionality in terms of Flux are available if you need them.

Installing Slurm wrappers on a Flux system cuts two ways.  On one hand,
it can ease users through the transition to Flux and reduce the support
burden.  On the other hand, it can be a crutch that delays learning and
obfuscates problems.

.. list-table::
   :header-rows: 1

   * - Package
     - Functionality

   * - `flux-wrappers <https://github.com/LLNL/flux-wrappers>`_
     - Wrapper scripts to ease the transition to Flux.