flux-config-pam(5)

DESCRIPTION

The pam table configures flux-pam features that manage systemd user slices for Flux job users. This includes:

  • The flux-pam prolog and housekeeping scripts, which run during job prolog and housekeeping phases to start/stop user services and optionally apply resource constraints to user slices.

  • The pam_flux.so PAM session module, which attaches login sessions authenticated via the account module to the user's managed slice when manage-user-slice is enabled. See pam_flux(8).

PREREQUISITES

pam.manage-user-slice requires systemd ≥ 239 and the cgroup v2 unified hierarchy. Resource constraints (AllowedCPUs, AllowedMemoryNodes, DevicePolicy, DeviceAllow) are applied to user slice units via systemctl set-property --runtime, which is only enforced by systemd on the unified hierarchy. cgroup v1 systems are not supported.

When pam.manage-user-slice is enabled, Flux takes ownership of the user@UID.service manager for job users — starting it at first job and stopping it after the last. systemd linger must not be enabled for job users on compute nodes. Linger (loginctl enable-linger) keeps user@UID.service running independently of jobs, which interferes with Flux's lifecycle management in ways that can cause login sessions to escape containment silently. The prolog will fail hard if it detects linger is enabled for a job user, rather than proceeding with an inconsistent state.

The flux-pam package installs flux-pam-prolog and flux-pam-housekeeping into $libexecdir/flux/prolog.d/ and $libexecdir/flux/housekeeping.d/, where they are run by flux-run-prolog and flux-run-housekeeping respectively.

For these scripts to execute on compute nodes, the Flux system instance must load the perilog.so job-manager plugin with per-rank = true for prolog and housekeeping. With per-rank = true, the default command is flux-imp run prolog (or housekeeping), which invokes flux-run-prolog (or flux-run-housekeeping) as root, executing all scripts in the drop-in directory.

Flux system instance (/etc/flux/system/conf.d/):

[job-manager]
plugins = [
  { load = "perilog.so" }
]

[job-manager.prolog]
per-rank = true

[job-manager.housekeeping]
per-rank = true

IMP (/etc/flux/imp/conf.d/):

[run.prolog]
allowed-users = [ "flux" ]
allowed-environment = [ "FLUX_*" ]
path = "/usr/libexec/flux/cmd/flux-run-prolog"

[run.housekeeping]
allowed-users = [ "flux" ]
allowed-environment = [ "FLUX_*" ]
path = "/usr/libexec/flux/cmd/flux-run-housekeeping"

See flux-config-job-manager(5) and the Flux Administrator's Guide for further details.

KEYS

All keys are optional and default to false unless otherwise noted.

manage-user-slice

Boolean value that enables systemd user slice lifecycle management via prolog and housekeeping scripts. When enabled, the prolog starts user@UID.service (if not already running) for each job user, and housekeeping stops it when the user's last job completes. This is the master switch for all user slice management features, including session attachment in pam_flux.so (see pam_flux(8)). (Default: false).

When this feature is disabled, prolog and housekeeping scripts exit early without managing user services or applying resource constraints. This includes the instance owner (who always skips management).

kill-user-slice

Boolean value that controls whether housekeeping actively terminates processes remaining in the user slice when stopping user@UID.service. (Default: false).

When set to true, housekeeping implements aggressive cleanup:

  • Checks for orphan processes (processes in user-UID.slice but not under user@UID.service, such as leftover SSH sessions or other systemd scopes)

  • If orphans exist, sends SIGTERM to all processes in the slice

  • Waits for kill-slice-grace-time for processes to exit

  • If processes remain, sends SIGKILL to all processes in the slice

  • Waits for kill-slice-grace-time again

  • If processes still remain, raises an error and drains the node

When set to false (the default), housekeeping stops user@UID.service without attempting to kill processes. Cleanup is delegated to other mechanisms such as site-specific tools.

Warning

All processes in the user's slice — including interactive login sessions — are terminated when the last job completes.

kill-slice-grace-time

Duration in Flux Standard Duration (FSD) format specifying how long to wait for processes to exit after each kill signal. Only applies when kill-user-slice = true. (Default: "30s").

The grace time is applied twice: once after SIGTERM, once after SIGKILL. Maximum total cleanup time is therefore 2 * kill-slice-grace-time. If processes remain after both waits, housekeeping drains the node.

See 23/Flux Standard Duration for the FSD format specification.

debug

Boolean value that enables verbose debug logging for the prolog and housekeeping scripts. When true, each script logs its actions to stderr, which is captured in the Flux job-manager log. Equivalent to setting the FLUX_PAM_SCRIPTS_DEBUG environment variable. (Default: false).

RESOURCE CONSTRAINTS

The pam table works in conjunction with the exec configuration for resource management:

exec.sdexec-constrain-resources

When enabled (along with pam.manage-user-slice), prolog scripts compute the union of resources allocated to all of a user's jobs on a node and apply corresponding systemd properties to the user slice:

  • AllowedCPUs - Restricts slice to allocated CPU cores

  • AllowedMemoryNodes - Restricts slice to NUMA nodes for allocated cores

  • DeviceAllow - Grants access only to allocated GPUs

  • DevicePolicy=closed - Blocks access to physical devices except those explicitly allowed

When exec.sdexec-constrain-resources is disabled, prolog/housekeeping still manage the user service lifecycle (start/stop) if pam.manage-user-slice is enabled, but do not apply resource constraints.

See flux-config-exec(5) for details on the exec configuration.

OPERATION

Prolog Scripts

Prolog scripts run at job start and perform the following actions (when pam.manage-user-slice is enabled):

  1. Acquire an exclusive lock for the user (prevents races between concurrent prolog/housekeeping operations)

  2. Check that linger is not enabled for the user (fail hard if it is — see PREREQUISITES)

  3. Count active jobs on the node for this user (excluding the starting job)

  4. Start user@UID.service (idempotent: no-op if already running due to a concurrent prolog for the same user)

  5. If exec.sdexec-constrain-resources is enabled:

    • Compute the union of resources from all active jobs (including the starting job)

    • Query sdexec-mapper for systemd properties corresponding to the resource union

    • Apply properties to user-UID.slice via systemctl set-property

  6. Release the lock

Housekeeping Scripts

Housekeeping scripts run at job completion and perform the following actions (when pam.manage-user-slice is enabled):

  1. Acquire an exclusive lock for the user

  2. Count remaining active jobs on the node for this user (excluding the completed job)

  3. If jobs remain (count > 0) and exec.sdexec-constrain-resources is enabled, recalculate and apply resource constraints for the remaining jobs

  4. If no jobs remain (count = 0):

    • If pam.kill-user-slice is true, perform cleanup sequence (see kill-user-slice above)

    • Stop user@UID.service

  5. Release the lock

LOCKING AND SERIALIZATION

Prolog and housekeeping scripts acquire an exclusive lock (via flock) on /run/flux-pam/uid.UID.lock to serialize operations for each user. This prevents race conditions when multiple jobs for the same user start or complete concurrently on the same node.

The lock is held for the entire duration of prolog/housekeeping execution and released automatically when the script exits.

The lock directory (/run/flux-pam by default) must have permissions 0700 (owner read/write/execute only) and be owned by root. Lock files within the directory are created with permissions 0600 (owner read/write only) and are never deleted (they persist to avoid recreating them on each operation). If the lock directory has group or other write permissions, both the prolog/housekeeping scripts and the PAM session module will refuse to proceed and log an error. The directory is created with correct permissions at boot by the flux-pam tmpfiles.d drop-in (/usr/lib/tmpfiles.d/flux-pam.conf).

SECURITY CONSIDERATIONS

User Isolation

When exec.sdexec-constrain-resources is enabled, systemd resource constraints ensure that:

  • Users can only access CPU cores allocated to their jobs

  • Users can only access GPUs allocated to their jobs

  • Users cannot access physical devices not explicitly granted

However, users in the same user slice (user-UID.slice) share these constraints. All of a user's jobs on a node, plus any other processes the user starts within user@UID.service (such as SSH sessions if permitted), collectively share the union of resources allocated to the user's jobs.

Orphan Processes

With kill-user-slice = false (the default), processes that outlive a user's jobs may remain in the user slice even after user@UID.service stops. These processes may retain access to resources that were allocated to previous jobs. Sites concerned about this should either:

  • Enable kill-user-slice to forcibly terminate orphans

  • Configure systemd's KillMode for user slices to handle cleanup

  • Deploy separate mechanisms to detect and terminate orphan processes

  • Use pam_flux.so account management to deny non-job logins entirely

EXAMPLES

Minimal configuration to enable user slice lifecycle management:

[pam]
manage-user-slice = true

Enable resource constraints (requires systemd execution service):

[exec]
service = "sdexec"
sdexec-constrain-resources = true

[pam]
manage-user-slice = true

Enable aggressive orphan cleanup with 60-second grace time:

[pam]
manage-user-slice = true
kill-user-slice = true
kill-slice-grace-time = "60s"

Enable debug logging for prolog and housekeeping scripts:

[pam]
manage-user-slice = true
debug = true

RESOURCES

Flux Administrator's Guide: https://flux-framework.readthedocs.io/projects/flux-core/en/latest/guide/admin.html

SEE ALSO

flux-config(5), flux-config-exec(5), flux-config-job-manager(5), Flux Administrator's Guide: Adding Prolog/Housekeeping Scripts, pam_flux(8), pam.d(5)