flux-shell-cray-pals(1)

DESCRIPTION

flux run -opmi=cray-pals

DESCRIPTION

cray-pals is a flux-shell(1) plugin that assists with the launch of programs built with Cray MPICH. It creates an apinfo data file on each node and sets several environment variables which are used by Cray PMI to initialize.

After cray-pals places that data on each node, Cray PMI bootstraps an overlay network and handles the PMI data exchange without Flux.

SHELL OPTIONS

pmi=cray-pals

Enable only the cray-pals PMI plugin. Other PMI implementations may be added, separated by commas.

Note

On systems where Cray MPICH is used, it may be helpful for system administrators to customize /etc/flux/shell/initrc to enable cray-pals by default:

if shell.options['pmi'] == nil then
    shell.options['pmi'] = 'cray-pals,simple'
end

simple PMI is required to launch Flux instances, so if the pmi default is changed, be sure to include it also.

cray-pals.no-edit-env

Prevent the cray-pals PMI plugin from removing Flux's PMI library directory from LD_LIBRARY_PATH, if present.

It is removed by default to ensure that Cray PMI library is found before Flux's.

cray-pals.apinfo-version=1

Force HPE apinfo version 1. The default is version 5.

cray-pals.timeout=SECONDS

The plugin synchronously watches the job eventlog for data from the cray-pmi-bootstrap jobtap plugin. The job is aborted if the eventlog does not make progress for the specified number of seconds. Setting the timeout to a negative value disables it. Default: 10.

cray-pals.pmi-bootstrap=off

Disable setting PMI_CONTROL_PORT and PMI_SHARED_SECRET. This is mainly useful for testing.

cray-pals.pmi-bootstrap=[port1,port2,secret]

Provide fixed values for setting PMI_CONTROL_PORT and and PMI_SHARED_SECRET. The option value is expressed as a JSON array of three integers. This is mainly useful for testing.

ENVIRONMENT

The following environment variables are set by cray-pals, as required by Cray PMI.

PALS_APID

Alias for FLUX_JOB_ID, forced into integer form.

PALS_APINFO

The path to the aforementioned apinfo file on the local node.

PALS_RANKID

Alias for FLUX_TASK_RANK.

PALS_NODEID

The index of the local node relative to the job.

PALS_SPOOL_DIR

Alias for FLUX_JOB_TMPDIR.

PMI_CONTROL_PORT

A comma-separated pair of port numbers to assist Cray PMI in bootstrapping peer connections. The ports should be available for binding on all nodes of the job.

PMI_SHARED_SECRET

A random 64 bit integer to assist Cray PMI in bootstrapping secure communications.

APINFO

The APINFO contains application data in the following sections:

comm profiles

One comm profile per NIC, each of which defines a CXI service that includes VNI numbers for access control and allowed traffic classes. The default CXI service is used if none is provided here. Not supported by cray-pals, but a high priority for future development.

command

One entry per MPMD application, each with tasks per node and CPU per task. MPMD is not supported by cray-pals so there is always just one entry.

pes

One entry per task rank, each containing a node-local task index, a reference to the assigned MPMD command, and a node index.

nodes

One entry per node allocated to the job, each containing a hostname and a node index.

nics

One entry per NIC for each NIC assigned to the job across all nodes. Each entry contains the NIC address, etc., for scalable program launch. Not supported by cray-pals.

DEBUGGING

The following may be useful if MPI_Init() is failing for unknown reasons.

Tip

Obtain a Flux allocation with flux alloc that will fit the minimum MPI size that can reproduce the issue.

1. Run with flux run -o verbose=2 and check for output from cray-pals.

$ flux run -o pmi=cray-pals -N2 -n2 -o verbose=2 true
...
0.051s: flux-shell[1]: DEBUG: pmi-cray-pals: enabled
0.068s: flux-shell[1]: TRACE: pmi-cray-pals: created pals apinfo file
  /var/tmp/user/flux-tBlt5H/jobtmp-1-f4yBboYGo/libpals_apinfo
0.069s: flux-shell[1]: TRACE: pmi-cray-pals: set PMI_SHARED_SECRET to 16945943893152566943
0.069s: flux-shell[1]: TRACE: pmi-cray-pals: set PALS_NODEID to 1
0.069s: flux-shell[1]: TRACE: pmi-cray-pals: set PALS_APID to 8762756694016
0.069s: flux-shell[1]: TRACE: pmi-cray-pals: set PALS_SPOOL_DIR to
  /var/tmp/user/flux-tBlt5H/jobtmp-1-f4yBboYGo
0.069s: flux-shell[1]: TRACE: pmi-cray-pals: set PALS_APINFO to
  /var/tmp/user/flux-tBlt5H/jobtmp-1-f4yBboYGo/libpals_apinfo
0.070s: flux-shell[1]: TRACE: pmi-cray-pals: set PALS_RANKID to 1
0.047s: flux-shell[0]: DEBUG: pmi-cray-pals: enabled
0.064s: flux-shell[0]: TRACE: pmi-cray-pals: created pals apinfo file
  /var/tmp/user/flux-pSw4um/jobtmp-0-f6jyUdR2P/libpals_apinfo
0.065s: flux-shell[0]: TRACE: pmi-cray-pals: set PMI_CONTROL_PORT to 11998,11999
0.065s: flux-shell[0]: TRACE: pmi-cray-pals: set PMI_SHARED_SECRET to 11872392986869071399
0.065s: flux-shell[0]: TRACE: pmi-cray-pals: set PALS_NODEID to 0
0.065s: flux-shell[0]: TRACE: pmi-cray-pals: set PALS_APID to 12675874553856
0.065s: flux-shell[0]: TRACE: pmi-cray-pals: set PALS_SPOOL_DIR to
  /var/tmp/user/flux-pSw4um/jobtmp-0-f6jyUdR2P
0.065s: flux-shell[0]: TRACE: pmi-cray-pals: set PALS_APINFO to
  var/tmp/user/flux-pSw4um/jobtmp-0-f6jyUdR2P/libpals_apinfo
0.066s: flux-shell[0]: TRACE: pmi-cray-pals: set PALS_RANKID to 0

2. Check that you can launch a PMI test program configured to use Cray PMI using the same options. In this example, flux pmi(1) is used.

$ flux run -o pmi=cray-pals --label-io -N2 -n2 flux pmi --method=libpmi2 --verbose barrier
0: libpmi2: using /opt/cray/pe/lib64/libpmi2.so (cray quirks enabled)
0: libpmi2: initialize: rank=0 size=2 name=kvs_160608288768: success
0: libpmi2: barrier: success
0: libpmi2: barrier: success
0: libpmi2: finalize: success
0: f5DhQTk3: completed pmi barrier on 2 tasks in 0.000s.
1: libpmi2: using /opt/cray/pe/lib64/libpmi2.so (cray quirks enabled)
1: libpmi2: initialize: rank=1 size=2 name=kvs_160608288768: success
1: libpmi2: barrier: success
1: libpmi2: barrier: success
1: libpmi2: finalize: success

3. Check that you can launch an MPI hello world program compiled with Cray MPICH.

$ flux run -o pmi=cray-pals --label-io -N2 -n2 proj/mpi-test/hello
0: fdfdnnoy: completed MPI_Init in 0.581s.  There are 2 tasks
0: fdfdnnoy: completed first barrier in 0.002s
0: fdfdnnoy: completed MPI_Finalize in 0.017s
  1. Activate debugging output for Cray PMI.

$ flux run --env=PMI_DEBUG=1 --label-io -N2 -n2 proj/mpi-test/hello
1: Mon Mar 10 14:46:06 2025: [unset]: _pmi_pals_init:my_peidx=1,npes=2,
  nnodes=2,napps=1,my_cmd.pes_per_node=1,my_cmd.npes=2,my_pe.localidx=0,
  my_pe.nodeidx=1,my_pe.cmdidx=0,nid=1
1: Mon Mar 10 14:46:06 2025: [PE_1]: _pmi2_kvs_hash_entries = 1
1: Mon Mar 10 14:46:06 2025: [PE_1]: mmap in a file for shared memory type 4 len 345600
1: Mon Mar 10 14:46:06 2025: [PE_1]:  pals_get_nodes nnodes = 2 pals_get_nics nnics = 0
...

If all else fails, Cray MPICH works at least superficially with Flux's simple PMI:

$ flux run -o pmi=simple -n2 -N2 proj/mpi-test/hello
fB6P3jXzo: completed MPI_Init in 0.396s.  There are 2 tasks
fB6P3jXzo: completed first barrier in 0.000s
fB6P3jXzo: completed MPI_Finalize in 0.010s

SEE ALSO

flux-submit(1), flux-shell(1), flux-pmi(1)