flux-shell-cray-pals(1)
DESCRIPTION
flux run -opmi=cray-pals
DESCRIPTION
cray-pals is a flux-shell(1) plugin that assists
with the launch of programs built with Cray MPICH. It creates an apinfo
data file on each node and sets several environment variables which are
used by Cray PMI to initialize.
After cray-pals places that data on each node, Cray PMI bootstraps an overlay network and handles the PMI data exchange without Flux.
SHELL OPTIONS
- pmi=cray-pals
Enable only the cray-pals PMI plugin. Other PMI implementations may be added, separated by commas.
Note
On systems where Cray MPICH is used, it may be helpful for system
administrators to customize /etc/flux/shell/initrc to enable
cray-pals by default:
if shell.options['pmi'] == nil then
shell.options['pmi'] = 'cray-pals,simple'
end
simple PMI is required to launch Flux instances, so if the
pmi default is changed, be sure to include it also.
- cray-pals.no-edit-env
Prevent the cray-pals PMI plugin from removing Flux's PMI library directory from
LD_LIBRARY_PATH, if present.It is removed by default to ensure that Cray PMI library is found before Flux's.
- cray-pals.apinfo-version=1
Force HPE apinfo version 1. The default is version 5.
- cray-pals.timeout=SECONDS
The plugin synchronously watches the job eventlog for data from the
cray-pmi-bootstrapjobtap plugin. The job is aborted if the eventlog does not make progress for the specified number of seconds. Setting the timeout to a negative value disables it. Default: 10.
- cray-pals.pmi-bootstrap=off
Disable setting
PMI_CONTROL_PORTandPMI_SHARED_SECRET. This is mainly useful for testing.
- cray-pals.pmi-bootstrap=[port1,port2,secret]
Provide fixed values for setting
PMI_CONTROL_PORTand andPMI_SHARED_SECRET. The option value is expressed as a JSON array of three integers. This is mainly useful for testing.
ENVIRONMENT
The following environment variables are set by cray-pals, as required by Cray PMI.
- PALS_APID
Alias for
FLUX_JOB_ID, forced into integer form.
- PALS_APINFO
The path to the aforementioned
apinfofile on the local node.
- PALS_RANKID
Alias for
FLUX_TASK_RANK.
- PALS_NODEID
The index of the local node relative to the job.
- PALS_SPOOL_DIR
Alias for
FLUX_JOB_TMPDIR.
- PMI_CONTROL_PORT
A comma-separated pair of port numbers to assist Cray PMI in bootstrapping peer connections. The ports should be available for binding on all nodes of the job.
- PMI_SHARED_SECRET
A random 64 bit integer to assist Cray PMI in bootstrapping secure communications.
APINFO
The APINFO contains application data in the following sections:
- comm profiles
One comm profile per NIC, each of which defines a CXI service that includes VNI numbers for access control and allowed traffic classes. The default CXI service is used if none is provided here. Not supported by cray-pals, but a high priority for future development.
- command
One entry per MPMD application, each with tasks per node and CPU per task. MPMD is not supported by cray-pals so there is always just one entry.
- pes
One entry per task rank, each containing a node-local task index, a reference to the assigned MPMD command, and a node index.
- nodes
One entry per node allocated to the job, each containing a hostname and a node index.
- nics
One entry per NIC for each NIC assigned to the job across all nodes. Each entry contains the NIC address, etc., for scalable program launch. Not supported by cray-pals.
DEBUGGING
The following may be useful if MPI_Init() is failing for unknown
reasons.
Tip
Obtain a Flux allocation with flux alloc that will fit the minimum
MPI size that can reproduce the issue.
1. Run with flux run -o verbose=2 and check for output from
cray-pals.
$ flux run -o pmi=cray-pals -N2 -n2 -o verbose=2 true
...
0.051s: flux-shell[1]: DEBUG: pmi-cray-pals: enabled
0.068s: flux-shell[1]: TRACE: pmi-cray-pals: created pals apinfo file
/var/tmp/user/flux-tBlt5H/jobtmp-1-f4yBboYGo/libpals_apinfo
0.069s: flux-shell[1]: TRACE: pmi-cray-pals: set PMI_SHARED_SECRET to 16945943893152566943
0.069s: flux-shell[1]: TRACE: pmi-cray-pals: set PALS_NODEID to 1
0.069s: flux-shell[1]: TRACE: pmi-cray-pals: set PALS_APID to 8762756694016
0.069s: flux-shell[1]: TRACE: pmi-cray-pals: set PALS_SPOOL_DIR to
/var/tmp/user/flux-tBlt5H/jobtmp-1-f4yBboYGo
0.069s: flux-shell[1]: TRACE: pmi-cray-pals: set PALS_APINFO to
/var/tmp/user/flux-tBlt5H/jobtmp-1-f4yBboYGo/libpals_apinfo
0.070s: flux-shell[1]: TRACE: pmi-cray-pals: set PALS_RANKID to 1
0.047s: flux-shell[0]: DEBUG: pmi-cray-pals: enabled
0.064s: flux-shell[0]: TRACE: pmi-cray-pals: created pals apinfo file
/var/tmp/user/flux-pSw4um/jobtmp-0-f6jyUdR2P/libpals_apinfo
0.065s: flux-shell[0]: TRACE: pmi-cray-pals: set PMI_CONTROL_PORT to 11998,11999
0.065s: flux-shell[0]: TRACE: pmi-cray-pals: set PMI_SHARED_SECRET to 11872392986869071399
0.065s: flux-shell[0]: TRACE: pmi-cray-pals: set PALS_NODEID to 0
0.065s: flux-shell[0]: TRACE: pmi-cray-pals: set PALS_APID to 12675874553856
0.065s: flux-shell[0]: TRACE: pmi-cray-pals: set PALS_SPOOL_DIR to
/var/tmp/user/flux-pSw4um/jobtmp-0-f6jyUdR2P
0.065s: flux-shell[0]: TRACE: pmi-cray-pals: set PALS_APINFO to
var/tmp/user/flux-pSw4um/jobtmp-0-f6jyUdR2P/libpals_apinfo
0.066s: flux-shell[0]: TRACE: pmi-cray-pals: set PALS_RANKID to 0
2. Check that you can launch a PMI test program configured to use Cray PMI using the same options. In this example, flux pmi(1) is used.
$ flux run -o pmi=cray-pals --label-io -N2 -n2 flux pmi --method=libpmi2 --verbose barrier
0: libpmi2: using /opt/cray/pe/lib64/libpmi2.so (cray quirks enabled)
0: libpmi2: initialize: rank=0 size=2 name=kvs_160608288768: success
0: libpmi2: barrier: success
0: libpmi2: barrier: success
0: libpmi2: finalize: success
0: f5DhQTk3: completed pmi barrier on 2 tasks in 0.000s.
1: libpmi2: using /opt/cray/pe/lib64/libpmi2.so (cray quirks enabled)
1: libpmi2: initialize: rank=1 size=2 name=kvs_160608288768: success
1: libpmi2: barrier: success
1: libpmi2: barrier: success
1: libpmi2: finalize: success
3. Check that you can launch an MPI hello world program compiled with Cray MPICH.
$ flux run -o pmi=cray-pals --label-io -N2 -n2 proj/mpi-test/hello
0: fdfdnnoy: completed MPI_Init in 0.581s. There are 2 tasks
0: fdfdnnoy: completed first barrier in 0.002s
0: fdfdnnoy: completed MPI_Finalize in 0.017s
Activate debugging output for Cray PMI.
$ flux run --env=PMI_DEBUG=1 --label-io -N2 -n2 proj/mpi-test/hello
1: Mon Mar 10 14:46:06 2025: [unset]: _pmi_pals_init:my_peidx=1,npes=2,
nnodes=2,napps=1,my_cmd.pes_per_node=1,my_cmd.npes=2,my_pe.localidx=0,
my_pe.nodeidx=1,my_pe.cmdidx=0,nid=1
1: Mon Mar 10 14:46:06 2025: [PE_1]: _pmi2_kvs_hash_entries = 1
1: Mon Mar 10 14:46:06 2025: [PE_1]: mmap in a file for shared memory type 4 len 345600
1: Mon Mar 10 14:46:06 2025: [PE_1]: pals_get_nodes nnodes = 2 pals_get_nics nnics = 0
...
If all else fails, Cray MPICH works at least superficially with Flux's simple PMI:
$ flux run -o pmi=simple -n2 -N2 proj/mpi-test/hello
fB6P3jXzo: completed MPI_Init in 0.396s. There are 2 tasks
fB6P3jXzo: completed first barrier in 0.000s
fB6P3jXzo: completed MPI_Finalize in 0.010s