Glossary
- APU
Accelerated Processing Unit. CPU and GPU are combined on a single chip.
- Cassini
The HPE 200G Slingshot NIC.
- CORAL
Collaboration of Oak Ridge, Argonne, and Livermore. The first CORAL procurement was awarded to IBM and brought the Power-9 based pre-exascale systems Sierra to Livermore and Summit to Oak Ridge in 2018.
- CORAL-2
The second CORAL procurement was awarded to HPE and brought the Cray EX based exascale systems Frontier to Oak Ridge in 2022, Aurora to Argonne in 2023, and El Capitan to Livermore in 2024.
- CRD
Kubernetes Custom Resource Definition. A mechanism for extending the Kubernetes API with application-specific resource types. DWS and NNF make heavy use of CRDs to represent storage allocations, file system state, mount points, and data movement operations.
- CXI
Cray eXascale Interface. Another way of referring to the Cassini NIC.
- CXI Service
An authorization token that grants specific UNIX user and group IDs access to a set of Cassini NIC resources: VNI numbers, traffic classes, and NIC hardware entities such as transmit and receive queues.
a simplified dragonfly network
- DWS
Data Workflow Services. An HPE software stack, running as a set of Kubernetes controllers, that manages the lifecycle of Rabbit storage allocations on behalf of a workload manager.
- dragonfly
A two-level interconnect topology that divides switches into groups. The switches in each group have some node ports, inter-group ports, and intra-group ports. The intra-group switch ports fully connect the switches in the group (level 0), although other topologies are permitted. The inter-group switch ports fully connect the groups (level 1).
- finalizer
A Kubernetes mechanism that prevents an object from being deleted until specific cleanup actions are complete. An object with a finalizer set will not be removed from the API server even after a delete request until the finalizer is explicitly removed by a controller.
- fabric manager
The fabric manager configures, manages, and monitors the Slingshot network. It runs on one dedicated server and on the Rosetta switches.
- MPICH
The Argonne MPI implementation. Cray MPICH is a proprietary fork of MPICH.
- PALS
Parallel Application Launch Service. The Cray PALS library provides placement, interconnect, and debugger information to applications.
- PMI
Process Management Interface, a quasi-standard bootstrap API and protocol for MPI implementations.
- NNF
Near-Node Flash. The HPE near-node-local flash storage product that interfaces with nearby Cray EX compute nodes via PCI Express. The Rabbit is the NNF node.
- Rabbit
HPE near-node-local storage product that interfaces with nearby Cray EX compute nodes via PCI Express.
- RDMA
Remote Direct Memory Access. A low-latency technique that allows a NIC to directly access application memory.
- Rosetta
The HPE 200G, 64-port Slingshot switch.
- traffic class
A Cassini NIC quality of service level. The classes include best effort, bulk data, low latency, and dedicated access.
- Slingshot
The HPE 200G Ethernet-compliant interconnect. CORAL-2 is based on Slingshot-11 with Cassini NIC and and Rosetta switch devices configured in a dragonfly topology.
- VNI
Virtual Network Identifier. An integer packet label used to isolate Slingshot RDMA traffic.