Glossary

APU

Accelerated Processing Unit. CPU and GPU are combined on a single chip.

Cassini

The HPE 200G Slingshot NIC.

CORAL

Collaboration of Oak Ridge, Argonne, and Livermore. The first CORAL procurement was awarded to IBM and brought the Power-9 based pre-exascale systems Sierra to Livermore and Summit to Oak Ridge in 2018.

CORAL-2

The second CORAL procurement was awarded to HPE and brought the Cray EX based exascale systems Frontier to Oak Ridge in 2022, Aurora to Argonne in 2023, and El Capitan to Livermore in 2024.

CRD

Kubernetes Custom Resource Definition. A mechanism for extending the Kubernetes API with application-specific resource types. DWS and NNF make heavy use of CRDs to represent storage allocations, file system state, mount points, and data movement operations.

CXI

Cray eXascale Interface. Another way of referring to the Cassini NIC.

CXI Service

An authorization token that grants specific UNIX user and group IDs access to a set of Cassini NIC resources: VNI numbers, traffic classes, and NIC hardware entities such as transmit and receive queues.

Dragonfly topology

a simplified dragonfly network

DWS

Data Workflow Services. An HPE software stack, running as a set of Kubernetes controllers, that manages the lifecycle of Rabbit storage allocations on behalf of a workload manager.

dragonfly

A two-level interconnect topology that divides switches into groups. The switches in each group have some node ports, inter-group ports, and intra-group ports. The intra-group switch ports fully connect the switches in the group (level 0), although other topologies are permitted. The inter-group switch ports fully connect the groups (level 1).

finalizer

A Kubernetes mechanism that prevents an object from being deleted until specific cleanup actions are complete. An object with a finalizer set will not be removed from the API server even after a delete request until the finalizer is explicitly removed by a controller.

fabric manager

The fabric manager configures, manages, and monitors the Slingshot network. It runs on one dedicated server and on the Rosetta switches.

MPICH

The Argonne MPI implementation. Cray MPICH is a proprietary fork of MPICH.

PALS

Parallel Application Launch Service. The Cray PALS library provides placement, interconnect, and debugger information to applications.

PMI

Process Management Interface, a quasi-standard bootstrap API and protocol for MPI implementations.

NNF

Near-Node Flash. The HPE near-node-local flash storage product that interfaces with nearby Cray EX compute nodes via PCI Express. The Rabbit is the NNF node.

Rabbit

HPE near-node-local storage product that interfaces with nearby Cray EX compute nodes via PCI Express.

RDMA

Remote Direct Memory Access. A low-latency technique that allows a NIC to directly access application memory.

Rosetta

The HPE 200G, 64-port Slingshot switch.

traffic class

A Cassini NIC quality of service level. The classes include best effort, bulk data, low latency, and dedicated access.

Slingshot

The HPE 200G Ethernet-compliant interconnect. CORAL-2 is based on Slingshot-11 with Cassini NIC and and Rosetta switch devices configured in a dragonfly topology.

VNI

Virtual Network Identifier. An integer packet label used to isolate Slingshot RDMA traffic.