flux.resource.TreePool module

TreePool: Rv1Pool extension for sub-node affinity-aware GPU+core allocation.

Reads scheduling.children from the Rv1 R object to build a sub-node topology tree. GPU+core slots are allocated from the finest topology level (e.g. NUMA node) that can satisfy the request, so GPU and core share tight locality. Falls back to coarser levels (socket, whole node) when necessary. CPU-only slots use best-fit within the finest fitting level to preserve intact groups for future GPU jobs.

children is a list of objects. Each object has a ranks key (RFC 22 IDset string identifying which broker ranks share this layout) and a topo key holding the node topology object. cores and gpus within leaf nodes are RFC 22 IDset strings using node-local IDs.

Node with no named topology levels (resources appear directly in topo):

{"ranks": "0-15", "topo": {"cores": "0-7", "memory": 128}}

Two-socket node (socket array groups cores by physical package):

{
  "ranks": "0-15",
  "topo": {
    "socket": [
      {"numa": [
        {"cores": "0-14",  "gpus": "0"},
        {"cores": "15-29", "gpus": "1"}
      ]},
      {"numa": [
        {"cores": "30-44", "gpus": "2"},
        {"cores": "45-59", "gpus": "3"}
      ]}
    ]
  }
}

Levels that carry no useful grouping information MAY be omitted. Sites MAY introduce additional locality names (e.g. apu) to create aliases or extra containment layers; all named levels are available for container-exclusive allocation.

Topology deduplication

Large clusters often have only 2-3 distinct node types. The scheduler stores one topology structure per unique node type rather than one per rank, so memory is O(types × resources_per_type) rather than O(nodes × resources_per_type).

Node-exclusive fast path

When a job requests node-exclusive allocation (exclusive: true at the node level, set by the frobnicator as a site policy), every candidate node has its full complement of cores and GPUs free. At the finest sub-node topology level, groups are guaranteed disjoint, so the per-group core intersection is skipped. After the affinity helpers confirm that the node can satisfy the slot count, the full node complement is claimed so the node is correctly marked as exclusive.

class flux.resource.TreePool.TreePool(R, log=None, **kwargs)

Bases: ResourcePool

Version-dispatching pool with sub-node affinity-aware GPU+core allocation.

Wraps _TreePoolV1 for Rv1 resources. Set pool_class to this class or pass pool-class=TreePool to sched-simple.

To support a future R version, add a version-specific implementation class and extend _impl_map.

class flux.resource.TreePool.TreeResourceRequest(*args, container_level=None, **kwargs)

Bases: ResourceRequest

ResourceRequest subclass for TreePool supporting RFC 14 canonical jobspec.

Extends the base RFC 25 V1 parser with:

  • Container-exclusive allocation: an exclusive leaf vertex whose type is not node/slot/core/gpu is treated as a topo locality name (e.g. socket{x}, numa{x}). container_level is set to that type and slot_size is 0; the actual per-container resource counts are resolved from the pool topology at scheduling time.

  • Node-exclusive without explicit cores: slot=N/node{x} is valid even without a core child. slot_size is 0; the scheduler claims the full resource complement of the chosen node. When a core child is present its count acts as a minimum capability filter.

container_level
classmethod from_jobspec(jobspec)

Parse a jobspec and return a TreeResourceRequest.

flux.resource.TreePool.amend(R, hwloc_xml=None)

Amender for fake-resources: add R.scheduling with TreePool topology.

When hwloc_xml is None (fake-resources.hwloc-xml-path not set), R is returned unchanged — no scheduling key is added. When XML is provided, calls rhwloc_treepool_topo_to_json() via CFFI to derive the topology and builds the scheduling key into R.

Configure via:

--conf=fake-resources.hwloc-xml-path=<topology.xml>
--conf=fake-resources.amend-r=flux.resource.TreePool:amend
flux.resource.TreePool.pool_class

alias of TreePool