flux.resource.TreePool module
TreePool: Rv1Pool extension for sub-node affinity-aware GPU+core allocation.
Reads scheduling.children from the Rv1 R object to build a sub-node
topology tree. GPU+core slots are allocated from the finest topology level
(e.g. NUMA node) that can satisfy the request, so GPU and core share tight
locality. Falls back to coarser levels (socket, whole node) when necessary.
CPU-only slots use best-fit within the finest fitting level to preserve
intact groups for future GPU jobs.
children is a list of objects. Each object has a ranks key
(RFC 22 IDset string identifying which broker ranks share this layout)
and a topo key holding the node topology object. cores and
gpus within leaf nodes are RFC 22 IDset strings using node-local IDs.
Node with no named topology levels (resources appear directly in topo):
{"ranks": "0-15", "topo": {"cores": "0-7", "memory": 128}}
Two-socket node (socket array groups cores by physical package):
{
"ranks": "0-15",
"topo": {
"socket": [
{"numa": [
{"cores": "0-14", "gpus": "0"},
{"cores": "15-29", "gpus": "1"}
]},
{"numa": [
{"cores": "30-44", "gpus": "2"},
{"cores": "45-59", "gpus": "3"}
]}
]
}
}
Levels that carry no useful grouping information MAY be omitted.
Sites MAY introduce additional locality names (e.g. apu) to create
aliases or extra containment layers; all named levels are available for
container-exclusive allocation.
- Topology deduplication
Large clusters often have only 2-3 distinct node types. The scheduler stores one topology structure per unique node type rather than one per rank, so memory is O(types × resources_per_type) rather than O(nodes × resources_per_type).
- Node-exclusive fast path
When a job requests node-exclusive allocation (
exclusive: trueat the node level, set by the frobnicator as a site policy), every candidate node has its full complement of cores and GPUs free. At the finest sub-node topology level, groups are guaranteed disjoint, so the per-group core intersection is skipped. After the affinity helpers confirm that the node can satisfy the slot count, the full node complement is claimed so the node is correctly marked as exclusive.
- class flux.resource.TreePool.TreePool(R, log=None, **kwargs)
Bases:
ResourcePoolVersion-dispatching pool with sub-node affinity-aware GPU+core allocation.
Wraps
_TreePoolV1for Rv1 resources. Setpool_classto this class or passpool-class=TreePoolto sched-simple.To support a future R version, add a version-specific implementation class and extend
_impl_map.
- class flux.resource.TreePool.TreeResourceRequest(*args, container_level=None, **kwargs)
Bases:
ResourceRequestResourceRequest subclass for TreePool supporting RFC 14 canonical jobspec.
Extends the base RFC 25 V1 parser with:
Container-exclusive allocation: an exclusive leaf vertex whose type is not node/slot/core/gpu is treated as a topo locality name (e.g.
socket{x},numa{x}).container_levelis set to that type andslot_sizeis 0; the actual per-container resource counts are resolved from the pool topology at scheduling time.Node-exclusive without explicit cores:
slot=N/node{x}is valid even without acorechild.slot_sizeis 0; the scheduler claims the full resource complement of the chosen node. When acorechild is present its count acts as a minimum capability filter.
- container_level
- classmethod from_jobspec(jobspec)
Parse a jobspec and return a
TreeResourceRequest.
- flux.resource.TreePool.amend(R, hwloc_xml=None)
Amender for fake-resources: add R.scheduling with TreePool topology.
When
hwloc_xmlis None (fake-resources.hwloc-xml-pathnot set), R is returned unchanged — no scheduling key is added. When XML is provided, callsrhwloc_treepool_topo_to_json()via CFFI to derive the topology and builds theschedulingkey into R.Configure via:
--conf=fake-resources.hwloc-xml-path=<topology.xml> --conf=fake-resources.amend-r=flux.resource.TreePool:amend