flux-overlay(1)
SYNOPSIS
DESCRIPTION
flux overlay is a utility for the Flux tree based overlay network.
COMMANDS
status
flux overlay status reports the current status of the tree based overlay network. The possible status values are shown in SUBTREE STATUS below.
- -r, --rank=[RANK]
Check health of sub-tree rooted at NODEID (default 0).
- -v, --verbose=[LEVEL]
Increase reporting detail: 1=show time since current state was entered, 2=show round-trip RPC times.
- -t, --timeout=FSD
Set RPC timeout, 0=disable (default 0.5s)
- --summary
Show only the root sub-tree status.
- --down
Show only the partial/degraded sub-trees.
- --no-pretty
Do not indent entries and use line drawing characters to show overlay tree structure
- --no-ghost
Do not fill in presumed state of nodes that are inaccessible behind offline/lost overlay parents.
- -L, --color=WHEN
Colorize output when supported; WHEN can be 'always' (default if omitted), 'never', or 'auto' (default).
- -H, --highlight=TARGET
Highlight one or more targets and their ancestors.
- -w, --wait=STATE
Wait until sub-tree enters STATE before reporting (full, partial, offline, degraded, lost).
errors
flux overlay errors summarizes any errors recorded for lost or offline nodes. The output consists of one line per unique error with a hostlist prefix.
- -t, --timeout=FSD
Set RPC timeout, 0=disable (default 0.5s)
lookup
Translate a hostlist target to a rank idset or a rank idset target to hostlist.
parentof
Show the parent of rank.
disconnect
Disconnect a subtree rooted at target (hostname or rank).
- -r, --parent=NODEID
Set parent rank to NODEID. By default, the parent is determined from the topology.
trace
Display message summaries for messages transmitted and received on the overlay network. A topic string glob pattern may be supplied as a positional argument.
- -f, --full
Include JSON payload in output, if any. Payloads that are not JSON are not displayed.
- -r, --rank=NODEID
Filter output by overlay network peer rank. Note that this rank is not necessarily the same as the message source or destination.
- -t, --type=TYPE,...
Filter output by message type, a comma-separated list. Valid types are
request
,response
,event
, orcontrol
.
- -L, --color=WHEN
Colorize output when supported; WHEN can be
always
(default if omitted),never
, orauto
(default).
EXAMPLES
By default, flux overlay status shows the status of the full Flux instance in graphical form, e.g.
$ flux overlay status
0 test0: partial
├─ 1 test1: offline
├─ 2 test2: full
├─ 3 test3: full
├─ 4 test4: full
├─ 5 test5: offline
├─ 6 test6: full
└─ 7 test7: offline
The time in the current state is reported if -v
is added, e.g.
$ flux overlay status -v
0 test0: partial for 2.55448m
├─ 1 test1: offline for 12.7273h
├─ 2 test2: full for 2.55484m
├─ 3 test3: full for 18.2725h
├─ 4 test4: full for 18.2777h
├─ 5 test5: offline for 12.7273h
├─ 6 test6: full for 18.2784h
└─ 7 test7: offline for 12.7273h
Round trip RPC times are shown with -vv
, e.g.
0 test0: partial for 4.15692m (2.982 ms)
├─ 1 test1: offline for 12.754h
├─ 2 test2: full for 4.15755m (2.161 ms)
├─ 3 test3: full for 18.2992h (2.332 ms)
├─ 4 test4: full for 18.3045h (2.182 ms)
├─ 5 test5: offline for 12.754h
├─ 6 test6: full for 18.3052h (2.131 ms)
└─ 7 test7: offline for 12.754h
At times an error summary for lost nodes may be useful, e.g.
$ flux overlay errors
test[2,5]: lost connection
test[6-7]: lost parent
A broker that is not responding but is not shown as lost or offline may be forcibly disconnected from the overlay network with
$ flux overlay disconnect 2
flux-overlay: asking test0 (rank 0) to disconnect child test2 (rank 2)
However, before doing that it may be useful to see if a broker acting as a router to that node is actually the problem. The parent of a broker rank may be listed with
$ flux overlay parentof 2
0
Finally, translation between hostnames and broker ranks is accomplished with
$ flux overlay lookup 2
test2
$ flux overlay lookup test2
2
SUBTREE STATUS
The possible overlay subtree status values are:
- full
Node is online and no children are in partial, offline, degraded, or lost state.
- partial
Node is online, and some children are in partial or offline state; no children are in degraded or lost state.
- degraded
Node is online, and some children are in degraded or lost state.
- lost
Node has gone missing, from the parent perspective.
- offline
Node has not yet joined the instance, or has been cleanly shut down.
RESOURCES
Flux: http://flux-framework.org
Flux RFC: https://flux-framework.readthedocs.io/projects/flux-rfc
Issue Tracker: https://github.com/flux-framework/flux-core/issues