How to List Flux jobs, Filter Them, and Output Extra Information¶
Inevitably you will want to list your current jobs. Many users will know that
to list jobs you use the flux jobs
command. However, there are many
advanced job listing and filtering options available. This tutorial will go
through some of the most popular ones so you know how to list the jobs you are
looking for more easily and to get the information you are looking for.
Job Submission Setup¶
Before beginning the tutorial, let’s run some jobs to set things up. I have a small Flux instance of
one node with only four processors, which we can see with flux resource list
.
$ flux resource list
STATE NNODES NCORES NODELIST
free 1 4 catalyst160
allocated 0 0
down 0 0
Let’s submit some jobs to the Flux instance.
$ flux submit --cc=0-1 --wait /bin/true
f2RYAVwjM
f2RYByw1h
$ flux submit --cc=0-1 --wait /bin/false
f2SLdmS4F
f2SLfFRLb
$ flux submit --cc=0-7 sleep inf
f2TR77Ao1
f2TR8bA5M
f2TR8bA5N
f2TRBZ8e3
f2TRD37vP
f2TREX7Cj
f2TRG16V5
f2TRHV5mR
$ flux job cancel f2TRHV5mR
In the above we have submitted a number of jobs. First we submit two jobs that
run /bin/true
, then we run two jobs that run /bin/false
. If you are
unfamiliar with the --cc
(i.e. carbon copy) option, it will simply create
duplicates of the job you just submitted. We also specify the --wait
command
to wait for those jobs to finish before moving on.
Next we submit 8 sleep
commands of infinite length. Then for fun, we cancel
one of those jobs via flux job cancel
.
Basic Job Listing¶
Let’s run flux jobs
and see what we get.
$ flux jobs
JOBID USER NAME ST NTASKS NNODES TIME INFO
f2TRD37vP achu sleep S 1 - -
f2TREX7Cj achu sleep S 1 - -
f2TRG16V5 achu sleep S 1 - -
f2TRBZ8e3 achu sleep R 1 1 1.765m catalyst160
f2TR8bA5N achu sleep R 1 1 1.765m catalyst160
f2TR8bA5M achu sleep R 1 1 1.765m catalyst160
f2TR77Ao1 achu sleep R 1 1 1.765m catalyst160
The default output lists the jobs that are currently running or pending. You’ll notice this via
the job state listed under ST
. The S
indicates the job state SCHED
and the R
indicates
the job state RUN
. As mentioned above, this instance only has 4 CPUs, therefore only 4 sleep jobs are running.
You’ll notice that there are only 7 sleep jobs listed and not 8. As you recall above, we canceled one sleep job, therefore there should be only 7 jobs listed here. In addition, the /bin/true
and /bin/false
jobs are not listed.
By default flux jobs
only listed “pending” and “running” jobs. Inactive jobs are not listed. In order to list
inactive jobs we can specify a different filter via the --filter
option. For example we could tell flux jobs
to only list the inactive jobs via flux jobs --filter=inactive
.
$ flux jobs --filter=inactive
JOBID USER NAME ST NTASKS NNODES TIME INFO
f2TRHV5mR achu sleep CA 1 - -
f2SLdmS4F achu false F 1 1 0.057s catalyst160
f2SLfFRLb achu false F 1 1 0.056s catalyst160
f2RYByw1h achu true CD 1 1 0.560s catalyst160
f2RYAVwjM achu true CD 1 1 0.453s catalyst160
In the above we see the five INACTIVE jobs we expect. There are the two
/bin/true
jobs, two /bin/false
jobs, and our canceled job. As you can
see via the job state, they are lited with CD
(completed), F
(failed),
and CA
(canceled) respectively.
Note
In this tutorial there is no color highlighting of flux jobs
output, but
depending on your terminal there may be color highlighting different job
states and results.
While you can use the --filter
option to list inactive jobs, most users prefer to use the
-a
option. The -a
is shorthand for --filter=pending,running,inactive
. In other words,
it lists all of your jobs. Which we can see below.
$ flux jobs -a
JOBID USER NAME ST NTASKS NNODES TIME INFO
f2TRD37vP achu sleep S 1 - -
f2TREX7Cj achu sleep S 1 - -
f2TRG16V5 achu sleep S 1 - -
f2TRBZ8e3 achu sleep R 1 1 39.93m catalyst160
f2TR8bA5N achu sleep R 1 1 39.93m catalyst160
f2TR8bA5M achu sleep R 1 1 39.93m catalyst160
f2TR77Ao1 achu sleep R 1 1 39.93m catalyst160
f2TRHV5mR achu sleep CA 1 - -
f2SLdmS4F achu false F 1 1 0.057s catalyst160
f2SLfFRLb achu false F 1 1 0.056s catalyst160
f2RYByw1h achu true CD 1 1 0.560s catalyst160
f2RYAVwjM achu true CD 1 1 0.453s catalyst160
Advanced Filtering¶
In this particular example, it’s not too annoying to run flux jobs -a
because we only have
12 jobs total. However, over time, you may have hundreds, if not thousands, of jobs to list. It can
become difficult to filter and find your jobs.
There are many ways to filter the job listing to limit the output to the jobs you are interested in. Here are some of the most common options.
The simplest way to limit job output is to specify the jobid of the jobs you wish to list. This is typically done because you want to monitor the status of some finite number of your jobs.
$ flux jobs f2RYAVwjM f2RYByw1h f2SLfFRLb f2SLdmS4F
JOBID USER NAME ST NTASKS NNODES TIME INFO
f2RYAVwjM achu true CD 1 1 0.453s catalyst160
f2RYByw1h achu true CD 1 1 0.560s catalyst160
f2SLfFRLb achu false F 1 1 0.056s catalyst160
f2SLdmS4F achu false F 1 1 0.057s catalyst160
Here we list the jobids of the /bin/true
and /bin/false
jobs to get the results of just those jobids.
By default flux jobs
will limit output to 1000 jobs. If the number of jobs is getting too large (or you want to show even more jobs) you can adjust this via the --count
option.
$ flux jobs --count=4
JOBID USER NAME ST NTASKS NNODES TIME INFO
f2TRD37vP achu sleep S 1 - -
f2TREX7Cj achu sleep S 1 - -
f2TRG16V5 achu sleep S 1 - -
f2TRBZ8e3 achu sleep R 1 1 47.88m catalyst160
$ flux jobs --count=4 --filter=inactive
JOBID USER NAME ST NTASKS NNODES TIME INFO
f2TRHV5mR achu sleep CA 1 - -
f2SLdmS4F achu false F 1 1 0.057s catalyst160
f2SLfFRLb achu false F 1 1 0.056s catalyst160
f2RYByw1h achu true CD 1 1 0.560s catalyst160
Here we pass --count=4
to limit the of jobs output from flux jobs
default output and when we
specify that we should only list inactive jobs.
We already saw above that --filter
can be used to filter jobs on “pending”, “running”, or “inactive”
state. But we can also filter on the result of a job. In the following example, we show that you can
list “completed”, “failed”, or “canceled” jobs respectively.
$ flux jobs --filter=completed
JOBID USER NAME ST NTASKS NNODES TIME INFO
f2RYByw1h achu true CD 1 1 0.560s catalyst160
f2RYAVwjM achu true CD 1 1 0.453s catalyst160
$ flux jobs --filter=failed
JOBID USER NAME ST NTASKS NNODES TIME INFO
f2SLdmS4F achu false F 1 1 0.057s catalyst160
f2SLfFRLb achu false F 1 1 0.056s catalyst160
$ flux jobs --filter=canceled
JOBID USER NAME ST NTASKS NNODES TIME INFO
f2TRHV5mR achu sleep CA 1 - -
Jobs can also be filtered by their job name via the --name
option.
$ flux jobs --name=sleep
JOBID USER NAME ST NTASKS NNODES TIME INFO
f2TRD37vP achu sleep S 1 - -
f2TREX7Cj achu sleep S 1 - -
f2TRG16V5 achu sleep S 1 - -
f2TRBZ8e3 achu sleep R 1 1 50.04m catalyst160
f2TR8bA5N achu sleep R 1 1 50.04m catalyst160
f2TR8bA5M achu sleep R 1 1 50.04m catalyst160
f2TR77Ao1 achu sleep R 1 1 50.04m catalyst160
Remember that the --filter
option only lists “pending” and “running” jobs by default, so you
may get unexpected results if the --name
option is used without an appropriate setting to --filter
.
$ flux jobs --name=true
JOBID USER NAME ST NTASKS NNODES TIME INFO
$ flux jobs --name=true -a
JOBID USER NAME ST NTASKS NNODES TIME INFO
f2RYByw1h achu true CD 1 1 0.560s catalyst160
f2RYAVwjM achu true CD 1 1 0.453s catalyst160
As you can see in the above, flux jobs --name=true` does not output anything. That's because no "active"
jobs have the job name ``true
. However, when specifying --name=true
along with -a
we see
our expected jobs that have already completed.
Advanced Output¶
While the default output of flux jobs
is generally useful, it may not have all the information you wish.
Additional information can be output from flux jobs
via the --format
option, which will inform flux jobs
to adjust the output format to what you wish to use. Here’s a simple example. Let’s get all of the exit codes for all of the jobs that have so far completed. We’ll get the completed jobs via --filter=inactive
like before. We’ll adjust the output to simply be the format {id} {returncode}
. The {id}
field outputs the jobid and {returncode}
outputs the exit code of the job.
$ flux jobs --filter=inactive --format="{id} {returncode}"
JOBID RC
f2TRHV5mR -128
f2SLdmS4F 1
f2SLfFRLb 1
f2RYByw1h 0
f2RYAVwjM 0
There are many additional fields that are available for output in flux-jobs
. This tutorial will not go through them but you can find information on them via the flux-jobs(1) manpage as well as ways to format the output in pretty ways.
For most users, instead of formatting your own output, you may wish to use one of the additional “common” formats available in flux-jobs
. They can be listed with flux jobs --format=help
.
$ flux jobs --format=help
Configured flux-jobs output formats:
default Default flux-jobs format string
cute Cute flux-jobs format string (default with emojis)
long Extended flux-jobs format string
deps Show job urgency, priority, and dependencies
For example, let’s take a look at the long
output.
$ flux jobs --format=long
JOBID USER NAME STATUS NTASKS NNODES T_SUBMIT T_REMAINING TIME INFO
f2TRD37vP achu sleep SCHED 1 - Mar29 17:28 - -
f2TREX7Cj achu sleep SCHED 1 - Mar29 17:28 - -
f2TRG16V5 achu sleep SCHED 1 - Mar29 17:28 - -
f2TRBZ8e3 achu sleep RUN 1 1 Mar29 17:28 - 53.01m catalyst160
f2TR8bA5N achu sleep RUN 1 1 Mar29 17:28 - 53.01m catalyst160
f2TR8bA5M achu sleep RUN 1 1 Mar29 17:28 - 53.01m catalyst160
f2TR77Ao1 achu sleep RUN 1 1 Mar29 17:28 - 53.01m catalyst160
Compared to the default output, we have STATUS
being output with full names instead of abbreviations, we have the time
that the job was submitted via T_SUBMIT
and the time remaining for the job in T_REMAINING
(in this particular example, there is no time limit, thus no time remaining listed).
Note
You can set the default output of flux jobs
through the
environment variable FLUX_JOBS_FORMAT_DEFAULT. For example, by
setting FLUX_JOBS_FORMAT_DEFAULT=long, the long output will be
output as the default output.
Note
Within a script, it is very common to use the following pattern to get information about a specific job.
nnodes=$(flux jobs -no "{nnodes}" $FLUX_JOB_ID)
In order to get the number of nodes for the job we are running, we set the output format to exactly {nnodes}
and nothing else. The -n
option ensures that the header from flux jobs
will not be output. So the only thing output from this call to flux jobs
is just the number of nodes for the specified jobid, which we then store in the nnodes
variable.
Recursive Job Listing¶
Note
This section is independent on the previous one. To continue on with this example from the previous one,
you may wish to cancel your jobs from before via flux cancel --all
.
By default, flux jobs
will not list jobs that are running under subinstances within Flux. Let’s illustrate
this with an example. Submit the following script to flux batch
.
#!/bin/sh
# filename: batchjob.sh
flux submit sleep inf
flux submit sleep inf
flux queue drain
All we’re doing is running two sleep
jobs for infinity, and then calling flux queue drain
to wait for those
jobs to complete. Note that you can use flux queue drain
to wait for jobs to complete.
Let’s run this via flux batch
$ flux batch -N1 ./batchjob.sh
fUdmwwisR
$ flux jobs
JOBID USER NAME ST NTASKS NNODES TIME INFO
fUdmwwisR achu ./batchjo+ R 1 1 14.12s catalyst160
After submitting the batch job, you’ll notice that flux jobs
only lists the jobid of the batch job. It does not list
the jobids of the sleep jobs running within that instance.
In order to see those additional jobs, you’ll have to specify the --recursive
option.
$ flux jobs --recursive
JOBID USER NAME ST NTASKS NNODES TIME INFO
fUdmwwisR achu ./batchjo+ R 1 1 42.99s catalyst160
fUdmwwisR:
f3sv29Cj achu sleep R 1 1 30.23s catalyst160
f3Xfqx1d achu sleep R 1 1 31.02s catalyst160