Troubleshooting Guide

Overview

This page looks to serve as a resource in case you encounter unusual behavior from flux-accounting and any of its components. The format of the guide follows that of a Frequently Asked Questions (FAQ) page, so if you have a suggestion on a question you'd like to see answered, please consider opening an issue on Github.

My job is held with a flux-accounting dependency. How can I figure out why it's held?

For details on how to interpret flux-accounting dependencies and how they are resolved, please see the Limits page.

The job-usage values reported for some associations and banks seem inconsistent and/or inaccurate. How can I resolve this?

If it seems like association or bank job usage values don't look correct, the first thing to check would be the consistency between the association's userid column in both the association_table and the job_usage_factor_table. If they are reporting different values, then the script that updates job usage values for that association will be skipped since the script searches for completed jobs based on userid. To sync the two tables to use the same user ID defined in the association_table in the job_usage_factor_table, you can run the sync-userids command. Under the hood, this command is running this query to see which user IDs are inconsistent:

SELECT j.username,
       j.userid AS old_userid,
       a.userid AS new_userid
  FROM job_usage_factor_table j
  JOIN association_table a
    ON j.username = a.username
 WHERE j.userid != a.userid;

And this one to update job_usage_factor_table with the user ID found in association_table:

UPDATE job_usage_factor_table
   SET userid = (
     SELECT association_table.userid
       FROM association_table
      WHERE association_table.username = job_usage_factor_table.username
   )
WHERE EXISTS (
     SELECT 1
       FROM association_table
      WHERE association_table.username = job_usage_factor_table.username
);

Note

This inconsistency between user IDs in the association_table and job_usage_factor_table was discovered in flux-accounting versions prior to v0.50.0 and should be fixed in versions v0.50.0 or later. If you are running flux-accounting v0.50.0 (or, more specifically, adding associations to the association_table) or later and are still running into this bug, please open an issue on GitHub.

I am getting a "Database is locked" error when trying to read/write from the flux-accounting DB. How can I resolve this?

SQLite uses file-level locking to ensure data integrity when multiple processes access the database simultaneously. A "Database is locked" error usually occurs when:

  • write lock contention: Another process holds an exclusive write lock on the database file, preventing the operation from proceeding. SQLite allows multiple concurrent readers, but only one writer at a time.

  • transaction timeout: A process started a transaction (with BEGIN) but hasn't committed or rolled back yet, keeping the lock active longer than expected.

  • stale connections: Processes that crashed or were terminated without properly closing their database connections can leave locks in place.

  • NFS or networked filesystems: SQLite's locking mechanism may not work reliably on some network filesystems, leading to lock conflicts.

While the scripts that update job usage and fair-share values for the flux-accounting database should be quick, a transaction occurring during this update could cause the database to enter a locked state. To resolve this, try restarting the flux-accounting service with systemctl restart flux-accounting.