flux-filemap(1)

SYNOPSIS

flux filemap map [--tags=LIST] [-C DIR] PATH ...

flux filemap unmap [--tags=LIST]

flux filemap list [--tags=LIST] [--long] [PATTERN]

flux filemap get [--tags=LIST] [-C DIR] [PATTERN]

DESCRIPTION

flux-filemap uses mmap(2) to map files into the rank 0 broker content cache. After mapping, the files may be extracted on any broker rank, taking advantage of scalability properties of the distributed cache to move the data. The files are treated as read-only and must not change while mapped.

flux-filemap map maps one or more file PATH arguments. It must be run on the rank 0 broker, such as within a batch script, and the files must be directly accessible by the rank 0 broker. If a PATH refers to a directory, the directory is recursively mapped. If a file is encountered that is not readable, or has a type other than regular file, directory, or symbolic link, a fatal error occurs. Sparse files such as file system images for virtual machines are mapped efficiently. File discretionary access permission are preserved, but file attributes, ACLs, and group ownership are not.

flux-filemap list lists mapped files. Optionally, a glob(7) pattern may be specified to filter the list.

flux-filemap get extracts mapped files and may be run on any broker or across all brokers using flux-exec(1). Optionally, a glob(7) pattern may be specified to filter the list. When extracting mapped files in parallel, take care to specify a --directory that is not shared and is not on a network file system without considering the ramifications.

flux-filemap unmap unmaps mapped files.

The stage-in shell plugin described in flux-shell(1) may be used to extract previously mapped files into $FLUX_JOB_TMPDIR or another directory.

OPTIONS

-h, --help

Display options and exit

-T, --tags=LIST

Specify a comma separated list of tags. If no tags are specified, the main tag is assumed.

-C, --directory=DIR

Change to the specified directory before performing the operation (map and get subcommands only).

-v, --verbose=[LEVEL]

Increase output verbosity (map and get subcommands only).

-l, --long

Include more detail in file listing (list subcommand only).

--small-file-threshold=N

Set the threshold in bytes over which a regular file is mapped through the distributed content cache. Set to 0 to always use the content cache. The default is 4096 (map subcommand only).

--disable-mmap

Never map a regular file through the distributed content cache.

--chunksize=N

Limit the content mapped blob size to N bytes. Set to 0 for unlimited. The default is 1048576 (map subcommand only).

--direct

Avoid indirection through the content cache when fetching the top level data for each file. This may be fastest for a single or small number of clients, but will scale poorly when performed in parallel (get subcommand only).

--blobref

List blobrefs (list subcommand only).

--raw

List RFC 37 file system objects (list subcommand only).

EXAMPLE

Example 1: a batch script that copies data from /project/dataset1 to a temporary directory on each node of the batch allocation where it can be used by multiple jobs.

#!/bin/bash

flux filemap map -C /project dataset1
flux exec -r all mkdir -p /tmp/project
flux exec -r all flux filemap get -C /tmp/project

# app1 and app2 have access to local copy of dataset1
flux run -N1024 app1 --input /tmp/project/dataset1
flux run -N1024 app2 --input /tmp/project/dataset1

# clean up
flux exec -r all rm -rf /tmp/project
flux filemap unmap

Example 2: a batch script that maps two data sets with tags, then uses the stage-in shell plugin to selectively copy them to $FLUX_JOB_TMPDIR, which is automatically cleaned up after each job.

#!/bin/bash

flux filemap map --tags=ds1 -C /project dataset1
flux filemap map --tags=ds2 -C /project dataset2

# App0 uses $FLUX_JOB_TMPDIR/dataset1 and $FLUX_JOB_TMPDIR/dataset2
flux run -N1024 -o stage-in.tags=ds1,ds2 App0

# App1 uses only $FLUX_JOB_TMPDIR/dataset1
flux run -N1024 -o stage-in.tags=ds1 App1

# App2 uses only $FLUX_JOB_TMPDIR/dataset2
flux run -N1024 -o stage-in.tags=ds2 App2

# clean up
flux filemap unmap --tags=ds1,ds2

CAVEATS

The rank 0 Flux broker may die with a SIGBUS error if a mapped file is removed or truncated, and subsequently accessed, since that renders pages mapped into the brokers address space invalid.

If mapped file content changes, access may fail if the original data is not cached. Under no circumstances will the new content be returned.

RESOURCES

Flux: http://flux-framework.org

SEE ALSO

flux-shell(1),