pyfileindex.pyfileindex.PyFileIndex#

class pyfileindex.pyfileindex.PyFileIndex(path: str = '.', filter_function: Callable | None = None, debug: bool = False, df: DataFrame | None = None, watch: bool = False)[source]#

Bases: object

The PyFileIndex maintains a pandas DataFrame to track changes in the file system.

Parameters:
  • path (str) – file system path

  • filter_function (Callable) – function to filter for specific files (optional)

  • debug (bool) – enable debug print statements (optional)

  • df (pandas.DataFrame) – DataFrame of a previous PyFileIndex object (optional)

  • watch (bool) – keep the file index in sync using a background file system watcher instead of rescanning the file system on every update() call. Relies on OS-level file change notifications (via the optional watchfiles dependency), which are not always delivered reliably on network filesystems such as NFS, Lustre, or GPFS when the change is made by a different node – a common setup when monitoring HPC simulation output from a separate process or login node. Prefer the default polling mode (watch=False) in that case (optional)

__init__(path: str = '.', filter_function: Callable | None = None, debug: bool = False, df: DataFrame | None = None, watch: bool = False) None[source]#

Methods

__init__([path, filter_function, debug, df, ...])

close()

Stop the background file system watcher started with watch=True.

open(path)

Open PyFileIndex in the subdirectory path

update([timeout])

Update file index

Attributes

dataframe

Alias for df.

df

The file index as a pandas DataFrame, with one row per file or directory below the indexed path.

close() None[source]#

Stop the background file system watcher started with watch=True. Safe to call even if no watcher is running.

property dataframe: DataFrame#

Alias for df.

property df: DataFrame#

The file index as a pandas DataFrame, with one row per file or directory below the indexed path. Columns:

  • basename (str): file or directory name, e.g. "output.txt".

  • path (str): absolute path.

  • dirname (str): absolute path of the parent directory.

  • is_directory (bool): True for directories, False for files.

  • mtime (float): last modification time as a POSIX timestamp (the same value os.stat().st_mtime returns). Useful for finding which simulation directories have written output most recently.

  • nlink (int): hard link count (os.stat().st_nlink). Used internally to detect changes that don’t update mtime; rarely needed directly.

Returns:

the file index

Return type:

pandas.DataFrame

open(path: str) PyFileIndex[source]#

Open PyFileIndex in the subdirectory path

Parameters:

path (str) – subdirectory to open

Returns:

PyFileIndex for subdirectory

Return type:

PyFileIndex

update(timeout: float = 0.1) None[source]#

Update file index

Parameters:

timeout (float) – when watch=True, a filesystem change made just before calling update() may not have reached the background watcher yet. timeout is the max time in seconds to wait for such a pending change to arrive before applying whatever is available. Ignored when watch=False (optional, default 100ms, matching watchfiles’ own minimum reporting latency).