pyfileindex.pyfileindex.PyFileIndex#
- class pyfileindex.pyfileindex.PyFileIndex(path: str = '.', filter_function: Callable | None = None, debug: bool = False, df: DataFrame | None = None, watch: bool = False)[source]#
Bases:
objectThe PyFileIndex maintains a pandas DataFrame to track changes in the file system.
- Parameters:
path (str) – file system path
filter_function (Callable) – function to filter for specific files (optional)
debug (bool) – enable debug print statements (optional)
df (pandas.DataFrame) – DataFrame of a previous PyFileIndex object (optional)
watch (bool) – keep the file index in sync using a background file system watcher instead of rescanning the file system on every update() call. Relies on OS-level file change notifications (via the optional watchfiles dependency), which are not always delivered reliably on network filesystems such as NFS, Lustre, or GPFS when the change is made by a different node – a common setup when monitoring HPC simulation output from a separate process or login node. Prefer the default polling mode (watch=False) in that case (optional)
- __init__(path: str = '.', filter_function: Callable | None = None, debug: bool = False, df: DataFrame | None = None, watch: bool = False) None[source]#
Methods
__init__([path, filter_function, debug, df, ...])close()Stop the background file system watcher started with watch=True.
open(path)Open PyFileIndex in the subdirectory path
update([timeout])Update file index
Attributes
Alias for
df.The file index as a pandas DataFrame, with one row per file or directory below the indexed path.
- close() None[source]#
Stop the background file system watcher started with watch=True. Safe to call even if no watcher is running.
- property df: DataFrame#
The file index as a pandas DataFrame, with one row per file or directory below the indexed path. Columns:
basename(str): file or directory name, e.g."output.txt".path(str): absolute path.dirname(str): absolute path of the parent directory.is_directory(bool):Truefor directories,Falsefor files.mtime(float): last modification time as a POSIX timestamp (the same valueos.stat().st_mtimereturns). Useful for finding which simulation directories have written output most recently.nlink(int): hard link count (os.stat().st_nlink). Used internally to detect changes that don’t updatemtime; rarely needed directly.
- Returns:
the file index
- Return type:
pandas.DataFrame
- open(path: str) PyFileIndex[source]#
Open PyFileIndex in the subdirectory path
- Parameters:
path (str) – subdirectory to open
- Returns:
PyFileIndex for subdirectory
- Return type:
- update(timeout: float = 0.1) None[source]#
Update file index
- Parameters:
timeout (float) – when watch=True, a filesystem change made just before calling update() may not have reached the background watcher yet. timeout is the max time in seconds to wait for such a pending change to arrive before applying whatever is available. Ignored when watch=False (optional, default 100ms, matching watchfiles’ own minimum reporting latency).