find

Collocations.find(start=None, end=None, sort=True, only_path=False, bundle=None, filters=None, no_files_error=True)

Find all files of this fileset in a given time period.

The start and end parameters build a semi-open interval: only the files that are equal or newer than start and older than end are going to be found.

While searching this method checks whether the file lies in the time periods given by exclude while initializing.

Parameters:
  • start – Start date either as datetime object or as string (“YYYY-MM-DD hh:mm:ss”). Year, month and day are required. Hours, minutes and seconds are optional. If not given, it is datetime.min per default.

  • end – End date. Same format as “start”. If not given, it is datetime.max per default.

  • sort – If true, all files will be yielded sorted by their starting and ending time. Default is true.

  • only_path – If true, only the paths of the files will be returned not their FileInfo object.

  • bundle – Instead of only yielding one file at a time, you can get a bundle of files. There are two possibilities: by setting this to an integer, you can define the size of the bundle directly or by setting this to a string (e.g. 1H), you can define the time period of one bundle. See http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases for allowed time specifications. Default value is 1. This argument will be ignored when having a single-file fileset. When using bundle, the returned files will always be sorted ignoring the state of the sort argument.

  • filters – Limits user-defined placeholder to certain values. Must be a dictionary where the keys are the names of user-defined placeholders and the values either strings or lists of strings with allowed placeholder values (can be represented by regular expressions). If the key name starts with a ! (exclamation mark), the value represent a black list (values that are not allowed).

  • no_files_error – If true, raises an NoFilesError when no files are found.

Yields:

Either a FileInfo object for each found file or - if bundle_size is not None - a list of FileInfo objects.

Examples:

# Define a fileset consisting of multiple files:
fileset = FileSet(
    "/dir/{year}/{month}/{day}/{hour}{minute}{second}.nc"
)

# Find some files of the fileset:
for file in fileset.find("2017-01-01", "2017-01-02"):
    # file is a FileInfo object that has the attribute path
    # and times.
    print(file.path)  # e.g. "/dir/2017/01/01/120000.nc"
    print(file.times)  # list of two datetime objects