collocate_filesets

Collocator.collocate_filesets(filesets, start=None, end=None, processes=None, output=None, bundle=None, skip_file_errors=False, post_processor=None, post_processor_kwargs=None, **kwargs)[source]

Find collocation between the data of two filesets

If you want to save the collocations directly to disk, it may be easier to use search() directly.

Parameters
  • filesets – A list of two FileSet objects, the primary and the secondary fileset. Can be also Collocations objects with read_mode=collapse. The order of the filesets is irrelevant for the results of the collocation search but files from the secondary fileset might be read multiple times if using parallel processing (processes is greater than one). The number of output files could be different (see also the option bundle).

  • start – Start date either as datetime object or as string (“YYYY-MM-DD hh:mm:ss”). Year, month and day are required. Hours, minutes and seconds are optional. If not given, it is datetime.min per default.

  • end – End date. Same format as “start”. If not given, it is datetime.max per default.

  • processes – Collocating can be parallelized which improves the performance significantly. Pass here the number of processes to use.

  • output – Fileset object where the collocated data should be stored.

  • bundle – Set this to primary if you want to bundle the output files by their collocated primaries, i.e. there will be only one output file per primary. daily is also possible, then all files from one day are bundled together. Per default, all collocations for each file match will be saved separately. This might lead to a high number of output files. Note: daily means one process bundles all collocations from one day into one output file. If using multiple processes, this could still produce several daily output files per day.

  • skip_file_errors – If this is True and a file could not be read, the file and its match will be skipped and a warning will be printed. Otheriwse the program will stop (default).

  • post_processor – A function for post-processing the collocated data before saving it to output. Must accept two parameters: a xarray.Dataset with the collocated data and a dictionary with the path attributes from the collocated files.

  • post_processor_kwargs – A dictionary with keyword arguments that should be passed to post_processor.

  • **kwargs – Further keyword arguments that are allowed for collocate().

Yields

A xarray.Dataset with the collocated data if output is not set. If output is set to a FileSet-like object, only the filename of the stored collocations is yielded. The results are not ordered if you use more than one process. For more information about the yielded xarray.Dataset have a look at collocate().

Examples: