typhon.datasets

Modules in this package contain classes to handle datasets.

That includes the overall framework to handle datasets in the dataset module, as well as concrete datasets for specific sensors etc., including reading routines.

To implement a new reading routine, subclass one of the datasets here.

typhon.datasets.dataset

Module containing classes abstracting datasets

exception typhon.datasets.dataset.DataFileError[source]

Bases: Exception

Superclass for any datafile problems

Upon reading a large amounts of data, some files will contain problems. When processing a year of data, we don’t want to fail entirely when one orbit file fails. But if there is a real bug somewhere, we probably do want to fail. Therefore, the typhon dataset framework is optionally resilient to datafile related errors, but only if those derive from DataFileError.

Therefore, users implementing their own reading routine may wish to catch errors arising from corrupted data, and raise an exception derived from DataFileError instead. That way, typhon knows that a problem is due to corrupted data and not due to a bug.

class typhon.datasets.dataset.Dataset(**kwargs)[source]

Bases: object

Represents a dataset.

This is an abstract class. More specific subclasses are SingleFileDataset and MultiFileDataset.

To add a dataset, subclass one of the subclasses of Dataset, such as MultiFileDataset, and implement the abstract methods.

Dataset objects have a limited number of attributes. To limit the occurence of bugs, dynamically setting non-pre-existing attributes is limited. Attributes can be set either by passing keyword arguments when creating the object, or by setting the appropriate field in your typhon configuration file (such as .typhonrc). The configuration section will correspond to the object name, the key to the attribute, and the value to the value assigned to the attribute. See also typhon.config.

start_date

datetime.datetime or numpy.datetime64 – Starting date for dataset. May be used to search through ALL granules. WARNING! If this is set at a time t_0 before the actual first measurement t_1, then the collocation algorith (see CollocatedDataset) will conclude that there are 0 collocations in [t_0, t_1], and will not realise if data in [t_0, t_1] are actually added later!

end_date

datetime.datetime or numpy.datetime64 – Similar to start_date, but for ending.

name

str – Name for the dataset. Used to make sure there is only a single dataset with the same name for any particular dataset. If a dataset is initiated with a pre-exisitng name, the previous product is called.

aliases

Mapping[str, str] – Aliases for field. Dictionary can be useful if you want to programmatically loop through the same field for many different datasets, but they are named differently. For example, an alias could be “ch4_profile”.

unique_fields

Container[str] – Set of fields that make any individual measurement unique. For example, the default value is {“time”, “lat”, “lon”}.

related

Mapping[str, Dataset] – Dictionary whose keys may refer to other datasets with related information, such as DMPs or flags.

aliases = {}
as_xarray_dataset()[source]
combine(my_data, other_obj, other_data=None, other_args=None, trans=None, timetol=numpy.timedelta64(1, 's'))[source]

Combine with data from other dataset.

Combine a set of measurements from this dataset with another dataset, where each individual measurement correspond to exactly one from the other one, as identified by time/lat/lon, orbitid, or measurument id, or other characteristics. The object attribute unique_fields determines how those are found.

The other dataset may contain flags, DMPs, or different information altogether.

Parameters:
  • my_data (ndarray) – Data for self. A (masked) array with a dtype such as returned from self.read <Dataset.read>.
  • other_obj (Dataset) – Dataset to match Object from a Dataset subclass from which to find matching data.
  • other_data (ndarray) – Data for other. Optional. Optionally, pass data for other object. If not provided or None, this will be read using other_obj.
  • other_args (dict) – Keyword arguments passed to other_obj.read_period. May need to contain things like {“locator_args”: {“satname”: “noaa18”}}
  • trans (collections.OrderedDict) – Dictionary of what field in my_data corresponds to what field in other_data. Optional; by default, merges self.unique_fields and other_obj.unique_fields, and assumes names between the two are identical. Order is relevant for optimal recursive bisection search for matches, which is to be implemented.
  • timetol (timedelta64) – For datetime types, isclose does not work (https://github.com/numpy/numpy/issues/5610). User must pass an explicit tolerance, defaulting to 1 second.
Returns:

Masked ndarray of same size as my_data and same dtype as returned by other_obj.read.

TODO: Allow user to pass already-read data from other dataset.

end_date = None
find_granules(start=datetime.datetime(1, 1, 1, 0, 0), end=datetime.datetime(9999, 12, 31, 23, 59, 59, 999999), include_last_before=False)[source]

Loop through all granules for indicated period.

This is a generator that will loop through all granules from start to end, inclusive.

See also: find_granules_sorted

Parameters:
  • start (datetime.datetime) – Start Starting datetime. When omitted, start at complete beginning of dataset.
  • end (datetime.datetime) – End End datetime. When omitted, continue to end of dataset. Last granule will start before this datetime, but contents may continue beyond it.
  • include_last_before (bool) – Be inclusive When True, also return the last granule /before/ start, so that a reader is sure to include all data in the covered period. When False, the first granule yielded is the first granule starting after start.
Yields:

pathlib.Path objects for all files in dataset. Sorting is not guaranteed; if you need guaranteed sorting, use find_granules_sorted.

find_granules_sorted(start=None, end=None)[source]

Yield all granules sorted by starting time then ending time.

For details, see find_granules.

find_most_recent_granule_before(instant, **locator_args)[source]

Find granule covering instant

Find granule started most recently before instant.

Parameters:
  • instant (datetime.datetime) – Time to search for datetime for which a granule is sought beginning of dataset.
  • **locator_args – Any other keyword arguments that the particular dataset needs. Commonly, satname is needed.
Returns:

pathlib.Path object for sought granule.

get_additional_field(M, fld)[source]

Get additional field.

Get field from other dataset, original objects, or otherwise. To be implemented by subclass implementations.

Exact fields depend on subclass.

Parameters:
  • M (ndarray) – ndarray with existing data A (masked) array with a dtype such as returned from self.read <Dataset.read>.
  • fld (str) – Additional field to read from original data
Returns:

ndarray with fields of M + fld.

mandatory_fields = None
maxsize = 10485760000
my_pseudo_fields = None
name = ''
read(f=None, fields='all', pseudo_fields=None, **kwargs)[source]

Read granule in file and do some other fixes

Shall return an ndarray with at least the fields lat, lon, time.

Parameters:
  • f (str) – String containing path to file to read.
  • fields (Iterable[str] or str) – What fields to return. See Dataset.read_period() for details.
  • pseudo_fields (Mapping[str, function]) – Additional fields to calculate on-the-fly after every read. That may be more attractive from a memory point of view. In this mapping, the keys will be names added to the dtype of the returned ndarray (at the top level). The values are functions. Each function must take a
  • further keyword arguments are passed on to the particular (Any) –
  • routine. For details, please refer to the docstring (reading) –
  • the class. (for) –
Returns:

Masked array containing data in file with selected fields.

read_period(start=None, end=None, onerror='skip', fields='all', pseudo_fields=None, sorted=True, locator_args=None, reader_args=None, limits=None, filters=None)[source]

Read all granules between start and end, in bulk.

Parameters:
  • start (datetime.datetime) – Start Starting datetime. When omitted, start at complete beginning of dataset.
  • end (datetime.datetime) – End End datetime. When omitted, continue to end of dataset. Last granule will start before this datetime, but contents may continue beyond it. Can also be a datetime.timedelta, which will then be interpreted as a length after start.
  • onerror (str) – What to do on errors When reading many files, some files may have problems. If onerror is set to “skip”, files with errors are skipped and reading continues with the next file. This is the default behaviour. If onerror is set to “raise”, the method will reraise the original exception as soon as a problem occurs.
  • fields (Iterable[str] or str) – What fields to return What fields to read from dataset. Either a collection of strings corresponding to fields to read, or the str “all” (default), which means read all fields.
  • pseudo_fields (Mapping[str, function]) – See documentation for self.read.
  • sorted (bool) – Should the granules be read in sorted order? Defaults to true. NB: does not currently guarantee that the actual results are sorted; this is up to the individual reading routines!
  • locator_args (dict) – Extra keyword arguments passed on to granule finding routines.
  • reader_args (dict) – Extra keyword arguments to be passed on to reading routine. Since these differ per type, those are probably documented in the class docstring.
  • limits (dict) – Limitations to apply to each granule. For the exact format, see :func:typhon.math.array.limit_ndarray.
  • filters (container/iterable) – collection of functions to be applied for filtering. Must take ndarray input, must give ndarray output.
Returns:

Masked array containing all data in period. Invalid data may or may not be masked, depending on the behaviour of the reading routine implemented by the subclass.

related = {}
section = ''
setlocal()[source]

Set local attributes, from config or otherwise.

start_date = None
unique_fields = {'lon', 'lat', 'time'}
class typhon.datasets.dataset.DatasetDeque(ds, window, init_time)[source]

Bases: object

A deque-like object sliding through a dataset.

For a particular dataset, keep data corresponding to time period in memory. When reading new data, discard a corresponding time period of data in the past. This should be useful to process longer periods of data that do not fit into memory, to calculate sliding window statistics, or to perform processing where values nearby in time are required.

center_time = None
data = None
dsobj = None
init_time = None
move(period)[source]

Read period on the right, discard period on the left

This moves the window to the right by period by reading period new data, appending this on the right, and discarding an equally long period on the left.

Parameters:[timedelta] (period) – Duration by which to shift. Must be positive.
reset(newtime=None)[source]

Reset to initial conditions

Sets data to indicated time or self.init_time ± self.window/2

Parameters:
  • [datetime.datetime] (newtime) – optional, time at which to
  • window. If not given, reset to initial time. (center) –
resize(window)[source]

Resize window

window = None
exception typhon.datasets.dataset.GranuleLocatorError[source]

Bases: Exception

Problem locating granules.

class typhon.datasets.dataset.HomemadeDataset(**kwargs)[source]

Bases: typhon.datasets.dataset.MultiFileDataset

For any dataset created by typhon or its dependencies.

Currently supports only saving to npz, through the save_npz method. Eventually, should also support other file formats, in particular NetCDF.

find_granule_for_time(**kwargs)[source]

Find granule for specific time.

May or may not exist.

Arguments (kw only) are passed on to format directories stored in self.basedir / self.subdir / self.stored_name, along with self.__dict__.

Returns path to granule.

save_npz(path, M)[source]

Save to compressed npz

Parameters:
  • path (pathlib.Path) – Path to store to
  • M (ndarray) – Contents of what to store.
stored_name = ''
class typhon.datasets.dataset.HyperSpectral(**kwargs)[source]

Bases: typhon.datasets.dataset.Dataset, typhon.physics.units.em.FwmuMixin

Superclass for any hyperspectral instrument

freqfile = None
exception typhon.datasets.dataset.InvalidDataError[source]

Bases: typhon.datasets.dataset.DataFileError

Raised when data is not how it should be.

See DataFileError for more information.

exception typhon.datasets.dataset.InvalidFileError[source]

Bases: typhon.datasets.dataset.DataFileError, ValueError

Raised when the requested information cannot be obtained from the file

See DataFileError for more information.

class typhon.datasets.dataset.MultiFileDataset(**kwargs)[source]

Bases: typhon.datasets.dataset.Dataset

Represents a dataset where measurements are spread over multiple files.

If filenames contain timestamps, this information is used to determine the time for a granule or measurement. If filenames do not contain timestamps, this information is obtained from the file contents.

basedir

pathlib.Path or str – Describes the directory under which all granules are located. Can be either a string or a pathlib.Path object.

subdir

pathlib.Path or str – Describes the directory within basedir where granules are located. May contain string formatting directives where particular fields are replaces, such as year, month, and day. For example: subdir = ‘{year}/{month}’. Sorting cannot be more narrow than by day.

re

str – Regular expression that should match valid granule files within basedir / subdir. Should use symbolic group names to capture relevant information when possible, such as starting time, orbit number, etc. For time identification, relevant fields are contained in MultiFileDataset.date_info, where each field also exists in a version with “_end” appended. MultiFileDataset.refields contains all recognised fields.

If any _end fields are found, the ending time is equal to the beginning time with any _end fields replaced. If no _end fields are found, the granule_duration attribute is used to determine the ending time, or the file is read to get the ending time (hopefully the header is enough).

granule_cache_file

pathlib.Path or str – If set, use this file to cache information related to granules. This is used to cache granule times if those are not directly inferred from the filename. Otherwise, this is not used. The full path to this file shall be basedir / granule_cache_file.

granule_duration

datetime.timedelta – If the filename contains starting times but no ending times, granule_duration is used to determine the ending time. This should be a datetime.timedelta object.

basedir = None
datefields = ['year', 'month', 'day', 'hour', 'minute', 'second']
find_dir_for_time(dt)[source]

Find the directory containing granules/measurements at (date)time

For a given datetime object, find the directory that contains granules/measurument files for this particular time.

Parameters:dt (datetime.datetime) – Timestamp for inquiry. In reality, any object with year, month, and day attributes works.
Returns:pathlib.Path object pointing to to relevant directory
find_granules(dt_start=None, dt_end=None, include_last_before=False, **extra)[source]

Yield all granules/measurementfiles in period

Accepts extra keyword arguments. Meaning depends on actual dataset. Could be something like a satellite name in the case of sensors occurring on multiple platforms, like HIRS. To see what keyword arguments are accepted or possibly needed for a particular dataset, call self.get_path_format_variables()

If keyword argument return_time is present and True, yield tuples of (start_time, path) rather than just path.

The results are usually sorted by start time, but this is not guaranteed and depends on the filesystem. If you need sorted granules, please use find_granules_sorted.

Parameters:
  • d_start (datetime.date) – Starting date.
  • d_end (datetime.date) – Ending date
  • include_last_before (bool) – Include last granule starting before.
  • **extra – Any extra keyword arguments. This will be passed on to format self.basedir / self.subdir, in case the standard fields like year, month, etc. do not provide enough information.
Yields:

pathlib.Path objects for each datafile in the dataset between dt_start and dt_end.

find_granules_sorted(dt_start=None, dt_end=None, include_last_before=False, **extra)[source]

Yield all granules, sorted by times.

For documentation, see find_granules().

find_most_recent_granule_before(instant, **locator_args)[source]

Find granule covering instant

Find granule started most recently before instant.

Arguments:

instant (datetime.datetime): Time to search for
datetime for which a granule is sought beginning of dataset.
**locator_args:
Any other keyword arguments that the particular dataset needs. Commonly, satname is needed.
Returns:
pathlib.Path object for sought granule.

Docstring inherited from Dataset

get_info_for_granule(p)[source]

Return dict (re.fullmatch) for granule, based on re

Parameters:p (pathlib.Path) – path to granule
Returns:dictionary with info, such as returned by re.fullmatch().
Return type:dict
get_mandatory_fields()[source]
get_path_format_variables()[source]

What extra format variables are needed in find_granules?

Depending on the dataset, find_granules needs zero or more extra formatting valiables. For example, TOVS instruments require the satellite. Required formatting arguments are determined by self.basedir and self.subdir.

get_subdir_resolution()[source]

Return the resolution for the subdir precision.

Returns “year”, “month”, “day”, or None (if there is no subdir).

Based on parsing of self.subdir attribute.

get_time_from_granule_contents(p)[source]

Get datetime objects for beginning and end of granule

If it returns None, then use same as start time.

Parameters:p (pathlib.Path) – Path to file
Returns:2-tuple for start and end times
Return type:(datetime, datetime)
get_times_for_granule(p, **kwargs)[source]

For granule stored in path, get start and end times.

May take hints for year, month, day, hour, minute, second, and their endings, according to self.datefields

Parameters:
  • p (pathlib.Path) – path to granule
  • **kwargs – Any more info that may be needed
Returns:

Start and end time for granule

Return type:

(datetime, datetime)

granule_cache_file = None
granule_duration = None
granules_firstline_db = None
granules_firstline_file = None
iterate_subdirs(d_start, d_end, **extra)[source]

Iterate through all subdirs in dataset.

Note that this does not check for existance of those directories.

Yields a 2-element tuple where the first contains information on year(/month/day), and the second is the path.

Parameters:
  • d_start (datetime.date) – Starting date.
  • d_end (datetime.date) – Ending date
  • **extra – Any extra keyword arguments. This will be passed on to format self.basedir / self.subdir, in case the standard fields like year, month, etc. do not provide enough information.
Yields:

pathlib.Path objects for each directory in the dataset containing files between d_start and d_end.

re = None
refields = ['year', 'year_end', 'month', 'month_end', 'day', 'day_end', 'hour', 'hour_end', 'minute', 'minute_end', 'second', 'second_end']
subdir = ''
valid_field_values = {}
verify_mandatory_fields(extra)[source]
class typhon.datasets.dataset.MultiSatelliteDataset[source]

Bases: object

satellites = set()
valid_field_values
class typhon.datasets.dataset.NetCDFDataset[source]

Bases: object

Mixin for any dataset where the contents are in NetCDF.

This may provide a good default for any NetCDF-based dataset. The reading routine will take the most commonly occurring dimension as the ndarray axes, and the rest within structured multidimensional dtype.

USE WITH CARE! PROVISIONAL API!

This one should use xarray, of course! See https://arts.mi.uni-hamburg.de/trac/rt/ticket/145

class typhon.datasets.dataset.SingleFileDataset(**kwargs)[source]

Bases: typhon.datasets.dataset.Dataset

Represents a dataset where all measurements are in one file.

end_date = datetime.datetime(9999, 12, 31, 23, 59, 59, 999999)
find_granules(start=datetime.datetime(1, 1, 1, 0, 0), end=datetime.datetime(9999, 12, 31, 23, 59, 59, 999999), include_last_before=False)[source]

Loop through all granules for indicated period.

This is a generator that will loop through all granules from start to end, inclusive.

See also: find_granules_sorted

Arguments:

start (datetime.datetime): Start
Starting datetime. When omitted, start at complete beginning of dataset.
end (datetime.datetime): End
End datetime. When omitted, continue to end of dataset. Last granule will start before this datetime, but contents may continue beyond it.
include_last_before (bool): Be inclusive
When True, also return the last granule /before/ start, so that a reader is sure to include all data in the covered period. When False, the first granule yielded is the first granule starting after start.
Yields:
pathlib.Path objects for all files in dataset. Sorting is not guaranteed; if you need guaranteed sorting, use find_granules_sorted.

Docstring inherited from Dataset

find_granules_sorted(start=datetime.datetime(1, 1, 1, 0, 0), end=datetime.datetime(9999, 12, 31, 23, 59, 59, 999999), include_last_before=False)[source]

Yield all granules sorted by starting time then ending time.

For details, see find_granules.

Docstring inherited from Dataset

find_most_recent_granule_before(instant, **locator_args)[source]

Find granule covering instant

Find granule started most recently before instant.

Arguments:

instant (datetime.datetime): Time to search for
datetime for which a granule is sought beginning of dataset.
**locator_args:
Any other keyword arguments that the particular dataset needs. Commonly, satname is needed.
Returns:
pathlib.Path object for sought granule.

Docstring inherited from Dataset

get_times_for_granule(gran=None)[source]
read(f=None, fields='all')[source]

Read granule in file and do some other fixes

Shall return an ndarray with at least the fields lat, lon, time.

Arguments:

f (str): String containing path to file to read.

fields (Iterable[str] or str): What fields to return.
See Dataset.read_period() for details.
pseudo_fields (Mapping[str, function]): Additional fields to
calculate on-the-fly after every read. That may be more attractive from a memory point of view. In this mapping, the keys will be names added to the dtype of the returned ndarray (at the top level). The values are functions. Each function must take a

Any further keyword arguments are passed on to the particular reading routine. For details, please refer to the docstring for the class.

Returns:

Masked array containing data in file with selected fields.

Docstring inherited from Dataset

srcfile = None
start_date = datetime.datetime(1, 1, 1, 0, 0)
class typhon.datasets.dataset.SingleMeasurementPerFileDataset(**kwargs)[source]

Bases: typhon.datasets.dataset.MultiFileDataset

Represents datasets where each file contains one measurement.

An example of this would be ACE-FTS, or some radio-occultation datasets.

filename_fields

Mapping[str, dtype] – dict with {name, dtype} for fields that should be copied from the filename (as obtained with self.re) into the header

filename_fields = {}
granule_duration = datetime.timedelta(0)
read_single(p, fields='all')[source]

Read a single measurement from a single file.

Shall take one argument (a path object) and return a tuple with (header, measurement). The header shall contain information like latitude, longitude, time.

Parameters:

typhon.datasets.tovs

Datasets for TOVS/ATOVS

This module imports typhon.physics.units and therefore has a soft dependency on the pint units library. Import this module only if you can accept such a dependency.

class typhon.datasets.tovs.ATOVS[source]

Bases: object

Functionality in common with all ATOVS.

Designed as mixin.

class typhon.datasets.tovs.HIASI(**kwargs)[source]

Bases: typhon.datasets.tovs.TOVSCollocatedDataset, typhon.datasets.dataset.NetCDFDataset, typhon.datasets.dataset.MultiFileDataset, typhon.datasets.dataset.HyperSpectral

“HIRS-IASI collocations

combine(M, other_obj, *args, **kwargs)[source]

Combine with data from other dataset.

Combine a set of measurements from this dataset with another dataset, where each individual measurement correspond to exactly one from the other one, as identified by time/lat/lon, orbitid, or measurument id, or other characteristics. The object attribute unique_fields determines how those are found.

The other dataset may contain flags, DMPs, or different information altogether.

Arguments:

my_data (ndarray): Data for self.
A (masked) array with a dtype such as returned from self.read <Dataset.read>.
other_obj (Dataset): Dataset to match
Object from a Dataset subclass from which to find matching data.
other_data (ndarray): Data for other. Optional.
Optionally, pass data for other object. If not provided or None, this will be read using other_obj.
other_args (dict): Keyword arguments passed to
other_obj.read_period. May need to contain things like {“locator_args”: {“satname”: “noaa18”}}
trans (collections.OrderedDict): Dictionary of what field in my_data
corresponds to what field in other_data. Optional; by default, merges self.unique_fields and other_obj.unique_fields, and assumes names between the two are identical. Order is relevant for optimal recursive bisection search for matches, which is to be implemented.
timetol (timedelta64): For datetime types, isclose does not
work (https://github.com/numpy/numpy/issues/5610). User must pass an explicit tolerance, defaulting to 1 second.

Returns:

Masked ndarray of same size as my_data and same dtype as returned by other_obj.read.

TODO: Allow user to pass already-read data from other dataset.

Docstring inherited from MultiFileDataset

end_date = datetime.datetime(2014, 1, 1, 0, 0)
name = 'hiasi'
re = 'W_XX-EUMETSAT-Darmstadt,SATCAL\\+COLLOC\\+LEOLEOIR,opa\\+HIRS\\+M02\\+IASI_C_EUMS_(?P<year>\\d{4})(?P<month>\\d{2})(?P<day>\\d{2})(?P<hour>\\d{2})(?P<minute>\\d{2})(?P<second>\\d{2})_(?P<year_end>\\d{4})(?P<month_end>\\d{2})(?P<day_end>\\d{2})(?P<hour_end>\\d{2})(?P<minute_end>\\d{2})(?P<second_end>\\d{2})\\.nc'
section = 'hiasi'
start_date = datetime.datetime(2013, 1, 1, 0, 0)
subdir = '{year:04d}/{month:02d}'
class typhon.datasets.tovs.HIRS(*args, **kwargs)[source]

Bases: typhon.datasets.dataset.MultiSatelliteDataset, typhon.datasets.tovs.Radiometer, typhon.datasets.dataset.MultiFileDataset

High-resolution Infra-Red Sounder.

This class can read HIRS l1b as published in the NOAA CLASS archive.

Reading routines as for any datasets (see documentation for Dataset, MultiFileDataset, and others).

Specifically for HIRS: when reading a single file (i.e. h.read(path)), takes keyword arguments:

return_header. If true, returns tuple (header, lines). Otherwise, only return the lines. The latter is default behaviour, in particular when reading many

radiance_units. Defaults to “si”, by which I annoyingly mean W/(m²·sr·Hz). Set to “classic” if you want mW/(m²·sr·cm^{-1}), which is the unit more commonly used for HIRS and which it is calibrated against.

apply_scale_factors. If true (defaults true), apply scale factors as documented in KLM / POD guides. This is required when calibrate is True.

calibrate. If true (defaults true), apply calibration. When false, will not return any brightness temperatures or radiances, just counts. Note that this relates to the native NOAA calibration, not to any new calibration such as developed for FIDUCEO.

apply_flags. If true, apply flags when reading data.

apply_filter. Apply an outlier filter. FIXME DOC.

max_flagged. Float between 0 and 1. If a larger proportion than this number is flagged, raise an exception (FIXME DOC) and throw away the entire granule.

Note that this class only reads in the standard HIRS data with its standard calibration. Innovative calibrations including uncertainties are implemented in HIRSFCDR.

To use this class, you need to define in your typhonrc the following settings in the section ‘hirs’:

basedir

subdir

re

(TODO: migrate to definition?)

format_definition_file

only for FIXME

Work in progress.

TODO/FIXME:

  • What is the correct way to use the odd bit parity? Information in NOAA KLM User’s Guide pages 3-31 and 8-154, but I’m not sure how to apply it.
  • If datasets like MHS or AVHRR are added some common code could probably move to a class between HIRS and MultiFileDataset, or to a mixin such as ATOVS.
  • Better handling of duplicates between subsequent granules. Currently it takes all lines from the older granule and none from the newer, but this should be decided on a case-by-case basis (Jon Mittaz, personal communication).
apply_calibcount_filter(lines, cutoff=10)[source]
as_xarray_dataset(M, skip_dimensions=(), rename_dimensions={})[source]

Convert structured ndarray to xarray dataset

From an object with a dtype such as may be returned by self.read, return an xarray dataset.

This method is in flux and its API is currently not stable. There needs to be a proper system of defining the variable names etc.

See tickets 145, 148, 149.

Parameters:
  • [ndarray] (M) – ndarray of same type as returned by self.read.
  • [Container[str]] (skip_dimensions) – dimensions that shall not be included. For example, normal HIRS data has a scanpos dimension, HIRS-HIRS collocations do not; to convert collocaitons, pass skip_dimensions=[“scanpos”].
  • [Mapping[str,str]] (rename_dimensions) – dimensions that shall be renamed. For example, for collocations you may want to rename “time” to “scanline” or to “collocation”.
calc_time_since_last_calib(M)[source]

Calculate time since last calibration.

Calculate time (in seconds) since the last calibration cycle.

Parameters:[ndarray] (M) – ndarray of the same type as returned by self.read. Must be contiguous or results will be wrong.
Returns:ndarray, seconds since last calibration cycle
calibrate(cc, counts)[source]
check_parity(counts)[source]

Verify parity for counts

NOAA KLM Users Guide – April 2014 Revision, Section 3.2.2.4, Page 3-31, Table 3.2.2.4-1:

> Minor Word Parity Check is the last bit of each minor Frame > or data element and is inserted to make the total number of > “ones” in that data element odd. This permits checking for > loss of data integrity between transmission from the instrument > and reconstruction on the ground.

count_end = 22
count_lines_since_last_calib(M)[source]

Count scanlines since last calibration.

Count scanlines since the last calibration cycle.

Parameters:[ndarray] (M) – ndarray of the same type as returned by self.read. Must be contiguous or results will be wrong.
Returns:ndarray (uint16), scanlines since last calibration cycle
count_start = 2
filter_bestline(path, header, scanlines)[source]

Choose best lines in overlap between last/current/next granule

filter_firstline(header, scanlines)[source]

Filter out any scanlines that existed in the previous granule.

filter_overlap(path, header, scanlines, method='first')[source]
format_definition_file = ''
get_cc(scanlines)[source]

Extract calibration coefficients from scanlines.

get_dataname(header)[source]

Extract dataname from header.

classmethod get_definition_from_PDF(path_to_pdf)[source]

Get HIRS definition from NWPSAF PDF.

This method needs the external program pdftotext. Put the result in header_dtype manually, but there are some corrections (see comments in source code in _tovs_defs).

Parameters:path_to_pdf (str) – Path to document NWPSAF-MF-UD-003_Formats.pdf
Returns:(head_dtype, head_format, line_dtype, line_format)
get_dtypes(fp)[source]

Get dtypes for file.

Needs an open file object. Used internally by reading routine.

get_iwt(header, elem)[source]

Get temperature of internal warm target

get_mask_from_flags(header, lines, max_flagged=0.5)[source]

Set mask in lines, based on header and lines info

Given header and lines such as returned by self.read, determine flags and set them on lines as appropriate. Returns lines as a masked array.

get_other(scanlines)[source]

Get other information from scanlines.

Exact content depends on implementation. Can be things like scantype, solar zenith angles, etc.

get_pos(scanlines)[source]

Get lat-lon from scanlines.

get_temp(header, elem, anwrd)[source]

Get temperatures from header, element, anwrd

Used internally. FIXME DOC.

get_wn_c1_c2(header)[source]

Read central wavenumber, c1, and c2 from header

Given a header such as returned by self.read, return central wavenumber and the coefficients c1 and c2.

granules_firstline_file = PosixPath('.')
id2name(satid)[source]

Translate satellite id to satellite name.

See also id2no.

WARNING: Does not support NOAA-13 or TIROS-N!

id2no(satid)[source]

Translate satellite id to satellite number.

Sources: - POD guide, Table 2.0.4-3. - KLM User’s Guide, Table 8.3.1.5.2.1-1. - KLM User’s Guide, Table 8.3.1.5.2.2-1.

WARNING: Does not support NOAA-13 or TIROS-N!

max_valid_time_ptp = numpy.timedelta64(3,'h')
n_calibchannels = 19
n_channels = 20
n_minorframes = 64
n_perline = 56
name = 'hirs'
rad2bt(rad_wn, wn, c1, c2)[source]

Apply the standard radiance-to-BT conversion from NOAA KLM User’s Guide.

Applies the standard radiance-to-BT conversion as documented by the NOAA KLM User’s Guide. This is based on a linearisation of a radiance-to-BT mapping for the entire channel. A more accurate method is available in typhon.physics.em.SRF.channel_radiance2bt, which requires explicit consideration of the SRF. Such consideration is implicit here. That means that this method is only valid assuming the nominal SRF!

This method relies on values reported in the header of each granule. See NOAA KLM User’s Guide, Table 8.3.1.5.2.1-1., page 8-108. Please convert to SI units first.

NOAA KLM User’s Guide, Section 7.2.

Parameters:
  • rad_wn – Spectral radiance per wanenumber [W·sr^{-1}·m^{-2}·{m^{-1}}^{-1}]
  • wn – Central wavenumber [m^{-1}]. Note that unprefixed SI units are used.
  • c1 – c1 as contained in hrs_h_tempradcnv
  • c2 – c2 as contained in hrs_h_tempradcnv
re = '(L?\\d*\\.)?NSS.HIR[XS].(?P<satcode>.{2})\\.D(?P<year>\\d{2})(?P<doy>\\d{3})\\.S(?P<hour>\\d{2})(?P<minute>\\d{2})\\.E(?P<hour_end>\\d{2})(?P<minute_end>\\d{2})\\.B(?P<B>\\d{7})\\.(?P<station>.{2})\\.gz'
satname = None
section = 'hirs'
seekhead(f)[source]

Seek open file to header position.

For some files, CLASS prepends 512 bytes to the file. This method shall make sure the file pointer is in the correct position to start reading.

temperature_fields = None
typ_Earth = 0
typ_iwt = 3
typ_space = 1
update_firstline_db(satname=None, start_date=None, end_date=None, overwrite=False)[source]

Create / update the firstline database

Create or update the database describing for each granule what the first scanline is that doesn’t occur in the preceding granule.

If a granule is entirely contained within the previous one, firstline is set to L+1 where L is the number of lines.

class typhon.datasets.tovs.HIRS2(*args, **kwargs)[source]

Bases: typhon.datasets.tovs.HIRSPOD

Sole implementation of HIRSPOD

channel_order = array([ 1, 17, 2, 3, 13, 4, 18, 11, 19, 7, 8, 20, 10, 14, 6, 5, 15, 12, 16, 9])
end_date = datetime.datetime(2006, 10, 10, 0, 0)
satellites = {'noaa12': {'noaa12', 'N12', 'NOAA12', 'n12'}, 'tirosn': {'tn', 'TN', 'TIROSN'}, 'noaa14': {'NOAA14', 'noaa14', 'N14', 'n14'}, 'noaa10': {'N10', 'NOAA10', 'noaa10', 'n10'}, 'noaa08': {'N08', 'NOAA8', 'n8', 'NOAA08', 'N8', 'n08'}, 'noaa13': {'n13', 'NOAA13', 'noaa13', 'N13'}, 'noaa11': {'N11', 'n11', 'NOAA11', 'noaa11'}, 'noaa07': {'NOAA07', 'n07', 'NOAA7', 'N07', 'n7', 'N7'}, 'noaa06': {'n6', 'NOAA6', 'N6', 'n06', 'N06', 'NOAA06'}, 'noaa09': {'n09', 'NOAA09', 'NOAA9', 'n9', 'N9', 'N09'}}
start_date = datetime.datetime(1978, 10, 29, 0, 0)
version = 2
class typhon.datasets.tovs.HIRS2I(*args, **kwargs)[source]

Bases: typhon.datasets.tovs.HIRS2

Sole implementation of HIRSPOD

satellites = {'noaa14', 'noaa11'}
class typhon.datasets.tovs.HIRS3(*args, **kwargs)[source]

Bases: typhon.datasets.tovs.HIRSKLM

Functionality in common with all ATOVS.

Designed as mixin.

channel_order = array([ 1, 17, 2, 3, 13, 4, 18, 11, 19, 7, 8, 20, 10, 14, 6, 5, 15, 12, 16, 9])
end_date = datetime.datetime(2016, 12, 31, 0, 0)
get_mask_from_flags(header, lines, max_flagged=0.5)[source]

Set mask in lines, based on header and lines info

Given header and lines such as returned by self.read, determine flags and set them on lines as appropriate. Returns lines as a masked array.

Docstring inherited from HIRS

Docstring inherited from HIRSKLM

header_dtype = dtype([('hrs_h_siteid', 'S3'), ('hrs_h_blank', 'S1'), ('hrs_h_l1bversnb', '>i2'), ('hrs_h_l1bversyr', '>i2'), ('hrs_h_l1bversdy', '>i2'), ('hrs_h_reclg', '>i2'), ('hrs_h_blksz', '>i2'), ('hrs_h_hdrcnt', '>i2'), ('hrs_h_filler0', '>i2', (3,)), ('hrs_h_dataname', 'S42'), ('hrs_h_prblkid', 'S8'), ('hrs_h_satid', '>i2'), ('hrs_h_instid', '>i2'), ('hrs_h_datatyp', '>i2'), ('hrs_h_tipsrc', '>i2'), ('hrs_h_startdatajd', '>i4'), ('hrs_h_startdatayr', '>i2'), ('hrs_h_startdatady', '>i2'), ('hrs_h_startdatatime', '>i4'), ('hrs_h_enddatajd', '>i4'), ('hrs_h_enddatayr', '>i2'), ('hrs_h_enddatady', '>i2'), ('hrs_h_enddatatime', '>i4'), ('hrs_h_cpidsyr', '>i2'), ('hrs_h_cpidsdy', '>i2'), ('hrs_h_filler1', '>i2', (4,)), ('hrs_h_inststat1', '>i4'), ('hrs_h_filler2', '>i2'), ('hrs_h_statchrecnb', '>i2'), ('hrs_h_inststat2', '>i4'), ('hrs_h_scnlin', '>i2'), ('hrs_h_callocsclin', '>i2'), ('hrs_h_misscnlin', '>i2'), ('hrs_h_datagaps', '>i2'), ('hrs_h_okdatafr', '>i2'), ('hrs_h_pacsparityerr', '>i2'), ('hrs_h_auxsyncerrsum', '>i2'), ('hrs_h_timeseqerr', '>i2'), ('hrs_h_timeseqerrcode', '>i2'), ('hrs_h_socclockupind', '>i2'), ('hrs_h_locerrind', '>i2'), ('hrs_h_locerrcode', '>i2'), ('hrs_h_pacsstatfield', '>i2'), ('hrs_h_pacsdatasrc', '>i2'), ('hrs_h_filler3', '>i4'), ('hrs_h_spare1', 'S8'), ('hrs_h_spare2', 'S8'), ('hrs_h_filler4', '>i2', (5,)), ('hrs_h_autocalind', '>i2'), ('hrs_h_solarcalyr', '>i2'), ('hrs_h_solarcaldy', '>i2'), ('hrs_h_calinf', '>i4', (80,)), ('hrs_h_filler5', '>i4', (2,)), ('hrs_h_tempradcnv', '>i4', (57,)), ('hrs_h_20solfiltirrad', '>i2'), ('hrs_h_20equifiltwidth', '>i2'), ('hrs_h_filler6', '>i4', (2,)), ('hrs_h_modelid', 'S8'), ('hrs_h_nadloctol', '>i2'), ('hrs_h_locbit', '>i2'), ('hrs_h_filler7', '>i2'), ('hrs_h_rollerr', '>i2'), ('hrs_h_pitcherr', '>i2'), ('hrs_h_yawerr', '>i2'), ('hrs_h_epoyr', '>i2'), ('hrs_h_epody', '>i2'), ('hrs_h_epotime', '>i4'), ('hrs_h_smaxis', '>i4'), ('hrs_h_eccen', '>i4'), ('hrs_h_incli', '>i4'), ('hrs_h_argper', '>i4'), ('hrs_h_rascnod', '>i4'), ('hrs_h_manom', '>i4'), ('hrs_h_xpos', '>i4'), ('hrs_h_ypos', '>i4'), ('hrs_h_zpos', '>i4'), ('hrs_h_xvel', '>i4'), ('hrs_h_yvel', '>i4'), ('hrs_h_zvel', '>i4'), ('hrs_h_earthsun', '>i4'), ('hrs_h_filler8', '>i4', (4,)), ('hrs_h_rdtemp', '>i2', (6,)), ('hrs_h_bptemp', '>i2', (6,)), ('hrs_h_eltemp', '>i2', (6,)), ('hrs_h_pchtemp', '>i2', (6,)), ('hrs_h_fhcc', '>i2', (6,)), ('hrs_h_scnmtemp', '>i2', (6,)), ('hrs_h_fwmtemp', '>i2', (6,)), ('hrs_h_p5v', '>i2', (6,)), ('hrs_h_p10v', '>i2', (6,)), ('hrs_h_p75v', '>i2', (6,)), ('hrs_h_m75v', '>i2', (6,)), ('hrs_h_p15v', '>i2', (6,)), ('hrs_h_m15v', '>i2', (6,)), ('hrs_h_fwmcur', '>i2', (6,)), ('hrs_h_scmcur', '>i2', (6,)), ('hrs_h_pchcpow', '>i2', (6,)), ('hrs_h_filler9', '>i4', (890,))])
line_dtype = dtype([('hrs_scnlin', '>i2'), ('hrs_scnlinyr', '>i2'), ('hrs_scnlindy', '>i2'), ('hrs_clockdrift', '>i2'), ('hrs_scnlintime', '>i4'), ('hrs_scnlinf', '>i2'), ('hrs_mjfrcnt', '>i2'), ('hrs_scnpos', '>i2'), ('hrs_scntyp', '>i2'), ('hrs_filler1', '>i4', (2,)), ('hrs_qualind', '>i4'), ('hrs_linqualflgs', '>i4'), ('hrs_chqualflg', '>i2', (20,)), ('hrs_mnfrqual', 'i1', (64,)), ('hrs_filler2', '>i4', (4,)), ('hrs_calcof', '>i4', (60,)), ('hrs_scalcof', '>i4', (60,)), ('hrs_filler3', '>i4', (3,)), ('hrs_navstat', '>i4'), ('hrs_attangtime', '>i4'), ('hrs_rollang', '>i2'), ('hrs_pitchang', '>i2'), ('hrs_yawang', '>i2'), ('hrs_scalti', '>i2'), ('hrs_ang', '>i2', (168,)), ('hrs_pos', '>i4', (112,)), ('hrs_filler4', '>i4', (2,)), ('hrs_elem', '>i2', (1536,)), ('hrs_filler5', '>i4', (3,)), ('hrs_digbinvwbf', '>i2'), ('hrs_digitbwrd', '>i2'), ('hrs_aninvwbf', '>i4'), ('hrs_anwrd', 'i1', (16,)), ('hrs_filler6', '>i4', (11,))])
pdf_definition_pages = (26, 37)
satellites = {'noaa17': {'N17', 'noaa17', 'n17', 'NOAA17'}, 'noaa15': {'n15', 'noaa15', 'NOAA15', 'N15'}, 'noaa16': {'N16', 'NOAA16', 'noaa16', 'n16'}}
start_date = datetime.datetime(1999, 1, 1, 0, 0)
version = 3
class typhon.datasets.tovs.HIRS4(*args, **kwargs)[source]

Bases: typhon.datasets.tovs.HIRSKLM

Functionality in common with all ATOVS.

Designed as mixin.

channel_order = array([ 1, 17, 2, 3, 13, 4, 18, 11, 19, 7, 8, 20, 10, 14, 6, 5, 15, 12, 16, 9])
end_date = datetime.datetime(2016, 12, 31, 0, 0)
get_mask_from_flags(header, lines, max_flagged=0.5)[source]

Set mask in lines, based on header and lines info

Given header and lines such as returned by self.read, determine flags and set them on lines as appropriate. Returns lines as a masked array.

Docstring inherited from HIRS

Docstring inherited from HIRSKLM

get_temp(header, elem, anwrd)[source]

Extract temperatures

header_dtype = dtype([('hrs_h_siteid', 'S3'), ('hrs_h_blank', 'S1'), ('hrs_h_l1bversnb', '>i2'), ('hrs_h_l1bversyr', '>i2'), ('hrs_h_l1bversdy', '>i2'), ('hrs_h_reclg', '>i2'), ('hrs_h_blksz', '>i2'), ('hrs_h_hdrcnt', '>i2'), ('hrs_h_filler0', '>i2', (3,)), ('hrs_h_dataname', 'S42'), ('hrs_h_prblkid', 'S8'), ('hrs_h_satid', '>i2'), ('hrs_h_instid', '>i2'), ('hrs_h_datatyp', '>i2'), ('hrs_h_tipsrc', '>i2'), ('hrs_h_startdatajd', '>i4'), ('hrs_h_startdatayr', '>i2'), ('hrs_h_startdatady', '>i2'), ('hrs_h_startdatatime', '>i4'), ('hrs_h_enddatajd', '>i4'), ('hrs_h_enddatayr', '>i2'), ('hrs_h_enddatady', '>i2'), ('hrs_h_enddatatime', '>i4'), ('hrs_h_cpidsyr', '>i2'), ('hrs_h_cpidsdy', '>i2'), ('hrs_h_fov1offset', '>i2'), ('hrs_h_instrtype', 'S6'), ('hrs_h_inststat1', '>i4'), ('hrs_h_filler1', '>i2'), ('hrs_h_statchrecnb', '>i2'), ('hrs_h_inststat2', '>i4'), ('hrs_h_scnlin', '>i2'), ('hrs_h_callocsclin', '>i2'), ('hrs_h_misscnlin', '>i2'), ('hrs_h_datagaps', '>i2'), ('hrs_h_okdatafr', '>i2'), ('hrs_h_pacsparityerr', '>i2'), ('hrs_h_auxsyncerrsum', '>i2'), ('hrs_h_timeseqerr', '>i2'), ('hrs_h_timeseqerrcode', '>i2'), ('hrs_h_socclockupind', '>i2'), ('hrs_h_locerrind', '>i2'), ('hrs_h_locerrcode', '>i2'), ('hrs_h_pacsstatfield', '>i2'), ('hrs_h_pacsdatasrc', '>i2'), ('hrs_h_filler2', '>i4'), ('hrs_h_spare1', 'S8'), ('hrs_h_spare2', 'S8'), ('hrs_h_filler3', '>i2', (5,)), ('hrs_h_autocalind', '>i2'), ('hrs_h_solarcalyr', '>i2'), ('hrs_h_solarcaldy', '>i2'), ('hrs_h_calinf', '>i4', (80,)), ('hrs_h_filler4', '>i4', (2,)), ('hrs_h_tempradcnv', '>i4', (57,)), ('hrs_h_20solfiltirrad', '>i2'), ('hrs_h_20equifiltwidth', '>i2'), ('hrs_h_filler5', '>i4', (2,)), ('hrs_h_modelid', 'S8'), ('hrs_h_nadloctol', '>i2'), ('hrs_h_locbit', '>i2'), ('hrs_h_filler6', '>i2'), ('hrs_h_rollerr', '>i2'), ('hrs_h_pitcherr', '>i2'), ('hrs_h_yawerr', '>i2'), ('hrs_h_epoyr', '>i2'), ('hrs_h_epody', '>i2'), ('hrs_h_epotime', '>i4'), ('hrs_h_smaxis', '>i4'), ('hrs_h_eccen', '>i4'), ('hrs_h_incli', '>i4'), ('hrs_h_argper', '>i4'), ('hrs_h_rascnod', '>i4'), ('hrs_h_manom', '>i4'), ('hrs_h_xpos', '>i4'), ('hrs_h_ypos', '>i4'), ('hrs_h_zpos', '>i4'), ('hrs_h_xvel', '>i4'), ('hrs_h_yvel', '>i4'), ('hrs_h_zvel', '>i4'), ('hrs_h_earthsun', '>i4'), ('hrs_h_filler7', '>i4', (4,)), ('hrs_h_rdtemp', '>i4', (6,)), ('hrs_h_bptemp', '>i4', (6,)), ('hrs_h_eltemp', '>i4', (6,)), ('hrs_h_pchtemp', '>i4', (6,)), ('hrs_h_fhcc', '>i4', (6,)), ('hrs_h_scnmtemp', '>i4', (6,)), ('hrs_h_fwmtemp', '>i4', (6,)), ('hrs_h_p5v', '>i4', (6,)), ('hrs_h_p10v', '>i4', (6,)), ('hrs_h_p75v', '>i4', (6,)), ('hrs_h_m75v', '>i4', (6,)), ('hrs_h_p15v', '>i4', (6,)), ('hrs_h_m15v', '>i4', (6,)), ('hrs_h_fwmcur', '>i4', (6,)), ('hrs_h_scmcur', '>i4', (6,)), ('hrs_h_pchcpow', '>i4', (6,)), ('hrs_h_iwtcnttmp', '>i4', (30,)), ('hrs_h_ictcnttmp', '>i4', (24,)), ('hrs_h_tttcnttmp', '>i4', (6,)), ('hrs_h_fwcnttmp', '>i4', (24,)), ('hrs_h_patchexpcnttmp', '>i4', (6,)), ('hrs_h_fsradcnttmp', '>i4', (6,)), ('hrs_h_scmircnttmp', '>i4', (6,)), ('hrs_h_pttcnttmp', '>i4', (6,)), ('hrs_h_sttcnttmp', '>i4', (6,)), ('hrs_h_bpcnttmp', '>i4', (6,)), ('hrs_h_electcnttmp', '>i4', (6,)), ('hrs_h_patchfcnttmp', '>i4', (6,)), ('hrs_h_scmotcnttmp', '>i4', (6,)), ('hrs_h_fwmcnttmp', '>i4', (6,)), ('hrs_h_chsgcnttmp', '>i4', (6,)), ('hrs_h_conversions', '>i4', (11,)), ('hrs_h_moonscnlin', '>i2'), ('hrs_h_moonthresh', '>i2'), ('hrs_h_avspcounts', '>i4', (20,)), ('hrs_h_startmanyr', '>i2'), ('hrs_h_startmandy', '>i2'), ('hrs_h_startmantime', '>i4'), ('hrs_h_endmanyr', '>i2'), ('hrs_h_endmandy', '>i2'), ('hrs_h_endmantime', '>i4'), ('hrs_h_deltav', '>i4', (3,)), ('hrs_h_mass', '>i4', (2,)), ('hrs_h_filler8', '>i2', (1302,))])
line_dtype = dtype([('hrs_scnlin', '>i2'), ('hrs_scnlinyr', '>i2'), ('hrs_scnlindy', '>i2'), ('hrs_clockdrift', '>i2'), ('hrs_scnlintime', '>i4'), ('hrs_scnlinf', '>i2'), ('hrs_mjfrcnt', '>i2'), ('hrs_scnpos', '>i2'), ('hrs_scntyp', '>i2'), ('hrs_filler1', '>i4', (2,)), ('hrs_qualind', '>i4'), ('hrs_linqualflgs', '>i4'), ('hrs_chqualflg', '>i2', (20,)), ('hrs_mnfrqual', 'i1', (64,)), ('hrs_filler2', '>i4', (4,)), ('hrs_calcof', '>i4', (60,)), ('hrs_scalcof', '>i4', (60,)), ('hrs_yawsteering', '>i2', (3,)), ('hrs_totattcorr', '>i2', (3,)), ('hrs_navstat', '>i4'), ('hrs_attangtime', '>i4'), ('hrs_rollang', '>i2'), ('hrs_pitchang', '>i2'), ('hrs_yawang', '>i2'), ('hrs_scalti', '>i2'), ('hrs_ang', '>i2', (168,)), ('hrs_pos', '>i4', (112,)), ('hrs_moonang', '>i2'), ('hrs_filler3', '>i2', (3,)), ('hrs_elem', '>i2', (1536,)), ('hrs_filler4', '>i4', (3,)), ('hrs_digitbupdatefg', '>i2'), ('hrs_digitbwrd', '>i2'), ('hrs_analogupdatefg', '>i4'), ('hrs_anwrd', 'i1', (16,)), ('hrs_filler5', '>i4', (11,))])
pdf_definition_pages = (38, 54)
satellites = {'noaa18': {'NOAA18', 'N18', 'n18', 'noaa18'}, 'metopa': {'metopa', 'METOPA', 'MA', 'ma'}, 'metopb': {'mb', 'metopb', 'MB', 'METOPB'}, 'noaa19': {'noaa19', 'N19', 'NOAA19', 'n19'}}
start_date = datetime.datetime(2005, 6, 5, 0, 0)
version = 4
class typhon.datasets.tovs.HIRSHIRS(**kwargs)[source]

Bases: typhon.datasets.tovs.TOVSCollocatedDataset, typhon.datasets.dataset.NetCDFDataset, typhon.datasets.dataset.MultiFileDataset

HIRS-HIRS collocations from Brockmann Consult

A.k.a. MMD05

combine(M, other_obj, *args, col_field, **kwargs)[source]

Combine with data from other dataset.

Combine a set of measurements from this dataset with another dataset, where each individual measurement correspond to exactly one from the other one, as identified by time/lat/lon, orbitid, or measurument id, or other characteristics. The object attribute unique_fields determines how those are found.

The other dataset may contain flags, DMPs, or different information altogether.

Arguments:

my_data (ndarray): Data for self.
A (masked) array with a dtype such as returned from self.read <Dataset.read>.
other_obj (Dataset): Dataset to match
Object from a Dataset subclass from which to find matching data.
other_data (ndarray): Data for other. Optional.
Optionally, pass data for other object. If not provided or None, this will be read using other_obj.
other_args (dict): Keyword arguments passed to
other_obj.read_period. May need to contain things like {“locator_args”: {“satname”: “noaa18”}}
trans (collections.OrderedDict): Dictionary of what field in my_data
corresponds to what field in other_data. Optional; by default, merges self.unique_fields and other_obj.unique_fields, and assumes names between the two are identical. Order is relevant for optimal recursive bisection search for matches, which is to be implemented.
timetol (timedelta64): For datetime types, isclose does not
work (https://github.com/numpy/numpy/issues/5610). User must pass an explicit tolerance, defaulting to 1 second.

Returns:

Masked ndarray of same size as my_data and same dtype as returned by other_obj.read.

TODO: Allow user to pass already-read data from other dataset.

Docstring inherited from MultiFileDataset

end_date = datetime.datetime(2016, 12, 31, 0, 0)
name = 'hirshirs'
re = 'mmd05_hirs-(?P<prim>.{2,3})_hirs-(?P<sec>.{2,3})_(?P<year>\\d{4})-(?P<doy>\\d{3})_(?P<year_end>\\d{4})-(?P<doy_end>\\d{3})\\.nc'
section = 'hirshirs'
start_date = datetime.datetime(1978, 10, 29, 0, 0)
subdir = 'hirs_{prim:s}_{sec:s}'
class typhon.datasets.tovs.HIRSKLM(*args, **kwargs)[source]

Bases: typhon.datasets.tovs.ATOVS, typhon.datasets.tovs.HIRS

Functionality in common with all ATOVS.

Designed as mixin.

calibrate(cc, counts)[source]

Apply the standard calibration from NOAA KLM User’s Guide.

NOAA KLM User’s Guide, section 7.2, equation (7.2-3), page 7-12, PDF page 286:

r = a₀ + a₁C + a₂²C

where C are counts and a₀, a₁, a₂ contained in hrs_calcof as documented in the NOAA KLM User’s Guide: - Section 8.3.1.5.3.1, Table 8.3.1.5.3.1-1. and - Section 8.3.1.5.3.2, Table 8.3.1.5.3.2-1.,

counts_offset = 4096
dist_space_iwct = 1
get_cc(scanlines)[source]

Extract calibration coefficients from scanlines.

Docstring inherited from HIRS

get_dataname(header)[source]

Extract dataname from header.

Docstring inherited from HIRS

get_dtypes(f)[source]

Get dtypes for file.

Needs an open file object. Used internally by reading routine.

Docstring inherited from HIRS

get_mask_from_flags(header, lines, max_flagged=0.5)[source]

Set mask in lines, based on header and lines info

Given header and lines such as returned by self.read, determine flags and set them on lines as appropriate. Returns lines as a masked array.

Docstring inherited from HIRS

get_other(scanlines)[source]

Get other information from scanlines.

Exact content depends on implementation. Can be things like scantype, solar zenith angles, etc.

Docstring inherited from HIRS

static get_pos(scanlines)[source]
get_temp(header, elem, anwrd)[source]

Get temperatures from header, element, anwrd

Used internally. FIXME DOC.

Docstring inherited from HIRS

get_wn_c1_c2(header)[source]

Read central wavenumber, c1, and c2 from header

Given a header such as returned by self.read, return central wavenumber and the coefficients c1 and c2.

Docstring inherited from HIRS

n_wordperframe = 24
static read_cpids(path)[source]

Read calibration parameters input data sets (CPIDS)

Should contain a CPIDS file for HIRS, such as NK.cpids.HIRS. Read telemetry conversiot data from a Calibration Parameters Input Data Sets (CPIDS) source file, such as available at NOAA. Some were sent by Dejiang Han <dejiang.han@noaa.gov> to Gerrit Holl <g.holl@reading.ac.uk> on 2016-02-17.

scantype_fieldname = 'hrs_scntyp'
seekhead(f)[source]

Seek open file to header position.

For some files, CLASS prepends 512 bytes to the file. This method shall make sure the file pointer is in the correct position to start reading.

Docstring inherited from HIRS

views = ('iwt', 'space', 'Earth')
class typhon.datasets.tovs.HIRSPOD(*args, **kwargs)[source]

Bases: typhon.datasets.tovs.HIRS

Read early HIRS such as documented in POD guide.

Methods are mostly as for HIRS class.

calibrate(cc, counts)[source]

Apply the standard calibration from NOAA POD Guide

Returns radiance in SI units (W m^-2 sr^-1 Hz^-1).

POD Guide, section 4.5

counts_offset = 0
dist_space_iwct = 2
get_cc(scanlines)[source]

Extract calibration coefficients from scanlines.

Docstring inherited from HIRS

get_dataname(header)[source]

Extract dataname from header.

Docstring inherited from HIRS

get_dtypes(f)[source]

Get dtypes for header and lines

Takes as argument fp to open granule file.

Before 1995, a record was 4256 bytes. After 1995, it became 4253 bytes. This change appears undocumented but can be find in the AAPP source code at AAPP/src/tools/bin/hirs2_class_to_aapp.F

get_mask_from_flags(header, lines, max_flagged=0.5)[source]

Set mask in lines, based on header and lines info

Given header and lines such as returned by self.read, determine flags and set them on lines as appropriate. Returns lines as a masked array.

Docstring inherited from HIRS

get_other(scanlines)[source]

Get other information from scanlines.

Exact content depends on implementation. Can be things like scantype, solar zenith angles, etc.

Docstring inherited from HIRS

static get_pos(scanlines)[source]
get_temp(header, elem, anwrd)[source]

Get temperatures from header, element, anwrd

Used internally. FIXME DOC.

Docstring inherited from HIRS

get_wn_c1_c2(header)[source]

Read central wavenumber, c1, and c2 from header

Given a header such as returned by self.read, return central wavenumber and the coefficients c1 and c2.

Docstring inherited from HIRS

n_wordperframe = 22
ref_lza = array([ 59.19, 56.66, 54.21, 51.82, 49.48, 47.19, 44.93, 42.7 , 40.5 , 38.33, 36.17, 34.03, 31.91, 29.8 , 27.7 , 25.61, 23.53, 21.46, 19.4 , 17.34, 15.29, 13.24, 11.2 , 9.16, 7.12, 5.09, 3.05, 1.02, 1.01, 3.05, 5.08, 7.12, 9.15, 11.19, 13.24, 15.28, 17.34, 19.39, 21.46, 23.53, 25.6 , 27.69, 29.79, 31.9 , 34.02, 36.16, 38.32, 40.49, 42.69, 44.92, 47.18, 49.47, 51.81, 54.2 , 56.65, 59.18])
scantype_fieldname = 'scantype'
seekhead(f)[source]

Seek open file to header position.

For some files, CLASS prepends 512 bytes to the file. This method shall make sure the file pointer is in the correct position to start reading.

Docstring inherited from HIRS

typ_ict = 2
views = ('ict', 'iwt', 'space', 'Earth')
class typhon.datasets.tovs.IASIEPS(**kwargs)[source]

Bases: typhon.datasets.dataset.MultiFileDataset, typhon.datasets.dataset.HyperSpectral

Read IASI from EUMETSAT EPS L1C

This class depends on the Common Data Access Toolbox (CODA).

http://stcorp.nl/coda/

Reading data requires that in TYPHONRC, the variables ‘tmpdir’ and ‘tmpdirb’ are set in [main].

end_date = datetime.datetime(2015, 11, 17, 16, 38, 59)
granule_duration = datetime.timedelta(0, 6200)
minspace = 10000000000.0
name = 'iasi'
section = 'iasi'
start_date = datetime.datetime(2007, 5, 29, 5, 8, 56)
class typhon.datasets.tovs.IASISub(**kwargs)[source]

Bases: typhon.datasets.dataset.HomemadeDataset, typhon.datasets.dataset.HyperSpectral

For any dataset created by typhon or its dependencies.

Currently supports only saving to npz, through the save_npz method. Eventually, should also support other file formats, in particular NetCDF.

end_date = datetime.datetime(2011, 12, 31, 23, 59, 59)
get_times_for_granule(p, **kwargs)[source]

For granule stored in path, get start and end times.

May take hints for year, month, day, hour, minute, second, and their endings, according to self.datefields

Arguments:

p (pathlib.Path): path to granule **kwargs: Any more info that may be needed
Returns:
(datetime, datetime): Start and end time for granule

Docstring inherited from HomemadeDataset

name = 'iasisub'
re = 'IASI_1C_selection_(?P<year>\\d{4})_(?P<month>\\d{1,2})_(?P<day>\\d{1,2}).npz'
section = 'iasisub'
start_date = datetime.datetime(2011, 1, 1, 0, 0)
stored_name = 'IASI_1C_selection_{year}_{month}_{day}.npz'
subdir = '{month}'
class typhon.datasets.tovs.MHSL1C(**kwargs)[source]

Bases: typhon.datasets.tovs.ATOVS, typhon.datasets.dataset.NetCDFDataset, typhon.datasets.dataset.MultiFileDataset

Functionality in common with all ATOVS.

Designed as mixin.

end_date = datetime.datetime(2016, 4, 1, 0, 0)
name = 'mhs_l1c'
re = '(\\d*\\.)?NSS.MHSX.(?P<satcode>.{2})\\.D(?P<year>\\d{2})(?P<doy>\\d{3})\\.S(?P<hour>\\d{2})(?P<minute>\\d{2})\\.E(?P<hour_end>\\d{2})(?P<minute_end>\\d{2})\\.B(?P<B>\\d{7})\\.(?P<station>.{2})\\.h5'
section = 'mhs_l1c'
start_date = datetime.datetime(1999, 1, 1, 0, 0)
subdir = '{satname:s}_mhs_{year:04d}/{month:02d}/{day:02d}'
class typhon.datasets.tovs.Radiometer[source]

Bases: object

srf_backend_f = ''
srf_backend_response = ''
srf_dir = ''
class typhon.datasets.tovs.TOVSCollocatedDataset[source]

Bases: object

Mixin for any TOVS collocated dataset. Different because of scanlines.

Should be mixed in before Dataset

combine(M, other_obj, *args, col_field, col_field_slice=slice(None, None, None), **kwargs)[source]