typhon.math

Maths-related modules.

typhon.math.calculate_precisely(f)[source]

Raise all arguments to their highest numpy precision.

This decorator copies any floats to f8, ints to i8, preserving masked arrays and/or ureg units.

Currently only works for simple dtypes of float, int, or uint.

This makes a copy. Therefore, it is memory-intensive and it does not work if function has need to change values in-place.

Experimental function.

See also: https://github.com/numpy/numpy/issues/593

typhon.math.integrate_column(y, z, axis=None)[source]

Calculate the column integral of given data.

Parameters:
  • y (np.array) – Data array.
  • z (np.array) – Height levels.
Returns:

Column integral.

Return type:

float

typhon.math.nlogspace(start, stop, num=50)[source]

Creates a vector with equally logarithmic spacing.

Creates a vector with length num, equally logarithmically spaced between the given end values.

Parameters:
  • start (int) – The starting value of the sequence.
  • stop (int) – The end value of the sequence, unless endpoint is set to False.
  • num (int) – Number of samples to generate. Default is 50. Must be non-negative.

Returns: ndarray.

typhon.math.promote_maximally(x)[source]

Return copy of x with high precision dtype.

Converts input of ‘f2’, ‘f4’, or ‘f8’ to ‘f8’. Please don’t pass f16. f16 is misleading and naughty.

Converts input of ‘u1’, ‘u2’, ‘u4’, ‘u8’ to ‘u8’.

Converts input of ‘i1’, ‘i2’, ‘i4’, ‘i8’ to ‘i8’.

Naturally, this copies the data and increases memory usage.

Anything else is returned unchanged.

If you input a pint quantity you will get back a pint quantity.

Experimental function.

typhon.math.sum_digits(n)[source]

Calculate the sum of digits.

Parameters:n (int) – Number.
Returns:Sum of digitis of n.
Return type:int

Examples

>>> sum_digits(42)
6

typhon.math.array

Functions operating on arrays

typhon.math.array.limit_ndarray(M, limits)[source]

Select elements from structured ndarray based on value ranges

This function filters a structured ndarray based on ranges defined for zero or more fields. For each field f with limits (lo, hi), it will select only those elements where lo<=X[f]<hi.

>>> X = array([(2, 3), (4, 5), (8, 2), (5, 1)],
               dtype=[("A", "i4"), ("B", "i4")])
>>> print(limit_ndarray(X, {"A": (2, 5)}))
[(2, 3) (4, 5)]
>>> X = array([([2, 3], 3), ([4, 6], 5), ([8, 3], 2), ([5, 3], 1)],
               dtype=[("A", "i4", 2), ("B", "i4")])
>>> print(limit_ndarray(X, {"A": (2, 5, "all")}))
[([2, 3], 3)]
Parameters:
  • M (numpy.ndarray) – 1-D structured ndarray
  • limits (dict) – Dictionary with limits. Keys must correspond to fields in M. If this is a scalar field (M.dtype[field].shape==()), values are tuples (lo, hi). If this is a multidimensional field, values are tuples (lo, hi, mode), where mode must be either all or any. Values in the range [lo, hi) are retained, applying all or any when needed.
Returns:

ndarray subset of M. This is a view, not a copy.

typhon.math.array.localmin(arr)[source]

Find local minima for 1-D array

Given a 1-dimensional numpy.ndarray, return the locations of any local minimum as a boolean array. The first and last item are always considered False.

Parameters:localmin (numpy.ndarray) – 1-D ndarray for which to find local minima. Should have a numeric dtype.
Returns:numpy.ndarray with dtype bool. True for any element that is strictly smaller than both neighbouring elements. First and last element are always False.
typhon.math.array.mad_outliers(arr, cutoff=10, mad0='raise')[source]

Mask out mad outliers

Mask out any values that are more than N times the median absolute devitation from the median.

Although I (Gerrit Holl) came up with this myself, it’s also documented at:

http://eurekastatistics.com/using-the-median-absolute-deviation-to-find-outliers/

except that I rolled by own approach for “what if mad==0”.

Note: If all values except one are constant, it is not possible to determine whether the remaining one is an outlier or “reasonably close” to the rest, without additional hints. In this case, some outliers may go unnoticed.

Parameters:
  • arr (numpy.ndarray) – n-D array with numeric dtype
  • cutoff (int) – Maximum tolerable normalised fractional distance
  • mad0 (str) – What to do if mad=0. Can be ‘raise’, ‘ignore’, or ‘perc’. In case of ‘perc’, will search for the lowest percentile at which the percentile absolute deviation is nonzero, increase the cutoff by the fractional approach toward percentile 100, and use that percentile instead. So if the first non-zero is at percentile 75%, it will use the 75th-percntile-absolute-deviation and increase the cutoff by a factor (100 - 50)/(100 - 75).
Returns:

ndarray with bool dtype, True for outliers

typhon.math.array.parity(v)[source]

Vectorised parity-checking.

For any ndarray with an nd.integer dtype, return an equally shaped array with the bit parity for each element.

Parameters:v (numpy.ndarray) – Array of integer dtype
Returns:ndarray with uint8 dtype with the parity for each value in v

typhon.math.stats

Various statistical functions

typhon.math.stats.adev(x, dim=-1)[source]

Calculate Allan deviation in its simplest form

Parameters:
  • x (ndarray or xarray DataArray) – n-dim array for Allan calculation
  • dim (int or str) – optional, axis to operate along, defaults to last. If you pass a str, x must be a xarray.DataArray and the dimension will be a name.
\[\sigma = \sqrt{\frac{1}{2(N-1)} \sum_{i=1}^{N-1} (y_{i+1} - y_i)^2}\]

Equation source: Jon Mittaz, personal communication, April 2016

typhon.math.stats.bin(x, y, bins)[source]

Bin/bucket y according to values of x.

Returns list of arrays, one element with values for bin. Digitising happens with numpy.digitize.

Parameters:
  • x (ndarray) – Coordinate that y is binned along
  • y (ndarray) – Data to be binned. First dimension must match length of x. All subsequent dimensions are left untouched.
  • bins (ndarray) – Bins according to which sort data.
Returns:

List of arrays, one element per bin.

typhon.math.stats.bin_nd(binners, bins, data=None)[source]

Bin/bucket data in arbitrary number of dimensions

For example, one can bin geographical data according to lat/lon through:

>>> binned = bin_nd([lats, lons], [lat_bins, lon_bins])

The actually binned data are the indices for the arrays lats/lons, which hopefully corresponds to indices in your actual data.

Data that does not fit in any bin, is not binned anywhere.

Note: do NOT pass the 3rd argument, data. This is used purely for the implementation using recursion. Passing anything here explicitly is a recipe for disaster.

Parameters:
  • binners (List[ndarray]) – Axes that data is binned at. This is akin to the x-coordinate in :func:bin.
  • bins (List[ndarray]) – Edges for the bins according to which bin data.
Returns:

n-D ndarray of type object, with indices describing what bin elements belong to.

typhon.math.stats.corrcoef(mat)[source]

Calculate correlation coefficient with p-values

Calculate correlation coefficients along with p-values.

Parameters:mat (ndarray) – 2-D array [p×N] for which the correlation matrix is calculated
Returns:(r, p) where r is a p×p matrix with the correlation coefficients, obtained with numpy.corrcoef, and p is

Attribution:

this code, or an earlier version was posted by user ‘jingchao’ on Stack Overflow at 2014-7-3 at http://stackoverflow.com/a/24547964/974555 and is licensed under CC BY-SA 3.0. This notice may not be removed.
typhon.math.stats.get_distribution_as_percentiles(x, y, bins, ptiles=(5, 25, 5, 75, 95))[source]

get the distribution of y vs. x as percentiles.

Bin y-data according to x-data (using typhon.math.stats.bin()). Then, within each bin, calculate percentiles.

Parameters:
  • x (ndarray) – data for x-axis
  • y (ndarray) – data for y-axis
  • bins (ndarray) – Specific bins to use for dividing the x-data.
  • ptiles (ndarray) – Percentiles to use.