Submodule: opexebo.general¶

General signal processing functions

Overview¶

Signal processing

`opexebo.general.smooth`(data, sigma, **kwargs)	Smooth provided data with a Gaussian kernel
`opexebo.general.upsample`(array, upscale, …)	A MaskedArray aware upsampling routine.
`opexebo.general.peak_search`(image, **kwargs)	Given a 1D or 2D array, return a list of co-ordinates of the local maxima or minima
`opexebo.general.circular_mask`(axes, …)	Given a set of axes, construct a circular mask defining pixels as within or without a circular arena.
`opexebo.general.power_spectrum`(values, …)	Calculate the power spectrum of a time-series of data
`opexebo.general.spatial_cross_correlation`(…)	Calculate the Pearson cross correlation between two arrays.
`opexebo.general.shuffle`(times, offset_lim, …)	Duplicate the provided time series `iterations` number of times.
`opexebo.general.walk_filter`(speed, …[, fmt])	It is common practice when studying a freely moving subject to exclude data from periods when the subject was stationary, or nearly stationary.

General Helper functions

`opexebo.general.accumulate_spatial`(pos, …)	Given a list of positions, create a histogram of those positions.
`opexebo.general.normxcorr2_general`(array)	Calculate spatial autocorrelation.
`opexebo.general.fit_ellipse`(X, Y)	Fit an ellipse to the provided set of X, Y co-ordinates
`opexebo.general.bin_width_to_bin_number`(…)	This conversion is done in several separate places, and must be done the same in every case.

Detailed Documentation¶

opexebo.general.normxcorr2_general(array)¶

Calculate spatial autocorrelation.

Python implementation of the Matlab generalized-normalized cross correlation function, adapted by Vadim Frolov. Some generality was abandoned in the adaption as unnecessary for autocorrelogram calculation.

For the original function, see https://se.mathworks.com/matlabcentral/fileexchange/29005-generalized-normalized-cross-correlation

Parameters:	array: NxM matrix firing array. array is not necessary a numpy array. Must not contain NaNs!
Returns:	np.ndarray Resulting correlation matrix

opexebo.general.smooth(data, sigma, **kwargs)¶

Smooth provided data with a Gaussian kernel

The smoothing is done with a routine from the astronomical package astropy Like scipy.ndimage.gaussian_filter, this does not handle MaskedArrays - but it handles NaNs much better. Specifically, astropy.convolution.convolve replaces NaN values with an interpolation across the void region.

Therefore, to handle masked arrays, the data at masked positions is replaced by np.nan prior to smoothing, and thus avoids influencing nearby, unmasked cells. The masked cells are then returned to their original values prior to return.

The package is discussed at http://docs.astropy.org/en/stable/convolution/index.html

Parameters:

data: np.ndarray or np.ma.MaskedArray: Data that will be smoothed
sigma: float: Standard deviations for Gaussian kernel in units of pixels/bins
mask_fill: float, optional: The value that masked locations should be treated as This can either be provided as an absolute number (e.g. 0), or nan If nan, then each masked location will get a value by interpolating from nearby cells. This will only apply if the input is a MaskedArray to start with. If the input is a standard np.ndarray, then no values will be substituted, even if there are nans present.
circular: bool, optional: If True, then smoothing at the edge of the array will be handled in a circular manner, i.e. the value to the left of data[0] will be data[-1]. If False, the edge will be handled by padding with values equal to the boundary value. Default False

Returns:

smoothed_data: np.ndarray or np.ma.MaskedArray: Smoothed data

Notes

BNT.+general.smooth http://docs.astropy.org/en/stable/convolution/index.html https://github.com/astropy/astropy/issues/6511

opexebo.general.accumulate_spatial(pos, arena_size, **kwargs)¶

Given a list of positions, create a histogram of those positions. The resulting histogram is typically referred to as a map.

The complexity in this function comes down to selecting where the edges of the arena are, and generating the bins within those limits.

The histogram bin edges must be defined in one of 3 different ways

bin_width: based on the keyword arena_size, the number of bins will be calculated as

opexebo.general.bin_width_to_bin_number

The histogram will use num_bins between the minimum and maximum of the positions (or limit if provided)

bin_number: the histogram will use bin_number of bins between the minimum and maximum of the positions (or limit if provided)

bin_edges: the histogram will use the provided `bin_edg`e arrays

Either zero or one of the three bin_* keyword arguments must be defined. If none are defined, then a default bin_width is used. If more than 1 is defined, an error is raised

Parameters:

pos: np.ndarray

1D or 2D array of positions in row-major format, i.e. x = pos[0], y = pos[1]. This matches the simplest input creation pos = np.array( [x, y] )

arena_size: float or tuple of floats

Dimensions of arena (in cm)

For a linear track, length
For a circular arena, diameter
For a rectangular arena, length or (length, length)

bin_width: float

Bin size in cm. Default 2.5cm. If bin_width is supplied, limit must also be supplied. One of bin_width, bin_number, bin_edges must be provided

bin_number: int or tuple of int

Number of bins. If provided as a tuple, then (x_bins, y_bins). One of bin_width, bin_number, bin_edges must be provided

bin_edges: array-like

Edges of the bins. Provided either as edges or (x_edges, y_edges). One of bin_width, bin_number, bin_edges must be provided

limits: tuple or np.ndarray

(x_min, x_max) or (x_min, x_max, y_min, y_max) Provide concrete limits to the range over which the histogram searches Any observations outside these limits are discarded If no limits are provided, then use np.nanmin(data), np.nanmax(data) to generate default limits. As is standard in python, acceptable values include the lower bound and exclude the upper bound

Returns:

hist: np.ndarray: 1D or 2D histogram of the occurrences of the input observations Dimensions given by arena_size/bin_width Not normalised - each cell gives the integer number of occurrences of the observation in that cell
edges: list-like: x, or (x, y), where x, y are 1d np.ndarrays Here x, y correspond to the output histogram

opexebo.general.shuffle(times: numpy.ndarray, offset_lim: float, iterations: int, t_start: float = None, t_stop: float = None)¶

Duplicate the provided time series iterations number of times. Each duplicate will be incremented circularly by a random value not smaller than offset_lim.

Circular incrementation results in (the majority) of time _differences_ remaining preserved

Initially, we have a time series, times, both with values in the range [min(times), max(times)]. t_start may be smaller than min(times), and t_stop may be larger than max(times)
iterations number of duplicates of times are created.
In each iteraction, a random increment T is generated, and added to each value in that iteration, such that values now fall into the range [min(times)+T, max(times)+T]. max(times)+T may exceed t_stop.
All timestamps matching t_n > t_stop are mapped back into the range

[t_start, t_stop] by subtracting (t_stop-t_start) * The iteration is re-ordered by value (moving those beyond the far edge

back to the beginning)

Parameters:

times : np.ndarray: 1D array of floats. Time series data to be shuffled
offset_lim : float: Minimum offset from the original time values. Each iteration is incremented by a random value evenly distributed in the range [offset_lim, t_stop-offset_lim]
iterations : int: Number of repeats of times to be returned
t_start : float, optional: Lower bound of time domain. Must meet the criteria t_start <= min(times) Defaults to min(times)
t_stop : float, optional: Upper bound of time domain. Must meet the criteria t_stop >= max(times) Defaults to max(times)

Returns:

output : np.ndarray: iterations x N array of times. A single iteration is accessed as output[i]
increments : np.ndarray: 1D array of offset values that were used

opexebo.general.validatekeyword__arena_size(kwv, provided_dimensions)¶

Decipher the possible meanings of the keyword “arena_size”.

“arena_size” is given to describe the arena in which the animal is moving It should be either a float, or an array-like of 2 floats (x, y)

Parameters:	kw: float or array-like of floats The value given for the keyword arena_size provided_dimensions : int the number of spatial dimensions provided to the original function. Acceptable values are 1 or 2 E.g. if the original function was provided with positions = [t, x, y], then provided_dimensions=2 (x and y)
Returns:	arena_size : float or np.ndarray of floats
Raises:	ValueError IndexError

opexebo.general.validate_keyword_arena_shape(arena_shape)¶

Ensure that the arena_shape is a meaningful value

Parameters:	arena_shape : str the value given for the keyword arena_shape
Returns:	arena_shape : str A value that is guaranteed to be an acceptable member of one of the recognised groups of arena_shapes

opexebo.general.fit_ellipse(X, Y)¶

Fit an ellipse to the provided set of X, Y co-ordinates

Based on the approach taken in Authors: Andrew Fitzgibbon, Maurizio Pilu, Bob Fisher Reference: “Direct Least Squares Fitting of Ellipses”, IEEE T-PAMI, 1999

@Article{Fitzgibbon99, author = “Fitzgibbon, A.~W.and Pilu, M. and Fisher, R.~B.”, title = “Direct least-squares fitting of ellipses”, journal = pami, year = 1999, volume = 21, number = 5, month = may, pages = “476–480” } and implemented in MatLab by Vadim Frolov

Parameters:	X - np.ndarray x co-ordinates of points Y - np.ndarray y co-ordinates of points
Returns:	x_centre : float Centre of ellipse y_centre : float Centre of ellipse Ru : float Major radius Rv : float Minor radius theta_rad : float Ellipse orientation (in radians)

Notes

BNT.+general.fitEllipse opexebo.analysis.gridscore

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.

opexebo.general.peak_search(image, **kwargs)¶

Given a 1D or 2D array, return a list of co-ordinates of the local maxima or minima

Multiple searching techniques are provided:

default: uses skimage.morphology.get_maxima

sep: uses the Python wrapper to the Source Extractor astronomy tool to identify peaks

Parameters:

image: np.ndarray: 1D or 2D array of data
search_method : str, optional, {“default”, “sep”}
mask : np.ndarray, optional: Array of masked locations in the image with the same dimensions. Locations where the mask value is True are ignored for the purpose of searching.
maxima: bool, optional: [default search method only] Define whether to search for maxima or minima in the provided array
null_background: bool: [sep search method only] Set the image background to zero for calculation purposes rather than attempt to calculate a background gradient. This should generally be True, as our images are not directly comparable to standard telescope output
threshold : float, optional: [sep search method only] Threshold for identifiying maxima area

Returns:

peak_coords: tuple: Co-ordinates of peaks, in the form ((x0, x1, x2…), (y0, y1, y2…))

Notes

opexebo.general.power_spectrum(values, time_stamps, sampling_frequency, **kwargs)¶

Calculate the power spectrum of a time-series of data

Parameters:

values: np.ndarray: Amplitude of time series at times time_stamps. Must be sampled at a constant frequency
time_stamps: np.ndarray: Time stamps of time-series. Must be the same length as values. Assumed to be in [seconds]
sampling_frequency: float: Sampling rate. Assumed to be in [Hz]
method: str, optional, {“welch”, “fft”}: Method for calculating a power spectrum. Accepted values are: [“welch”, “fft”]. Defaults to “welch”
scale: str, optiona, {“linear”, “log”}: Should the power spectrum be returned in a linear form propto v**2/sqrt(Hz)) (“linear”); or decibel scale (dB/sqrt(Hz)) (“log”) Defaults to “linear”
resolution: float: Desired output frequency resolution. Applicable only to Welch’s method. Due to the averaging effect of Welch’s method, this sets the “effective” resolution, which is NOT the same as the minimum difference in frequencies Default 1 [Hz]. Values lower than 1/(time_series_duration) are meaningless

Returns:

frequencies: np.ndarray: Discrete Fourier Transform sample frequencies
power_spectrum: np.ndarray: Power spectral density at that sample frequency

opexebo.general.spatial_cross_correlation(arr_0, arr_1, **kwargs)¶

Calculate the Pearson cross correlation between two arrays.

Arrays may be either 1d or 2d. If MaskedArrays are supplied, all locations that are masked are excluded from the correlation. Invalid numbers (NaN) are also excluded from the correlation.

If 2d arrays are provided, then calculates a 1d array of row(or column)-wise correlations, controlled by the row_major keyword.

Other Parameters:
Parameters:	arr_0: np.ndarray 1d or 2d array. Must either contain no NaNs, or be an np.ma.MaskedArray with NaN values masked. arr_1: np.ndarray 2nd array to correlate with. Must have the same dimensions as arr_0
Returns:	output_single: float Pearson correlation coefficient between entire arrays. 2d input arrays are flattened to 1d to calculate output_array: np.ndarray 1D array of pearson correlation coefficients, where the i’th value is the coefficient between the i’th rows (if row_major), or the i’th columns of the two arrays Return value is np.nan if 1d arrays are supplied
	row_major: bool, optional If the arrays are 2D, process first by row, or first by column. Default True

Notes

BNT.+analyses.spatialCrossCorrelation()

opexebo.general.bin_width_to_bin_number(length, bin_width)¶

This conversion is done in several separate places, and must be done the same in every case. Therefore, refactor into a quick helper function.

In cases where the bin_width is not a perfect divisor of length, the actual bins will be slightly smaller

Parameters:	length: float or np.ndarray Length of a side to be divided into equally spaced bins bin_width: float Dimension of a square bin or pixel
Returns:	num_bins: int or np.ndarray Same type as length. Integer number of bins.

opexebo.general.circular_mask(axes, diameter, **kwargs)¶

Given a set of axes, construct a circular mask defining pixels as within or without a circular arena.

A pixel is assumed to be inside the circle iff the centre-point is within the radius of the circle. For small radii, this will result in including pixels with <50% of their area inside.

Parameters:

axes: list of np.ndarray

[x, y] - pair of nd-arrays defining the edges of the pixels. This is the return value from np.histogram2d, for instance. Each one is size+1 relative to the number of pixels This function assumes that each dimension has a CONSTANT bin width, although each dimension may have its own bin width.

diameter: float: diameter of the circle
origin: list of floats: [x, y] co-ordinates of the centre of the arena. Optional, defaults to (0,0)

Returns:

in_field: np.ndarray: 2D boolean array of dimensions (axes[0].size-1, axs[1].size-1) Values are true if INSIDE circle and false if OUTSIDE
distance_map: np.ndarray: 2D float array of same size. Values are the distance of the centre of the pixel from the origin
angular_map: np.ndarray: 2D float array of the same size. Values are the angle from the origin to the pixel, in degrees. Angles are given with 0° pointing to increasing y, clockwise.

opexebo.general.upsample(array, upscale, **kwargs)¶

A MaskedArray aware upsampling routine.

skimage provides more reliable methods for interpolated upsampling of simple ndarrays. However, this is not appropriate to MaskedArrays, i.e. where certain locations have invalid values that should not expand.

For non-masked arrays, this function can optionally rely on the skimage routine allowing fractional upscaling. For MaskedArrays, integer upscaling is enforced, with no interpolation

Parameters:

array: array-like

Supports list, tuple, np.ndarray or np.ma.MaskedArray. Supports 1d and 2d (2d for true arrays only). If the array contains any non-finite values, or is a MaskedArray, then integer upscaling is enforced

upscale: int or float

Upscaling factor. The output array shape will be this factor larger. Method of upscaling depends on type(upscale).

int -> integer upscaling with no interpolation

float -> upscaling with interpolation

Note the type distinction: type(1) == int; type(1.) == float

debug: bool, optional

Print out debugging information while running. Default False

Returns:

new_array : np.ndarray or np.ma.MaskedArray: Upsampled array

opexebo.general.walk_filter(speed: numpy.ndarray, speed_cutoff: float, *args, fmt='remove')¶

It is common practice when studying a freely moving subject to exclude data from periods when the subject was stationary, or nearly stationary. This method is described as a “walk-filter” - a high-pass filter on subject speed.

This function allows an arbitrary number of arrays to be filtered in parallel to the speed (or whatever other filtering criteria are used). Filters can be performed either by removing the unwanted elements, or by masking them and retunring a MaskedArray.

Parameters:

speed : np.ndarray: Array of speeds for other data points
speed_cutoff : float: The cutoff, below which values in speed will be excluded.
*args : np.ndarray, optional: Any other arrays that should be filtered in parallel with speed Optional arguments here _must_ be np.ndarrays with size equal to that of speed
fmt : str, optional: Either “remove” or “mask”. Determines how the values are returned “remove” (default) - the invalid valaues are removed from the array “mask” - the original array is returned as a MaskedArray, with the invalid values masked out.

Returns:

np.ndarray: Filtered copy of speed
[np.ndarray]: Arbitrary other filtered arrays, if any other arrays were provided as *args