Submodule: opexebo.general

General signal processing functions

Overview

Signal processing

opexebo.general.smooth(data, sigma, **kwargs) Smooth provided data with a Gaussian kernel
opexebo.general.upsample(array, upscale, …) A MaskedArray aware upsampling routine.
opexebo.general.peak_search(image, **kwargs) Given a 1D or 2D array, return a list of co-ordinates of the local maxima or minima
opexebo.general.circular_mask(axes, …) Given a set of axes, construct a circular mask defining pixels as within or without a circular arena.
opexebo.general.power_spectrum(values, …) Calculate the power spectrum of a time-series of data
opexebo.general.spatial_cross_correlation(…) Calculate the Pearson cross correlation between two arrays.
opexebo.general.shuffle(times, offset_lim, …) Duplicate the provided time series iterations number of times.
opexebo.general.walk_filter(speed, …[, fmt]) It is common practice when studying a freely moving subject to exclude data from periods when the subject was stationary, or nearly stationary.

General Helper functions

opexebo.general.accumulate_spatial(pos, …) Given a list of positions, create a histogram of those positions.
opexebo.general.normxcorr2_general(array) Calculate spatial autocorrelation.
opexebo.general.fit_ellipse(X, Y) Fit an ellipse to the provided set of X, Y co-ordinates
opexebo.general.bin_width_to_bin_number(…) This conversion is done in several separate places, and must be done the same in every case.

Detailed Documentation

opexebo.general.normxcorr2_general(array)

Calculate spatial autocorrelation.

Python implementation of the Matlab generalized-normalized cross correlation function, adapted by Vadim Frolov. Some generality was abandoned in the adaption as unnecessary for autocorrelogram calculation.

For the original function, see https://se.mathworks.com/matlabcentral/fileexchange/29005-generalized-normalized-cross-correlation

Parameters:
array: NxM matrix

firing array. array is not necessary a numpy array. Must not contain NaNs!

Returns:
np.ndarray

Resulting correlation matrix

opexebo.general.smooth(data, sigma, **kwargs)

Smooth provided data with a Gaussian kernel

The smoothing is done with a routine from the astronomical package astropy Like scipy.ndimage.gaussian_filter, this does not handle MaskedArrays - but it handles NaNs much better. Specifically, astropy.convolution.convolve replaces NaN values with an interpolation across the void region.

Therefore, to handle masked arrays, the data at masked positions is replaced by np.nan prior to smoothing, and thus avoids influencing nearby, unmasked cells. The masked cells are then returned to their original values prior to return.

The package is discussed at http://docs.astropy.org/en/stable/convolution/index.html

Parameters:
data: np.ndarray or np.ma.MaskedArray

Data that will be smoothed

sigma: float

Standard deviations for Gaussian kernel in units of pixels/bins

mask_fill: float, optional

The value that masked locations should be treated as This can either be provided as an absolute number (e.g. 0), or nan If nan, then each masked location will get a value by interpolating from nearby cells. This will only apply if the input is a MaskedArray to start with. If the input is a standard np.ndarray, then no values will be substituted, even if there are nans present.

circular: bool, optional

If True, then smoothing at the edge of the array will be handled in a circular manner, i.e. the value to the left of data[0] will be data[-1]. If False, the edge will be handled by padding with values equal to the boundary value. Default False

Returns:
smoothed_data: np.ndarray or np.ma.MaskedArray

Smoothed data

Notes

BNT.+general.smooth http://docs.astropy.org/en/stable/convolution/index.html https://github.com/astropy/astropy/issues/6511

Copyright (C) 2019 by Simon Ball

opexebo.general.accumulate_spatial(pos, arena_size, **kwargs)

Given a list of positions, create a histogram of those positions. The resulting histogram is typically referred to as a map.

The complexity in this function comes down to selecting where the edges of the arena are, and generating the bins within those limits.

The histogram bin edges must be defined in one of 3 different ways

  • bin_width: based on the keyword arena_size, the number of bins will be calculated as

    opexebo.general.bin_width_to_bin_number

    The histogram will use num_bins between the minimum and maximum of the positions (or limit if provided)

  • bin_number: the histogram will use bin_number of bins between the minimum and maximum of the positions (or limit if provided)

  • bin_edges: the histogram will use the provided `bin_edg`e arrays

Either zero or one of the three bin_* keyword arguments must be defined. If none are defined, then a default bin_width is used. If more than 1 is defined, an error is raised

Parameters:
pos: np.ndarray

1D or 2D array of positions in row-major format, i.e. x = pos[0], y = pos[1]. This matches the simplest input creation pos = np.array( [x, y] )

arena_size: float or tuple of floats
Dimensions of arena (in cm)
  • For a linear track, length
  • For a circular arena, diameter
  • For a rectangular arena, length or (length, length)
bin_width: float

Bin size in cm. Default 2.5cm. If bin_width is supplied, limit must also be supplied. One of bin_width, bin_number, bin_edges must be provided

bin_number: int or tuple of int

Number of bins. If provided as a tuple, then (x_bins, y_bins). One of bin_width, bin_number, bin_edges must be provided

bin_edges: array-like

Edges of the bins. Provided either as edges or (x_edges, y_edges). One of bin_width, bin_number, bin_edges must be provided

limits: tuple or np.ndarray

(x_min, x_max) or (x_min, x_max, y_min, y_max) Provide concrete limits to the range over which the histogram searches Any observations outside these limits are discarded If no limits are provided, then use np.nanmin(data), np.nanmax(data) to generate default limits. As is standard in python, acceptable values include the lower bound and exclude the upper bound

Returns:
hist: np.ndarray

1D or 2D histogram of the occurrences of the input observations Dimensions given by arena_size/bin_width Not normalised - each cell gives the integer number of occurrences of the observation in that cell

edges: list-like

x, or (x, y), where x, y are 1d np.ndarrays Here x, y correspond to the output histogram

opexebo.general.shuffle(times: numpy.ndarray, offset_lim: float, iterations: int, t_start: float = None, t_stop: float = None)

Duplicate the provided time series iterations number of times. Each duplicate will be incremented circularly by a random value not smaller than offset_lim.

Circular incrementation results in (the majority) of time _differences_ remaining preserved

  • Initially, we have a time series, times, both with values in the range [min(times), max(times)]. t_start may be smaller than min(times), and t_stop may be larger than max(times)
  • iterations number of duplicates of times are created.
  • In each iteraction, a random increment T is generated, and added to each value in that iteration, such that values now fall into the range [min(times)+T, max(times)+T]. max(times)+T may exceed t_stop.
  • All timestamps matching t_n > t_stop are mapped back into the range

[t_start, t_stop] by subtracting (t_stop-t_start) * The iteration is re-ordered by value (moving those beyond the far edge

back to the beginning)
Parameters:
times : np.ndarray

1D array of floats. Time series data to be shuffled

offset_lim : float

Minimum offset from the original time values. Each iteration is incremented by a random value evenly distributed in the range [offset_lim, t_stop-offset_lim]

iterations : int

Number of repeats of times to be returned

t_start : float, optional

Lower bound of time domain. Must meet the criteria t_start <= min(times) Defaults to min(times)

t_stop : float, optional

Upper bound of time domain. Must meet the criteria t_stop >= max(times) Defaults to max(times)

Returns:
output : np.ndarray

iterations x N array of times. A single iteration is accessed as output[i]

increments : np.ndarray

1D array of offset values that were used

opexebo.general.validatekeyword__arena_size(kwv, provided_dimensions)

Decipher the possible meanings of the keyword “arena_size”.

“arena_size” is given to describe the arena in which the animal is moving It should be either a float, or an array-like of 2 floats (x, y)

Parameters:
kw: float or array-like of floats

The value given for the keyword arena_size

provided_dimensions : int

the number of spatial dimensions provided to the original function. Acceptable values are 1 or 2 E.g. if the original function was provided with positions = [t, x, y], then provided_dimensions=2 (x and y)

Returns:
arena_size : float or np.ndarray of floats
Raises:
ValueError
IndexError
opexebo.general.validate_keyword_arena_shape(arena_shape)

Ensure that the arena_shape is a meaningful value

Parameters:
arena_shape : str

the value given for the keyword arena_shape

Returns:
arena_shape : str

A value that is guaranteed to be an acceptable member of one of the recognised groups of arena_shapes

opexebo.general.fit_ellipse(X, Y)

Fit an ellipse to the provided set of X, Y co-ordinates

Based on the approach taken in Authors: Andrew Fitzgibbon, Maurizio Pilu, Bob Fisher Reference: “Direct Least Squares Fitting of Ellipses”, IEEE T-PAMI, 1999

@Article{Fitzgibbon99, author = “Fitzgibbon, A.~W.and Pilu, M. and Fisher, R.~B.”, title = “Direct least-squares fitting of ellipses”, journal = pami, year = 1999, volume = 21, number = 5, month = may, pages = “476–480” } and implemented in MatLab by Vadim Frolov

Parameters:
X - np.ndarray

x co-ordinates of points

Y - np.ndarray

y co-ordinates of points

Returns:
x_centre : float

Centre of ellipse

y_centre : float

Centre of ellipse

Ru : float

Major radius

Rv : float

Minor radius

theta_rad : float

Ellipse orientation (in radians)

Notes

BNT.+general.fitEllipse opexebo.analysis.gridscore

Copyright (C) 2019 by Simon Ball

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.

Given a 1D or 2D array, return a list of co-ordinates of the local maxima or minima

Multiple searching techniques are provided:

  • default: uses skimage.morphology.get_maxima
  • sep: uses the Python wrapper to the Source Extractor astronomy tool to identify peaks
Parameters:
image: np.ndarray

1D or 2D array of data

search_method : str, optional, {“default”, “sep”}
mask : np.ndarray, optional

Array of masked locations in the image with the same dimensions. Locations where the mask value is True are ignored for the purpose of searching.

maxima: bool, optional

[default search method only] Define whether to search for maxima or minima in the provided array

null_background: bool

[sep search method only] Set the image background to zero for calculation purposes rather than attempt to calculate a background gradient. This should generally be True, as our images are not directly comparable to standard telescope output

threshold : float, optional

[sep search method only] Threshold for identifiying maxima area

Returns:
peak_coords: tuple

Co-ordinates of peaks, in the form ((x0, x1, x2…), (y0, y1, y2…))

Notes

Copyright (C) 2019 by Simon Ball

opexebo.general.power_spectrum(values, time_stamps, sampling_frequency, **kwargs)

Calculate the power spectrum of a time-series of data

Parameters:
values: np.ndarray

Amplitude of time series at times time_stamps. Must be sampled at a constant frequency

time_stamps: np.ndarray

Time stamps of time-series. Must be the same length as values. Assumed to be in [seconds]

sampling_frequency: float

Sampling rate. Assumed to be in [Hz]

method: str, optional, {“welch”, “fft”}

Method for calculating a power spectrum. Accepted values are: [“welch”, “fft”]. Defaults to “welch”

scale: str, optiona, {“linear”, “log”}

Should the power spectrum be returned in a linear form propto v**2/sqrt(Hz)) (“linear”); or decibel scale (dB/sqrt(Hz)) (“log”) Defaults to “linear”

resolution: float

Desired output frequency resolution. Applicable only to Welch’s method. Due to the averaging effect of Welch’s method, this sets the “effective” resolution, which is NOT the same as the minimum difference in frequencies Default 1 [Hz]. Values lower than 1/(time_series_duration) are meaningless

Returns:
frequencies: np.ndarray

Discrete Fourier Transform sample frequencies

power_spectrum: np.ndarray

Power spectral density at that sample frequency

opexebo.general.spatial_cross_correlation(arr_0, arr_1, **kwargs)

Calculate the Pearson cross correlation between two arrays.

Arrays may be either 1d or 2d. If MaskedArrays are supplied, all locations that are masked are excluded from the correlation. Invalid numbers (NaN) are also excluded from the correlation.

If 2d arrays are provided, then calculates a 1d array of row(or column)-wise correlations, controlled by the row_major keyword.

Parameters:
arr_0: np.ndarray

1d or 2d array. Must either contain no NaNs, or be an np.ma.MaskedArray with NaN values masked.

arr_1: np.ndarray

2nd array to correlate with. Must have the same dimensions as arr_0

Returns:
output_single: float

Pearson correlation coefficient between entire arrays. 2d input arrays are flattened to 1d to calculate

output_array: np.ndarray

1D array of pearson correlation coefficients, where the i’th value is the coefficient between the i’th rows (if row_major), or the i’th columns of the two arrays Return value is np.nan if 1d arrays are supplied

Other Parameters:
 
row_major: bool, optional

If the arrays are 2D, process first by row, or first by column. Default True

Notes

BNT.+analyses.spatialCrossCorrelation()

Copyright (C) 2019 by Simon Ball

opexebo.general.bin_width_to_bin_number(length, bin_width)

This conversion is done in several separate places, and must be done the same in every case. Therefore, refactor into a quick helper function.

In cases where the bin_width is not a perfect divisor of length, the actual bins will be slightly smaller

Parameters:
length: float or np.ndarray

Length of a side to be divided into equally spaced bins

bin_width: float

Dimension of a square bin or pixel

Returns:
num_bins: int or np.ndarray

Same type as length. Integer number of bins.

opexebo.general.circular_mask(axes, diameter, **kwargs)

Given a set of axes, construct a circular mask defining pixels as within or without a circular arena.

A pixel is assumed to be inside the circle iff the centre-point is within the radius of the circle. For small radii, this will result in including pixels with <50% of their area inside.

Parameters:
axes: list of np.ndarray

[x, y] - pair of nd-arrays defining the edges of the pixels. This is the return value from np.histogram2d, for instance. Each one is size+1 relative to the number of pixels This function assumes that each dimension has a CONSTANT bin width, although each dimension may have its own bin width.

diameter: float

diameter of the circle

origin: list of floats

[x, y] co-ordinates of the centre of the arena. Optional, defaults to (0,0)

Returns:
in_field: np.ndarray

2D boolean array of dimensions (axes[0].size-1, axs[1].size-1) Values are true if INSIDE circle and false if OUTSIDE

distance_map: np.ndarray

2D float array of same size. Values are the distance of the centre of the pixel from the origin

angular_map: np.ndarray

2D float array of the same size. Values are the angle from the origin to the pixel, in degrees. Angles are given with 0° pointing to increasing y, clockwise.

opexebo.general.upsample(array, upscale, **kwargs)

A MaskedArray aware upsampling routine.

skimage provides more reliable methods for interpolated upsampling of simple ndarrays. However, this is not appropriate to MaskedArrays, i.e. where certain locations have invalid values that should not expand.

For non-masked arrays, this function can optionally rely on the skimage routine allowing fractional upscaling. For MaskedArrays, integer upscaling is enforced, with no interpolation

Parameters:
array: array-like

Supports list, tuple, np.ndarray or np.ma.MaskedArray. Supports 1d and 2d (2d for true arrays only). If the array contains any non-finite values, or is a MaskedArray, then integer upscaling is enforced

upscale: int or float

Upscaling factor. The output array shape will be this factor larger. Method of upscaling depends on type(upscale).

  • int -> integer upscaling with no interpolation
  • float -> upscaling with interpolation

Note the type distinction: type(1) == int; type(1.) == float

debug: bool, optional

Print out debugging information while running. Default False

Returns:
new_array : np.ndarray or np.ma.MaskedArray

Upsampled array

opexebo.general.walk_filter(speed: numpy.ndarray, speed_cutoff: float, *args, fmt='remove')

It is common practice when studying a freely moving subject to exclude data from periods when the subject was stationary, or nearly stationary. This method is described as a “walk-filter” - a high-pass filter on subject speed.

This function allows an arbitrary number of arrays to be filtered in parallel to the speed (or whatever other filtering criteria are used). Filters can be performed either by removing the unwanted elements, or by masking them and retunring a MaskedArray.

Parameters:
speed : np.ndarray

Array of speeds for other data points

speed_cutoff : float

The cutoff, below which values in speed will be excluded.

*args : np.ndarray, optional

Any other arrays that should be filtered in parallel with speed Optional arguments here _must_ be np.ndarrays with size equal to that of speed

fmt : str, optional

Either “remove” or “mask”. Determines how the values are returned “remove” (default) - the invalid valaues are removed from the array “mask” - the original array is returned as a MaskedArray, with the invalid values masked out.

Returns:
np.ndarray

Filtered copy of speed

[np.ndarray]

Arbitrary other filtered arrays, if any other arrays were provided as *args