simcats_datasets.support_functions.data_preprocessing

Data preprocessors to be used with the Pytorch Dataset class.

Every preprocessor must accept either a single array or a list of arrays as input. Output type should always be the same as the input type. Please try to use -=, +=, =, and /=, as these are way faster than data = data + … etc.. Avoid using map(function, data), as this will return a copy and copying will slow down your code. **Please look at example_preprocessor for a reference.*

Module Contents

Functions

example_preprocessor

Example (reference) for preprocessor implementations.

cast_to_float32

Cast the data to float32. Especially useful to reduce memory usage for preloaded datasets.

cast_to_float16

Cast the data to float16. Especially useful to reduce memory usage for preloaded datasets.

standardization

Standardization of the data (mean=0, std=1).

min_max_0_1

Min max scaling of the data to [0, 1].

min_max_minus_one_one

Min max scaling of the data to [-1, 1].

add_newaxis

Adds a new axis to the data (basically the missing color channel).

only_two_classes

Sets all mask labels that are larger than or equal 1 to 1 and all other pixels to zero.

shrink_to_shape

Cut off required number of rows/columns of pixels at each edge of the image to get the desired shape.

shrink_to_shape_96x96

Cut off required number of rows/columns of pixels at each edge of the image to get shape 96x96.

resample_image

Resample an image to target size using scipy.signal.resample.

decimate_image

Decimate an image to target size using scipy.signal.decimate.

standardize_to_dataset

Standardization of the data not per image but for a whole dataset.

bm3d_smoothing

Smoothing of the data using the BM3D algorithm.

vertical_median_smoothing

Median-smoothing of the data, for each vertical column independently.

Module Implementation Details

simcats_datasets.support_functions.data_preprocessing.example_preprocessor(data)

Example (reference) for preprocessor implementations.

Parameters:

data (Union[numpy.ndarray, List[numpy.ndarray]]) – Numpy array to be preprocessed (or a list of such).

Returns:

Preprocessed numpy array (or a list of such).

Return type:

Union[numpy.ndarray, List[numpy.ndarray]]

simcats_datasets.support_functions.data_preprocessing.cast_to_float32(data)

Cast the data to float32. Especially useful to reduce memory usage for preloaded datasets.

Parameters:

data (Union[numpy.ndarray, List[numpy.ndarray]]) – Numpy array to be cast to float32 (or a list of such).

Returns:

Float32 numpy array (or a list of such).

Return type:

Union[numpy.ndarray, List[numpy.ndarray]]

simcats_datasets.support_functions.data_preprocessing.cast_to_float16(data)

Cast the data to float16. Especially useful to reduce memory usage for preloaded datasets.

Parameters:

data (Union[numpy.ndarray, List[numpy.ndarray]]) – Numpy array to be cast to float16 (or a list of such).

Returns:

Float16 numpy array (or a list of such).

Return type:

Union[numpy.ndarray, List[numpy.ndarray]]

simcats_datasets.support_functions.data_preprocessing.standardization(data)

Standardization of the data (mean=0, std=1).

If a list of data is passed, each data is standardized individually (no global standardization).

Parameters:

data (Union[numpy.ndarray, List[numpy.ndarray]]) – Numpy array to be standardized (or a list of such).

Returns:

Standardized numpy array (or a list of such).

Return type:

Union[numpy.ndarray, List[numpy.ndarray]]

simcats_datasets.support_functions.data_preprocessing.min_max_0_1(data)

Min max scaling of the data to [0, 1].

If a list of data is passed, each data is scaled individually (no global scaling).

Parameters:

data (Union[numpy.ndarray, List[numpy.ndarray]]) – Numpy array to be scaled (or a list of such).

Returns:

Rescaled numpy array (or a list of such).

Return type:

Union[numpy.ndarray, List[numpy.ndarray]]

simcats_datasets.support_functions.data_preprocessing.min_max_minus_one_one(data)

Min max scaling of the data to [-1, 1].

If a list of data is passed, each data is scaled individually (no global scaling).

Parameters:

data (Union[numpy.ndarray, List[numpy.ndarray]]) – Numpy array to be scaled (or a list of such).

Returns:

Rescaled numpy array (or a list of such).

Return type:

Union[numpy.ndarray, List[numpy.ndarray]]

simcats_datasets.support_functions.data_preprocessing.add_newaxis(data)

Adds a new axis to the data (basically the missing color channel).

Parameters:

data (Union[numpy.ndarray, List[numpy.ndarray]]) – Numpy array to which the axis will be added (or a list of such).

Returns:

Numpy array with additional axis (or a list of such).

Return type:

Union[numpy.ndarray, List[numpy.ndarray]]

simcats_datasets.support_functions.data_preprocessing.only_two_classes(data)

Sets all mask labels that are larger than or equal 1 to 1 and all other pixels to zero.

Parameters:

data (Union[numpy.ndarray, List[numpy.ndarray]]) – Numpy array to be processed (or a list of such).

Returns:

Numpy array with only two classes (or a list of such).

Return type:

Union[numpy.ndarray, List[numpy.ndarray]]

simcats_datasets.support_functions.data_preprocessing.shrink_to_shape(data, shape)

Cut off required number of rows/columns of pixels at each edge of the image to get the desired shape.

Warning: This preprocessor can’t be used by supplying a string with the name to the class SimcatsDataset from the simcats_datasets.pytorch module, as this requires that preprocessors need no additional parameters but only the data. If a list of data is passed, it is expected, that all images in the list have the same shape!

Parameters:
  • data (Union[numpy.ndarray, List[numpy.ndarray]]) – Numpy array to be preprocessed (or a list of such).

  • shape (Tuple[int, int]) – The shape to which the data will be reshaped.

Returns:

Shrinked numpy array (or a list of such).

Return type:

Union[numpy.ndarray, List[numpy.ndarray]]

simcats_datasets.support_functions.data_preprocessing.shrink_to_shape_96x96(data)

Cut off required number of rows/columns of pixels at each edge of the image to get shape 96x96.

Warning: If a list of data is passed, it is expected, that all images in the list have the same shape!

Parameters:

data (Union[numpy.ndarray, List[numpy.ndarray]]) – Numpy array to be preprocessed (or a list of such).

Returns:

Shrinked numpy array (or a list of such).

Return type:

Union[numpy.ndarray, List[numpy.ndarray]]

simcats_datasets.support_functions.data_preprocessing.resample_image(data, target_size)

Resample an image to target size using scipy.signal.resample.

Warning: This preprocessor can’t be used by supplying a string with the name to the class SimcatsDataset from the simcats_datasets.pytorch module, as it requires that preprocessors need no additional parameters but only the data.

Parameters:
  • data (Union[numpy.ndarray, List[numpy.ndarray]]) – The image to resample.

  • target_size (Tuple[int, int]) – The target size to resample to.

Returns:

The resampled image or a list of such.

Return type:

Union[numpy.ndarray, List[numpy.ndarray]]

simcats_datasets.support_functions.data_preprocessing.decimate_image(data, target_size)

Decimate an image to target size using scipy.signal.decimate.

Warning: This preprocessor can’t be used by supplying a string with the name to the class SimcatsDataset from the simcats_datasets.pytorch module, as it requires that preprocessors need no additional parameters but only the data.

Parameters:
  • data (Union[numpy.ndarray, List[numpy.ndarray]]) – The image to decimate.

  • target_size (Tuple[int, int]) – The target size to decimate to.

Returns:

The decimated image or a list of such.

Return type:

Union[numpy.ndarray, List[numpy.ndarray]]

simcats_datasets.support_functions.data_preprocessing.standardize_to_dataset(data, mean, std)

Standardization of the data not per image but for a whole dataset.

Warning: This preprocessor can’t be used by supplying a string with the name to the class SimcatsDataset from the simcats_datasets.pytorch module, as it requires that preprocessors need no additional parameters but only the data.

Parameters:
  • data (Union[np.ndarray, List[np.ndarray]]) – Numpy array to be standardized (or a list of such).

  • mean (float) – The mean to subtract.

  • std (float) – The standard deviation to divide by.

Returns:

Standardized numpy array (or a list of such).

Return type:

Union[np.ndarray, List[np.ndarray]]

simcats_datasets.support_functions.data_preprocessing.bm3d_smoothing(data)

Smoothing of the data using the BM3D algorithm.

Parameters:

data (Union[numpy.ndarray, List[numpy.ndarray]]) – Numpy array to be smoothed (or a list of such)

Returns:

BM3D-smoothed numpy array (or a list of such)

Return type:

Union[numpy.ndarray, List[numpy.ndarray]]

simcats_datasets.support_functions.data_preprocessing.vertical_median_smoothing(data)

Median-smoothing of the data, for each vertical column independently.

Parameters:

data (Union[numpy.ndarray, List[numpy.ndarray]]) – Numpy array to be smoothed (or a list of such).

Returns:

Smoothed numpy array (or a list of such).

Return type:

Union[numpy.ndarray, List[numpy.ndarray]]