simcats_datasets.support_functions.data_preprocessing
Data preprocessors to be used with the Pytorch Dataset class.
Every preprocessor must accept either a single array or a list of arrays as input. Output type should always be the same as the input type. Please try to use -=, +=, =, and /=, as these are way faster than data = data + … etc.. Avoid using map(function, data), as this will return a copy and copying will slow down your code. **Please look at example_preprocessor for a reference.*
Module Contents
Functions
Example (reference) for preprocessor implementations. |
|
Cast the data to float32. Especially useful to reduce memory usage for preloaded datasets. |
|
Cast the data to float16. Especially useful to reduce memory usage for preloaded datasets. |
|
Standardization of the data (mean=0, std=1). |
|
Min max scaling of the data to [0, 1]. |
|
Min max scaling of the data to [-1, 1]. |
|
Adds a new axis to the data (basically the missing color channel). |
|
Sets all mask labels that are larger than or equal 1 to 1 and all other pixels to zero. |
|
Cut off required number of rows/columns of pixels at each edge of the image to get the desired shape. |
|
Cut off required number of rows/columns of pixels at each edge of the image to get shape 96x96. |
|
Resample an image to target size using scipy.signal.resample. |
|
Decimate an image to target size using scipy.signal.decimate. |
|
Standardization of the data not per image but for a whole dataset. |
|
Smoothing of the data using the BM3D algorithm. |
|
Median-smoothing of the data, for each vertical column independently. |
Module Implementation Details
- simcats_datasets.support_functions.data_preprocessing.example_preprocessor(data)
Example (reference) for preprocessor implementations.
- Parameters:
data (Union[numpy.ndarray, List[numpy.ndarray]]) – Numpy array to be preprocessed (or a list of such).
- Returns:
Preprocessed numpy array (or a list of such).
- Return type:
Union[numpy.ndarray, List[numpy.ndarray]]
- simcats_datasets.support_functions.data_preprocessing.cast_to_float32(data)
Cast the data to float32. Especially useful to reduce memory usage for preloaded datasets.
- Parameters:
data (Union[numpy.ndarray, List[numpy.ndarray]]) – Numpy array to be cast to float32 (or a list of such).
- Returns:
Float32 numpy array (or a list of such).
- Return type:
Union[numpy.ndarray, List[numpy.ndarray]]
- simcats_datasets.support_functions.data_preprocessing.cast_to_float16(data)
Cast the data to float16. Especially useful to reduce memory usage for preloaded datasets.
- Parameters:
data (Union[numpy.ndarray, List[numpy.ndarray]]) – Numpy array to be cast to float16 (or a list of such).
- Returns:
Float16 numpy array (or a list of such).
- Return type:
Union[numpy.ndarray, List[numpy.ndarray]]
- simcats_datasets.support_functions.data_preprocessing.standardization(data)
Standardization of the data (mean=0, std=1).
If a list of data is passed, each data is standardized individually (no global standardization).
- Parameters:
data (Union[numpy.ndarray, List[numpy.ndarray]]) – Numpy array to be standardized (or a list of such).
- Returns:
Standardized numpy array (or a list of such).
- Return type:
Union[numpy.ndarray, List[numpy.ndarray]]
- simcats_datasets.support_functions.data_preprocessing.min_max_0_1(data)
Min max scaling of the data to [0, 1].
If a list of data is passed, each data is scaled individually (no global scaling).
- Parameters:
data (Union[numpy.ndarray, List[numpy.ndarray]]) – Numpy array to be scaled (or a list of such).
- Returns:
Rescaled numpy array (or a list of such).
- Return type:
Union[numpy.ndarray, List[numpy.ndarray]]
- simcats_datasets.support_functions.data_preprocessing.min_max_minus_one_one(data)
Min max scaling of the data to [-1, 1].
If a list of data is passed, each data is scaled individually (no global scaling).
- Parameters:
data (Union[numpy.ndarray, List[numpy.ndarray]]) – Numpy array to be scaled (or a list of such).
- Returns:
Rescaled numpy array (or a list of such).
- Return type:
Union[numpy.ndarray, List[numpy.ndarray]]
- simcats_datasets.support_functions.data_preprocessing.add_newaxis(data)
Adds a new axis to the data (basically the missing color channel).
- Parameters:
data (Union[numpy.ndarray, List[numpy.ndarray]]) – Numpy array to which the axis will be added (or a list of such).
- Returns:
Numpy array with additional axis (or a list of such).
- Return type:
Union[numpy.ndarray, List[numpy.ndarray]]
- simcats_datasets.support_functions.data_preprocessing.only_two_classes(data)
Sets all mask labels that are larger than or equal 1 to 1 and all other pixels to zero.
- Parameters:
data (Union[numpy.ndarray, List[numpy.ndarray]]) – Numpy array to be processed (or a list of such).
- Returns:
Numpy array with only two classes (or a list of such).
- Return type:
Union[numpy.ndarray, List[numpy.ndarray]]
- simcats_datasets.support_functions.data_preprocessing.shrink_to_shape(data, shape)
Cut off required number of rows/columns of pixels at each edge of the image to get the desired shape.
Warning: This preprocessor can’t be used by supplying a string with the name to the class SimcatsDataset from the simcats_datasets.pytorch module, as this requires that preprocessors need no additional parameters but only the data. If a list of data is passed, it is expected, that all images in the list have the same shape!
- Parameters:
data (Union[numpy.ndarray, List[numpy.ndarray]]) – Numpy array to be preprocessed (or a list of such).
shape (Tuple[int, int]) – The shape to which the data will be reshaped.
- Returns:
Shrinked numpy array (or a list of such).
- Return type:
Union[numpy.ndarray, List[numpy.ndarray]]
- simcats_datasets.support_functions.data_preprocessing.shrink_to_shape_96x96(data)
Cut off required number of rows/columns of pixels at each edge of the image to get shape 96x96.
Warning: If a list of data is passed, it is expected, that all images in the list have the same shape!
- Parameters:
data (Union[numpy.ndarray, List[numpy.ndarray]]) – Numpy array to be preprocessed (or a list of such).
- Returns:
Shrinked numpy array (or a list of such).
- Return type:
Union[numpy.ndarray, List[numpy.ndarray]]
- simcats_datasets.support_functions.data_preprocessing.resample_image(data, target_size)
Resample an image to target size using scipy.signal.resample.
Warning: This preprocessor can’t be used by supplying a string with the name to the class SimcatsDataset from the simcats_datasets.pytorch module, as it requires that preprocessors need no additional parameters but only the data.
- Parameters:
data (Union[numpy.ndarray, List[numpy.ndarray]]) – The image to resample.
target_size (Tuple[int, int]) – The target size to resample to.
- Returns:
The resampled image or a list of such.
- Return type:
Union[numpy.ndarray, List[numpy.ndarray]]
- simcats_datasets.support_functions.data_preprocessing.decimate_image(data, target_size)
Decimate an image to target size using scipy.signal.decimate.
Warning: This preprocessor can’t be used by supplying a string with the name to the class SimcatsDataset from the simcats_datasets.pytorch module, as it requires that preprocessors need no additional parameters but only the data.
- Parameters:
data (Union[numpy.ndarray, List[numpy.ndarray]]) – The image to decimate.
target_size (Tuple[int, int]) – The target size to decimate to.
- Returns:
The decimated image or a list of such.
- Return type:
Union[numpy.ndarray, List[numpy.ndarray]]
- simcats_datasets.support_functions.data_preprocessing.standardize_to_dataset(data, mean, std)
Standardization of the data not per image but for a whole dataset.
Warning: This preprocessor can’t be used by supplying a string with the name to the class SimcatsDataset from the simcats_datasets.pytorch module, as it requires that preprocessors need no additional parameters but only the data.
- Parameters:
data (Union[np.ndarray, List[np.ndarray]]) – Numpy array to be standardized (or a list of such).
mean (float) – The mean to subtract.
std (float) – The standard deviation to divide by.
- Returns:
Standardized numpy array (or a list of such).
- Return type:
Union[np.ndarray, List[np.ndarray]]
- simcats_datasets.support_functions.data_preprocessing.bm3d_smoothing(data)
Smoothing of the data using the BM3D algorithm.
- Parameters:
data (Union[numpy.ndarray, List[numpy.ndarray]]) – Numpy array to be smoothed (or a list of such)
- Returns:
BM3D-smoothed numpy array (or a list of such)
- Return type:
Union[numpy.ndarray, List[numpy.ndarray]]
- simcats_datasets.support_functions.data_preprocessing.vertical_median_smoothing(data)
Median-smoothing of the data, for each vertical column independently.
- Parameters:
data (Union[numpy.ndarray, List[numpy.ndarray]]) – Numpy array to be smoothed (or a list of such).
- Returns:
Smoothed numpy array (or a list of such).
- Return type:
Union[numpy.ndarray, List[numpy.ndarray]]