:py:mod:`simcats_datasets.loading.pytorch` ========================================== .. py:module:: simcats_datasets.loading.pytorch .. autoapi-nested-parse:: Implementation of a pytorch dataset class. Can be used to train machine learning approaches with CSD data. @author: f.hader Module Contents --------------- Classes ~~~~~~~ .. autoapisummary:: simcats_datasets.loading.pytorch.SimcatsDataset simcats_datasets.loading.pytorch.SimcatsConcatDataset Module Implementation Details ----------------------------- .. py:class:: SimcatsDataset(h5_path, specific_ids = None, load_ground_truth = None, data_preprocessors = None, ground_truth_preprocessors = None, format_output = None, preload = True, max_concurrent_preloads = 100000, progress_bar = False, sensor_scan_dataset = False) Bases: :py:obj:`torch.utils.data.Dataset` .. autoapi-inheritance-diagram:: simcats_datasets.loading.pytorch.SimcatsDataset :parts: 1 Pytorch Dataset class implementation for SimCATS datasets. Uses simcats_datasets to load and provide (training) data. Initializes an object for providing simcats_datasets data to pytorch. :param h5_path: The path to the h5 file containing the dataset. :param specific_ids: Determines if only specific ids should be loaded. Using this option, the returned values are sorted according to the specified ids and not necessarily ascending. If set to None, all data is loaded. Default is None. :param load_ground_truth: Defines the required type of ground truth data to be loaded. Accepts either a callable or a string. Callables must be of the same structure/interface as load_zeros_masks defined in simcats_datasets.loading.load_ground_truth. Strings must map to the function names of the loading functions defined in simcats_datasets.loading.load_ground_truth. If this is None, no ground truth are loaded is used, which restricts what output formats are possible. Default is None. Example of available types (**full list at simcats_datasets.loading.load_ground_truth**): - **'tct_masks'**: The Total Charge Transition (TCT) mask generated by SimCATS. - **'tc_region_masks'**: Regions with a fixed number of total charges. - **'tc_region_minus_tct_masks'**: Regions with a fixed number of total charges, but with zeros between the regions (at tcts). :param data_preprocessors: Defines if data should be preprocessed. Accepts a list of callables or strings. Callables must be of the same structure/interface as example_preprocessor defined in simcats_datasets.support_functions.data_preprocessing. Strings must map to the function names of the preprocessors defined in simcats_datasets.support_functions.data_preprocessing. Default is None. Example of available types (**full list at simcats_datasets.support_functions.data_preprocessing**): - **'min_max_0_1'**: Min max scaling of the data to [0, 1] - **'standardization'**: Standardization of the data (mean=0, std=1) - **'add_newaxis'**: Adds new axis as first axis (required for UNET) :param ground_truth_preprocessors: Defines if ground truth should be preprocessed. Accepts a list of callables or strings. Callables must be of the same structure/interface as example_preprocessor defined in simcats_datasets.support_functions.data_preprocessing. Strings must map to the function names of the preprocessors defined in simcats_datasets.support_functions.data_preprocessing. Default is None. Example of available types (**full list at simcats_datasets.support_functions.data_preprocessing**): - **'only_two_classes'**: Reduce the number of classes in a mask to 2 (set every pixel > 1 = 1) :param format_output: Defines the required type of data format for the output. Accepts either a callable or a string. Callables must be of the same structure/interface as format_dict_csd_float_ground_truth_long defined in simcats_datasets.support_functions.pytorch_format_output. Strings must map to the function names of the format functions defined in simcats_datasets.support_functions.pytorch_format_output. If this is None, format_dict_csd_float_ground_truth_long is used, which does return the output as dict with entries 'csd' and 'ground_truth' of dtype float and long, respectively. Default is None. Example of available types (**full list at simcats_datasets.support_functions.pytorch_format_output**): - **'format_dict_csd_float_ground_truth_long'**: formats the output as dict with entries 'csd' and 'ground_truth' of dtype float and long, respectively :param preload: Enables preloading the whole dataset during the initialization (requires more RAM). Default is True. :param max_concurrent_preloads: Determines how many CSDs are concurrently loaded from the dataset during the preload phase. This option only affects instances with preload = True. It allows to preload large datasets (for which it might not be possible to load the whole dataset into the memory at once), by loading them step by step and for example converting the CSDs to float32 with a corresponding data preprocessor. Default is 100,000. :param progress_bar: Determines whether to display a progress bar while loading data. Default is False. :param sensor_scan_dataset: Determines whether the dataset is a sensor scan dataset (contains sensor scans instead of CSDs). Default is False. .. py:property:: h5_path :type: str .. py:property:: sensor_scan_dataset :type: bool .. py:property:: specific_ids :type: Union[range, List[int], numpy.ndarray, None] .. py:property:: load_ground_truth :type: Callable .. py:property:: data_preprocessors :type: Union[List[Callable], None] .. py:property:: ground_truth_preprocessors :type: Union[List[Callable], None] .. py:property:: format_output :type: Callable .. py:property:: preload :type: bool .. py:property:: progress_bar :type: bool .. py:property:: shape :type: Tuple[int] .. py:class:: SimcatsConcatDataset(h5_paths, specific_ids = None, load_ground_truth = None, data_preprocessors = None, ground_truth_preprocessors = None, format_output = None, preload = True, max_concurrent_preloads = 100000, progress_bar = False, sensor_scan_dataset = False) Bases: :py:obj:`torch.utils.data.ConcatDataset` .. autoapi-inheritance-diagram:: simcats_datasets.loading.pytorch.SimcatsConcatDataset :parts: 1 Pytorch ConcatDataset class implementation for SimCATS datasets. Uses simcats_datasets to load and provide (training) data. Initializes an object for providing concatenated simcats_datasets data to pytorch. :param h5_paths: The paths to the h5 files containing the datasets to be concatenated. :param specific_ids: Determines if only specific ids should be loaded. Using this option, the returned values are sorted according to the specified ids and not necessarily ascending. If set to None, all data is loaded. Expects a list of specific_id settings, with one entry for each provided h5_path. Default is None. :param load_ground_truth: Defines the required type of ground truth data to be loaded. Accepts either a callable or a string. Callables must be of the same structure/interface as load_zeros_masks defined in simcats_datasets.loading.load_ground_truth. Strings must map to the function names of the loading functions defined in simcats_datasets.loading.load_ground_truth. If this is None, no ground truth are loaded is used, which restricts what output formats are possible. Default is None. Example of available types (**full list at simcats_datasets.loading.load_ground_truth**): - **'tct_masks'**: The Total Charge Transition (TCT) mask generated by SimCATS. - **'tc_region_masks'**: Regions with a fixed number of total charges. - **'tc_region_minus_tct_masks'**: Regions with a fixed number of total charges, but with zeros between the regions (at tcts). :param data_preprocessors: Defines if data should be preprocessed. Accepts a list of callables or strings. Callables must be of the same structure/interface as example_preprocessor defined in simcats_datasets.support_functions.data_preprocessing. Strings must map to the function names of the preprocessors defined in simcats_datasets.support_functions.data_preprocessing. Default is None. Example of available types (**full list at simcats_datasets.support_functions.data_preprocessing**): - **'min_max_0_1'**: Min max scaling of the data to [0, 1] - **'standardization'**: Standardization of the data (mean=0, std=1) - **'add_newaxis'**: Adds new axis as first axis (required for UNET) :param ground_truth_preprocessors: Defines if ground truth should be preprocessed. Accepts a list of callables or strings. Callables must be of the same structure/interface as example_preprocessor defined in simcats_datasets.support_functions.data_preprocessing. Strings must map to the function names of the preprocessors defined in simcats_datasets.support_functions.data_preprocessing. Default is None. Example of available types (**full list at simcats_datasets.support_functions.data_preprocessing**): - **'only_two_classes'**: Reduce the number of classes in a mask to 2 (set every pixel > 1 = 1) :param format_output: Defines the required type of data format for the output. Accepts either a callable or a string. Callables must be of the same structure/interface as format_dict_csd_float_ground_truth_long defined in simcats_datasets.support_functions.pytorch_format_output. Strings must map to the function names of the format functions defined in simcats_datasets.support_functions.pytorch_format_output. If this is None, format_dict_csd_float_ground_truth_long is used, which does return the output as dict with entries 'csd' and 'ground_truth' of dtype float and long, respectively. Default is None. Example of available types (**full list at simcats_datasets.support_functions.pytorch_format_output**): - **'format_dict_csd_float_ground_truth_long'**: formats the output as dict with entries 'csd' and 'ground_truth' of dtype float and long, respectively :param preload: Enables preloading the whole dataset during the initialization (requires more RAM). Default is True. :param max_concurrent_preloads: Determines how many CSDs are concurrently loaded from the dataset during the preload phase. This option only affects instances with preload = True. It allows to preload large datasets (for which it might not be possible to load the whole dataset into the memory at once), by loading them step by step and for example converting the CSDs to float32 with a corresponding data preprocessor. Default is 100.000. :param progress_bar: Determines whether to display a progress bar while loading data. Default is False. :param sensor_scan_dataset: Determines whether the datasets are sensor scan datasets (contain sensor scans instead of CSDs). Default is False. .. py:property:: shape :type: Tuple[int] .. py:property:: h5_paths :type: List[str] .. py:property:: sensor_scan_dataset :type: bool .. py:property:: specific_ids :type: Union[List[Union[range, List[int], numpy.ndarray, None]], None] .. py:property:: load_ground_truth :type: Callable .. py:property:: data_preprocessors :type: Union[List[Callable], None] .. py:property:: ground_truth_preprocessors :type: Union[List[Callable], None] .. py:property:: format_output :type: Callable .. py:property:: preload :type: bool .. py:property:: progress_bar :type: bool .. py:attribute:: datasets :type: List[Dataset[_T_co]] .. py:attribute:: cumulative_sizes :type: List[int] .. py:method:: cumsum(sequence) :staticmethod: .. py:property:: cummulative_sizes