simcats_datasets.loading
Module with functionalities for loading data from dataset files (HDF5 format).
Also contains functionalities for loading data as pytorch dataset with different ground truth types.
Submodules
Package Contents
Functions
Loads a dataset consisting of multiple CSDs from a given path. |
Package Implementation Details
- simcats_datasets.loading.load_dataset(file, load_csds=True, load_sensor_scans=False, load_occupations=False, load_tct_masks=False, load_ct_by_dot_masks=False, load_line_coords=False, load_line_labels=False, load_metadata=False, load_ids=False, specific_ids=None, progress_bar=False)
Loads a dataset consisting of multiple CSDs from a given path.
- Parameters:
file (Union[str, h5py.File]) – The file to read the data from. Can either be an object of the type h5py.File or the path to the dataset. If a path is supplied, load_dataset will open the file itself. If you want to do multiple consecutive loads from the same file (e.g. for using th PyTorch SimcatsDataset without preloading), consider initializing the file object yourself and passing it, to improve the performance.
load_csds (bool) – Determines if CSDs should be loaded. A dataset can have either CSDs or sensor scans, but never both. Default is True.
load_sensor_scans (bool) – Determines if sensor scans should be loaded. A dataset can have either CSDs or sensor scans, but never both. Default is False.
load_occupations (bool) – Determines if occupation data should be loaded. Default is False.
load_tct_masks (bool) – Determines if lead transition masks should be loaded. Default is False.
load_ct_by_dot_masks (bool) – Determines if charge transition labeled by affected dot masks should be loaded. This requires that ct_by_dot_masks have been added to the dataset. If a dataset has been created using create_simulated_dataset, these masks can be added afterward using add_ct_by_dot_masks_to_dataset, mainly to avoid recalculating them multiple times (for example for machine learning purposes). Default is False.
load_line_coords (bool) – Determines if lead transition definitions using start and end points should be loaded. Default is False.
load_line_labels (bool) – Determines if labels for lead transitions defined using start and end points should be loaded. Default is False.
load_metadata (bool) – Determines if the metadata (SimCATS config) of the CSDs should be loaded. Default is False.
load_ids (bool) – Determines if the available ids should be loaded (or in case of specific ids: the specific ids are returned in the given order). Default is False.
specific_ids (Union[range, List[int], numpy.ndarray, None]) – Determines if only specific ids should be loaded. Using this option, the returned values are sorted according to the specified ids and not necessarily ascending. If set to None, all data is loaded. Default is None.
progress_bar (bool) – Determines whether to display a progress bar. This parameter has no functionality since version 2, but is kept for compatibility reasons. Default is False.
- Returns:
The namedtuple can be unpacked like every normal tuple, or instead accessed by field names.
Depending on what has been enabled, the following data is included in the named tuple (all lists are sorted by the id of the CSDs or sensor_scans if no specific_ids are provided, else the order is given by specific_ids):
field ‘csds’: List containing all CSDs as numpy arrays.
field ‘sensor_scans’: List containing all sensor scans as numpy arrays.
field ‘occupations’: List containing numpy arrays with occupations.
field ‘tct_masks’: List containing numpy arrays of TCT masks.
field ‘ct_by_dot_masks’: List containing numpy arrays of CT_by_dot masks.
- field ‘line_coordinates’: List containing numpy arrays of line coordinates. Each row of the array specifies
the start and end points of one line.
- field ‘line_labels’: List containing a list of dictionaries (one dict for each line specified as line
coordinates).
field ‘metadata’: List containing dictionaries with all metadata (simcats configs) for each CSD.
field ‘ids’: List of the ids of the CSDs.
- Return type:
namedtuple