:py:mod:`simcats_datasets.loading` ================================== .. py:module:: simcats_datasets.loading .. autoapi-nested-parse:: Module with functionalities for loading data from dataset files (HDF5 format). Also contains functionalities for loading data as pytorch dataset with different ground truth types. Submodules ---------- .. toctree:: :titlesonly: :maxdepth: 1 load_ground_truth/index.rst pytorch/index.rst Package Contents ---------------- Functions ~~~~~~~~~ .. autoapisummary:: :nosignatures: simcats_datasets.loading.load_dataset Package Implementation Details ------------------------------ .. py:function:: load_dataset(file, load_csds = True, load_sensor_scans = False, load_occupations = False, load_tct_masks = False, load_ct_by_dot_masks = False, load_sensor_regime_masks = False, load_sensor_peak_center_masks = False, load_line_coords = False, load_line_labels = False, load_metadata = False, load_ids = False, specific_ids = None, progress_bar = False) Loads a dataset consisting of multiple CSDs from a given path. :param file: The file to read the data from. Can either be an object of the type `h5py.File` or the path to the dataset. If a path is supplied, load_dataset will open the file itself. If you want to do multiple consecutive loads from the same file (e.g. for using th PyTorch SimcatsDataset without preloading), consider initializing the file object yourself and passing it, to improve the performance. :param load_csds: Determines if CSDs should be loaded. A dataset can have either CSDs or sensor scans, but never both. Default is True. :param load_sensor_scans: Determines if sensor scans should be loaded. A dataset can have either CSDs or sensor scans, but never both. Default is False. :param load_occupations: Determines if occupation data should be loaded. Default is False. :param load_tct_masks: Determines if lead transition masks should be loaded. Default is False. :param load_ct_by_dot_masks: Determines if charge transition labeled by affected dot masks should be loaded. This requires that ct_by_dot_masks have been added to the dataset. If a dataset has been created using create_simulated_dataset, these masks can be added afterward using add_ct_by_dot_masks_to_dataset, mainly to avoid recalculating them multiple times (for example for machine learning purposes). Default is False. :param load_sensor_regime_masks: Determines if sensor regime masks should be loaded. Only sensor scan datasets contain sensor regime masks. Default is False. :param load_sensor_peak_center_masks: Determines if sensor peak center masks should be loaded. Only sensor scan datasets contain sensor peak center masks. Default is False. :param load_line_coords: Determines if lead transition definitions using start and end points should be loaded. Default is False. :param load_line_labels: Determines if labels for lead transitions defined using start and end points should be loaded. Default is False. :param load_metadata: Determines if the metadata (SimCATS config) of the CSDs should be loaded. Default is False. :param load_ids: Determines if the available ids should be loaded (or in case of specific ids: the specific ids are returned in the given order). Default is False. :param specific_ids: Determines if only specific ids should be loaded. Using this option, the returned values are sorted according to the specified ids and not necessarily ascending. If set to None, all data is loaded. Default is None. :param progress_bar: Determines whether to display a progress bar. This parameter has no functionality since version 2, but is kept for compatibility reasons. Default is False. :returns: The namedtuple can be unpacked like every normal tuple, or instead accessed by field names. Depending on what has been enabled, the following data is included in the named tuple (all lists are sorted by the id of the CSDs or sensor_scans if no specific_ids are provided, else the order is given by specific_ids): - field 'csds': List containing all CSDs as numpy arrays. - field 'sensor_scans': List containing all sensor scans as numpy arrays. - field 'occupations': List containing numpy arrays with occupations. - field 'tct_masks': List containing numpy arrays of TCT masks. - field 'ct_by_dot_masks': List containing numpy arrays of CT_by_dot masks. - field 'line_coordinates': List containing numpy arrays of line coordinates. Each row of the array specifies the start and end points of one line. - field 'line_labels': List containing a list of dictionaries (one dict for each line specified as line coordinates). - field 'metadata': List containing dictionaries with all metadata (simcats configs) for each CSD. - field 'ids': List of the ids of the CSDs. :rtype: namedtuple