GuardianDataset#
- class pepbench.datasets.GuardianDataset(base_path: path_t, groupby_cols: Sequence[str] | None = None, subset_index: Sequence[str] | None = None, *, return_clean: bool = True, exclude_no_recorded_data: bool = True, exclude_noisy_data: bool = True, use_cache: bool = True, only_labeled: bool = False, label_type: str = 'rater_01')[source]#
Dataset class for the Guardian Dataset.
Provides access to Task Force Monitor ECG/ICG signals, preprocessed signals, timelogs describing experimental phases, reference annotations, and participant metadata.
- Parameters:
- base_path
Path Path to the root directory of the Guardian dataset.
- groupby_colssequence of str, optional
Columns to group the dataset index by.
- subset_indexsequence of str, optional
Subset of the dataset index to operate on.
- return_cleanbool, optional
If True, return preprocessed/cleaned ECG and ICG signals. Default is True.
- exclude_no_recorded_databool, optional
If True, exclude known participant/phase combinations with no recorded data. Default is True.
- exclude_noisy_databool, optional
If True, exclude known noisy participant/phase combinations. Default is True.
- use_cachebool, optional
If True, cache loading of TFM files. Default is True.
- only_labeledbool, optional
If True, return only labeled sections (cut to labeling borders). Default is False.
- label_type{‘rater_01’, ‘rater_02’, ‘average’}, optional
Which label set to use for reference annotations. Default is ‘rater_01’.
- base_path
- Attributes:
- SAMPLING_RATESdict
Per-channel sampling rates in Hz.
- PHASESsequence
Ordered list of experimental phases.
- GENDER_MAPPINGdict
Mapping to recode gender values from the source.
- SUBSET_NO_RECORDED_DATA, SUBSET_NOISY_DATAsequence
Known participant/phase tuples to optionally exclude.
Methods
as_attrs()Return a version of the Dataset class that can be subclassed using
attrsdefined classes.Return a version of the Dataset class that can be subclassed using dataclasses.
assert_is_single(groupby_cols, property_name)Raise error if index does contain more than one group/row with the given groupby settings.
assert_is_single_group(property_name)Raise error if index does contain more than one group/row.
clone()Create a new instance of the class with all parameters copied over.
Create the dataset index DataFrame.
create_string_group_labels(label_cols)Generate a list of string labels for each group/row in the dataset.
get_params([deep])Get parameters for this algorithm.
get_subset(*[, group_labels, index, bool_map])Get a subset of the dataset.
groupby(groupby_cols)Return a copy of the dataset grouped by the specified columns.
Get all datapoint labels of the dataset (i.e. a list of the rows of the index as named tuples).
is_single(groupby_cols)Return True if index contains only one row/group with the given groupby settings.
Return True if index contains only one group.
iter_level(level)Return generator object containing a subset for every category from the selected level.
set_params(**params)Set the parameters of this Algorithm.
create_group_labels
- __init__(base_path: path_t, groupby_cols: Sequence[str] | None = None, subset_index: Sequence[str] | None = None, *, return_clean: bool = True, exclude_no_recorded_data: bool = True, exclude_noisy_data: bool = True, use_cache: bool = True, only_labeled: bool = False, label_type: str = 'rater_01') None[source]#
Initialize a new
GuardianDatasetinstance.- Parameters:
- base_path
Pathor str Path to the root directory of the Guardian dataset.
- return_cleanbool
Whether to return the preprocessed/cleaned ECG and ICG data when accessing the respective properties. Default:
True.- exclude_no_recorded_databool, optional
Whether to exclude participants with no recorded data. Default:
True.- exclude_noisy_databool, optional
Whether to exclude participants with noisy data. Default:
True.- use_cachebool, optional
Whether to use caching for loading TFM data. Default:
True.- only_labeledbool, optional
Whether to only return segments that are labeled (i.e., cut the data to the labeling borders). This is necessary when using the dataset for evaluating the performance of PEP extraction algorithms or for training ML-based PEP extraction algorithms. Default:
False.- label_type: str, optional
Which annotations to use. Can be either “rater_01”, “rater_02”, or “average”. Default: “rater_01”.
- base_path
- create_index() DataFrame[source]#
Create the dataset index DataFrame.
- Returns:
DataFrameDataset index with columns “participant” and “phase”.
- property sampling_rates: dict[str, int]#
Return sampling rates of the ECG and ICG signals.
- Returns:
- dict
Dictionary with the sampling rates of the ECG and ICG signals in Hz.
- property sampling_rate_ecg: int#
Return sampling rate of the ECG signal.
- Returns:
- int
Sampling rate of the ECG signal in Hz.
- property sampling_rate_icg: int#
Return sampling rate of the ICG signal.
- Returns:
- int
Sampling rate of the ICG signal in Hz.
- property date: Series | Timestamp#
Return recording date(s) for the selected participant(s).
- property tfm_data: DataFrame | dict[str, DataFrame]#
Task Force Monitor (TFM) data for the current selection.
The property loads raw TFM data files for a single participant. It supports accessing either a single phase or all phases for that participant. When
only_labeledis True, returned signals are cut to the labeling borders.- Returns:
DataFrameor dictIf a single phase is selected, a DataFrame of channel signals is returned. If all phases are selected, a dict mapping phase names to DataFrames is returned.
- Raises:
- ValueError
If accessed for multiple participants or unsupported multi-phase selections.
- property icg: DataFrame#
Return ICG channel for the current selection.
If
return_cleanis True the ICG is preprocessed usingIcgPreprocessingBandpass.- Returns:
DataFrameICG signal (cleaned or raw) for the selected participant/phase.
- Raises:
- ValueError
If not operating on a single participant and phase.
- property ecg: DataFrame#
Return ECG channel for the current selection.
If
return_cleanis True the ECG is preprocessed usingEcgPreprocessingNeurokit.- Returns:
DataFrameECG signal (cleaned or raw) for the selected participant/phase.
- Raises:
- ValueError
If not operating on a single participant and phase.
- property labeling_borders: DataFrame#
Return labeling borders describing annotated segments for a participant.
- Returns:
DataFrameLabeling borders with columns including
sample_absoluteanddescription.
- Raises:
- ValueError
If not operating on a single participant.
- FileNotFoundError
If the expected labeling borders CSV is missing for the participant.
- property reference_heartbeats: DataFrame#
Return computed reference heartbeat markers derived from ECG reference labels.
- Returns:
DataFrameHeartbeat segmentation/reference table derived from ECG reference labels.
- property reference_labels_ecg: DataFrame#
Return reference labels for a given channel and the current selection.
- Returns:
DataFrameor dict- If a single phase is selected, returns a DataFrame for that
- phase. If all phases are selected, returns a concatenated DataFrame indexed
- by phase.
- property reference_labels_icg: DataFrame#
Return the reference labels for the ICG signal.
- Returns:
DataFrameReference labels for the ICG signal as a pandas DataFrame
- property heartbeats: DataFrame#
Segment heartbeats from the ECG data and return the heartbeat borders.
- Returns:
DataFrameHeartbeats as a DataFrame.
- property metadata: DataFrame#
Return metadata for the selected participants.
- Returns:
DataFrameMetadata as a DataFrame.
- property age: DataFrame#
Return the age of the selected participants.
- Returns:
DataFrameAge as a DataFrame.
- classmethod as_attrs()[source]#
Return a version of the Dataset class that can be subclassed using
attrsdefined classes.Note, this requires
attrsto be installed!
- classmethod as_dataclass()[source]#
Return a version of the Dataset class that can be subclassed using dataclasses.
- assert_is_single(groupby_cols: list[str] | str | None, property_name) None[source]#
Raise error if index does contain more than one group/row with the given groupby settings.
This should be used when implementing access to data values, which can only be accessed when only a single trail/participant/etc. exist in the dataset.
- Parameters:
- groupby_cols
None (no grouping) or a valid subset of the columns available in the dataset index.
- property_name
Name of the property this check is used in. Used to format the error message.
- assert_is_single_group(property_name) None[source]#
Raise error if index does contain more than one group/row.
Note that this is different from
assert_is_singleas it is aware of the current grouping. Instead of checking that a certain combination of columns is left in the dataset, it checks that only a single group exists with the already selected grouping as defined byself.groupby_cols.- Parameters:
- property_name
Name of the property this check is used in. Used to format the error message.
- property base_demographics: DataFrame#
Return base demographics of the participants.
- Returns:
DataFrameThe base demographics DataFrame including gender, age, and BMI.
- clone() Self[source]#
Create a new instance of the class with all parameters copied over.
This will create a new instance of the class itself and all nested objects
- create_string_group_labels(label_cols: str | list[str]) list[str][source]#
Generate a list of string labels for each group/row in the dataset.
Note
This has a different use case than the dataset-wide groupby. Using
groupbyreduces the effective size of the dataset to the number of groups. This method produces a group label for each group/row that is already in the dataset, without changing the dataset.The output of this method can be used in combination with
GroupKFoldas the group label.- Parameters:
- label_cols
The columns that should be included in the label. If the dataset is already grouped, this must be a subset of
self.groupby_cols.
- property gender: DataFrame#
Return the gender of the selected participants.
- Returns:
DataFrameGender as a pandas DataFrame, recoded as {“M”: “Male”, “F”: “Female”}
- get_params(deep: bool = True) dict[str, Any][source]#
Get parameters for this algorithm.
- Parameters:
- deep
Only relevant if object contains nested algorithm objects. If this is the case and deep is True, the params of these nested objects are included in the output using a prefix like
nested_object_name__(Note the two “_” at the end)
- Returns:
- params
Parameter names mapped to their values.
- get_subset(*, group_labels: list[tuple[str, ...]] | None = None, index: DataFrame | None = None, bool_map: Sequence[bool] | None = None, **kwargs: list[str] | str) Self[source]#
Get a subset of the dataset.
Note
All arguments are mutable exclusive!
- Parameters:
- group_labels
A valid row locator or slice that can be passed to
self.grouped_index.loc[locator, :]. This basically needs to be a subset ofself.group_labels. Note that this is the only indexer that works on the grouped index. All other indexers work on the pure index.- index
pd.DataFramethat is a valid subset of the current dataset index.- bool_map
bool-map that is used to index the current index-dataframe. The list must be of same length as the number of rows in the index.
- **kwargs
The key must be the name of an index column. The value is a list containing strings that correspond to the categories that should be kept. For examples see above.
- Returns:
- subset
New dataset object filtered by specified parameters.
- property group: GroupLabelT#
Get the current group label. Deprecated, use
group_labelinstead.
- property group_label: GroupLabelT#
Get the current group label.
The group is defined by the current groupby settings.
Note, this attribute can only be used, if there is just a single group. This will return a named tuple. The tuple will contain only one entry if there is only a single groupby column or column in the index. The elements of the named tuple will have the same names as the groupby columns and will be in the same order.
- property group_labels: list[GroupLabelT]#
Get all group labels of the dataset based on the set groupby level.
This will return a list of named tuples. The tuples will contain only one entry if there is only one groupby level or index column.
The elements of the named tuples will have the same names as the groupby columns and will be in the same order.
Note, that if one of the groupby levels/index columns is not a valid Python attribute name (e.g. in contains spaces or starts with a number), the named tuple will not contain the correct column name! For more information see the documentation of the
renameparameter ofcollections.namedtuple.For some examples and additional explanation see this example.
- groupby(groupby_cols: list[str] | str | None) Self[source]#
Return a copy of the dataset grouped by the specified columns.
This does not change the order of the rows of the dataset index.
Each unique group represents a single data point in the resulting dataset.
- Parameters:
- groupby_cols
None (no grouping) or a valid subset of the columns available in the dataset index.
- property grouped_index: DataFrame#
Return the index with the
groupbycolumns set as multiindex.
- property groups: list[GroupLabelT]#
Get the current group labels. Deprecated, use
group_labelsinstead.
- property index: DataFrame#
Get index.
- index_as_tuples() list[GroupLabelT][source]#
Get all datapoint labels of the dataset (i.e. a list of the rows of the index as named tuples).
- property index_is_unchanged: bool#
Returns True if the index is the same as the one created by
create_index.This can be used to check, if the index represents a subset or the actual full index. Note, that this is independent of the
groupby_colssetting.Note
Under the hood this uses the attrs functionality of pandas to store a hash of the original index on the dataframe. If the index is modified or a new index is created, this property does either not exist anymore or the content is modified.
- is_single(groupby_cols: list[str] | str | None) bool[source]#
Return True if index contains only one row/group with the given groupby settings.
If
groupby_cols=Nonethis checks if there is only a single row left. If you want to check if there is only a single group within the current grouping, useis_single_groupinstead.- Parameters:
- groupby_cols
None (no grouping) or a valid subset of the columns available in the dataset index.
- iter_level(level: str) Iterator[Self][source]#
Return generator object containing a subset for every category from the selected level.
- Parameters:
- level
Optional
strthat sets the level which shall be used for iterating. This must be one of the columns names of the index.
- Returns:
- subset
New dataset object containing only one category in the specified
level.
- property reference_pep: DataFrame#
Compute the reference PEP values between the reference Q-peak and B-point labels.
- Returns:
DataFrameDataFrame containing the computed PEP values.
- set_params(**params: Any) Self[source]#
Set the parameters of this Algorithm.
To set parameters of nested objects use
nested_object_name__para_name=.