TimeWindowIcgDataset#

class pepbench.datasets.TimeWindowIcgDataset(base_path: path_t, groupby_cols: Sequence[str] | None = None, subset_index: Sequence[str] | None = None, *, return_clean: bool = True, use_cache: bool = True, exclude_r_peak_detection_errors: bool = True, only_labeled: bool = False)[source]#

Dataset accessor for TimeWindow ICG data.

The dataset exposes methods and properties to load raw and cleaned ECG/ICG data, compute and return heartbeat borders, and load reference labels for ECG and ICG channels.

Parameters:

base_pathstr or pathlib.Path: Path to the dataset root directory containing signals, annotations and reference_heartbeats subfolders.
groupby_colssequence of str or None, optional: Columns to group the dataset by when creating indices. Default is None (no grouping).
subset_indexsequence of tuple or None, optional: Explicit subset of the dataset index to work with. Default is None.
return_cleanbool, optional: If True (default), return cleaned signals for ecg and icg properties; otherwise return raw signals.
use_cachebool, optional: If True (default), cache parsed .txt files to speed up repeated loads.
exclude_r_peak_detection_errorsbool, optional: If True (default), exclude known participants with R-peak detection issues (see SUBSET_R_PEAK_DETECTION_ERRORS).
only_labeledbool, optional: If True, restrict the dataset to entries that have labels.

Attributes:

SAMPLING_RATEint: Sampling rate of raw signals in Hz (default: 2000).
PHASESsequence of str: Known experiment phases (default: ["Baseline", "EmotionInduction"]).
SUBSET_R_PEAK_DETECTION_ERRORSsequence of str: Participant-phase pairs to exclude by default due to R-peak detection errors.
base_pathpathlib.Path: Normalized dataset base path.
use_cachebool: Whether to use cached parsing for text files.
exclude_r_peak_detection_errorsbool: Whether to exclude problematic recordings.
data_to_excludesequence of tuple: List of participant-phase pairs that should be dropped from the index.

Methods

`as_attrs`()	Return a version of the Dataset class that can be subclassed using `attrs` defined classes.
`as_dataclass`()	Return a version of the Dataset class that can be subclassed using dataclasses.
`assert_is_single`(groupby_cols, property_name)	Raise error if index does contain more than one group/row with the given groupby settings.
`assert_is_single_group`(property_name)	Raise error if index does contain more than one group/row.
`clone`()	Create a new instance of the class with all parameters copied over.
`create_index`()	Create a dataset index of participant/phase rows.
`create_string_group_labels`(label_cols)	Generate a list of string labels for each group/row in the dataset.
`get_params`([deep])	Get parameters for this algorithm.
`get_subset`(*[, group_labels, index, bool_map])	Get a subset of the dataset.
`groupby`(groupby_cols)	Return a copy of the dataset grouped by the specified columns.
`index_as_tuples`()	Get all datapoint labels of the dataset (i.e. a list of the rows of the index as named tuples).
`is_single`(groupby_cols)	Return True if index contains only one row/group with the given groupby settings.
`is_single_group`()	Return True if index contains only one group.
`iter_level`(level)	Return generator object containing a subset for every category from the selected level.
`set_params`(**params)	Set the parameters of this Algorithm.

create_group_labels

__init__(base_path: path_t, groupby_cols: Sequence[str] | None = None, subset_index: Sequence[str] | None = None, *, return_clean: bool = True, use_cache: bool = True, exclude_r_peak_detection_errors: bool = True, only_labeled: bool = False) → None[source]#

Initialize the dataset.

See class-level documentation for parameter meanings.

Raises:

OSError: If the provided base_path does not exist or required folders are missing (implementation-specific checks may raise other errors).

create_index() → DataFrame[source]#

Create a dataset index of participant/phase rows.

The index contains one row per combination of participant and phase (PHASES). Participant identifiers are derived from file names in the signals folder and normalized to the form IDN_XX.

Returns:

DataFrame: DataFrame with columns participant and phase and a default integer index. Rows present in data_to_exclude are dropped.

property sampling_rate_ecg: int#

Sampling rate used for ECG processing.

Returns:

int: Sampling rate in Hz (same as SAMPLING_RATE).

property sampling_rate_icg: int#

Sampling rate used for ICG processing.

Returns:

int: Sampling rate in Hz (same as SAMPLING_RATE).

property data: DataFrame#

Load raw signal data for the selected participant (and phase).

The returned DataFrame contains numeric columns (including ecg and icg_der) and uses a pandas.TimedeltaIndex named t representing seconds.

If the dataset is restricted to a single phase, the data is sliced at 120 seconds: Baseline returns 0–120s, EmotionInduction returns 120s–end.

Returns:

DataFrame: Time-indexed signal data.

Raises:

ValueError: If called when the dataset is not restricted to a single participant.

property ecg: _EcgRawDataFrame | DataFrame#

Return ECG data for the active subset.

If return_clean was set to True during initialization, the ECG is cleaned using EcgPreprocessingNeurokit.

Returns:

EcgRawDataFrame: ECG signals (cleaned or raw) as expected by downstream algorithms.

property icg: _IcgRawDataFrame | DataFrame#

Return ICG data for the active subset.

If return_clean was set to True during initialization, the ICG derivative channel is cleaned using IcgPreprocessingBandpass.

Returns:

IcgRawDataFrame: ICG signals (cleaned or raw).

property heartbeats: DataFrame#

Segment ECG into heartbeat borders.

Uses HeartbeatSegmentationNeurokit to compute heartbeat borders and returns the algorithm’s heartbeat list.

Returns:

DataFrame: Heartbeat borders and related metadata (one row per heartbeat).

property labeling_borders: DataFrame#

Return the labeling borders for the currently selected participant/phase.

The returned DataFrame contains one row with columns start_sample and end_sample (inclusive indices) describing the available sample range.

Returns:

DataFrame: Single-row DataFrame with labeling borders.

Raises:

ValueError: If called when the dataset is not restricted to a single participant.

property reference_heartbeats: DataFrame#

Load reference heartbeat borders.

Reference heartbeats are read from reference_heartbeats/IDN<id>.csv and indexed by heartbeat_id. When a single phase is selected, borders are filtered and (for EmotionInduction) shifted to start at zero.

Returns:

DataFrame: Reference heartbeat borders with integer sample columns: start_sample, end_sample, r_peak_sample, rr_interval_sample, and time column start_time (float seconds).

Raises:

ValueError: If reference heartbeat folder is missing or when called for multiple participants.

classmethod as_attrs()[source]#

Return a version of the Dataset class that can be subclassed using attrs defined classes.

Note, this requires attrs to be installed!

classmethod as_dataclass()[source]#: Return a version of the Dataset class that can be subclassed using dataclasses.

assert_is_single(groupby_cols: list[str] | str | None, property_name) → None[source]#

Raise error if index does contain more than one group/row with the given groupby settings.

This should be used when implementing access to data values, which can only be accessed when only a single trail/participant/etc. exist in the dataset.

Parameters:

groupby_cols: None (no grouping) or a valid subset of the columns available in the dataset index.
property_name: Name of the property this check is used in. Used to format the error message.

assert_is_single_group(property_name) → None[source]#

Raise error if index does contain more than one group/row.

Note that this is different from assert_is_single as it is aware of the current grouping. Instead of checking that a certain combination of columns is left in the dataset, it checks that only a single group exists with the already selected grouping as defined by self.groupby_cols.

Parameters:

property_name: Name of the property this check is used in. Used to format the error message.

clone() → Self[source]#

Create a new instance of the class with all parameters copied over.

This will create a new instance of the class itself and all nested objects

create_string_group_labels(label_cols: str | list[str]) → list[str][source]#

Generate a list of string labels for each group/row in the dataset.

Note

This has a different use case than the dataset-wide groupby. Using groupby reduces the effective size of the dataset to the number of groups. This method produces a group label for each group/row that is already in the dataset, without changing the dataset.

The output of this method can be used in combination with GroupKFold as the group label.

Parameters:

label_cols: The columns that should be included in the label. If the dataset is already grouped, this must be a subset of self.groupby_cols.

get_params(deep: bool = True) → dict[str, Any][source]#

Get parameters for this algorithm.

Parameters:

deep: Only relevant if object contains nested algorithm objects. If this is the case and deep is True, the params of these nested objects are included in the output using a prefix like nested_object_name__ (Note the two “_” at the end)

Returns:

params: Parameter names mapped to their values.

get_subset(*, group_labels: list[tuple[str, ...]] | None = None, index: DataFrame | None = None, bool_map: Sequence[bool] | None = None, **kwargs: list[str] | str) → Self[source]#

Get a subset of the dataset.

Note

All arguments are mutable exclusive!

Parameters:

group_labels: A valid row locator or slice that can be passed to self.grouped_index.loc[locator, :]. This basically needs to be a subset of self.group_labels. Note that this is the only indexer that works on the grouped index. All other indexers work on the pure index.
index: pd.DataFrame that is a valid subset of the current dataset index.
bool_map: bool-map that is used to index the current index-dataframe. The list must be of same length as the number of rows in the index.
**kwargs: The key must be the name of an index column. The value is a list containing strings that correspond to the categories that should be kept. For examples see above.

Returns:

subset: New dataset object filtered by specified parameters.

property group: GroupLabelT#: Get the current group label. Deprecated, use group_label instead.

property group_label: GroupLabelT#

Get the current group label.

The group is defined by the current groupby settings.

Note, this attribute can only be used, if there is just a single group. This will return a named tuple. The tuple will contain only one entry if there is only a single groupby column or column in the index. The elements of the named tuple will have the same names as the groupby columns and will be in the same order.

property group_labels: list[GroupLabelT]#

Get all group labels of the dataset based on the set groupby level.

This will return a list of named tuples. The tuples will contain only one entry if there is only one groupby level or index column.

The elements of the named tuples will have the same names as the groupby columns and will be in the same order.

Note, that if one of the groupby levels/index columns is not a valid Python attribute name (e.g. in contains spaces or starts with a number), the named tuple will not contain the correct column name! For more information see the documentation of the rename parameter of collections.namedtuple.

For some examples and additional explanation see this example.

groupby(groupby_cols: list[str] | str | None) → Self[source]#

Return a copy of the dataset grouped by the specified columns.

This does not change the order of the rows of the dataset index.

Each unique group represents a single data point in the resulting dataset.

Parameters:

groupby_cols: None (no grouping) or a valid subset of the columns available in the dataset index.

property grouped_index: DataFrame#: Return the index with the groupby columns set as multiindex.

property groups: list[GroupLabelT]#: Get the current group labels. Deprecated, use group_labels instead.

property index: DataFrame#: Get index.

index_as_tuples() → list[GroupLabelT][source]#: Get all datapoint labels of the dataset (i.e. a list of the rows of the index as named tuples).

property index_is_unchanged: bool#

Returns True if the index is the same as the one created by create_index.

This can be used to check, if the index represents a subset or the actual full index. Note, that this is independent of the groupby_cols setting.

Note

Under the hood this uses the attrs functionality of pandas to store a hash of the original index on the dataframe. If the index is modified or a new index is created, this property does either not exist anymore or the content is modified.

is_single(groupby_cols: list[str] | str | None) → bool[source]#

Return True if index contains only one row/group with the given groupby settings.

If groupby_cols=None this checks if there is only a single row left. If you want to check if there is only a single group within the current grouping, use is_single_group instead.

Parameters:

groupby_cols: None (no grouping) or a valid subset of the columns available in the dataset index.

is_single_group() → bool[source]#: Return True if index contains only one group.

iter_level(level: str) → Iterator[Self][source]#

Return generator object containing a subset for every category from the selected level.

Parameters:

level: Optional str that sets the level which shall be used for iterating. This must be one of the columns names of the index.

Returns:

subset: New dataset object containing only one category in the specified level.

property reference_labels_icg: DataFrame#

Load reference ICG labels and align them to heartbeat IDs.

The method loads raw B-point annotations from the annotations folder, adjusts them for the selected phase (slicing and shifting) and matches B-points to heartbeat IDs using available reference heartbeat borders.

Returns:

DataFrame: Multi-indexed DataFrame indexed by (heartbeat_id, channel, label) with a single column sample_relative containing the sample index of the annotated point relative to the phase start.

Raises:

ValueError: If called for multiple participants.

property reference_pep: DataFrame#

Compute the reference PEP values between the reference Q-peak and B-point labels.

Returns:

DataFrame: DataFrame containing the computed PEP values.

set_params(**params: Any) → Self[source]#

Set the parameters of this Algorithm.

To set parameters of nested objects use nested_object_name__para_name=.

property shape: tuple[int]#

Get the shape of the dataset.

This only reports a single dimension. This is equal to the number of rows in the index, if self.groupby_cols=None. Otherwise, it is equal to the number of unique groups.

property reference_labels_ecg: DataFrame#

Compute reference ECG labels (Q-peaks) from ECG and reference heartbeats.

Q-peak extraction is performed by QPeakExtractionVanLien2013 and results are returned as a multi-indexed DataFrame consistent with other datasets in the project.

Returns:

DataFrame: Multi-indexed DataFrame indexed by (heartbeat_id, channel, label) with a single column sample_relative.

Raises:

ValueError: If called for multiple participants (since ECG and heartbeats must come from a single recording).

TimeWindowIcgDataset#

This Page