Data Handling#

Module for various data handling helper functions.

This package provides helpers for:

  • Adding a unique identifier column to results dataframes to track samples across datasets and enable reliable merging.

  • Computing improvement metrics and running the evaluation pipeline with optional outlier handling.

  • Producing performance summaries tailored to PEP estimation tasks for evaluation and reporting.

  • Assessing and quantifying the relationship between reference PEP measurements and heart rate signals.

  • Generating descriptive statistics and diagnostic summaries for PEP values across datasets or cohorts.

  • Loading and filtering input data specific to a chosen algorithm or experimental setup.

  • Computing error metrics grouped by experimental factors and aggregating those error statistics for analysis.

  • Extracting PEP annotations or reference signals from dataset records for downstream processing.

  • Retrieving canonical reference datasets or records used for benchmarking and validation.

  • Merging metric outputs and reconciling per-sample results produced by multiple annotators into a unified view.

  • Converting series of RR intervals into instantaneous heart rate values.

  • Exposing miscellaneous low-level utility helpers used across the data handling code.

Core helpers#

add_unique_id_to_results_dataframe(data[, ...])

Add a unique ID to the results dataframe.

compute_improvement_outlier_correction(data, ...)

Compute the percentage of samples which improved, deteriorated, or remained unchanged after outlier correction.

compute_improvement_pipeline(data, pipelines)

Compute the percentage of samples which showed sign changes in the error metric between two pipelines.

compute_pep_performance_metrics(...[, ...])

Compute the performance metrics for the PEP values.

correlation_reference_pep_heart_rate(data[, ...])

Compute the correlation between the reference PEP values and the heart rate.

describe_pep_values(data[, group_cols, metrics])

Compute the descriptive statistics for the PEP values using the pandas.DataFrame.describe method.

get_data_for_algo(results_per_sample, algo_combi)

Extract the data for a specific algorithm combination from the results-per-sample dataframe.

get_error_by_group(results_per_sample[, ...])

Compute mean and standard deviation of the error metric by group.

get_pep_for_algo(results_per_sample, algo_combi)

Extract the PEP values for a specific algorithm combination from the results-per-sample dataframe.

get_reference_data(results_per_sample)

Extract the reference data from the results-per-sample dataframe.

get_reference_pep(results_per_sample)

Extract the reference PEP values from the results-per-sample dataframe.

merge_result_metrics_from_multiple_annotators(results)

Merge result metrics from multiple annotators into a single dataframe.

merge_results_per_sample_from_different_annotators(results)

Merge results-per-sample dataframes from different annotators into a single dataframe.

rr_interval_to_heart_rate(data)

Convert RR intervals in milliseconds to heart rate in beats per minute.

Utilities#

Utility functions for data handling.

This module provides utility functions for handling data from the EmpkinsDataset and GuardianDataset. The functions include reindexing and renaming of the data according to specific mappings for conditions and phases. The reindexing can be performed either before or after the data has been renamed, depending on the user’s needs.

reindex_empkins(data[, after_rename])

Reindex data from the EmpkinsDataset.

rename_empkins(data)

Rename the data from the EmpkinsDataset.