Building and Customizing PEP Extraction Pipelines#
A PEP extraction pipeline in pepbench is a
tpcp pipeline that chains heartbeat segmentation, Q-peak
detection, C- and B-point detection, optional outlier correction, and
finally PEP computation.
The main class is pepbench.pipelines.PepExtractionPipeline.
Conceptual structure#
A pipeline is configured by choosing algorithms for each step:
heartbeat_segmentation_algo– ECG heartbeat boundariesq_peak_algo– Q-peaks on ECGb_point_algo– B-points on ICGc_point_algo– C-points on ICG (optional but required for some B algorithms)outlier_correction_algo– optional B-point post-processing
The pipeline then provides:
methods:
run,safe_runresult attributes:
heartbeat_segmentation_results_,q_peak_results_,c_point_results_,b_point_results_,b_point_after_outlier_correction_results_,pep_results_
A minimal pipeline#
from pepbench.algorithms.heartbeat_segmentation import HeartbeatSegmentationNeurokit
from pepbench.algorithms.ecg import QPeakExtractionVanLien2013
from pepbench.algorithms.icg import (
BPointExtractionLozano2007LinearRegression,
CPointExtractionScipyFindPeaks,
)
from pepbench.algorithms.outlier_correction import OutlierCorrectionLinearInterpolation
from pepbench.pipelines import PepExtractionPipeline
from pepbench.datasets import EmpkinsDataset
# 1. Load a dataset
ds = EmpkinsDataset(
base_path="/path/to/empkins",
only_labeled=True,
exclude_missing_data=True,
)
datapoint = next(iter(ds)) # single datapoint
# 2. Configure algorithms
heartbeat_algo = HeartbeatSegmentationNeurokit()
q_algo = QPeakExtractionVanLien2013(time_interval_ms=40)
c_algo = CPointExtractionScipyFindPeaks()
b_algo = BPointExtractionLozano2007LinearRegression()
outlier_algo = OutlierCorrectionLinearInterpolation()
# 3. Build the pipeline
pipeline = PepExtractionPipeline(
heartbeat_segmentation_algo=heartbeat_algo,
q_peak_algo=q_algo,
b_point_algo=b_algo,
c_point_algo=c_algo,
outlier_correction_algo=outlier_algo,
handle_negative_pep="nan",
handle_missing_events="warn",
)
# 4. Run on a single datapoint
pipeline = pipeline.safe_run(datapoint)
pep_df = pipeline.pep_results_
print(pep_df.head())
Why safe_run?#
PepExtractionPipeline.safe_run wraps run with additional
sanity checks:
verifies that
runreturnsselfchecks that result attributes are set and follow the
*_naming conventionchecks that input parameters are not mutated
When experimenting with custom pipelines or algorithms, prefer
safe_run; once things are stable, run can be used directly if
you need slightly less overhead.
Inspecting intermediate results#
Because the pipeline exposes results per step, you can inspect or plot intermediate stages:
hb = pipeline.heartbeat_segmentation_results_
q = pipeline.q_peak_results_
c = pipeline.c_point_results_
b_raw = pipeline.b_point_results_
b_corr = pipeline.b_point_after_outlier_correction_results_
pep = pipeline.pep_results_
# Example: join PEP with heartbeat table
pep_with_hb = pep.join(hb, how="left")
Configuring parameters#
All algorithms and the pipeline follow the tpcp get_params
/ set_params convention.
# Inspect all parameters (including nested algorithms)
print(pipeline.get_params())
# Change the Q-peak offset
pipeline = pipeline.set_params(q_peak_algo__time_interval_ms=50)
# Change outlier correction behavior
pipeline = pipeline.set_params(
outlier_correction_algo__max_gap_beats=3,
)
After changing parameters, simply call safe_run again.
Working with multiple datapoints#
Pipelines are stateless with respect to the dataset: you reuse the same pipeline instance (or cloned copies) for each datapoint.
for dp in ds:
res = pipeline.clone().safe_run(dp)
# store res.pep_results_ somewhere
For large evaluation runs, you will normally let
PepEvaluationChallenge handle this loop (see
Running Evaluation Challenges).
PepExtractionPipelineReferenceQPeaks and PepExtractionPipelineReferenceBPoints#
pepbench also includes convenience pipelines that use reference annotations instead of algorithmic detection for either Q-peaks or B-points.
These are useful for:
upper-bound comparisons (algorithm vs “perfect” reference)
sanity checks on datasets and scoring
Refer to the API reference for these specialized classes.