Building and Customizing PEP Extraction Pipelines#

A PEP extraction pipeline in pepbench is a tpcp pipeline that chains heartbeat segmentation, Q-peak detection, C- and B-point detection, optional outlier correction, and finally PEP computation.

The main class is pepbench.pipelines.PepExtractionPipeline.

Conceptual structure#

A pipeline is configured by choosing algorithms for each step:

heartbeat_segmentation_algo – ECG heartbeat boundaries
q_peak_algo – Q-peaks on ECG
b_point_algo – B-points on ICG
c_point_algo – C-points on ICG (optional but required for some B algorithms)
outlier_correction_algo – optional B-point post-processing

The pipeline then provides:

methods: run, safe_run
result attributes: heartbeat_segmentation_results_, q_peak_results_, c_point_results_, b_point_results_, b_point_after_outlier_correction_results_, pep_results_

A minimal pipeline#

from pepbench.algorithms.heartbeat_segmentation import HeartbeatSegmentationNeurokit
from pepbench.algorithms.ecg import QPeakExtractionVanLien2013
from pepbench.algorithms.icg import (
    BPointExtractionLozano2007LinearRegression,
    CPointExtractionScipyFindPeaks,
)
from pepbench.algorithms.outlier_correction import OutlierCorrectionLinearInterpolation
from pepbench.pipelines import PepExtractionPipeline
from pepbench.datasets import EmpkinsDataset

# 1. Load a dataset
ds = EmpkinsDataset(
    base_path="/path/to/empkins",
    only_labeled=True,
    exclude_missing_data=True,
)

datapoint = next(iter(ds))  # single datapoint

# 2. Configure algorithms
heartbeat_algo = HeartbeatSegmentationNeurokit()
q_algo = QPeakExtractionVanLien2013(time_interval_ms=40)
c_algo = CPointExtractionScipyFindPeaks()
b_algo = BPointExtractionLozano2007LinearRegression()
outlier_algo = OutlierCorrectionLinearInterpolation()

# 3. Build the pipeline
pipeline = PepExtractionPipeline(
    heartbeat_segmentation_algo=heartbeat_algo,
    q_peak_algo=q_algo,
    b_point_algo=b_algo,
    c_point_algo=c_algo,
    outlier_correction_algo=outlier_algo,
    handle_negative_pep="nan",
    handle_missing_events="warn",
)

# 4. Run on a single datapoint
pipeline = pipeline.safe_run(datapoint)

pep_df = pipeline.pep_results_
print(pep_df.head())

Why `safe_run`?#

PepExtractionPipeline.safe_run wraps run with additional sanity checks:

verifies that run returns self
checks that result attributes are set and follow the *_ naming convention
checks that input parameters are not mutated

When experimenting with custom pipelines or algorithms, prefer safe_run; once things are stable, run can be used directly if you need slightly less overhead.

Inspecting intermediate results#

Because the pipeline exposes results per step, you can inspect or plot intermediate stages:

hb = pipeline.heartbeat_segmentation_results_
q = pipeline.q_peak_results_
c = pipeline.c_point_results_
b_raw = pipeline.b_point_results_
b_corr = pipeline.b_point_after_outlier_correction_results_
pep = pipeline.pep_results_

# Example: join PEP with heartbeat table
pep_with_hb = pep.join(hb, how="left")

Configuring parameters#

All algorithms and the pipeline follow the tpcp get_params / set_params convention.

# Inspect all parameters (including nested algorithms)
print(pipeline.get_params())

# Change the Q-peak offset
pipeline = pipeline.set_params(q_peak_algo__time_interval_ms=50)

# Change outlier correction behavior
pipeline = pipeline.set_params(
    outlier_correction_algo__max_gap_beats=3,
)

After changing parameters, simply call safe_run again.

Working with multiple datapoints#

Pipelines are stateless with respect to the dataset: you reuse the same pipeline instance (or cloned copies) for each datapoint.

for dp in ds:
    res = pipeline.clone().safe_run(dp)
    # store res.pep_results_ somewhere

For large evaluation runs, you will normally let PepEvaluationChallenge handle this loop (see Running Evaluation Challenges).

`PepExtractionPipelineReferenceQPeaks` and `PepExtractionPipelineReferenceBPoints`#

pepbench also includes convenience pipelines that use reference annotations instead of algorithmic detection for either Q-peaks or B-points.

These are useful for:

upper-bound comparisons (algorithm vs “perfect” reference)
sanity checks on datasets and scoring

Refer to the API reference for these specialized classes.