score_pep_evaluation#

pepbench.evaluation.score_pep_evaluation(pipeline: BasePepExtractionPipeline, datapoint: BasePepDatasetWithAnnotations) → dict[source]#

Run a PEP extraction pipeline on a single datapoint and compute evaluation metrics.

The function executes the pipeline on datapoint and matches detected heartbeats to the reference. It computes a set of metrics that are either:

first averaged on the single datapoint and later aggregated across the dataset (returned as scalar floats),

passed through as single values per datapoint (to be aggregated later via a summation aggregator), or

returned as per-sample results (unaggregated) for downstream per-sample aggregation.

The following metrics are computed and returned:

Datapoint-level values that are typically aggregated across the dataset later: pep_reference_ms, pep_estimated_ms, error_ms, absolute_error_ms, absolute_relative_error_percent.
Datapoint-level counters intended for summation across the dataset: num_pep_total, num_pep_valid, num_pep_invalid.
A datapoint-level scalar passed through without aggregation: pearson_r.
Per-sample values kept unaggregated for downstream processing: pep_estimation_per_sample.
Metrics aggregated directly across all matched samples: error_per_sample_ms, absolute_error_per_sample_ms, absolute_relative_error_per_sample_percent.

Parameters:

pipelinepepbench.pipelines.BasePepExtractionPipeline: A PEP extraction pipeline instance. The pipeline will be run using its pepbench.pipelines.BasePepExtractionPipeline.safe_run method.
datapointpepbench.datasets.BasePepDatasetWithAnnotations: A single datapoint providing reference PEPs, reference heartbeats and sampling rate.

Returns:

dict: Dictionary containing the evaluation metrics. Some values are scalar floats, some are structures returned via tpcp.validate.no_agg and some are the result of per-sample aggregators.

score_pep_evaluation#

This Page