score_pep_evaluation#

pepbench.evaluation.score_pep_evaluation(pipeline: BasePepExtractionPipeline, datapoint: BasePepDatasetWithAnnotations) dict[source]#

Run a PEP extraction pipeline on a single datapoint and compute evaluation metrics.

The function executes the pipeline on datapoint and matches detected heartbeats to the reference. It computes a set of metrics that are either:

  • first averaged on the single datapoint and later aggregated across the dataset (returned as scalar floats),

  • passed through as single values per datapoint (to be aggregated later via a summation aggregator), or

  • returned as per-sample results (unaggregated) for downstream per-sample aggregation.

The following metrics are computed and returned:

  • Datapoint-level values that are typically aggregated across the dataset later: pep_reference_ms, pep_estimated_ms, error_ms, absolute_error_ms, absolute_relative_error_percent.

  • Datapoint-level counters intended for summation across the dataset: num_pep_total, num_pep_valid, num_pep_invalid.

  • A datapoint-level scalar passed through without aggregation: pearson_r.

  • Per-sample values kept unaggregated for downstream processing: pep_estimation_per_sample.

  • Metrics aggregated directly across all matched samples: error_per_sample_ms, absolute_error_per_sample_ms, absolute_relative_error_per_sample_percent.

Parameters:
pipelinepepbench.pipelines.BasePepExtractionPipeline

A PEP extraction pipeline instance. The pipeline will be run using its pepbench.pipelines.BasePepExtractionPipeline.safe_run method.

datapointpepbench.datasets.BasePepDatasetWithAnnotations

A single datapoint providing reference PEPs, reference heartbeats and sampling rate.

Returns:
dict

Dictionary containing the evaluation metrics. Some values are scalar floats, some are structures returned via tpcp.validate.no_agg and some are the result of per-sample aggregators.