Evaluation and Validation#

Evaluation challenges for systematic evaluation of PEP extraction pipelines.

The systematic evaluation of different PEP extraction pipelines requires a standardized evaluation procedure. This can be achieved by defining a challenge that evaluates a pipeline on a given dataset and returns the results. For that, tpcp provides the necessary functionality to define such challenges and compare the results of different pipelines, implemented in pepbench.evaluation.PepEvaluationChallenge.

The evaluation challenge takes a dataset and a scoring function and evaluates the performance of a PEP extraction pipeline on the given dataset. While the scoring function can be customized, the default scoring function is provided in pepbench.evaluation.score_pep_evaluation.

Evaluation Challenges#

PepEvaluationChallenge(*, dataset, scoring, ...)

Evaluation challenge for PEP extraction pipelines.

ChallengeResults(agg_mean_std, agg_total, ...)

Container for the results produced by a PEP evaluation challenge.

Evaluation Scoring Functions#

score_pep_evaluation(pipeline, datapoint)

Run a PEP extraction pipeline on a single datapoint and compute evaluation metrics.