PepEvaluationChallenge#

class pepbench.evaluation.PepEvaluationChallenge(*, dataset: ~pepbench.datasets._base_pep_extraction_dataset.BasePepDatasetWithAnnotations, scoring: ~collections.abc.Callable = <function score_pep_evaluation>, validate_kwargs: dict | None = None)[source]#

Evaluation challenge for PEP extraction pipelines.

This is the tpcp implementation of the evaluation challenge for PEP extraction pipelines. It evaluates a given PEP extraction pipeline on a dataset and produces aggregated and per-sample evaluation results stored as pandas DataFrames on the challenge instance.

Parameters:
datasetBasePepDatasetWithAnnotations

The dataset to evaluate. The dataset must implement the unified interface required by the evaluation utilities.

scoringCallable, optional

The scoring function to use for evaluation. The scoring function should accept the pipeline and a datapoint and return a dictionary with evaluation outputs. Default is pepbench.evaluation.score_pep_evaluation.

validate_kwargsdict, optional

Additional keyword arguments passed to tpcp.validate.Scorer.

Attributes:
results_dict

Raw results returned by tpcp.validate.validate.

results_agg_mean_std_pandas.DataFrame

Mean and standard deviation aggregated results.

results_agg_total_pandas.DataFrame

Total counts aggregated results.

results_single_pandas.DataFrame

Single (non-aggregated) results for each datapoint.

results_per_sample_pandas.DataFrame

Per-sample flattened results.

Methods

clone()

Create a new instance of the class with all parameters copied over.

get_params([deep])

Get parameters for this algorithm.

results_as_df()

Convert the raw validation results to pandas DataFrames and attach them to the instance.

run(pipeline)

Run the evaluation challenge for a given pipeline.

save_results(folder_path, filename_stub)

Save the results of the evaluation to disk.

set_params(**params)

Set the parameters of this Algorithm.

__init__(*, dataset: ~pepbench.datasets._base_pep_extraction_dataset.BasePepDatasetWithAnnotations, scoring: ~collections.abc.Callable = <function score_pep_evaluation>, validate_kwargs: dict | None = None) None[source]#

Initialize a new evaluation challenge.

To initialize a new evaluation challenge, you need to provide a dataset and a scoring function. Afterwards, you can challenge a specific PEP extraction pipeline by passing it to the run method.

Parameters:
datasetBasePepDatasetWithAnnotations

The dataset to evaluate the pipeline on. The dataset needs to be a subclass of BaseUnifiedPepExtractionDataset, which provides the necessary unified interface to access the data.

scoringCallable, optional

The scoring function to use for the evaluation. The scoring function should take the pipeline and a datapoint from the dataset as input and return a dictionary with the evaluation results. The default scoring function is :func:pepbench.evaluation._scoring.score_pep_evaluation.

validate_kwargsdict, optional

Additional keyword arguments to pass to the :class:tpcp.validate.Scorer class.

run(pipeline: BasePepExtractionPipeline) Self[source]#

Run the evaluation challenge for a given pipeline.

Executes validation using tpcp.validate.validate with a tpcp.validate.Scorer and aggregates timing information.

Parameters:
pipelineBasePepExtractionPipeline

The PEP extraction pipeline to evaluate. The pipeline needs to be a subclass of :class:pepbench.pipelines.BasePepExtractionPipeline and should be able to process the dataset.

Returns:
Self

The challenge instance with results stored as attributes (see class docstring).

save_results(folder_path: path_t, filename_stub: str) None[source]#

Save the results of the evaluation to disk.

Saves timing information as JSON and DataFrame results as CSV files using the provided filename stub.

Parameters:
folder_pathpathlib.Path or str

Folder path to save the results to.

filename_stubstr

Filename stub to prefix saved files.

clone() Self[source]#

Create a new instance of the class with all parameters copied over.

This will create a new instance of the class itself and all nested objects

get_params(deep: bool = True) dict[str, Any][source]#

Get parameters for this algorithm.

Parameters:
deep

Only relevant if object contains nested algorithm objects. If this is the case and deep is True, the params of these nested objects are included in the output using a prefix like nested_object_name__ (Note the two “_” at the end)

Returns:
params

Parameter names mapped to their values.

set_params(**params: Any) Self[source]#

Set the parameters of this Algorithm.

To set parameters of nested objects use nested_object_name__para_name=.

results_as_df() Self[source]#

Convert the raw validation results to pandas DataFrames and attach them to the instance.

The method builds the following DataFrames and stores them as instance attributes:
  • results_agg_mean_std_: Mean and standard deviation of aggregated metrics.

  • results_agg_total_: Total counts (e.g. total/valid/invalid PEPs).

  • results_single_: Single (non-aggregated) results per datapoint.

  • results_per_sample_: Per-sample flattened results with multiindex columns.

Returns:
Self

The challenge instance with DataFrame attributes populated.