PepEvaluationChallenge#

class pepbench.evaluation.PepEvaluationChallenge(*, dataset: ~pepbench.datasets._base_pep_extraction_dataset.BasePepDatasetWithAnnotations, scoring: ~collections.abc.Callable = <function score_pep_evaluation>, validate_kwargs: dict | None = None)[source]#

Evaluation challenge for PEP extraction pipelines.

This is the tpcp implementation of the evaluation challenge for PEP extraction pipelines. It evaluates a given PEP extraction pipeline on a dataset and produces aggregated and per-sample evaluation results stored as pandas DataFrames on the challenge instance.

Parameters:

datasetBasePepDatasetWithAnnotations: The dataset to evaluate. The dataset must implement the unified interface required by the evaluation utilities.
scoringCallable, optional: The scoring function to use for evaluation. The scoring function should accept the pipeline and a datapoint and return a dictionary with evaluation outputs. Default is pepbench.evaluation.score_pep_evaluation.
validate_kwargsdict, optional: Additional keyword arguments passed to tpcp.validate.Scorer.

Attributes:

results_dict: Raw results returned by tpcp.validate.validate.
results_agg_mean_std_pandas.DataFrame: Mean and standard deviation aggregated results.
results_agg_total_pandas.DataFrame: Total counts aggregated results.
results_single_pandas.DataFrame: Single (non-aggregated) results for each datapoint.
results_per_sample_pandas.DataFrame: Per-sample flattened results.

Methods

`clone`()	Create a new instance of the class with all parameters copied over.
`get_params`([deep])	Get parameters for this algorithm.
`results_as_df`()	Convert the raw validation results to pandas DataFrames and attach them to the instance.
`run`(pipeline)	Run the evaluation challenge for a given pipeline.
`save_results`(folder_path, filename_stub)	Save the results of the evaluation to disk.
`set_params`(**params)	Set the parameters of this Algorithm.

__init__(*, dataset: ~pepbench.datasets._base_pep_extraction_dataset.BasePepDatasetWithAnnotations, scoring: ~collections.abc.Callable = <function score_pep_evaluation>, validate_kwargs: dict | None = None) → None[source]#

Initialize a new evaluation challenge.

To initialize a new evaluation challenge, you need to provide a dataset and a scoring function. Afterwards, you can challenge a specific PEP extraction pipeline by passing it to the run method.

Parameters:

datasetBasePepDatasetWithAnnotations: The dataset to evaluate the pipeline on. The dataset needs to be a subclass of BaseUnifiedPepExtractionDataset, which provides the necessary unified interface to access the data.
scoringCallable, optional: The scoring function to use for the evaluation. The scoring function should take the pipeline and a datapoint from the dataset as input and return a dictionary with the evaluation results. The default scoring function is :func:pepbench.evaluation._scoring.score_pep_evaluation.
validate_kwargsdict, optional: Additional keyword arguments to pass to the :class:tpcp.validate.Scorer class.

run(pipeline: BasePepExtractionPipeline) → Self[source]#

Run the evaluation challenge for a given pipeline.

Executes validation using tpcp.validate.validate with a tpcp.validate.Scorer and aggregates timing information.

Parameters:

pipelineBasePepExtractionPipeline: The PEP extraction pipeline to evaluate. The pipeline needs to be a subclass of :class:pepbench.pipelines.BasePepExtractionPipeline and should be able to process the dataset.

Returns:

Self: The challenge instance with results stored as attributes (see class docstring).

save_results(folder_path: path_t, filename_stub: str) → None[source]#

Save the results of the evaluation to disk.

Saves timing information as JSON and DataFrame results as CSV files using the provided filename stub.

Parameters:

folder_pathpathlib.Path or str: Folder path to save the results to.
filename_stubstr: Filename stub to prefix saved files.

clone() → Self[source]#

Create a new instance of the class with all parameters copied over.

This will create a new instance of the class itself and all nested objects

get_params(deep: bool = True) → dict[str, Any][source]#

Get parameters for this algorithm.

Parameters:

deep: Only relevant if object contains nested algorithm objects. If this is the case and deep is True, the params of these nested objects are included in the output using a prefix like nested_object_name__ (Note the two “_” at the end)

Returns:

params: Parameter names mapped to their values.

set_params(**params: Any) → Self[source]#

Set the parameters of this Algorithm.

To set parameters of nested objects use nested_object_name__para_name=.

results_as_df() → Self[source]#

Convert the raw validation results to pandas DataFrames and attach them to the instance.

The method builds the following DataFrames and stores them as instance attributes:

results_agg_mean_std_: Mean and standard deviation of aggregated metrics.
results_agg_total_: Total counts (e.g. total/valid/invalid PEPs).
results_single_: Single (non-aggregated) results per datapoint.
results_per_sample_: Per-sample flattened results with multiindex columns.

Returns:

Self: The challenge instance with DataFrame attributes populated.

PepEvaluationChallenge#

This Page