ragrank.evaluation.outputs

Contains the ouputs of evaluation

class ragrank.evaluation.outputs.EvalResult(*, llm: BaseLLM, metrics: List[BaseMetric], dataset: Dataset, scores: List[List[float]], response_time: float)

Represents the result of an evaluation.

llm

The language model used for evaluation.

Type:

BaseLLM

metrics

List of metrics used for evaluation.

Type:

List[BaseMetric]

dataset

The dataset used for evaluation.

Type:

Dataset

scores

List of scores for each metric.

Type:

List[List[float]]

response_time

Response time for the evaluation process.

Type:

float

model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ConfigDict = {'arbitrary_types_allowed': True, 'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'dataset': FieldInfo(annotation=Dataset, required=True, description='The dataset used for evaluation'), 'llm': FieldInfo(annotation=BaseLLM, required=True, description='The language model used for evaluation'), 'metrics': FieldInfo(annotation=List[BaseMetric], required=True, description='List of metrics used for evaluation.'), 'response_time': FieldInfo(annotation=float, required=True, description='Response time for the evaluation process.', metadata=[Gt(gt=0)]), 'scores': FieldInfo(annotation=List[List[float]], required=True, description='List of scores for each metric')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

to_dataframe() DataFrame

Convert the evaluation result to a pandas DataFrame.

Returns:

A DataFrame containing the evaluation results.

Return type:

DataFrame

to_dict() Dict[str, List[str] | str] | Dict[str, List[str] | List[List[str]]]

Convert the evaluation result to a dict.

Returns:

A dict containing the evaluation results.

Return type:

dict

validator() EvalResult

Validate the evaluation result after instantiation.

Raises:

ValueError – If the number of metrics and scores are not equal, or if the number of datapoints and scores are not balanced.