ragrank.evaluation.outputs

Contains the ouputs of evaluation

class ragrank.evaluation.outputs.EvalResult(*, llm: BaseLLM, metrics: List[BaseMetric], dataset: Dataset, scores: List[List[float]], response_time: float)

Represents the result of an evaluation.

llm

The language model used for evaluation.

Type:: BaseLLM

metrics

List of metrics used for evaluation.

Type:: List[BaseMetric]

dataset

The dataset used for evaluation.

Type:: Dataset

scores

List of scores for each metric.

Type:: List[List[float]]

response_time

Response time for the evaluation process.

Type:: float

model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}: A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ConfigDict = {'arbitrary_types_allowed': True, 'frozen': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'dataset': FieldInfo(annotation=Dataset, required=True, description='The dataset used for evaluation'), 'llm': FieldInfo(annotation=BaseLLM, required=True, description='The language model used for evaluation'), 'metrics': FieldInfo(annotation=List[BaseMetric], required=True, description='List of metrics used for evaluation.'), 'response_time': FieldInfo(annotation=float, required=True, description='Response time for the evaluation process.', metadata=[Gt(gt=0)]), 'scores': FieldInfo(annotation=List[List[float]], required=True, description='List of scores for each metric')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

to_dataframe() → DataFrame

Convert the evaluation result to a pandas DataFrame.

Returns:: A DataFrame containing the evaluation results.
Return type:: DataFrame

to_dict() → Dict[str, List[str] | str] | Dict[str, List[str] | List[List[str]]]

Convert the evaluation result to a dict.

Returns:: A dict containing the evaluation results.
Return type:: dict

validator() → EvalResult

Validate the evaluation result after instantiation.

Raises:: ValueError – If the number of metrics and scores are not equal, or if the number of datapoints and scores are not balanced.