ragrank.evaluation

`ragrank.evaluation.base`	evaluation: the main module
`ragrank.evaluation.outputs`	Contains the ouputs of evaluation

The main module for ragrank

class ragrank.evaluation.EvalResult(*, llm: BaseLLM, metrics: List[BaseMetric], dataset: Dataset, scores: List[List[float]], response_time: float)

Represents the result of an evaluation.

llm

The language model used for evaluation.

Type:: BaseLLM

metrics

List of metrics used for evaluation.

Type:: List[BaseMetric]

dataset

The dataset used for evaluation.

Type:: Dataset

scores

List of scores for each metric.

Type:: List[List[float]]

response_time

Response time for the evaluation process.

Type:: float

model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}: A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ConfigDict = {'arbitrary_types_allowed': True, 'frozen': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'dataset': FieldInfo(annotation=Dataset, required=True, description='The dataset used for evaluation'), 'llm': FieldInfo(annotation=BaseLLM, required=True, description='The language model used for evaluation'), 'metrics': FieldInfo(annotation=List[BaseMetric], required=True, description='List of metrics used for evaluation.'), 'response_time': FieldInfo(annotation=float, required=True, description='Response time for the evaluation process.', metadata=[Gt(gt=0)]), 'scores': FieldInfo(annotation=List[List[float]], required=True, description='List of scores for each metric')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

to_dataframe() → DataFrame

Convert the evaluation result to a pandas DataFrame.

Returns:: A DataFrame containing the evaluation results.
Return type:: DataFrame

to_dict() → Dict[str, List[str] | str] | Dict[str, List[str] | List[List[str]]]

Convert the evaluation result to a dict.

Returns:: A dict containing the evaluation results.
Return type:: dict

validator() → EvalResult

Validate the evaluation result after instantiation.

Raises:: ValueError – If the number of metrics and scores are not equal, or if the number of datapoints and scores are not balanced.

Evaluate the performance of a given dataset using specified metrics.

Parameters:

dataset (Union[Dataset, DataNode, dict]) – The dataset to be evaluated. It can be provided either as a Dataset object, DataNode object, or a dict representing the dataset.
llm (Optional[BaseLLM]) – The LLM (Language Model) used for evaluation. If None, a default LLM will be used.
metrics (Optional[Union[BaseMetric, List[BaseMetric]]]) – The metric or list of metrics used for evaluation. If None, response relevancy metric will be used.

Returns:

An object containing the evaluation results.

Return type:

EvalResult

Examples:

from ragrank import evaluate
from ragrank.dataset import from_dict

data = from_dict({
    "question": "Who is the 46th Prime Minister of US ?",
    "context": [
        "Joseph Robinette Biden is an American politician, "
        "he is the 46th and current president of the United States.",
    ],
    "response": "Joseph Robinette Biden",
})
result = evaluate(data)

print(result)