ragrank.evaluation

ragrank.evaluation.base

evaluation: the main module

ragrank.evaluation.outputs

Contains the ouputs of evaluation

The main module for ragrank

class ragrank.evaluation.EvalResult(*, llm: BaseLLM, metrics: List[BaseMetric], dataset: Dataset, scores: List[List[float]], response_time: float)

Represents the result of an evaluation.

llm

The language model used for evaluation.

Type:

BaseLLM

metrics

List of metrics used for evaluation.

Type:

List[BaseMetric]

dataset

The dataset used for evaluation.

Type:

Dataset

scores

List of scores for each metric.

Type:

List[List[float]]

response_time

Response time for the evaluation process.

Type:

float

model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ConfigDict = {'arbitrary_types_allowed': True, 'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'dataset': FieldInfo(annotation=Dataset, required=True, description='The dataset used for evaluation'), 'llm': FieldInfo(annotation=BaseLLM, required=True, description='The language model used for evaluation'), 'metrics': FieldInfo(annotation=List[BaseMetric], required=True, description='List of metrics used for evaluation.'), 'response_time': FieldInfo(annotation=float, required=True, description='Response time for the evaluation process.', metadata=[Gt(gt=0)]), 'scores': FieldInfo(annotation=List[List[float]], required=True, description='List of scores for each metric')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

to_dataframe() DataFrame

Convert the evaluation result to a pandas DataFrame.

Returns:

A DataFrame containing the evaluation results.

Return type:

DataFrame

to_dict() Dict[str, List[str] | str] | Dict[str, List[str] | List[List[str]]]

Convert the evaluation result to a dict.

Returns:

A dict containing the evaluation results.

Return type:

dict

validator() EvalResult

Validate the evaluation result after instantiation.

Raises:

ValueError – If the number of metrics and scores are not equal, or if the number of datapoints and scores are not balanced.

ragrank.evaluation.evaluate(dataset: Dataset | DataNode | dict, *, llm: BaseLLM | None = None, metrics: BaseMetric | List[BaseMetric] | None = None) EvalResult

Evaluate the performance of a given dataset using specified metrics.

Parameters:
  • dataset (Union[Dataset, DataNode, dict]) – The dataset to be evaluated. It can be provided either as a Dataset object, DataNode object, or a dict representing the dataset.

  • llm (Optional[BaseLLM]) – The LLM (Language Model) used for evaluation. If None, a default LLM will be used.

  • metrics (Optional[Union[BaseMetric, List[BaseMetric]]]) – The metric or list of metrics used for evaluation. If None, response relevancy metric will be used.

Returns:

An object containing the evaluation results.

Return type:

EvalResult

Examples:

from ragrank import evaluate
from ragrank.dataset import from_dict

data = from_dict({
    "question": "Who is the 46th Prime Minister of US ?",
    "context": [
        "Joseph Robinette Biden is an American politician, "
        "he is the 46th and current president of the United States.",
    ],
    "response": "Joseph Robinette Biden",
})
result = evaluate(data)

print(result)