Ragrank API documentation

Ragrank is a user-friendly Python library created to make evaluating Retrieval Augmented Generation (RAG) models easier.

ragrank.evaluate(dataset: Dataset | DataNode | dict, *, llm: BaseLLM | None = None, metrics: BaseMetric | List[BaseMetric] | None = None) EvalResult

Evaluate the performance of a given dataset using specified metrics.

Parameters:
  • dataset (Union[Dataset, DataNode, dict]) – The dataset to be evaluated. It can be provided either as a Dataset object, DataNode object, or a dict representing the dataset.

  • llm (Optional[BaseLLM]) – The LLM (Language Model) used for evaluation. If None, a default LLM will be used.

  • metrics (Optional[Union[BaseMetric, List[BaseMetric]]]) – The metric or list of metrics used for evaluation. If None, response relevancy metric will be used.

Returns:

An object containing the evaluation results.

Return type:

EvalResult

Examples:

from ragrank import evaluate
from ragrank.dataset import from_dict

data = from_dict({
    "question": "Who is the 46th Prime Minister of US ?",
    "context": [
        "Joseph Robinette Biden is an American politician, "
        "he is the 46th and current president of the United States.",
    ],
    "response": "Joseph Robinette Biden",
})
result = evaluate(data)

print(result)