ragrank.dataset.base

Contain all of the base classes for dataset

class ragrank.dataset.base.DataNode(*, question: str, context: List[str], response: str)

Represents a single data point in a dataset.

question

The question associated with the data point.

Type:: str

context

The context or background nformation related to the question.

Type:: List[str]

response

The response or answer to the question.

Type:: str

model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}: A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'context': FieldInfo(annotation=List[str], required=True, description='The context information related to the question'), 'question': FieldInfo(annotation=str, required=True, description='The question associated with the data point'), 'response': FieldInfo(annotation=str, required=True, description='The response or answer to the question')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

to_dataset() → Dataset

Convert the data node to a Dataset instance.

Returns:: A Dataset instance containing the current data node.
Return type:: Dataset

class ragrank.dataset.base.Dataset(*, question: List[str], context: List[List[str]], response: List[str])

Represents a dataset containing questions, contexts,: and responses.

question

A list of questions.

Type:: List[str]

context

A list of contexts, each represented as a list of strings.

Type:: List[List[str]]

response

A list of responses corresponding to the questions.

Type:: List[str]

append(data_node: DataNode) → None

Append a DataNode to the dataset.

Parameters:: data_node (DataNode) – The DataNode to append.

model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}: A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'context': FieldInfo(annotation=List[List[str]], required=True, description='A list of contexts, each represented as a list of strings'), 'question': FieldInfo(annotation=List[str], required=True, description='A list of questions, each represented as a string'), 'response': FieldInfo(annotation=List[str], required=True, description='A list of responses corresponding to the questions')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

to_csv(path: str | Path, **kwargs: Any) → None

Save the data as a csv file

Parameters:: path (str | Path) – path to the csv file
Returns:: None

to_dataframe() → DataFrame

Return a pandas dataframe of the data

Parameters:: None –
Returns:: data representation
Return type:: DataFrame

to_dict() → Dict[str, List[str] | List[List[str]]]

Return a dict of the data

Parameters:: None –
Returns:: data representation
Return type:: dict

validator() → Dataset

Validate the dataset after instantiation.

Raises:: ValueError – If the number of data points is not consistent across question, context, and response.

with_progress(purpose: str = 'Iterating') → tqdm

Return a tqdm progress bar for iterating over the dataset.

Parameters:: purpose (str) – The purpose for iterating over the dataset.
Returns:: A tqdm progress bar.
Return type:: tqdm