ragrank.dataset.base
Contain all of the base classes for dataset
- class ragrank.dataset.base.DataNode(*, question: str, context: List[str], response: str)
Represents a single data point in a dataset.
- question
The question associated with the data point.
- Type:
str
- context
The context or background nformation related to the question.
- Type:
List[str]
- response
The response or answer to the question.
- Type:
str
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'context': FieldInfo(annotation=List[str], required=True, description='The context information related to the question'), 'question': FieldInfo(annotation=str, required=True, description='The question associated with the data point'), 'response': FieldInfo(annotation=str, required=True, description='The response or answer to the question')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- class ragrank.dataset.base.Dataset(*, question: List[str], context: List[List[str]], response: List[str])
- Represents a dataset containing questions, contexts,
and responses.
- question
A list of questions.
- Type:
List[str]
- context
A list of contexts, each represented as a list of strings.
- Type:
List[List[str]]
- response
A list of responses corresponding to the questions.
- Type:
List[str]
- append(data_node: DataNode) None
Append a DataNode to the dataset.
- Parameters:
data_node (DataNode) – The DataNode to append.
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'context': FieldInfo(annotation=List[List[str]], required=True, description='A list of contexts, each represented as a list of strings'), 'question': FieldInfo(annotation=List[str], required=True, description='A list of questions, each represented as a string'), 'response': FieldInfo(annotation=List[str], required=True, description='A list of responses corresponding to the questions')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- to_csv(path: str | Path, **kwargs: Any) None
Save the data as a csv file
- Parameters:
path (str | Path) – path to the csv file
- Returns:
None
- to_dataframe() DataFrame
Return a pandas dataframe of the data
- Parameters:
None –
- Returns:
data representation
- Return type:
DataFrame
- to_dict() Dict[str, List[str] | List[List[str]]]
Return a dict of the data
- Parameters:
None –
- Returns:
data representation
- Return type:
dict
- validator() Dataset
Validate the dataset after instantiation.
- Raises:
ValueError – If the number of data points is not consistent across question, context, and response.
- with_progress(purpose: str = 'Iterating') tqdm
Return a tqdm progress bar for iterating over the dataset.
- Parameters:
purpose (str) – The purpose for iterating over the dataset.
- Returns:
A tqdm progress bar.
- Return type:
tqdm