ragrank.dataset.reader
Reader module for Ragrank
- class ragrank.dataset.reader.ColumnMap(*, question: str = 'question', context: str = 'context', response: str = 'response')
- Represents a mapping of column names to their
corresponding names in a dataset.
- question
The name of the column containing questions.
- Type:
str
- context
The name of the column containing contexts.
- Type:
str
- response
The name of the column containing responses.
- Type:
str
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'context': FieldInfo(annotation=str, required=False, default='context', description='The name of the column containing contexts'), 'question': FieldInfo(annotation=str, required=False, default='question', description='The name of the column containing questions'), 'response': FieldInfo(annotation=str, required=False, default='response', description='The name of the column containing responses')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- ragrank.dataset.reader.from_csv(path: str | Path, *, column_map: ColumnMap | None = None, **kwargs: Any) Dataset | DataNode
Create a Dataset or DataNode object from a CSV file.
- ragrank.dataset.reader.from_dataframe(data: DataFrame, *, return_as_dataset: bool = False, column_map: ColumnMap | None = None) Dataset | DataNode
Create a Dataset or DataNode object from a Pandas DataFrame.
- Parameters:
data (pd.DataFrame) – The DataFrame containing the data.
return_as_dataset (bool, optional) – If True, return as Dataset object, otherwise return as DataNode. Defaults to False.
column_map (ColumnMap, optional) – Column mapping. Defaults to ColumnMap().
- Returns:
Either a Dataset or DataNode object.
- Return type:
- ragrank.dataset.reader.from_dict(data: Dict[str, List[str] | str] | Dict[str, List[str] | List[List[str]]], *, return_as_dataset: bool = False, column_map: ColumnMap | None = None) Dataset | DataNode
Create a Dataset or DataNode object from a dictionary representation.
- Parameters:
data (Union[DATANODE_TYPE, DATASET_TYPE]) – The dictionary containing the data representation.
return_as_dataset (bool, optional) – If True, return as Dataset object, otherwise return as DataNode. Defaults to False.
column_map (ColumnMap, optional) – Column mapping. Defaults to ColumnMap().
- Returns:
Either a Dataset or DataNode object.
- Return type:
- Raises:
ValueError – If the column specified in column_map is not present in the data.
- ragrank.dataset.reader.from_hfdataset(url: str | Tuple[str], *, split: str, column_map: ColumnMap | None = None) Dataset
Create a Dataset object from a Hugging Face dataset.
- Parameters:
url (Union[str, Tuple[str]]) – The URL or tuple of URLs pointing to the dataset.
split (str) – The name of the split to load from the dataset.
column_map (ColumnMap, optional) – Column mapping. Defaults to ColumnMap().
- Returns:
A Dataset object containing the loaded data.
- Return type: