Skip to content

Commit

Permalink
[DataCatalog2.0]: KedroDataCatalog with dict interface (#4218)
Browse files Browse the repository at this point in the history
* Added a skeleton for AbstractDataCatalog and KedroDataCatalog

Signed-off-by: Elena Khaustova <[email protected]>

* Removed from_config method

Signed-off-by: Elena Khaustova <[email protected]>

* Implemented _init_datasets method

Signed-off-by: Elena Khaustova <[email protected]>

* Implemented get dataset

Signed-off-by: Elena Khaustova <[email protected]>

* Started resolve_patterns implementation

Signed-off-by: Elena Khaustova <[email protected]>

* Implemented resolve_patterns

Signed-off-by: Elena Khaustova <[email protected]>

* Fixed credentials resolving

Signed-off-by: Elena Khaustova <[email protected]>

* Updated match pattern

Signed-off-by: Elena Khaustova <[email protected]>

* Implemented add from dict method

Signed-off-by: Elena Khaustova <[email protected]>

* Updated io __init__

Signed-off-by: Elena Khaustova <[email protected]>

* Added list method

Signed-off-by: Elena Khaustova <[email protected]>

* Implemented _validate_missing_keys

Signed-off-by: Elena Khaustova <[email protected]>

* Added datasets access logic

Signed-off-by: Elena Khaustova <[email protected]>

* Added __contains__ and comments on lazy loading

Signed-off-by: Elena Khaustova <[email protected]>

* Renamed dataset_name to ds_name

Signed-off-by: Elena Khaustova <[email protected]>

* Updated some docstrings

Signed-off-by: Elena Khaustova <[email protected]>

* Fixed _update_ds_configs

Signed-off-by: Elena Khaustova <[email protected]>

* Fixed _init_datasets

Signed-off-by: Elena Khaustova <[email protected]>

* Implemented add_runtime_patterns

Signed-off-by: Elena Khaustova <[email protected]>

* Fixed runtime patterns usage

Signed-off-by: Elena Khaustova <[email protected]>

* Moved pattern logic out of data catalog, implemented KedroDataCatalog

Signed-off-by: Elena Khaustova <[email protected]>

* KedroDataCatalog updates

Signed-off-by: Elena Khaustova <[email protected]>

* Added property to return config

Signed-off-by: Elena Khaustova <[email protected]>

* Added list patterns method

Signed-off-by: Elena Khaustova <[email protected]>

* Renamed and moved ConfigResolver

Signed-off-by: Elena Khaustova <[email protected]>

* Renamed ConfigResolver

Signed-off-by: Elena Khaustova <[email protected]>

* Cleaned KedroDataCatalog

Signed-off-by: Elena Khaustova <[email protected]>

* Cleaned up DataCatalogConfigResolver

Signed-off-by: Elena Khaustova <[email protected]>

* Docs build fix attempt

Signed-off-by: Elena Khaustova <[email protected]>

* KedroDataCatalog draft

Signed-off-by: Elena Khaustova <[email protected]>

* Removed KedroDataCatalog

Signed-off-by: Elena Khaustova <[email protected]>

* Updated from_config method

Signed-off-by: Elena Khaustova <[email protected]>

* Updated constructor and add methods

Signed-off-by: Elena Khaustova <[email protected]>

* Updated _get_dataset method

Signed-off-by: Elena Khaustova <[email protected]>

* Updated __contains__

Signed-off-by: Elena Khaustova <[email protected]>

* Updated __eq__ and shallow_copy

Signed-off-by: Elena Khaustova <[email protected]>

* Added __iter__ and __getitem__

Signed-off-by: Elena Khaustova <[email protected]>

* Removed unused imports

Signed-off-by: Elena Khaustova <[email protected]>

* Added TODO

Signed-off-by: Elena Khaustova <[email protected]>

* Updated runner.run()

Signed-off-by: Elena Khaustova <[email protected]>

* Updated session

Signed-off-by: Elena Khaustova <[email protected]>

* Added confil_resolver property

Signed-off-by: Elena Khaustova <[email protected]>

* Updated catalog list command

Signed-off-by: Elena Khaustova <[email protected]>

* Updated catalog create command

Signed-off-by: Elena Khaustova <[email protected]>

* Updated catalog rank command

Signed-off-by: Elena Khaustova <[email protected]>

* Updated catalog resolve command

Signed-off-by: Elena Khaustova <[email protected]>

* Remove some methods

Signed-off-by: Elena Khaustova <[email protected]>

* Removed ds configs from catalog

Signed-off-by: Elena Khaustova <[email protected]>

* Fixed lint

Signed-off-by: Elena Khaustova <[email protected]>

* Fixed typo

Signed-off-by: Elena Khaustova <[email protected]>

* Added module docstring

Signed-off-by: Elena Khaustova <[email protected]>

* Renaming methods

Signed-off-by: Elena Khaustova <[email protected]>

* Removed None from Pattern type

Signed-off-by: Elena Khaustova <[email protected]>

* Fixed docs failing to find class reference

Signed-off-by: Elena Khaustova <[email protected]>

* Fixed docs failing to find class reference

Signed-off-by: Elena Khaustova <[email protected]>

* Updated Patterns type

Signed-off-by: Elena Khaustova <[email protected]>

* Fix tests (#4149)

* Fix most tests

Signed-off-by: Ankita Katiyar <[email protected]>

* Fix most tests

Signed-off-by: Ankita Katiyar <[email protected]>

---------

Signed-off-by: Ankita Katiyar <[email protected]>

* Returned constants to avoid breaking changes

Signed-off-by: Elena Khaustova <[email protected]>

* Udapted KedroDataCatalog for recent changes

Signed-off-by: Elena Khaustova <[email protected]>

* Minor fix

Signed-off-by: Elena Khaustova <[email protected]>

* Updated test_sorting_order_with_other_dataset_through_extra_pattern

Signed-off-by: Elena Khaustova <[email protected]>

* Removed odd properties

Signed-off-by: Elena Khaustova <[email protected]>

* Updated tests

Signed-off-by: Elena Khaustova <[email protected]>

* Removed None from _fetch_credentials input

Signed-off-by: Elena Khaustova <[email protected]>

* Updated specs and context

Signed-off-by: Elena Khaustova <[email protected]>

* Updated runners

Signed-off-by: Elena Khaustova <[email protected]>

* Updated default catalog validation

Signed-off-by: Elena Khaustova <[email protected]>

* Updated default catalog validation

Signed-off-by: Elena Khaustova <[email protected]>

* Updated contains and added exists methods for KedroDataCatalog

Signed-off-by: Elena Khaustova <[email protected]>

* Fixed docs

Signed-off-by: Elena Khaustova <[email protected]>

* Fixing docs and lint

Signed-off-by: Elena Khaustova <[email protected]>

* Fixed docs

Signed-off-by: Elena Khaustova <[email protected]>

* Fixed docs

Signed-off-by: Elena Khaustova <[email protected]>

* Fixed unit tests

Signed-off-by: Elena Khaustova <[email protected]>

* Added __eq__

Signed-off-by: Elena Khaustova <[email protected]>

* Renamed DataCatalogConfigResolver to CatalogConfigResolver

Signed-off-by: Elena Khaustova <[email protected]>

* Renamed _init_configs to _resolve_config_credentials

Signed-off-by: Elena Khaustova <[email protected]>

* Moved functions to the class

Signed-off-by: Elena Khaustova <[email protected]>

* Refactored resolve_dataset_pattern

Signed-off-by: Elena Khaustova <[email protected]>

* Fixed refactored part

Signed-off-by: Elena Khaustova <[email protected]>

* Changed the order of arguments for DataCatalog constructor

Signed-off-by: Elena Khaustova <[email protected]>

* Replaced __getitem__ with .get()

Signed-off-by: Elena Khaustova <[email protected]>

* Updated catalog commands

Signed-off-by: Elena Khaustova <[email protected]>

* Moved warm up block outside of the try block

Signed-off-by: Elena Khaustova <[email protected]>

* Fixed linter

Signed-off-by: Elena Khaustova <[email protected]>

* Removed odd copying

Signed-off-by: Elena Khaustova <[email protected]>

* Renamed DataCatalogConfigResolver to CatalogConfigResolver

Signed-off-by: Elena Khaustova <[email protected]>

* Renamed AbstractDataCatalog to BaseDataCatalog

Signed-off-by: Elena Khaustova <[email protected]>

* Moved validate_dataset_config inside catalog

Signed-off-by: Elena Khaustova <[email protected]>

* Renamed _init_dataset to _add_from_config

Signed-off-by: Elena Khaustova <[email protected]>

* Fix lint

Signed-off-by: Elena Khaustova <[email protected]>

* Updated release notes

Signed-off-by: Elena Khaustova <[email protected]>

* Returned DatasetError

Signed-off-by: Elena Khaustova <[email protected]>

* Added _dataset_patterns and _default_pattern to _config_resolver to avoid breaking change

Signed-off-by: Elena Khaustova <[email protected]>

* Made resolve_dataset_pattern return just dict

Signed-off-by: Elena Khaustova <[email protected]>

* Fixed linter

Signed-off-by: Elena Khaustova <[email protected]>

* Added Catalogprotocol draft

Signed-off-by: Elena Khaustova <[email protected]>

* Implemented CatalogProtocol

Signed-off-by: Elena Khaustova <[email protected]>

* Updated types

Signed-off-by: Elena Khaustova <[email protected]>

* Fixed linter

Signed-off-by: Elena Khaustova <[email protected]>

* Added _ImplementsCatalogProtocolValidator

Signed-off-by: Elena Khaustova <[email protected]>

* Updated docstrings

Signed-off-by: Elena Khaustova <[email protected]>

* Fixed tests

Signed-off-by: Elena Khaustova <[email protected]>

* Fixed docs

Signed-off-by: Elena Khaustova <[email protected]>

* Excluded Potocol from coverage

Signed-off-by: Elena Khaustova <[email protected]>

* Fixed docs

Signed-off-by: Elena Khaustova <[email protected]>

* Renamed catalog source to kedro_data_catalog

Signed-off-by: Elena Khaustova <[email protected]>

* Renamed data set to dataset in docstrings

Signed-off-by: Elena Khaustova <[email protected]>

* Updated add_from_dict

Signed-off-by: Elena Khaustova <[email protected]>

* Revised comments and TODOs

Signed-off-by: Elena Khaustova <[email protected]>

* Updated error message to point to specific catalog type

Signed-off-by: Elena Khaustova <[email protected]>

* Fixed tests

Signed-off-by: Elena Khaustova <[email protected]>

* Merged with protocol

Signed-off-by: Elena Khaustova <[email protected]>

* Removed reference to DataCatalog in docstrings

Signed-off-by: Elena Khaustova <[email protected]>

* Fixed docs

Signed-off-by: Elena Khaustova <[email protected]>

* Reordered methods

Signed-off-by: Elena Khaustova <[email protected]>

* Removed add_all from protocol

Signed-off-by: Elena Khaustova <[email protected]>

* Changed the order of arguments

Signed-off-by: Elena Khaustova <[email protected]>

* Updated docstrings

Signed-off-by: Elena Khaustova <[email protected]>

* Updated docstrings

Signed-off-by: Elena Khaustova <[email protected]>

* Added __repr__

Signed-off-by: Elena Khaustova <[email protected]>

* Made __getitem__ return deepcopy

Signed-off-by: Elena Khaustova <[email protected]>

* Fixed bug in get_dataset()

Signed-off-by: Elena Khaustova <[email protected]>

* Fixed __eq__

Signed-off-by: Elena Khaustova <[email protected]>

* Fixed docstrings

Signed-off-by: Elena Khaustova <[email protected]>

* Added __setitem__

Signed-off-by: Elena Khaustova <[email protected]>

* Unit tests for `KedroDataCatalog` (#4171)

* Added KedroDataCatlog tests template

Signed-off-by: Elena Khaustova <[email protected]>

* Added test save/load unregistered dataset

Signed-off-by: Elena Khaustova <[email protected]>

* Added test_feed_dict

Signed-off-by: Elena Khaustova <[email protected]>

* Added exists tests

Signed-off-by: Elena Khaustova <[email protected]>

* Added tests for list()

Signed-off-by: Elena Khaustova <[email protected]>

* Added test_eq

Signed-off-by: Elena Khaustova <[email protected]>

* Added test init/add datasets

Signed-off-by: Elena Khaustova <[email protected]>

* Updated test_adding_datasets_not_allowed

Signed-off-by: Elena Khaustova <[email protected]>

* Added shallow copy tests

Signed-off-by: Elena Khaustova <[email protected]>

* Added TestKedroDataCatalogFromConfig

Signed-off-by: Elena Khaustova <[email protected]>

* Added missing tests

Signed-off-by: Elena Khaustova <[email protected]>

---------

Signed-off-by: Elena Khaustova <[email protected]>

* Updated RELEASE.md

Signed-off-by: Elena Khaustova <[email protected]>

* Removed deep copies

Signed-off-by: Elena Khaustova <[email protected]>

* Removed some interface that will be changed in the next version

Signed-off-by: Elena Khaustova <[email protected]>

* Removed key completions

Signed-off-by: Elena Khaustova <[email protected]>

* Fixinf typos

Signed-off-by: Elena Khaustova <[email protected]>

* Removed key completions test

Signed-off-by: Elena Khaustova <[email protected]>

* Replaced data set with dataset

Signed-off-by: Elena Khaustova <[email protected]>

* Added docstring for get_dataset() method

Signed-off-by: Elena Khaustova <[email protected]>

* Renamed pytest fixture

Signed-off-by: Elena Khaustova <[email protected]>

* Addressed review comments

Signed-off-by: Elena Khaustova <[email protected]>

* Updated _assert_requirements_ok starters test

Signed-off-by: Elena Khaustova <[email protected]>

* Revert "Updated _assert_requirements_ok starters test"

This reverts commit 5208321.

Signed-off-by: Elena Khaustova <[email protected]>

* Updated error message

Signed-off-by: Elena Khaustova <[email protected]>

* Replaced typo

Signed-off-by: Elena Khaustova <[email protected]>

* Replaced data set with dataset in docstrings

Signed-off-by: Elena Khaustova <[email protected]>

* Updated tests

Signed-off-by: Elena Khaustova <[email protected]>

* Made KedroDataCatalog subclass from CatalogProtocol

Signed-off-by: Elena Khaustova <[email protected]>

* Updated release notes

Signed-off-by: Elena Khaustova <[email protected]>

* Implemented iter, getitem, setitem

Signed-off-by: Elena Khaustova <[email protected]>

* Updated add_data and TODOs

Signed-off-by: Elena Khaustova <[email protected]>

* Added key completions

Signed-off-by: Elena Khaustova <[email protected]>

* Maded behavior dict like

Signed-off-by: Elena Khaustova <[email protected]>

* Merged with main

Signed-off-by: Elena Khaustova <[email protected]>

* Removed add_data() method

Signed-off-by: Elena Khaustova <[email protected]>

* Added usage example and updated docstrings with experimental feature note

Signed-off-by: Elena Khaustova <[email protected]>

* Added len and get

Signed-off-by: Elena Khaustova <[email protected]>

* Implemented unit tests

Signed-off-by: Elena Khaustova <[email protected]>

* Update RELEASE.md

Co-authored-by: Merel Theisen <[email protected]>
Signed-off-by: ElenaKhaustova <[email protected]>

* Update kedro/io/kedro_data_catalog.py

Co-authored-by: Merel Theisen <[email protected]>
Signed-off-by: ElenaKhaustova <[email protected]>

* Fixed lint

Signed-off-by: Elena Khaustova <[email protected]>

* Updated load_data and save_data to use new interface

Signed-off-by: Elena Khaustova <[email protected]>

* Updated load_data and save_data to use new interface

Signed-off-by: Elena Khaustova <[email protected]>

* Returned usage of get_dataset()

Signed-off-by: Elena Khaustova <[email protected]>

* Fixed lint

Signed-off-by: Elena Khaustova <[email protected]>

* Updated __getitem__ to use old get_dataset() method

Signed-off-by: Elena Khaustova <[email protected]>

* Removed regex_search from values()

Signed-off-by: Elena Khaustova <[email protected]>

* Fixed type annotation for __iter__

Signed-off-by: Elena Khaustova <[email protected]>

* Fixed linter

Signed-off-by: Elena Khaustova <[email protected]>

* Revert lint fix

Signed-off-by: Elena Khaustova <[email protected]>

* Returned short names for save and load

Signed-off-by: Elena Khaustova <[email protected]>

* Removed regex_search from keys and items

Signed-off-by: Elena Khaustova <[email protected]>

* Updated release notes

Signed-off-by: Elena Khaustova <[email protected]>

* Maded regex_search non optional

Signed-off-by: Elena Khaustova <[email protected]>

* Changed default for regex_flags

Signed-off-by: Elena Khaustova <[email protected]>

* Returned list() method

Signed-off-by: Elena Khaustova <[email protected]>

* Fixed __iter__ return type

Signed-off-by: Elena Khaustova <[email protected]>

---------

Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Ankita Katiyar <[email protected]>
Signed-off-by: ElenaKhaustova <[email protected]>
Co-authored-by: Ankita Katiyar <[email protected]>
Co-authored-by: Merel Theisen <[email protected]>
  • Loading branch information
3 people authored Oct 18, 2024
1 parent 2e950a2 commit 3fe61a0
Show file tree
Hide file tree
Showing 3 changed files with 162 additions and 25 deletions.
4 changes: 4 additions & 0 deletions RELEASE.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
# Upcoming Release

## Major features and improvements
* Implemented dict-like interface for `KedroDataCatalog`.

**Note:** ``KedroDataCatalog`` is an experimental feature and is under active development. Therefore, it is possible we'll introduce breaking changes to this class, so be mindful of that if you decide to use it already. Let us know if you have any feedback about the ``KedroDataCatalog`` or ideas for new features.

## Bug fixes and other changes
## Breaking changes to the API
## Documentation changes
Expand Down
148 changes: 124 additions & 24 deletions kedro/io/kedro_data_catalog.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
import difflib
import logging
import re
from typing import Any
from typing import Any, Iterator, List # noqa: UP035

from kedro.io.catalog_config_resolver import CatalogConfigResolver, Patterns
from kedro.io.core import (
Expand Down Expand Up @@ -84,10 +84,12 @@ def __init__(

@property
def datasets(self) -> dict[str, Any]:
# TODO: remove when removing old catalog
return copy.copy(self._datasets)

@datasets.setter
def datasets(self, value: Any) -> None:
# TODO: remove when removing old catalog
raise AttributeError(
"Operation not allowed. Please use KedroDataCatalog.add() instead."
)
Expand All @@ -112,6 +114,49 @@ def __eq__(self, other) -> bool: # type: ignore[no-untyped-def]
other.config_resolver.list_patterns(),
)

def keys(self) -> List[str]: # noqa: UP006
return list(self.__iter__())

def values(self) -> List[AbstractDataset]: # noqa: UP006
return [self._datasets[key] for key in self]

def items(self) -> List[tuple[str, AbstractDataset]]: # noqa: UP006
return [(key, self._datasets[key]) for key in self]

def __iter__(self) -> Iterator[str]:
yield from self._datasets.keys()

def __getitem__(self, ds_name: str) -> AbstractDataset:
return self.get_dataset(ds_name)

def __setitem__(self, key: str, value: Any) -> None:
if key in self._datasets:
self._logger.warning("Replacing dataset '%s'", key)
if isinstance(value, AbstractDataset):
self._datasets[key] = value
else:
self._logger.info(f"Adding input data as a MemoryDataset - {key}")
self._datasets[key] = MemoryDataset(data=value) # type: ignore[abstract]

def __len__(self) -> int:
return len(self.keys())

def get(
self, key: str, default: AbstractDataset | None = None
) -> AbstractDataset | None:
"""Get a dataset by name from an internal collection of datasets."""
if key not in self._datasets:
ds_config = self._config_resolver.resolve_pattern(key)
if ds_config:
self._add_from_config(key, ds_config)

dataset = self._datasets.get(key, None)

return dataset or default

def _ipython_key_completions_(self) -> list[str]:
return list(self._datasets.keys())

@property
def _logger(self) -> logging.Logger:
return logging.getLogger(__name__)
Expand Down Expand Up @@ -178,6 +223,7 @@ def _add_from_config(self, ds_name: str, ds_config: dict[str, Any]) -> None:
def get_dataset(
self, ds_name: str, version: Version | None = None, suggest: bool = True
) -> AbstractDataset:
# TODO: remove when removing old catalog
"""Get a dataset by name from an internal collection of datasets.
If a dataset is not in the collection but matches any pattern
Expand All @@ -197,12 +243,7 @@ def get_dataset(
DatasetNotFoundError: When a dataset with the given name
is not in the collection and do not match patterns.
"""
if ds_name not in self._datasets:
ds_config = self._config_resolver.resolve_pattern(ds_name)
if ds_config:
self._add_from_config(ds_name, ds_config)

dataset = self._datasets.get(ds_name, None)
dataset = self.get(ds_name)

if dataset is None:
error_msg = f"Dataset '{ds_name}' not found in the catalog"
Expand Down Expand Up @@ -231,40 +272,71 @@ def _get_dataset(
def add(
self, ds_name: str, dataset: AbstractDataset, replace: bool = False
) -> None:
# TODO: remove when removing old catalog
"""Adds a new ``AbstractDataset`` object to the ``KedroDataCatalog``."""
if ds_name in self._datasets:
if replace:
self._logger.warning("Replacing dataset '%s'", ds_name)
else:
raise DatasetAlreadyExistsError(
f"Dataset '{ds_name}' has already been registered"
)
self._datasets[ds_name] = dataset

def list(self, regex_search: str | None = None) -> list[str]:
if ds_name in self._datasets and not replace:
raise DatasetAlreadyExistsError(
f"Dataset '{ds_name}' has already been registered"
)
self.__setitem__(ds_name, dataset)

def list(
self, regex_search: str | None = None, regex_flags: int | re.RegexFlag = 0
) -> List[str]: # noqa: UP006
# TODO: rename depending on the solution for https://github.com/kedro-org/kedro/issues/3917
"""
List of all dataset names registered in the catalog.
This can be filtered by providing an optional regular expression
which will only return matching keys.
"""

if regex_search is None:
return list(self._datasets.keys())
return self.keys()

if not regex_search.strip():
if regex_search == "":
self._logger.warning("The empty string will not match any datasets")
return []

if not regex_flags:
regex_flags = re.IGNORECASE

try:
pattern = re.compile(regex_search, flags=re.IGNORECASE)
pattern = re.compile(regex_search, flags=regex_flags)
except re.error as exc:
raise SyntaxError(
f"Invalid regular expression provided: '{regex_search}'"
) from exc
return [ds_name for ds_name in self._datasets if pattern.search(ds_name)]
return [ds_name for ds_name in self.__iter__() if pattern.search(ds_name)]

def save(self, name: str, data: Any) -> None:
"""Save data to a registered dataset."""
# TODO: rename input argument when breaking change: name -> ds_name
"""Save data to a registered dataset.
Args:
name: A dataset to be saved to.
data: A data object to be saved as configured in the registered
dataset.
Raises:
DatasetNotFoundError: When a dataset with the given name
has not yet been registered.
Example:
::
>>> import pandas as pd
>>>
>>> from kedro_datasets.pandas import CSVDataset
>>>
>>> cars = CSVDataset(filepath="cars.csv",
>>> load_args=None,
>>> save_args={"index": False})
>>> catalog = DataCatalog(datasets={'cars': cars})
>>>
>>> df = pd.DataFrame({'col1': [1, 2],
>>> 'col2': [4, 5],
>>> 'col3': [5, 6]})
>>> catalog.save("cars", df)
"""
dataset = self.get_dataset(name)

self._logger.info(
Expand All @@ -277,7 +349,35 @@ def save(self, name: str, data: Any) -> None:
dataset.save(data)

def load(self, name: str, version: str | None = None) -> Any:
"""Loads a registered dataset."""
# TODO: rename input argument when breaking change: name -> ds_name
# TODO: remove version from input arguments when breaking change
"""Loads a registered dataset.
Args:
name: A dataset to be loaded.
version: Optional argument for concrete data version to be loaded.
Works only with versioned datasets.
Returns:
The loaded data as configured.
Raises:
DatasetNotFoundError: When a dataset with the given name
has not yet been registered.
Example:
::
>>> from kedro.io import DataCatalog
>>> from kedro_datasets.pandas import CSVDataset
>>>
>>> cars = CSVDataset(filepath="cars.csv",
>>> load_args=None,
>>> save_args={"index": False})
>>> catalog = DataCatalog(datasets={'cars': cars})
>>>
>>> df = catalog.load("cars")
"""
load_version = Version(version, None) if version else None
dataset = self.get_dataset(name, version=load_version)

Expand Down
35 changes: 34 additions & 1 deletion tests/io/test_kedro_data_catalog.py
Original file line number Diff line number Diff line change
Expand Up @@ -379,7 +379,7 @@ def test_config_invalid_dataset_config(self, correct_config):

def test_empty_config(self):
"""Test empty config"""
assert KedroDataCatalog.from_config(None)
assert len(KedroDataCatalog.from_config(None)) == 0

def test_missing_credentials(self, correct_config):
"""Check the error if credentials can't be located"""
Expand Down Expand Up @@ -502,6 +502,39 @@ def test_bad_confirm(self, correct_config, dataset_name, pattern):
with pytest.raises(DatasetError, match=re.escape(pattern)):
data_catalog.confirm(dataset_name)

def test_iteration(self, correct_config):
"""Test iterate through keys, values and items."""
data_catalog = KedroDataCatalog.from_config(**correct_config)

for ds_name_cat, ds_name_config in zip(
data_catalog, correct_config["catalog"]
):
assert ds_name_cat == ds_name_config

for ds_name_cat, ds_name_config in zip(
data_catalog.keys(), correct_config["catalog"]
):
assert ds_name_cat == ds_name_config

for ds in data_catalog.values():
assert isinstance(ds, CSVDataset)

for ds_name, ds in data_catalog.items():
assert isinstance(ds, CSVDataset)
assert ds_name in correct_config["catalog"]

def test_getitem_setitem(self, correct_config):
"""Test get and set item."""
data_catalog = KedroDataCatalog.from_config(**correct_config)
data_catalog["test"] = 123
assert isinstance(data_catalog["test"], MemoryDataset)

def test_ipython_key_completions(self, correct_config):
data_catalog = KedroDataCatalog.from_config(**correct_config)
assert data_catalog._ipython_key_completions_() == list(
correct_config["catalog"].keys()
)

class TestDataCatalogVersioned:
def test_from_correct_config_versioned(self, correct_config, dummy_dataframe):
"""Test load and save of versioned datasets from config"""
Expand Down

0 comments on commit 3fe61a0

Please sign in to comment.