Preload data method added #2

konstntokas · 2024-12-10T12:27:04Z

So far, only an in-built monitoring table is shown, when running preload_data, which is updated, as soon as status, progress or message updates. See notebook examples/zenodo_data_store_preload.ipynb.

Tests for the preload still needs to be added.

b-yogesh

Please have a look at my suggestions and questions.

b-yogesh · 2024-12-11T08:55:21Z

xcube_zenodo/_utils.py

-    ext = data_id.split(".")[-1]
-    format_id = MAP_FILE_EXTENSION_FORMAT.get(ext.lower())
-    return format_id
+def estimate_file_format(data_id: str) -> Union[str, None]:


Since estimate sounds a bit like guessing, a better name could be extract, identify or something along those lines,

changed it to identify

b-yogesh · 2024-12-11T08:58:31Z

xcube_zenodo/preload.py

+        self.data_id = data_id
+        self.status = "Not started"
+        self.progress = 0.0
+        self.message = "Preloading not started jet."


I think you meant yet here

Thanks for reading carefully! :)

b-yogesh · 2024-12-11T09:05:14Z

xcube_zenodo/preload.py

+LOG = logging.getLogger(__name__)
+
+
+class Event:


A better name could be Task instead. Could you explain why you chose Event and maybe then we could decide if it makes sense to rename it.

b-yogesh · 2024-12-11T09:14:48Z

xcube_zenodo/store.py

+            if list_data_ids_mod:
+                LOG.info(
+                    f"{data_id} is already pre-loaded. The datasets can be "
+                    f"opened with the following data IDs: "
+                    f"\n{'\n'.join(str(item) for item in list_data_ids_mod)}"
+                )
+            elif self.cache_store.has_data(data_id):
+                LOG.info(f"{data_id} is already pre-loaded.")


What is the difference between the if and elif condition here? They both tell the user that the data_id is already pre-loaded.

It depends on the compressed file. If only one dataset is in the compressed file, or all datasets are merged (this is a preload parameter), then the data_id stays. If multiple files are in the compressed file which cannot be merged, then the data_id is extended by the file name. Therefore, two cases in the if and elif.

In the first case, I want to tell the user the extended data IDs.

b-yogesh · 2024-12-11T09:34:52Z

xcube_zenodo/preload.py

+        self._download_data(*data_ids)
+        self._decompress_data(*data_ids)
+        self._prepare_data(*data_ids, **preload_params)


QQ: We only have 3 threads in total, one for each of these tasks right? So each thread will do these tasks for all the data_ids. If that is the case, I do not see how these threads are useful here since they download the data one after the other, decompress after the download for each data_id one after other and so on. If you want to use concurrency to send mutliple download requests, the approach needs to be updated that each data_id has its own thread and so is not dependent on the other data_id status.
If this is not the case, I would be happy to discuss how this works.

I did it in three thread, to not open up to many threads. What happens when the user preloads 100 data_id?

But I agree. I think it would be better to give each data ID a thread, and maybe limit it to max 10 threads or so .

codecov · 2024-12-11T15:33:45Z

Codecov Report

Attention: Patch coverage is 29.61538% with 183 lines in your changes missing coverage. Please review.

Project coverage is 48.45%. Comparing base (021eda3) to head (a405658).
Report is 5 commits behind head on main.

Files with missing lines	Patch %	Lines
xcube_zenodo/preload.py	18.51%	154 Missing ⚠️
xcube_zenodo/store.py	50.00%	27 Missing ⚠️
xcube_zenodo/_utils.py	84.61%	2 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##              main       #2       +/-   ##
============================================
- Coverage   100.00%   48.45%   -51.55%     
============================================
  Files            6        7        +1     
  Lines          118      355      +237     
============================================
+ Hits           118      172       +54     
- Misses           0      183      +183

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

forman · 2024-12-12T09:39:45Z

examples/zenodo_data_store.ipynb

@@ -116,7 +116,7 @@
   ],
   "source": [
    "%%time\n",
-    "access_token = \"ZsZVfyPCmLYRZQtfSYWruNwXYBykonv0pXZYnrQYNNL0gGMJipYsx0CYvOSB\"\n",
+    "access_token = \"fill in you Zenodo access token here\"\n",


Suggested change

"access_token = \"fill in you Zenodo access token here\"\n",

"access_token = \"fill in your Zenodo access token here\"\n",

forman · 2024-12-12T09:40:57Z

pyproject.toml

@@ -18,7 +18,10 @@ readme = {file = "README.md", content-type = "text/markdown"}
 license = {text = "MIT"}
 requires-python = ">=3.10"
 dependencies = [
+  "IPython",


This should be an optional dependency. Check dynamically for the existence of IPython at runtime.

forman · 2024-12-12T09:42:04Z

environment.yml

@@ -4,7 +4,10 @@ channels:
 dependencies:
  # Required
  - python>=3.10
+  - IPython


Note, don't include this in the conda recipe. It should be an optional dependency.

forman · 2024-12-12T09:44:58Z

xcube_zenodo/_utils.py

-from xcube.core.store import MULTI_LEVEL_DATASET_TYPE
-from xcube.core.store import DataTypeLike
-
+from typing import Any, Container, Union


Because we are already at Python 3.10, don't use Union, use | op instead.

forman · 2024-12-12T09:45:35Z

xcube_zenodo/_utils.py

-    ext = data_id.split(".")[-1]
-    format_id = MAP_FILE_EXTENSION_FORMAT.get(ext.lower())
-    return format_id
+def identify_file_format(data_id: str) -> Union[str, None]:


Suggested change

def identify_file_format(data_id: str) -> Union[str, None]:

def identify_file_format(data_id: str) -> str | None:

Or use Optional[str].

forman · 2024-12-12T09:47:13Z

xcube_zenodo/store.py



-_LOG = logging.getLogger("xcube")
+LOG = logging.getLogger(__name__)


This will evaluate to logging.getLogger("store"). To unspecific, use logger name "xcube-zenodo" instead.

forman · 2024-12-12T09:56:22Z

xcube_zenodo/store.py



 class ZenodoDataStore(DataStore):
    """Implementation of the Zenodo data store defined in the ``xcube_zenodo``
    plugin."""

-    def __init__(self, access_token: str):
+    def __init__(self, access_token: str, preload_cache_folder: str = None):


Add parameters that configure the case data store to be used: cache_store_id: str, cache_store_params: dict[str, Any].

konstntokas · 2024-12-13T16:39:59Z

xcube_zenodo/preload.py

+        self._is_cancelled = False
+        self._is_closed = False
+        self._cache_store = cache_store
+        self._cache_fs: fsspec.AbstractFileSystem = cache_store.fs


@forman can give me feedback on this. I removed os completely, but I readout root and fs from the cache datastore, which I doubt is correct.

No, that's fine, root and fs belong to the FsDataStore's API.

preload method added

2cdf0a1

b-yogesh suggested changes Dec 11, 2024

View reviewed changes

forman self-requested a review December 11, 2024 10:36

yogesh pr addressed

a7409f6

forman requested changes Dec 12, 2024

View reviewed changes

reveiw addressed

03973f9

konstntokas commented Dec 13, 2024

View reviewed changes

example for compressed zarr added

a405658

konstntokas merged commit aa6ef13 into main Dec 16, 2024
1 of 3 checks passed

konstntokas deleted the konstntokas-xxx-add_preload_for_zip branch December 19, 2024 12:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preload data method added #2

Preload data method added #2

konstntokas commented Dec 10, 2024 •

edited

Loading

b-yogesh left a comment

b-yogesh Dec 11, 2024

konstntokas Dec 11, 2024

b-yogesh Dec 11, 2024

konstntokas Dec 11, 2024

b-yogesh Dec 11, 2024

b-yogesh Dec 11, 2024

konstntokas Dec 11, 2024

konstntokas Dec 11, 2024

b-yogesh Dec 11, 2024

konstntokas Dec 11, 2024

codecov bot commented Dec 11, 2024 •

edited

Loading

forman Dec 12, 2024

forman Dec 12, 2024

forman Dec 12, 2024

forman Dec 12, 2024

forman Dec 12, 2024

forman Dec 12, 2024

forman Dec 12, 2024

konstntokas Dec 13, 2024

forman Dec 16, 2024

	"access_token = \"fill in you Zenodo access token here\"\n",
	"access_token = \"fill in your Zenodo access token here\"\n",

	def identify_file_format(data_id: str) -> Union[str, None]:
	def identify_file_format(data_id: str) -> str \| None:



		_LOG = logging.getLogger("xcube")
		LOG = logging.getLogger(__name__)

Preload data method added #2

Preload data method added #2

Conversation

konstntokas commented Dec 10, 2024 • edited Loading

b-yogesh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Dec 11, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

konstntokas commented Dec 10, 2024 •

edited

Loading

codecov bot commented Dec 11, 2024 •

edited

Loading