Replies: 3 comments 6 replies
-
Hi @SwePalm ! Would you mind taking a screenshot of how it looks in your IDE? json_from_dir:
type: PartitionedDataSet
path: data/01_raw/hour=01
dataset: pandas.JSONDataSet
load_args:
lines: True
convert_dates: False
filename_suffix: ".gz" Does it look like this? It's a bit hard to see how you've pasted it. |
Beta Was this translation helpful? Give feedback.
-
sorry, I am not able the take a screenshot, my path info are a bit revealing...what do you want to see more specific? |
Beta Was this translation helpful? Give feedback.
-
OK, adding info from second test, this time with parquet files (instead of json). Did try to pass columns as load_args. And learnt that the error is thrown when trying the partition_load_func() even in kedro ipython (sorry for the initial confusion, me learning!) |
Beta Was this translation helpful? Give feedback.
-
Hi,
i am new to Kedro, so still exploring all options, but now i found something that i am not able to google :-)
My use-case involves data that are partitioned by year/month/day/hour.
Where every hour includes multiple files.
I started to explore PartitionedDataSet for this.
My first test was to read a single hour.
The catalog looks like this:
json_from_dir:
type: PartitionedDataSet
path: data/01_raw/hour=01
dataset: pandas.JSONDataSet
load_args:
lines: True
convert_dates: False
filename_suffix: ".gz"
I have been using the load_args in my "old" code, and works there.
I test the catalog from kedro ipython and
catalog.load("json_from_dir")
This works great.
But when building and run a simple pipeline, with the dataset as input i get an ERROR, where i can see that the file is loaded with load_args={}
(and the error is thrown because it is json-lines, so lines=True has to be included).
Anyone with a suggestion on what to do?
A dataset with a single json-file works as input to a node. So pandas.JSONDataSet picks up the load_args correct.
So i suspect PartitionedDataSet, but could of course be wrong :-)
Beta Was this translation helpful? Give feedback.
All reactions