Make the APIDataset.save stores the response received #748

npfp · 2024-07-03T10:19:35Z

Description

When running a POST or PUT request with the APIDataset, the response is currently lost while it would be useful to store it.

Context

We rely a lot on the APIDataset to fetch but also to save data to external API. Keeping tracks or the answer is then really important to us.

Possible Implementation

We built a custom APIDataset that takes a filepath argument. If this argument is not None, a TextDataset(filepath=filepath) is created and is called in the _execute_save_request:

self.local_dataset.save(response.text)

Possible Alternatives

Not found any other.

I would then like to make a PR with this proposed change but before making the actual PR, I wanted to double check with you that this feature would be of interest for the community.

The text was updated successfully, but these errors were encountered:

datajoely · 2024-07-03T10:24:21Z

This is a great point, I think the other slightly more robust way to do this would is to add a logging.info(response.text) call so that this sort of stuff can be picked up within an observability stack

npfp · 2024-07-03T10:31:31Z

Ah right I didn't think to this way.

In our case, some of the external endpoints send back to us an id and some pieces of information that we use as starting point of another pipeline in a subsequent run so the idea of storing the response.

datajoely · 2024-07-03T10:34:30Z

That makes sense, I think the ambition is right, we should store this. I guess this was built under the assumption we only cared about 200 responses, but POST functionality was added by the community later and this is a key point.

npfp · 2024-07-03T10:57:50Z

Great, I will make a PR then.

datajoely · 2024-07-03T11:12:01Z

before you do any work - I'd maybe like to get some other contributors opinion! @noklam @merelcht any thoughts here?

merelcht · 2024-07-03T15:25:36Z

It makes total sense to me to save the response. I wouldn't save it as an other type of dataset though (e.g. TextDataset mentioned in the description), but rather directly save it to a file format that makes sense. The main reason for that is that to me it feels odd to have one type of dataset be the return type of another dataset.

npfp · 2024-07-04T06:52:44Z

@merelcht I see, regarding the use of TextDataset it was really to reduce the maintenance/test burden by relying on a maintained dataset while keeping the interface simple.

When you say

directly save it to a file format that makes sense

do you mean use directly the open context manager/write operation?

astrojuanlu transferred this issue from kedro-org/kedro Jul 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the APIDataset.save stores the response received #748

Make the APIDataset.save stores the response received #748

npfp commented Jul 3, 2024

datajoely commented Jul 3, 2024

npfp commented Jul 3, 2024

datajoely commented Jul 3, 2024

npfp commented Jul 3, 2024

datajoely commented Jul 3, 2024

merelcht commented Jul 3, 2024

npfp commented Jul 4, 2024 •

edited

Loading

Make the APIDataset.save stores the response received #748

Make the APIDataset.save stores the response received #748

Comments

npfp commented Jul 3, 2024

Description

Context

Possible Implementation

Possible Alternatives

datajoely commented Jul 3, 2024

npfp commented Jul 3, 2024

datajoely commented Jul 3, 2024

npfp commented Jul 3, 2024

datajoely commented Jul 3, 2024

merelcht commented Jul 3, 2024

npfp commented Jul 4, 2024 • edited Loading

npfp commented Jul 4, 2024 •

edited

Loading