Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the APIDataset.save stores the response received #748

Open
npfp opened this issue Jul 3, 2024 · 7 comments
Open

Make the APIDataset.save stores the response received #748

npfp opened this issue Jul 3, 2024 · 7 comments

Comments

@npfp
Copy link

npfp commented Jul 3, 2024

Description

When running a POST or PUT request with the APIDataset, the response is currently lost while it would be useful to store it.

Context

We rely a lot on the APIDataset to fetch but also to save data to external API. Keeping tracks or the answer is then really important to us.

Possible Implementation

We built a custom APIDataset that takes a filepath argument. If this argument is not None, a TextDataset(filepath=filepath) is created and is called in the _execute_save_request:

self.local_dataset.save(response.text)

Possible Alternatives

Not found any other.

I would then like to make a PR with this proposed change but before making the actual PR, I wanted to double check with you that this feature would be of interest for the community.

@datajoely
Copy link
Contributor

This is a great point, I think the other slightly more robust way to do this would is to add a logging.info(response.text) call so that this sort of stuff can be picked up within an observability stack

@npfp
Copy link
Author

npfp commented Jul 3, 2024

Ah right I didn't think to this way.

In our case, some of the external endpoints send back to us an id and some pieces of information that we use as starting point of another pipeline in a subsequent run so the idea of storing the response.

@datajoely
Copy link
Contributor

That makes sense, I think the ambition is right, we should store this. I guess this was built under the assumption we only cared about 200 responses, but POST functionality was added by the community later and this is a key point.

@npfp
Copy link
Author

npfp commented Jul 3, 2024

Great, I will make a PR then.

@datajoely
Copy link
Contributor

before you do any work - I'd maybe like to get some other contributors opinion! @noklam @merelcht any thoughts here?

@astrojuanlu astrojuanlu transferred this issue from kedro-org/kedro Jul 3, 2024
@merelcht
Copy link
Member

merelcht commented Jul 3, 2024

It makes total sense to me to save the response. I wouldn't save it as an other type of dataset though (e.g. TextDataset mentioned in the description), but rather directly save it to a file format that makes sense. The main reason for that is that to me it feels odd to have one type of dataset be the return type of another dataset.

@npfp
Copy link
Author

npfp commented Jul 4, 2024

@merelcht I see, regarding the use of TextDataset it was really to reduce the maintenance/test burden by relying on a maintained dataset while keeping the interface simple.

When you say

directly save it to a file format that makes sense

do you mean use directly the open context manager/write operation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants