Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Authentication within a DAG #42

Open
leanneharris opened this issue Aug 16, 2021 · 2 comments
Open

Authentication within a DAG #42

leanneharris opened this issue Aug 16, 2021 · 2 comments

Comments

@leanneharris
Copy link

leanneharris commented Aug 16, 2021

Why do I get less rows if I run the same code from a DAG?
For the same query, I get 4185 rows if I run locally from the command line but only 2155 rows if run from a DAG in cloud composer.
Is this an authentication issue? My DAG is not throwing any errors.

@joshcarty
Copy link
Owner

Hi @leanneharris - that's very strange. Are you able to share your local script and DAG so that we can compare them?

@leanneharris
Copy link
Author

Hi @joshcarty - Thanks very much for your reply and the code :-) I actually should have taken the question down over the weekend because I'm not seeing this for any other days that it has run and have decided to ignore it for now. Perhaps you could help with a different DAG driven authentication problem though?

I am using serialised credentials. Whether I'm running from my command line, or within the DAG, this works nicely for around a week before I get the error that my credentials have expired. I then have to run the authentication again, update the credentials.json and I can run the script for another week (It's not an exact 7 day thing, but it's thereabouts each time). At the moment, I can't fully automate the process because I need to authenticate at the browser once per week and replace the credentials.json in the DAG bucket in cloud composer. It's not terrible, but it's not ideal either. Can you tell me if there's anything I'm doing wrong or if you know of a way to stop this happening? I am collecting around 4k rows per search appearance filter (usually 9) per domain (10) per day, so approx 360k rows total per day if that makes any difference and, rather frustratingly, I don't have control of my work device and have forced restarts on a regular basis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants