Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow performance on any cell that references the %sql magic #55

Closed
sillyotter opened this issue Mar 29, 2023 · 5 comments
Closed

Comments

@sillyotter
Copy link

Several days ago, I opened an issue on ploomber-core about its lack of license and that impacting my ability to install jupysql in a corporate environment. That environment scans every package imported for a variety of legal and security issues, and lack of acceptable license was preventing import.

The addition of a license fixed the problem, but I then found a different problem.

While I can install and run jupyter, every cell I run with a %sql or %%sql magic in it took 11+ seconds to run. The same operations run on a machine with more open internet access worked instantly. After looking at the docs for jupysql and ploomber-core, I found that it tries to phone home details about what its doing. I edited the appropriate .ploomber/stats/config.yaml file to turn off stats collection and version checks and that improved things, but the %sql cells still take +4 seconds to complete trivial operations.

Are there any other points in jupysql or its dependencies that are trying to phone home and blocking while doing so? Or doing something else that would be prohibited in a tightly run corporate environment? If so, can anyone comment on what they are, and if they can be turned off?

I can't be sure its related, but when running on my home computer, packet captures show every cell run seems to generate https calls to some google hosted service, but I haven't tried to decrypt the traffic yet to see what it is. I wouldn't be surprised if such calls were blocked in the corporate environment, and if the code in question is blocking waiting for that to return, that could well be the source of the problem..

Any ideas?

@edublancas
Copy link
Contributor

(moving this issue to ploomber-core)

thanks for your feedback. I think I found the problem (in ploomber-core), while checking if the user stats are enabled, we also check if there's internet connection by hitting google.com, but this is inneficient since there there isn't any point in checking internet connection if the user disabled user stats.

I can work on a quick fix for this.

@edublancas edublancas transferred this issue from ploomber/jupysql Mar 29, 2023
@sillyotter
Copy link
Author

sillyotter commented Mar 29, 2023

I think you may be on to something, and not doing the google lookup seems like it would fix the problem.

That being said, I know I can reach google from this corp site via browsers, and from code, but the traffic goes through a http proxy server. There are environment variables that many tools I have used recognize and use to redirect traffic. I have not investigated too deeply, but it appears that httplib has to be manually told via a set_tunnel call to set which proxy it to use? Not making the call will certainly fix the issue, but If there were a place in there where you needed to talk to external resources, making that httplib.HTTPSConenction proxy aware might help. Or using a library that was already http_proxy aware may help.

Again, thanks for the help.

@edublancas
Copy link
Contributor

ok, I released ploomber-core 0.2.9 with a fix. Can you check again?

pip install ploomber-core --upgrade

we can leave this issue open so we tackle the proxy later.

@sillyotter
Copy link
Author

That is much faster. The performance is much closer to what I see in a non-proxied environment.

Thanks again for the help.

In re: the proxy, I know that curl, and following suite, requests and httpx pay attention to the HTTP(S)_PROXY environment variables. Some notes on httpx's support for it can be found here.

I have no insight on if you need a fix, nor what it should be, but I thought I'd pass along that page and its details on some of the env vars that other http request modules pay attention to.

Feel free to close this and pursue (or not) the proxy ideas in a different issue.

@edublancas
Copy link
Contributor

opened a new issue to keep track of the proxy thing: #57

thanks for your feedback. if you encounter any other problems, let us know!

Also, our Slack community is a good place to get quick help so feel free to join as well: https://ploomber.io/community

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants