Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High number of database connections #138

Open
amotl opened this issue Apr 1, 2023 · 5 comments
Open

High number of database connections #138

amotl opened this issue Apr 1, 2023 · 5 comments

Comments

@amotl
Copy link
Member

amotl commented Apr 1, 2023

Hi there,

users of this library reported about a high number of database connections to CrateDB. They are using it in a high-volume data ingest environment.

There seems to be a ATTR_PERSISTENT attribute 1, which, under normal circumstances, would keep a connection open. But in the CrateDB PHP PDO adapter, it dumps that option if it finds it configured.

What we want ideally is connection pooling, that is, for connections to be reused to save all the overhead of creating new connections for each and every database conversation.

With kind regards,
Andreas.

/cc @hlcianfagna, @hammerhead

Footnotes

  1. PHP PDO » Connections and Connection management

@amotl
Copy link
Member Author

amotl commented May 4, 2023

Hi again,

while the best solution for limiting database connections is by using PgBouncer or SQL Relay, this would be a concern separately from this driver, as this will mean to use CrateDB's PostgreSQL interface instead. It is a viable option, and we should explore that route.

With respect to the current implementation of the CrateDB PHP PDO driver, which uses CrateDB's HTTP interface through the Guzzle HTTP client library, we should at least provide a way to limit concurrency by following the corresponding Guzzle » Concurrent requests guideline, i.e. using GuzzleHttp\Pool instead of GuzzleHttp\Client.

With kind regards,
Andreas.

@amotl
Copy link
Member Author

amotl commented May 4, 2023

I was wrong on my assumption GuzzleHttp\Pool would be some kind of solution here. Despite it's called Pool, it is apparently about batching multiple requests, where the number of requests is known in advance.

Obviously, this can't be applied in situations when an unknown amount of requests from any clients has to be processed.

@amotl
Copy link
Member Author

amotl commented May 4, 2023

I am now following up on the HTTP connection keep-alive / connection re-use aspects, with respect to the current HTTP client implementation based on Guzzle and the PHP cURL binding, if I am correct.

-- https://stackoverflow.com/questions/972925/persistent-keepalive-http-with-the-php-curl-library
-- https://curl.se/docs/faq.html#Can_I_perform_multiple_requests

I've identified the following settings from 1 might be interesting candidates to explore. However, it's a different thing if they can actually be used through Guzzle.

  • CURLOPT_FORBID_REUSE
    true to force the connection to explicitly close when it has finished processing, and not be pooled for reuse.
  • CURLOPT_FRESH_CONNECT
    true to force the use of a new connection instead of a cached one.
  • CURLOPT_MAXCONNECTS
    The maximum amount of persistent connections that are allowed. When the limit is reached, the oldest one in the cache is closed to prevent increasing the number of open connections.
  • CURLOPT_TCP_FASTOPEN
    true to enable TCP Fast Open.
  • CURLOPT_TCP_KEEPALIVE
    If set to 1, TCP keepalive probes will be sent. The delay and frequency of these probes can be controlled by the CURLOPT_TCP_KEEPIDLE and CURLOPT_TCP_KEEPINTVL options, provided the operating system supports them. If set to 0 (default) keepalive probes are disabled.
  • CURLOPT_TCP_KEEPIDLE
    Sets the delay, in seconds, that the operating system will wait while the connection is idle before sending keepalive probes, if CURLOPT_TCP_KEEPALIVE is enabled. Not all operating systems support this option. The default is 60.
  • CURLOPT_TCP_KEEPINTVL
    Sets the interval, in seconds, that the operating system will wait between sending keepalive probes, if CURLOPT_TCP_KEEPALIVE is enabled. Not all operating systems support this option. The default is 60.

Footnotes

  1. https://www.php.net/manual/en/function.curl-setopt.php

@amotl
Copy link
Member Author

amotl commented May 4, 2023

Quickly searching my vendor/guzzlehttp directory, it looks like none of the different CURLOPT settings listed above is used by Guzzle's CurlFactory at all.

@amotl
Copy link
Member Author

amotl commented May 4, 2023

GH-53 adjusted the default timeout to 0, which apparently means "indefinitely", mostly. After reading the discussion(s) referenced below, I wonder if it was a good decision? It's not completely clear to me yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant