Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MySQL TCP port argument is ignored #695

Open
brian-pond opened this issue Oct 8, 2024 · 3 comments
Open

MySQL TCP port argument is ignored #695

brian-pond opened this issue Oct 8, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@brian-pond
Copy link

brian-pond commented Oct 8, 2024

What language are you using?

Python 3.11.1

What version are you using?

connectorx 0.3.3

What database are you using?

MariaDB 10.11.6

What dataframe are you using?

Arrow

Can you describe your bug?

Environment Configuration

My host environment has 2 different MariaDB server instances. Each is bound to a different socket:

  • Instance A: TCP port 3306
    • Does not have a SQL table named "foo". 🚫
  • Instance B: TCP port 9999
    • Does have a SQL table named "foo". ✔️

I have triple-checked and tested all credentials and tables. I did this using the standard mysql CLI tool that ships with the server software.

Code

My goal is to query Instance B and create a dataframe from its table "foo"

I have the following Python code:

sql_query = "SELECT name FROM `foo`"
db_uri = "mysql://user:[email protected]:9999/mydatabase"
arrow_table = connectorx.read_sql(db_uri, sql_query, return_type="arrow")

Scenario 1: Both servers online and active

Unexpectedly, my Python code fails with an error: 😕

RuntimeError: MySqlError { ERROR 1146 (42S02): Table 'mydatabase.foo' doesn't exist }

This error should not happen. My database on TCP port 9999 definitely exists, and definitely has a table "foo".
After much testing and troubleshooting, I thought to try another scenario.

Scenario 2: What if I shutdown -only- Instance B?

I completely shut down my Instance B. Nothing is bound to TCP 9999 anymore. I verify that with mysql CLI tool.

I try my code again. The result is a different (but expected) error:
[2024-10-08T20:17:22Z ERROR r2d2] DriverError { Could not connect to address `127.0.0.1:9999': Connection refused (os error 111) }

This error ^ is a Good Thing. It's what I expected. My database instance is offline, so connectorx should definitely fail.

Scenario 3: What if I shutdown -only- Instance A?

I turn Instance B back on.
And on a hunch, I shutdown my Instance A. So now nothing is bound to TCP 3306 anymore.
This shouldn't make a difference. Because I'm not trying to connect to 3306. My URI clearly states 9999.

db_uri = "mysql://user:[email protected]:9999/mydatabase"

I try my code again. And it works perfectly. 👀 🤯

Conclusion

Based on the above, I have to conclude this is what's happening:

  1. When connectorx is initialized, the software initially tries to establish a connection to the Port specified in your URI connection string. I demonstrated this in Scenario 2.
  2. But then
    • If TCP 3306 is online and available? Then connectorx tries to run your SQL query there. No matter what TCP port you actually specified.
    • If TCP 3306 is not online and available, then connectorx tries to use the port you told it to.
@brian-pond brian-pond added the bug Something isn't working label Oct 8, 2024
@wangxiaoying
Copy link
Contributor

wangxiaoying commented Oct 10, 2024

Hi @brian-pond ,

Thanks for opening the issue and the analyze it. However, I cannot reproduce your bug locally. Here is my test of having two mysql instances on both 3306 and 9999 port concurrently:

image

You can see the two tables are different (one with an additional row), indicating that the results are fetching from different mysql instances.

Can you double check your environment? Or is there anything that I missed?

P.S. We do not parse the url by ourselves in connectorx. Instead, we call from_url provided by the mysql rust client. Another quick check is to try running the rust client directly in your environment and see whether it is still the case. Here is an example of the rust code, you may also need to setup rust beforehand.

@brian-pond
Copy link
Author

brian-pond commented Oct 10, 2024

Weird! The results were so surprising that even after double-checking? I brought my spouse into my office, explained the whole thing, and walked them through each test and scenario, one at a time. Just to make sure I wasn't doing something stupid. I've been able to make it happen consistently. 😕

Tonight I'll follow your suggestion by installing the mysql rust client, and see what happens.

Here are more details about my environment, though I'm doubtful this makes a big difference.

  • Host A
    • Debian 12 (bookworm)
    • Python 3.11.2
    • All code installed in a Python virtual environment.
    • MariaDB version 10.11.6, installed locally, bound to TCP 3306.
  • Host B
    • Debian 11 (bullseye)
    • MariaDB version 11.2.2.

Host A is running the Python code. It has an SSH tunnel to Host B, so it "sees" Host B MariaDB over local TCP 9999.

Here are the specific Python packages installed in the virtual environment.

cachetools                    5.3.3
certifi                       2024.2.2
charset-normalizer            3.3.2
click                         8.1.6
connectorx                    0.3.3
contextlib2                   21.6.0
cron-converter                1.2.0
cron-descriptor               1.4.3
dw_etl                        0.1.0       /dw/dw_etl
google-api-core               2.19.0
google-auth                   2.29.0
google-cloud-bigquery         3.13.0
google-cloud-bigquery-storage 2.24.0
google-cloud-core             2.4.1
google-crc32c                 1.5.0
google-resumable-media        2.7.0
googleapis-common-protos      1.63.0
greenlet                      3.0.3
grpcio                        1.63.0
grpcio-status                 1.62.2
idna                          3.7
Jinja2                        3.1.3
local-crontab                 0.3.0
mariadb                       1.1.7
MarkupSafe                    2.1.5
numpy                         1.26.4
ordered-set                   4.1.0
packaging                     24.0
pandas                        2.0.3
phpserialize                  1.3
pip                           23.0.1
polars                        0.20.3
proto-plus                    1.23.0
protobuf                      4.25.3
psycopg                       3.1.19
psycopg-binary                3.1.19
psycopg-pool                  3.2.2
psycopg2                      2.9.9
pyarrow                       16.1.0
pyasn1                        0.6.0
pyasn1_modules                0.4.0
python-dateutil               2.9.0.post0
pytz                          2024.1
pytz-deprecation-shim         0.1.0.post0
requests                      2.31.0
rsa                           4.9
schema                        0.7.5
semantic-version              2.10.0
setuptools                    66.1.1
six                           1.16.0
SQLAlchemy                    1.4.50
sqlalchemy-bigquery           1.9.0
temporal-lib                  0.0.7
toml                          0.10.2
typing_extensions             4.11.0
tzdata                        2024.1
tzlocal                       5.2
urllib3                       2.2.1
wheel                         0.43.0

@brian-pond
Copy link
Author

I even download the connectorx repo, and performed a global search to see if TCP 3306 was hard-coded anywhere, perhaps as a fallback value.

(it is not. only mentioned in markdown once, and a few times in GitHub Actions workflows)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants