Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple backends error when using join #7

Open
safrazRampersaud opened this issue Feb 4, 2021 · 13 comments
Open

Multiple backends error when using join #7

safrazRampersaud opened this issue Feb 4, 2021 · 13 comments

Comments

@safrazRampersaud
Copy link
Contributor

safrazRampersaud commented Feb 4, 2021

Hello,

I am encountering an error that is related to multiple backends when attempting to perform a join. I can't provide the exact code but I can provide the skeleton syntax of my steps:

In [1]: connect = ibis_mssql.connect(url=url)
In [2]: table1= connect.table(name='table1', database='DBX', schema='prd')
In [3]: table2 = connect.table(name='table2', database='DBX', schema='prd')
In [4]: result = table1.join(table2, table1._id == table2._id)
EDITED: In [5]: result = result.materialize()
In [6]: result.execute()
Out[6]:
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-46-82574fdc8e38> in <module>
----> 1 ab.execute()

~\working\ibis-master\ibis\expr\types.py in execute(self, limit, timecontext, params, **kwargs)
    222 
    223         return execute(
--> 224             self, limit=limit, timecontext=timecontext, params=params, **kwargs
    225         )
    226 

~\working\ibis-master\ibis\client.py in execute(expr, limit, params, timecontext, **kwargs)
    379       Scalar expressions: Python scalar value
    380     """
--> 381     (backend,) = validate_backends(list(find_backends(expr)))
    382     return backend.execute(
    383         expr, limit=limit, params=params, timecontext=timecontext, **kwargs

~\working\ibis-master\ibis\client.py in validate_backends(backends)
    351     print(backends)
    352     if len(backends) > 1:
--> 353         raise ValueError('Multiple backends found')
    354     return backends
    355 

ValueError: Multiple backends found

Here is what I have observed:

  1. I printed out the backends from the ibis-framework client (ibis/client.py) in method validate_backends before the stack trace error. I saw [<ibis_mssql.client.MSSQLClient object at 0x000001A3EED58588>, <ibis_mssql.client.MSSQLClient object at 0x000001A3EED58448>]. My first thought was that these backends correspond to the number of tables I am joining. To determine if that was true, I joined 3 tables and the output from the backends had 3 ibis_mssql.client.MSSQLClient objects.

  2. For comparison, I ran the example in https://labs.quansight.org/blog/2020/06/ibis-an-idiomatic-flavor-of-sql-for-python-programmers/ under the Materializing the join section to see if the join would give the same issue about the number of backends. It did not. Moreover, I investigated the results of the materialize() command in:

join = unmaterialized.materialize()

from the website and it was interesting that there were 2 SQLiteTables and a MaterializedJoin[table]. There was no MaterializedJoin[table] from [5] in my steps when I took a closer look.

Note: I've resolved all the duplicate columns from the join and each of the tables referenced above can successfully have execute() invoked on it. So I do not think there are any issues with my setup.

Have any thoughts about what might be causing this? Happy to provide more details about this error. Thanks!

@xmnlab
Copy link
Contributor

xmnlab commented Feb 4, 2021

thanks for reporting that @safrazRampersaud I will take a look into that today.

@safrazRampersaud
Copy link
Contributor Author

Thank you @xmnlab. Happy to help any way I can ...

@safrazRampersaud
Copy link
Contributor Author

safrazRampersaud commented Feb 9, 2021

@xmnlab Debugging a bit more, for N number of tables the ibis-mssql connection is supplying N number of clients (maybe?). Looking again at the SQLiteClient example with two tables, there is only one object ibis.backends.sqlite.client.SQLiteClient which is referenced twice but since the object has been seen before in the search, the find_backends method from the ibis client implementation does not return two of them to the compile method. This differs when I trace a simple example with ibis-mssql and I find that for 2 tables, there are two different ibis_mssql.client.MSSQLClient objects. Is ibis-mssql creating a different client for every table that gets referenced on the same connection? Hopefully this helps ...

@safrazRampersaud safrazRampersaud changed the title Multiple backends error when using join, absence of materialized table Multiple backends error when using join Feb 9, 2021
@xmnlab
Copy link
Contributor

xmnlab commented Feb 9, 2021

@safrazRampersaud I think you should execute the materialized expression (ref http://ibis-project.org/docs/user_guide/sql.html#join-with-select), for example:

result = result.materialize()
result.execute()

Could you check if it fixes the problem?

@safrazRampersaud
Copy link
Contributor Author

Hi @xmnlab Thanks for the response! Unfortunately, I get the same error even with your suggestion.

@xmnlab
Copy link
Contributor

xmnlab commented Feb 10, 2021

If you are trying ibis-mssql from source, could you try some changes in the connect function in api.py?

First, you will need to import some modules, add it in the beginning of the file:

import ibis.config as cf
from ibis.config import options

these are the changes you can try:

diff --git a/ibis_mssql/api.py b/ibis_mssql/api.py
index 9f4fea3..c81d405 100644
--- a/ibis_mssql/api.py
+++ b/ibis_mssql/api.py
@@ -113,7 +113,7 @@ def connect(
         year : int32
         month : int32
     """
-    return MSSQLClient(
+    client = MSSQLClient(
         host=host,
         user=user,
         password=password,
@@ -123,3 +123,12 @@ def connect(
         driver=driver,
         odbc_driver=odbc_driver,
     )
+
+    if options.default_backend is None:
+        options.default_backend = client
+
+    with cf.config_prefix('sql'):
+        k = 'default_limit'
+        cf.set_option(k, None)
+
+    return client

so basically, instead of return directly MSSQLClient, you will store that to a variable called client ... you will add some extra configuration, and it will return the client variable.

this is the same configuration we are using for ibis-omniscidb

let me know if it works for you.

@safrazRampersaud
Copy link
Contributor Author

@xmnlab Awesome!! Thank you!!

I just looked up and saw your message after I found where my issue lies. When calling the connect method in client.py, if the database parameter is not explicitly set in the signature, then it defaults to master. I used a connection string on my end at first, but then I switched to accessing the connection via other parameters in the signature. Concerning the default, its not a problem in my case until Line 121 in ibis-mssql/client.py, where the client_class instance will get its own ibis_mssql.client.MSSQLClient object. Then, for N tables referenced, N backends will be produced since there is no anchor to an explicit database (master by default does not exist in my setup). Does my interpretation sound in the neighborhood of being correct?

Looking at your suggestion, I agree with where you are going setting a default_backend mapped to the already discovered client. I'd like to implement it for coverage and let you know how it holds. Thanks again ... wish I would have came up with my work around sooner so that you didn't have to waste time on it but maybe the default_backend solution will prove useful later.

@xmnlab
Copy link
Contributor

xmnlab commented Feb 10, 2021

@safrazRampersaud I am glad that you found a workaround for this issue. We also need to add more tests for the project, so maybe we can find this kind of issue early in the test stage

PS: as ibis-framework doesn't have a way to test "external" backends, we need to figure out a way to create some tests that could be reusable.

@safrazRampersaud
Copy link
Contributor Author

@xmnlab I added the default_backend as directed. Works well! Time for a PR? Yes, I hear you on the tests. I think this is something I'll have to think about as well. Happy to have this conversation if it will provide more coverage to the project and then some.

@xmnlab
Copy link
Contributor

xmnlab commented Feb 11, 2021

that sounds great! for this PR, just a simple test for the join expression should be enough for now.

I am going to create an issue for tests, so we can discuss some ideas to increase the coverage for tests there.

@safrazRampersaud
Copy link
Contributor Author

Great! ... I'll look at putting something together this weekend. Thanks!

@xmnlab
Copy link
Contributor

xmnlab commented Feb 12, 2021

That sounds great! thank you so much.

@safrazRampersaud
Copy link
Contributor Author

safrazRampersaud commented Feb 15, 2021

Created a PR for the default_backend change and for getting the ball rolling on tests. Could use another set of eyes on it to see if this is the direction the project wants to go. Moreover, I left comments about what I believe was supposed to be accomplished, but again ... could use some more direction here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants