Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Truncate Column Rename Step #93

Open
yannistze opened this issue Oct 18, 2024 · 2 comments
Open

Truncate Column Rename Step #93

yannistze opened this issue Oct 18, 2024 · 2 comments

Comments

@yannistze
Copy link

Hello,

If I understand the order of operations in the SQLPushDownRule.scala file correctly, the first step after adding the shared context to all the Relations is to rename all the columns in each Relation to unique names following the normalizedExprIdMap logic:

// Second, we need to rename the outputs of each SingleStore relation in the tree. This transform is
// done to ensure that we can handle projections which involve ambiguous column name references.
var ptr, nextPtr = normalized.transform({
case SQLGen.Relation(relation) => relation.renameOutput
})

This makes sense, to avoid Duplicate Name issues later on for example in Joins since they are wrapped around a selectAll statement instead of a select statement with aliases:



The issue that I am facing with this approach is that:

  • when either (or all) Relation(s) has(have) too many columns (above 50-80)
  • when the column name conventions are long
  • and we fully PushDown

the query string becomes too long and thus leading to the schema fetch and subsequent PrepareStatement code to fail.

Tried:

  • truncating the query in a few places and keeping the Connector logic the same which helped but didn't fully solve the issue
  • using qualifiers instead of renaming each column which also helped but required a more extensive refactoring of the Connector (ex. joins) and thus making it more risky

I was wondering if there is any guidance in scenarios like the one I am facing ?

Thanks

@AdalbertMemSQL
Copy link
Collaborator

Hello,
Could you clarify what error you are encountering?

@yannistze
Copy link
Author

Hello, Could you clarify what error you are encountering?

Hello, for sure, the error I get is the following generic one:

java.sql.SQLTransientConnectionException: Driver has reconnect connection after a communications link failure with address=(host=10.133.121.176)(port=3306)(type=primary)
  at com.singlestore.jdbc.client.impl.MultiPrimaryClient.replayIfPossible(MultiPrimaryClient.java:212)
  at com.singlestore.jdbc.client.impl.MultiPrimaryClient.execute(MultiPrimaryClient.java:345)
  at com.singlestore.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:69)
  at com.singlestore.jdbc.ClientPreparedStatement.executeQuery(ClientPreparedStatement.java:251)
  at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:122)
  at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:122)
  at com.singlestore.spark.JdbcHelpers$.loadSchema(JdbcHelpers.scala:137)
  at com.singlestore.spark.SinglestoreReader.schema$lzycompute(SinglestoreReader.scala:84)
  at com.singlestore.spark.SinglestoreReader.schema(SinglestoreReader.scala:84)
...

that if I am not mistaken comes from this codepath in the JDBC Driver:

        // no transaction, but connection is now up again.
        // changing exception to SQLTransientConnectionException
        throw new SQLTransientConnectionException(
            String.format(
                "Driver has reconnect connection after a communications link failure with %s",
                oldClient.getHostAddress()),
            "25S03");

and "masks" the root cause 😞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants