Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for DuckDB Python Function API #1008

Closed
mattharrison opened this issue Apr 9, 2024 · 3 comments
Closed

Support for DuckDB Python Function API #1008

mattharrison opened this issue Apr 9, 2024 · 3 comments

Comments

@mattharrison
Copy link

I'm trying to add a custom Python function to duckdb.

This is a silly example, but it shows an example.

import duckdb
def two(col_name: int) -> int:
    return 2

duckdb.create_function('two', two)

I can use duckdb from the Python API and it works:

>>> duckdb.sql('SELECT two(2);')
┌────────┐
│ two(2) │
│ int64  │
├────────┤
│      2 │
└────────┘

However, the %%sql magic fails:

%%sql
SELECT two(2);

It gives the following error:

RuntimeError: (duckdb.duckdb.CatalogException) Catalog Error: Scalar Function with name two does not exist!
Did you mean "%"?
LINE 1: SELECT two(2);
               ^
[SQL: SELECT two(2);]
(Background on this error at: https://sqlalche.me/e/20/f405)
If you need help solving this issue, send us a message: https://ploomber.io/community

@edublancas
Copy link

edublancas commented Apr 9, 2024

this works using a native connection!

In [1]: import duckdb
   ...:
   ...: %load_ext sql
   ...: conn = duckdb.connect()

In [2]: def two(col_name: int) -> int:
   ...:     return 2
   ...:
   ...: conn.create_function('two', two) # note that I'm calling create_function on the connection object!
Out[2]: <duckdb.duckdb.DuckDBPyConnection at 0x14bd553b0>

In [3]: %sql conn --alias duckdb

In [4]: %%sql
   ...: SELECT two(2);
   ...:
   ...:
Running query in 'duckdb'
Out[4]:
+--------+
| two(2) |
+--------+
|   2    |
+--------+

the main issue is that, by default, our documentation shows connecting to DuckDB via duckdb-engine, which uses sqlalchemy. we did this to keep backward compatibility with ipython-SQL. the problem is that using duckdb-native APIs through sqlalchemy is a bit tricky (we've encountered issues like this before), so we ended up writing logic to support native connections.

native connections work pretty much the same as sqlalchemy ones, but the latter has been battle-tested over the years thanks to ipython-sql, so let me know if you encounter issues with native connections

@mattharrison
Copy link
Author

Is that the same issue as this PIVOT example (taken from https://duckdb.org/docs/sql/statements/pivot.html) not working?

%%sql
CREATE TABLE Cities (Country VARCHAR, Name VARCHAR, Year INTEGER, Population INTEGER);
INSERT INTO Cities VALUES ('NL', 'Amsterdam', 2000, 1005);
INSERT INTO Cities VALUES ('NL', 'Amsterdam', 2010, 1065);
INSERT INTO Cities VALUES ('NL', 'Amsterdam', 2020, 1158);
INSERT INTO Cities VALUES ('US', 'Seattle', 2000, 564);
INSERT INTO Cities VALUES ('US', 'Seattle', 2010, 608);
INSERT INTO Cities VALUES ('US', 'Seattle', 2020, 738);
INSERT INTO Cities VALUES ('US', 'New York City', 2000, 8015);
INSERT INTO Cities VALUES ('US', 'New York City', 2010, 8175);
INSERT INTO Cities VALUES ('US', 'New York City', 2020, 8772);

When running the pivot code it shows an empty table (but if you use the Python interface it works):

%%sql
PIVOT Cities
ON Country, Name
USING sum(Population);

@edublancas
Copy link

I think the pivot problem is the same described here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants