Skip to content

Commit

Permalink
refactor(pandas): remove the pandas backend
Browse files Browse the repository at this point in the history
BREAKING CHANGE: The `pandas` backend is removed. Note that **pandas DataFrames are STILL VALID INPUTS AND OUTPUTS** and will remain so for the foreseeable future. Please use one of the other local backends like DuckDB, Polars, or DataFusion to perform operations directly on pandas DataFrames.
  • Loading branch information
cpcloud committed Sep 13, 2024
1 parent f7e5704 commit f72d6fe
Show file tree
Hide file tree
Showing 58 changed files with 263 additions and 7,503 deletions.
5 changes: 1 addition & 4 deletions docs/backends/_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,7 @@ def get_renderer(level: int) -> MdRenderer:

@cache
def get_backend(backend: str):
if backend == "pandas":
return get_object(f"ibis.backends.{backend}", "BasePandasBackend")
else:
return get_object(f"ibis.backends.{backend}", "Backend")
return get_object(f"ibis.backends.{backend}", "Backend")


def get_callable(obj, name):
Expand Down
212 changes: 3 additions & 209 deletions docs/backends/pandas.qmd
Original file line number Diff line number Diff line change
@@ -1,213 +1,7 @@
# pandas

[https://pandas.pydata.org/](https://pandas.pydata.org/)

![](https://img.shields.io/badge/memtables-native-green?style=flat-square) ![](https://img.shields.io/badge/inputs-CSV | Parquet-blue?style=flat-square) ![](https://img.shields.io/badge/outputs-CSV | pandas | Parquet | PyArrow-orange?style=flat-square)

::: {.callout-warning}
## The Pandas backend is slated for removal in Ibis 10.0
We recommend using one of our other backends.

Many workloads work well on the DuckDB and Polars backends, for example.
:::


## Install

Install Ibis and dependencies for the pandas backend:

::: {.panel-tabset}

## `pip`

Install with the `pandas` extra:

```{.bash}
pip install 'ibis-framework[pandas]'
```

And connect:

```{.python}
import ibis
con = ibis.pandas.connect() # <1>
```

1. Adjust connection parameters as needed.

## `conda`

Install for pandas:

```{.bash}
conda install -c conda-forge ibis-pandas
```

And connect:

```{.python}
import ibis
con = ibis.pandas.connect() # <1>
```

1. Adjust connection parameters as needed.

## `mamba`

Install for pandas:

```{.bash}
mamba install -c conda-forge ibis-pandas
```

And connect:

```{.python}
import ibis
con = ibis.pandas.connect() # <1>
```

1. Adjust connection parameters as needed.
::: {.callout-note}
## The pandas backend was removed in Ibis version 10.0

See [our blog post](../posts/farewell-pandas/index.qmd) on the topic for more information.
:::



## User Defined functions (UDF)

Ibis supports defining three kinds of user-defined functions for operations on
expressions targeting the pandas backend: **element-wise**, **reduction**, and
**analytic**.

### Elementwise Functions

An **element-wise** function is a function that takes N rows as input and
produces N rows of output. `log`, `exp`, and `floor` are examples of
element-wise functions.

Here's how to define an element-wise function:

```python
import ibis.expr.datatypes as dt
from ibis.backends.pandas.udf import udf

@udf.elementwise(input_type=[dt.int64], output_type=dt.double)
def add_one(x):
return x + 1.0
```

### Reduction Functions

A **reduction** is a function that takes N rows as input and produces 1 row
as output. `sum`, `mean` and `count` are examples of reductions. In
the context of a `GROUP BY`, reductions produce 1 row of output _per
group_.

Here's how to define a reduction function:

```python
import ibis.expr.datatypes as dt
from ibis.backends.pandas.udf import udf

@udf.reduction(input_type=[dt.double], output_type=dt.double)
def double_mean(series):
return 2 * series.mean()
```

### Analytic Functions

An **analytic** function is like an **element-wise** function in that it takes
N rows as input and produces N rows of output. The key difference is that
analytic functions can be applied _per group_ using window functions. Z-score
is an example of an analytic function.

Here's how to define an analytic function:

```python
import ibis.expr.datatypes as dt
from ibis.backends.pandas.udf import udf

@udf.analytic(input_type=[dt.double], output_type=dt.double)
def zscore(series):
return (series - series.mean()) / series.std()
```

### Details of pandas UDFs

- Element-wise provide support
for applying your UDF to any combination of scalar values and columns.
- Reductions provide support for
whole column aggregations, grouped aggregations, and application of your
function over a window.
- Analytic functions work in both grouped and non-grouped
settings
- The objects you receive as input arguments are either `pandas.Series` or
Python/NumPy scalars.

::: {.callout-warning}
## Keyword arguments must be given a default

Any keyword arguments must be given a default value or the function **will
not work**.
:::

A common Python convention is to set the default value to `None` and
handle setting it to something not `None` in the body of the function.

Using `add_one` from above as an example, the following call will receive a
`pandas.Series` for the `x` argument:

```python
import ibis
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3]})
con = ibis.pandas.connect({'df': df})
t = con.table('df')
expr = add_one(t.a)
expr
```

And this will receive the `int` 1:

```python
expr = add_one(1)
expr
```

Since the pandas backend passes around `**kwargs` you can accept `**kwargs`
in your function:

```python
import ibis.expr.datatypes as dt
from ibis.backends.pandas.udf import udf

@udf.elementwise([dt.int64], dt.double)
def add_two(x, **kwargs): # do stuff with kwargs
return x + 2.0
```

Or you can leave them out as we did in the example above. You can also
optionally accept specific keyword arguments.

For example:

```python
import ibis.expr.datatypes as dt
from ibis.backends.pandas.udf import udf

@udf.elementwise([dt.int64], dt.double)
def add_two_with_none(x, y=None):
if y is None:
y = 2.0
return x + y
```

```{python}
#| echo: false
BACKEND = "Pandas"
```

{{< include ./_templates/api.qmd >}}
2 changes: 1 addition & 1 deletion docs/backends_sankey.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ def to_greyish(hex_code, grey_value=128):
"SQLite",
"Trino",
],
list(category_colors.keys())[2]: ["Dask", "pandas", "Polars"],
list(category_colors.keys())[2]: ["Polars"],
}

nodes, links = [], []
Expand Down
22 changes: 2 additions & 20 deletions ibis/backends/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,13 @@
import importlib
import importlib.metadata
import itertools
import operator
from functools import cache
from pathlib import Path
from typing import TYPE_CHECKING, Any

import _pytest
import pytest
from packaging.requirements import Requirement
from packaging.version import parse as vparse

import ibis
from ibis import util
Expand All @@ -30,22 +28,6 @@
from ibis.backends.tests.base import BackendTest


def compare_versions(module_name, given_version, op):
try:
current_version = importlib.metadata.version(module_name)
return op(vparse(current_version), vparse(given_version))
except importlib.metadata.PackageNotFoundError:
return False


def is_newer_than(module_name, given_version):
return compare_versions(module_name, given_version, operator.gt)


def is_older_than(module_name, given_version):
return compare_versions(module_name, given_version, operator.lt)


TEST_TABLES = {
"functional_alltypes": ibis.schema(
{
Expand Down Expand Up @@ -486,7 +468,7 @@ def _setup_backend(request, data_dir, tmp_path_factory, worker_id):


@pytest.fixture(
params=_get_backends_to_test(discard=("pandas",)),
params=_get_backends_to_test(),
scope="session",
)
def ddl_backend(request, data_dir, tmp_path_factory, worker_id):
Expand All @@ -501,7 +483,7 @@ def ddl_con(ddl_backend):


@pytest.fixture(
params=_get_backends_to_test(keep=("pandas", "pyspark")),
params=_get_backends_to_test(keep=("pyspark",)),
scope="session",
)
def udf_backend(request, data_dir, tmp_path_factory, worker_id):
Expand Down
Loading

0 comments on commit f72d6fe

Please sign in to comment.