Skip to content

Commit

Permalink
Merge branch 'main' into use-specific-secondary-roles
Browse files Browse the repository at this point in the history
  • Loading branch information
vkcelik authored Dec 26, 2024
2 parents d5f9792 + 69c41a2 commit 52d8864
Show file tree
Hide file tree
Showing 78 changed files with 6,328 additions and 3,737 deletions.
45 changes: 45 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,59 @@

## 1.27.0 (TBD)

### Snowpark Python API Updates

#### New Features

- Added support for the following functions in `functions.py`
- `array_reverse`
- `divnull`
- `map_cat`
- `map_contains_key`
- `map_keys`
- `nullifzero`
- `snowflake_cortex_sentiment`
- Added `Catalog` class to manage snowflake objects. It can be accessed via `Session.catalog`.

#### Improvements

- Updated README.md to include instructions on how to verify package signatures using `cosign`.

### Snowpark pandas API Updates

#### New Features

- Added support for `Series.str.ljust` and `Series.str.rjust`.
- Added support for `Series.str.center`.
- Added support for `Series.str.pad`.
- Added support for applying Snowpark Python function `snowflake_cortex_sentiment`.
- Added support for `DataFrame.map`.
- Added support for `DataFrame.from_dict` and `DataFrame.from_records`.
- Added support for mixed case field names in struct type columns.
- Added support for `SeriesGroupBy.unique`
- Added support for `Series.dt.strftime` with the following directives:
- %d: Day of the month as a zero-padded decimal number.
- %m: Month as a zero-padded decimal number.
- %Y: Year with century as a decimal number.
- %H: Hour (24-hour clock) as a zero-padded decimal number.
- %M: Minute as a zero-padded decimal number.
- %S: Second as a zero-padded decimal number.
- %f: Microsecond as a decimal number, zero-padded to 6 digits.
- %j: Day of the year as a zero-padded decimal number.
- %X: Locale’s appropriate time representation.
- %%: A literal '%' character.
- Added support for `Series.between`.

#### Bug Fixes

- Fixed a bug that system function called through `session.call` have incorrect type conversion.

#### Improvements
- Improve performance of `DataFrame.map`, `Series.apply` and `Series.map` methods by mapping numpy functions to snowpark functions if possible.
- Updated integration testing for `session.lineage.trace` to exclude deleted objects
- Added documentation for `DataFrame.map`.
- Improve performance of `DataFrame.apply` by mapping numpy functions to snowpark functions if possible.
- Added documentation on the extent of Snowpark pandas interoperability with scikit-learn

## 1.26.0 (2024-12-05)

Expand Down
22 changes: 22 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,28 @@ pandas_df = df.to_pandas()

Note that the above Snowpark pandas commands will work if Snowpark is installed with the `[modin]` option, the additional `[pandas]` installation is not required.

## Verifying Package Signatures

To ensure the authenticity and integrity of the Python package, follow the steps below to verify the package signature using `cosign`.

**Steps to verify the signature:**
- Install cosign:
- This example is using golang installation: [installing-cosign-with-go](https://edu.chainguard.dev/open-source/sigstore/cosign/how-to-install-cosign/#installing-cosign-with-go)
- Download the file from the repository like pypi:
- https://pypi.org/project/snowflake-snowpark-python/#files
- Download the signature files from the release tag, replace the version number with the version you are verifying:
- https://github.com/snowflakedb/snowpark-python/releases/tag/v1.22.1
- Verify signature:
````bash
# replace the version number with the version you are verifying
./cosign verify-blob snowflake_snowpark_python-1.22.1-py3-none-any.whl \
--certificate snowflake_snowpark_python-1.22.1-py3-none-any.whl.crt \
--certificate-identity https://github.com/snowflakedb/snowpark-python/.github/workflows/python-publish.yml@refs/tags/v1.22.1 \
--certificate-oidc-issuer https://token.actions.githubusercontent.com \
--signature snowflake_snowpark_python-1.22.1-py3-none-any.whl.sig
Verified OK
````

## Contributing
Please refer to [CONTRIBUTING.md][contributing].

Expand Down
1 change: 1 addition & 0 deletions docs/source/modin/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,5 +19,6 @@ For your convenience, here is all the :doc:`Supported APIs <supported/index>`
window
groupby
resampling
interoperability
numpy
performance
153 changes: 153 additions & 0 deletions docs/source/modin/interoperability.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
===========================================
Interoperability with third party libraries
===========================================

Many third party libraries are interoperable with pandas, for example by accepting pandas dataframes objects as function
inputs. Here we have a non-exhaustive list of third party library use cases with pandas and note whether each method
works in Snowpark pandas as well.

Snowpark pandas supports the `dataframe interchange protocol <https://data-apis.org/dataframe-protocol/latest/>`_, which
some libraries use to interoperate with Snowpark pandas to the same level of support as pandas.

plotly.express
==============

The following table is structured as follows: The first column contains the name of a method in the ``plotly.express`` module.
The second column is a flag for whether or not interoperability is guaranteed with Snowpark pandas. For each of these
operations, we validate that passing in Snowpark pandas dataframes or series as the data inputs behaves equivalently
to passing in pandas dataframes or series.

.. note::
``Y`` stands for yes, i.e., interoperability is guaranteed with this method, and ``N`` stands for no.


.. note::
Currently only plotly versions <6.0.0 are supported through the dataframe interchange protocol.

+-------------------------+---------------------------------------------+--------------------------------------------+
| Method name | Interoperable with Snowpark pandas? (Y/N) | Notes for current implementation |
+-------------------------+---------------------------------------------+--------------------------------------------+
| ``scatter`` | Y | |
+-------------------------+---------------------------------------------+--------------------------------------------+
| ``line`` | Y | |
+-------------------------+---------------------------------------------+--------------------------------------------+
| ``area`` | Y | |
+-------------------------+---------------------------------------------+--------------------------------------------+
| ``timeline`` | Y | |
+-------------------------+---------------------------------------------+--------------------------------------------+
| ``violin`` | Y | |
+-------------------------+---------------------------------------------+--------------------------------------------+
| ``bar`` | Y | |
+-------------------------+---------------------------------------------+--------------------------------------------+
| ``histogram`` | Y | |
+-------------------------+---------------------------------------------+--------------------------------------------+
| ``pie`` | Y | |
+-------------------------+---------------------------------------------+--------------------------------------------+
| ``treemap`` | Y | |
+-------------------------+---------------------------------------------+--------------------------------------------+
| ``sunburst`` | Y | |
+-------------------------+---------------------------------------------+--------------------------------------------+
| ``icicle`` | Y | |
+-------------------------+---------------------------------------------+--------------------------------------------+
| ``scatter_matrix`` | Y | |
+-------------------------+---------------------------------------------+--------------------------------------------+
| ``funnel`` | Y | |
+-------------------------+---------------------------------------------+--------------------------------------------+
| ``density_heatmap`` | Y | |
+-------------------------+---------------------------------------------+--------------------------------------------+
| ``boxplot`` | Y | |
+-------------------------+---------------------------------------------+--------------------------------------------+
| ``imshow`` | Y | |
+-------------------------+---------------------------------------------+--------------------------------------------+


scikit-learn
============

We break down scikit-learn interoperability by categories of scikit-learn
operations.

For each category, we provide a table of interoperability with the following
structure: The first column describes a scikit-learn operation that may include
multiple method calls. The second column is a flag for whether or not
interoperability is guaranteed with Snowpark pandas. For each of these methods,
we validate that passing in Snowpark pandas objects behaves equivalently to
passing in pandas objects.

.. note::
``Y`` stands for yes, i.e., interoperability is guaranteed with this method, and ``N`` stands for no.

.. note::
While some scikit-learn methods accept Snowpark pandas inputs, their
performance with Snowpark pandas inputs is often much worse than their
performance with native pandas inputs. Generally we recommend converting
Snowpark pandas inputs to pandas with ``to_pandas()`` before passing them
to scikit-learn.


Classification
--------------

+--------------------------------------------+---------------------------------------------+---------------------------------+
| Operation | Interoperable with Snowpark pandas? (Y/N) | Notes for current implementation|
+--------------------------------------------+---------------------------------------------+---------------------------------+
| Fitting a ``LinearDiscriminantAnalysis`` | Y | |
| classifier with the ``fit()`` method and | | |
| classifying data with the ``predict()`` | | |
| method. | | |
+--------------------------------------------+---------------------------------------------+---------------------------------+


Regression
----------

+--------------------------------------------+---------------------------------------------+---------------------------------+
| Operation | Interoperable with Snowpark pandas? (Y/N) | Notes for current implementation|
+--------------------------------------------+---------------------------------------------+---------------------------------+
| Fitting a ``LogisticRegression`` model | Y | |
| with the ``fit()`` method and predicting | | |
| results with the ``predict()`` method. | | |
+--------------------------------------------+---------------------------------------------+---------------------------------+

Clustering
----------

+--------------------------------------------+---------------------------------------------+---------------------------------+
| Clustering method | Interoperable with Snowpark pandas? (Y/N) | Notes for current implementation|
+--------------------------------------------+---------------------------------------------+---------------------------------+
| ``KMeans.fit()`` | Y | |
+--------------------------------------------+---------------------------------------------+---------------------------------+


Dimensionality reduction
------------------------

+--------------------------------------------+---------------------------------------------+---------------------------------+
| Operation | Interoperable with Snowpark pandas? (Y/N) | Notes for current implementation|
+--------------------------------------------+---------------------------------------------+---------------------------------+
| Getting the principal components of a | Y | |
| numerical dataset with ``PCA.fit()``. | | |
+--------------------------------------------+---------------------------------------------+---------------------------------+


Model selection
------------------------

+--------------------------------------------+---------------------------------------------+-----------------------------------------------+
| Operation | Interoperable with Snowpark pandas? (Y/N) | Notes for current implementation |
+--------------------------------------------+---------------------------------------------+-----------------------------------------------+
| Choosing parameters for a | Y | ``RandomizedSearchCV`` causes Snowpark pandas |
| ``LogisticRegression`` model with | | to issue many queries. We strongly recommend |
| ``RandomizedSearchCV.fit()``. | | converting Snowpark pandas inputs to pandas |
| | | before using ``RandomizedSearchCV`` |
+--------------------------------------------+---------------------------------------------+-----------------------------------------------+

Preprocessing
-------------

+--------------------------------------------+---------------------------------------------+-----------------------------------------------+
| Operation | Interoperable with Snowpark pandas? (Y/N) | Notes for current implementation |
+--------------------------------------------+---------------------------------------------+-----------------------------------------------+
| Scaling training data with | Y | |
| ``MaxAbsScaler.fit_transform()``. | | |
+--------------------------------------------+---------------------------------------------+-----------------------------------------------+
Loading

0 comments on commit 52d8864

Please sign in to comment.