Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add pandas utility for dataframe #84

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

ancy-augustin
Copy link

@ancy-augustin ancy-augustin commented Nov 11, 2024

What does this Pull Request accomplish?

Adds utility functions to interact with SystemLink DataFrame API using pandas data frames.

Why should this Pull Request be merged?

Simplifies working with pandas data frames and SystemLink DataFrame by providing convenient methods for creating, appending, and querying tables directly from pandas.

What testing has been done?

TODO: Detail what testing has been done to ensure this submission meets requirements.

@santhoshramaraj santhoshramaraj linked an issue Nov 11, 2024 that may be closed by this pull request
@santhoshramaraj santhoshramaraj added the enhancement New feature or request label Nov 11, 2024
index_name: str = None
if index:
index_name = _get_table_index_name(client=client, table_id=table_id)
if query.columns:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we combine these two if statements into a single if.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot combine the if statements because the index_name is required regardless of whether query.columns is None, and checking index_name in query.columns should only occur when both are valid.

@santhoshramaraj
Copy link
Member

It would be nice to have a function create_table_with_data_from_pandas_df to cover the common case where the whole data for the table is available and the user would like to upload it (create, upload, close inclusive) without much effort.

There is of course challenge in batching the rows upload if the data is large.

docs/getting_started.rst Outdated Show resolved Hide resolved
docs/getting_started.rst Outdated Show resolved Hide resolved
docs/getting_started.rst Outdated Show resolved Hide resolved
tests/integration/dataframe/test_pandas_utility.py Outdated Show resolved Hide resolved
tests/integration/dataframe/test_pandas_utility.py Outdated Show resolved Hide resolved
tests/integration/dataframe/test_pandas_utility.py Outdated Show resolved Hide resolved
tests/integration/dataframe/test_pandas_utility.py Outdated Show resolved Hide resolved
tests/integration/dataframe/test_pandas_utility.py Outdated Show resolved Hide resolved
@@ -52,6 +53,31 @@ class DataFrame(JsonModel):
columns: Optional[List[str]] = None
"""The names and order of the columns included in the data frame."""

data: List[List[Optional[str]]]
data: Optional[List[List[Optional[str]]]] = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could making this field optional lead to an error? This will provide a way to call the API without providing data.

client (DataFrameClient): Instance of DataFrameClient.
df (pd.DataFrame): Pandas dataframe.
table_name (str): Name of the table.
nullable_columns (bool): Make the columns nullable. Nullable columns can contain `null` values.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a default value to nullable_columns field?

@santhoshramaraj What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to understand the impact of nullable_columns on performance to choose the default value or require the user to make a conscious decision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Get DataFrame data as Pandas DataFrame
3 participants