Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Aggregation_FIRST and _LAST options, and used interface to support strings #218

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

Amnesiac9
Copy link

@Amnesiac9 Amnesiac9 commented Aug 7, 2023

_FIRST will include the first found value in a series for that column, this allows you to keep data from columns that may be mismatched but you'd like to include the first value found after grouping by another column.

Trying to provide the equivalent of this from pandas:

grouped_df = orders_df.groupby(['customer_id', 'ShortSKU']).agg({
        'Quantity': 'sum',
        'Name': 'first',
        'Address': 'first',
        'Address2': 'first',
        'City': 'first',
        'State': 'first',
        'Zip': 'first',
        'Country': 'first',
        'Phone': 'first',
        'Email': 'first',
        'Club Enrollment': 'first',
        'Account Type': 'first',
        'Order Count': 'count',
    }).reset_index(drop=True)

In Go:

group := *df.GroupBy("CustomerId", "ShortSku")
	if group.Err != nil {
		return nil, group.Err
	}

	agg_df := group.Aggregation([]dataframe.AggregationType{
		dataframe.Aggregation_SUM,
		dataframe.Aggregation_FIRST,
		dataframe.Aggregation_FIRST,
		dataframe.Aggregation_FIRST,
		dataframe.Aggregation_FIRST,
		dataframe.Aggregation_FIRST,
		dataframe.Aggregation_FIRST,
		dataframe.Aggregation_FIRST,
		dataframe.Aggregation_FIRST,
		dataframe.Aggregation_FIRST,
		dataframe.Aggregation_FIRST,
		dataframe.Aggregation_FIRST,
		dataframe.Aggregation_SUM,
		dataframe.Aggregation_COUNT},
		[]string{
			"Quantity",
			"CustomerName",
			"Address1",
			"Address2",
			"City",
			"State",
			"Zip",
			"Country",
			"Phone",
			"Email",
			"ClubEnrollment",
			"AccountType",
			"Spend",
			"OrderCount"})
	if agg_df.Err != nil {
		return nil, agg_df.Err
	}

Tests need to be updated, but someone more similar with the code base might see issues with this change. Let me know.

Also, I could not find where the name of the Aggregation type is stored, but this doesn't add "_COUNT" to the original column name like the rest do. Instead, it adds the full type including the number value of the aggregation type.

@vyassamir11
Copy link

@Amnesiac9 can you please also add _LAST aggregation? Thanks

@vyassamir11
Copy link

We could argue that _LAST is equivalent of sorting the dataframe in reverse order and applying _FIRST, but it would make sense to add it as a separate option for the completion.

@Amnesiac9
Copy link
Author

Done. Still need to refactor the aggregation testing now to test both of these.

@Amnesiac9 Amnesiac9 changed the title Add _FIRST option for Aggregation and support strings Add Aggregation_FIRST and _LAST options, and used interface to support strings Aug 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants