Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: flatten the query dataframe for contracts #2196

Closed

Conversation

johnson2427
Copy link
Contributor

What I did

Adding the ability to flatten event_arguments in queries for contracts.

fixes: #

How I did it

Added a check for the event_arguments field

How to verify it

use the uniswap-sdk feat/add-v2 branch

Checklist

  • All changes are completed
  • New test cases have been added
  • Documentation has been updated

@@ -676,6 +676,8 @@ def query(
columns_ls = validate_and_expand_columns(columns, ContractLog)
data = map(partial(extract_fields, columns=columns_ls), contract_events)
df = pd.DataFrame(columns=columns_ls, data=data)
# Note: The below check is to check for `event_arguments` in the request.
# If `event_arguments` exists in the request, we flatten the field.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So originally my idea was that "*" or specific event arg names mentioned in the columns would go grab those fields from the logs specifically, and really only if event_arguments was present in the columns would you keep it, otherwise it should be flattened in the resulting dataframe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I went through all of Ape, trying to find a good place to make this manipulation, and it looked pretty painful. I tried to do make this happen earlier, but there is no information available for what those dict keys are going to be in the event_arguments field. I can't know until we make the query

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the comment not correct then? I would think unless event_arguments is in the request, we should flatten that field

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was flattening it either way. If you want:

columns="name,event_arguments"

I planned to flatten event_arguments. We don't have to though, wasn't sure if we wanted to always flatten it or not

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good to think through the scenario where one of the event arguments might be called name or something else that ContractLog has

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good to think through the scenario where one of the event arguments might be called name or

contract_address is the one that usually bites me... I always mix up if it the one on ContractLog or a custom event input

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good to think through the scenario where one of the event arguments might be called name or something else that ContractLog has

This is something I was contemplating here. If that were to happen, pandas will automatically rename both columns to name_x, and name_y.

Do we want to keep event_arguments in the columns names? In the case of the uniswap v2 contract, we'd get event_arguments.token0, event_arguements.token1, event_arguments.pair, event_arguments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to keep event_arguments in the columns names? In the case of the uniswap v2 contract, we'd get event_arguments.token0, event_arguements.token1, event_arguments.pair, event_arguments.

I don't think so, it seems much more natural to me to work with it like this:

df = factory.PoolCreated.query("pair", ...)
pools = pool(p for p in df["pair"])
df = df["pair"] | pd.Dataframe(dict(token0=p.token0_balance, token1=p.token1_balance) for p in df["pair"])
df

|  pair | token0 | token1 |
| ----- | ------ | ------ |
| 0x... | 123... | 456... |
...

Copy link
Contributor Author

@johnson2427 johnson2427 Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to keep event_arguments in the columns names? In the case of the uniswap v2 contract, we'd get event_arguments.token0, event_arguements.token1, event_arguments.pair, event_arguments.

I don't think so, it seems much more natural to me to work with it like this:

df = factory.PoolCreated.query("pair", ...)
pools = pool(p for p in df["pair"])
df = df["pair"] | pd.Dataframe(dict(token0=p.token0_balance, token1=p.token1_balance) for p in df["pair"])
df

|  pair | token0 | token1 |
| ----- | ------ | ------ |
| 0x... | 123... | 456... |
...

I don't disagree, for this contract (UniswapV2), we do have an empty field. I think it's an ID? Maybe we just drop the empty field? Idk if we want that or not.

If we want the ability to query for sub-fields, it's going to take quite a bit of work. In BaseContractLog we have the event_arguments field as a dict. Since that field can be anything depending on the contract, I believe we'd need a model_validator in that model where we can pull apart event_arguments and set the keys of that value to keys of the model itself.

@model_validator(mode="before")
@classmethod
def validate_event_args(cls, v, values):
    if (event_args := values.get("event_arguments")):
        for k, v in event_args.items():
            # Note: we can do a check on the values.keys() to make sure we aren't duplicating values
            values[k] = v
    return values

I'm not 100% sure this will work, I've never created fields in a model on the fly before. But if it does, we'd have to ensure the downstream functionality remains intact. As long as we don't remove event_arguments we should be okay. Maybe we add something to the field names we create so we know those are generated fields? Not sure

Copy link
Member

@antazoey antazoey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed test_query isn't testing much from ContractLog at all!

@@ -676,6 +676,8 @@ def query(
columns_ls = validate_and_expand_columns(columns, ContractLog)
data = map(partial(extract_fields, columns=columns_ls), contract_events)
df = pd.DataFrame(columns=columns_ls, data=data)
# Note: The below check is to check for `event_arguments` in the request.
# If `event_arguments` exists in the request, we flatten the field.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good to think through the scenario where one of the event arguments might be called name or

contract_address is the one that usually bites me... I always mix up if it the one on ContractLog or a custom event input

@johnson2427
Copy link
Contributor Author

I just noticed test_query isn't testing much from ContractLog at all!

As part of the acceptance criteria of this PR, I can add some testing for ContractLog if you'd like?

Copy link
Member

@antazoey antazoey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this cause issues with clashing of fields?
For example contract_address is a common Event input argument, and contract_address is also a regular field on the logs.

event_arguments keeps those separate and avoids name-clashing.

@johnson2427
Copy link
Contributor Author

Does this cause issues with clashing of fields? For example contract_address is a common Event input argument, and contract_address is also a regular field on the logs.

event_arguments keeps those separate and avoids name-clashing.

probably, I have to do this in a smarter way I don't like the way I implemented this

Copy link

github-actions bot commented Oct 5, 2024

This pull request is considered stale because it has been open 30 days with no activity. Remove stale label, add a comment, or make a new commit, otherwise this PR will be closed in 5 days.

@github-actions github-actions bot added the stale No activity for 30 days label Oct 5, 2024
@johnson2427
Copy link
Contributor Author

@fubuloubu this has gone stale, do we want to close this for now and come back to it? This work was to help with the Uniswap work

@fubuloubu fubuloubu closed this Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale No activity for 30 days
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants