Skip to content

filter values in an array based on whether they are in a column of another table #10380

Answered by gforsyth
dschneiderch asked this question in Q&A
Discussion options

You must be logged in to vote

You're right that contains is backwards for what you're trying to do -- however, isin isn't an array-level method, it works on the column as a whole, so I don't think it's going to do what you want.

And I realized that as_scalar really does require the result to only be a single value. So I think perhaps you'll need an intermediate materialization if a join isn't practical.

It's a bit verbose, but you can evaluate the tax.segment_id as pyarrow and then make it a list, and that can be passed inline to intersect -- the general condition here is filter where arr1.intersect(arr2).length() > 0

[ins] In [53]: tax
Out[53]: 
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ segment_idsegment_name ┃
┡━━━━━━━━━━━…

Replies: 1 comment 4 replies

Comment options

You must be logged in to vote
4 replies
@gforsyth
Comment options

@dschneiderch
Comment options

@gforsyth
Comment options

Answer selected by dschneiderch
@dschneiderch
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants