Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ramp with direct category property is broken #779

Closed
davidmanzanares opened this issue Jul 30, 2018 · 7 comments
Closed

ramp with direct category property is broken #779

davidmanzanares opened this issue Jul 30, 2018 · 7 comments
Assignees

Comments

@davidmanzanares
Copy link

@elenatorro and I discovered that expressions like ramp($category, prism) are broken since ramp is assuming that $category internal IDs are between 0 and numCategories and that is only true when there is only one category property on the dataset.

Usage of ramp with buckets or top don't show this problem.

@davidmanzanares
Copy link
Author

Hi @makella , I don't want to work in "workaround guided" development, but this would be easier to solve if we forced a top operation in those expressions, which is a previously discussed idea.

To summarize for other people, the idea is to change (the behavior of) expressions like ramp($category, prism) into ramp(top($category, 16), prism).

I was reluctant to this change, but, knowing that this has been broken for so long and that nobody actually noticed it (we saw this problem by looking into the code for unrelated reasons) makes me think that we should change it.

What do you think, @makella ?

@makella
Copy link
Contributor

makella commented Jul 30, 2018

@davidmanzanares yes, totally this is something we had discussed in the past and looking back, maybe we never came to a clear resolution:

in a couple of old issues:
#110 (comment)
#129 (comment)

on slack:
https://cartoteam.slack.com/archives/C9391D1J5/p1529513536000682

These were both before we had talked through the buckets logic. Based on the logic we have there, with this case, for ramp($category,prism) would mean:

  • if there are less categories than colors, use the available colors in prism
  • if there are equal categories to colors, use the available colors in prism
  • if there are more categories than colors available in prism, symbolize all categories with an interpolated color scheme

For example, if someone did:

ramp($category,[yellow,magenta]) we would go through all of the steps above to provide them the proper symbolization.

Now, if we think about Builder logic, for the third option above, we would symbolize the top 10 categories and then everything else with the "other" gray color. Since we have top I think that by default, we can stick with the logic we already have built-in.

There is definitely a case to not limit the number of categories that we color as many users actually want to "see" all the categories colored in the data.

If we go this direction, I should definitely start revisiting the work I had started on reordering the category color schemes.

@davidmanzanares
Copy link
Author

There is definitely a case to not limit the number of categories that we color as many users actually want to "see" all the categories colored in the data.

This is going to be a problem because it goes in the opposite direction of the alternative implementation of top (see #781).

Another option would be to leave ramp($cat,...) as is, but changing the implementation to hash the input category ID.

@makella
Copy link
Contributor

makella commented Jul 30, 2018

Ahhh, I just started reading #781 after I had hit comment on this :)

Questions:

  • the limit for top is 1024 categories? we give an error at 200 because that is when we start seeing performance issues?

but changing the implementation to hash the input category ID.

what exactly does this mean?

@davidmanzanares
Copy link
Author

The current limit for top is 1024, we don't show any error there, it is uncontrolled right now, the map will appear, but values and colors will be random. The 1024 limit is on the property side of things, not on the number of buckets: top($propertyWithManyDifferentCategories, 3) is broken right now. This is not a performance issue.

what exactly does this mean?

Basically, that each category name will get a random color from the palette. We won't be able to control it in any way, and collisions between category names could exist.

@davidmanzanares davidmanzanares added the status: discussion Needs more discussion, to start or to be closed / merged label Jul 31, 2018
@davidmanzanares davidmanzanares removed the status: discussion Needs more discussion, to start or to be closed / merged label Aug 8, 2018
@davidmanzanares
Copy link
Author

I think we can go and implement the hash-based solution, and if there is a problem, react to that.

@davidmanzanares
Copy link
Author

Fixed by #851

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants