Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(bq|random): ST_GENERATEPOINTS returning exact name of points #486

Merged

Conversation

vdelacruzb
Copy link
Contributor

@vdelacruzb vdelacruzb commented Mar 13, 2024

Description

Shortcut

This ticket makes ST_GENERATEPOINTS return the exact number of points requested. As mentioned in the doc there was the small chance of returning less points but in some cases the points returned are way less than expected. I've gone for an approach equal to the one already implemented in Snowflake.

I saw that in Snowflake it worked much faster than the previous approach but it has to be with keeping the result as ARRAY(STRING) in this case it was the same. But UNNESTING + parsing to GEOG + ARRAY_AGG make this approach slower than the original approach. So going for this approach or not depends on what we want to prioritize.

I removed the warning in the 3 docs because in other providers we return the exact name of points.

Type of change

  • Fix

Acceptance

with a as (SELECT `carto-un`.carto.ST_GENERATEPOINTS(ST_GEOGFROM('{"type":"Polygon","coordinates":[[[-81.83228458193764,41.64497651445618],[-73.83438791064249,45.21751495507643],[-70.9321626546763,45.07761121864185],[-68.39615548876748,47.7299818321738],[-63.739164722242194,41.38848032241172],[-32.653971565046206,56.517498752322986],[-56.58596382377746,60.899302859749085],[-73.81312713859674,77.06548201227298],[-150.572477438432,77.03299512273315],[-140.85567102668878,70.45234214268247],[-140.15529260872003,48.81513877082211],[-93.76095161674623,49.282176839280744],[-83.29647407768371,46.472790002158526],[-81.83228458193764,41.64497651445618]]]}'), 500000) e
)
select ARRAY_LENGTH(e) from a;
-- time 5 sec, 344354 points

with a as (SELECT `cartodb-data-engineering-team`.vdelacruz_carto.ST_GENERATEPOINTS(ST_GEOGFROM('{"type":"Polygon","coordinates":[[[-81.83228458193764,41.64497651445618],[-73.83438791064249,45.21751495507643],[-70.9321626546763,45.07761121864185],[-68.39615548876748,47.7299818321738],[-63.739164722242194,41.38848032241172],[-32.653971565046206,56.517498752322986],[-56.58596382377746,60.899302859749085],[-73.81312713859674,77.06548201227298],[-150.572477438432,77.03299512273315],[-140.85567102668878,70.45234214268247],[-140.15529260872003,48.81513877082211],[-93.76095161674623,49.282176839280744],[-83.29647407768371,46.472790002158526],[-81.83228458193764,41.64497651445618]]]}'), 500000) e
)
select ARRAY_LENGTH(e) from a;
-- time 29 sec, 500000 points

Copy link

@vdelacruzb vdelacruzb requested a review from Jesus89 March 13, 2024 19:40
Copy link

@vdelacruzb vdelacruzb merged commit e521cb2 into main Mar 14, 2024
17 checks passed
@vdelacruzb vdelacruzb deleted the bug/sc-379986/internal-st-generate-points-generating-generate branch March 14, 2024 10:55
@vdelacruzb vdelacruzb mentioned this pull request Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants