Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add command to delete persons with no distinct id's #25657

Merged
merged 5 commits into from
Oct 18, 2024

Conversation

oliverb123
Copy link
Contributor

No description provided.

@PostHog PostHog deleted a comment from posthog-bot Oct 17, 2024
@bretthoerner
Copy link
Contributor

bretthoerner commented Oct 17, 2024

Iterating over them does seem super slow, you can toss raw SQL into Django like so:

from django.db import connection

def delete_persons_without_distinct_ids_raw_sql(team_id):
    with connection.cursor() as cursor:
        cursor.execute("""
            WITH persons_to_delete AS (
                SELECT p.id
                FROM posthog_person p
                LEFT JOIN posthog_persondistinctid pd ON p.id = pd.person_id AND p.team_id = pd.team_id
                WHERE p.team_id = %s AND pd.id IS NULL
            )
            DELETE FROM posthog_person
            WHERE id IN (SELECT id FROM persons_to_delete)
            RETURNING id;
        """, [team_id])
        
        deleted_ids = cursor.fetchall()
        deleted_count = len(deleted_ids)

    print(f"Deleted {deleted_count} Person objects with no PersonDistinctIds for team {team_id}.")
    return deleted_count

You could still have a dry run that just does SELECT COUNT(*) FROM persons_to_delete

I tried a "dry run" and got 717680 for that one team so it seems correct? A test never hurts, though.

The only catch is bulk deleting like this won't run Django signals, I'm not sure if we have any signals hooked up to Person? edit: I checked and don't see any outside of tests.

@oliverb123
Copy link
Contributor Author

@bretthoerner I went for iterating to avoid taking a lock on the persons table, but if you reckon that's not a concern I'll go for this

@bretthoerner
Copy link
Contributor

@bretthoerner I went for iterating to avoid taking a lock on the persons table, but if you reckon that's not a concern I'll go for this

The select ran fast enough for me, and this seems like something we'll run very rarely. 🤷‍♂️

@oliverb123
Copy link
Contributor Author

Sounds good, will go with that so

@oliverb123 oliverb123 enabled auto-merge (squash) October 17, 2024 16:08
Copy link
Collaborator

@mariusandra mariusandra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@oliverb123 oliverb123 merged commit d18b83c into master Oct 18, 2024
87 checks passed
@oliverb123 oliverb123 deleted the olly_distinctid_less_persons_delete branch October 18, 2024 07:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants