Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed duplicate neighbouring names and ids #35

Merged
merged 3 commits into from
Aug 21, 2024

Conversation

pushpak-gote-mapup
Copy link
Contributor

@pushpak-gote-mapup pushpak-gote-mapup commented Aug 21, 2024

User description

Modified script to remove duplicate neighbouring names and IDs


PR Type

enhancement


Description

  • Enhanced the group_and_aggregate function by adding logic to drop duplicate rows based on specific columns before performing group and aggregate operations.
  • Introduced a list drop_duplicates_cols to specify columns for duplicate removal, ensuring unique entries before aggregation.
  • Commented out a line for renaming a column in the state_road dataset, indicating a potential future change or consideration.

Changes walkthrough 📝

Relevant files
Enhancement
get_neighbouring_roads.py
Enhance grouping by removing duplicate rows and update column handling

mile-point-approach/get_neighbouring_roads.py

  • Added logic to drop duplicate rows based on specific columns before
    grouping.
  • Introduced a list drop_duplicates_cols to specify columns for
    duplicate removal.
  • Commented out a line for renaming a column in the state_road dataset.
  • +9/-2     

    💡 PR-Agent usage:
    Comment /help on the PR to get a list of all available PR-Agent tools and their descriptions

    Copy link

    PR Reviewer Guide 🔍

    ⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Key issues to review

    Code Redundancy
    The new code line at 210 seems to be a redundant addition, as it simply re-adds a previously commented-out argument description without any changes.

    Commented Code
    The commented code at lines 291-292 might lead to confusion or be accidentally uncommented without proper review. It's generally a good practice to remove or clearly justify commented-out code.

    Copy link

    github-actions bot commented Aug 21, 2024

    PR Code Suggestions ✨

    CategorySuggestion                                                                                                                                    Score
    Possible issue
    Handle potential NaN values in data aggregation to avoid 'nan' in output strings

    The aggregation functions convert values to strings and join them, which might not
    handle NaN values gracefully. Consider using dropna() before the join() to ensure no
    'nan' strings are included in the output.

    mile-point-approach/get_neighbouring_roads.py [223-224]

    -'created_unique_id_1_right': lambda x: ', '.join(x.astype(str)),
    -'RD_NAME_right': lambda x: ', '.join(x.astype(str)),
    +'created_unique_id_1_right': lambda x: ', '.join(x.dropna().astype(str)),
    +'RD_NAME_right': lambda x: ', '.join(x.dropna().astype(str)),
     
    Suggestion importance[1-10]: 9

    Why: The suggestion effectively handles NaN values during aggregation, preventing 'nan' strings in the output, which is crucial for data integrity.

    9
    Possible bug
    Ensure the column exists before renaming to avoid errors

    The renaming of columns should be conditional based on the presence of the column to
    avoid potential errors if the column does not exist. Use if 'NAME' in
    state_road.columns: before renaming.

    mile-point-approach/get_neighbouring_roads.py [291-292]

    -#Change 'NAME' to column name that contains road names from state_road dataset
    -# state_road.rename(columns={'NAME':'RD_NAME'}, inplace=True)
    +if 'NAME' in state_road.columns:
    +    state_road.rename(columns={'NAME':'RD_NAME'}, inplace=True)
     
    Suggestion importance[1-10]: 8

    Why: This suggestion addresses a potential bug by ensuring that the column exists before attempting to rename it, which prevents runtime errors.

    8
    Maintainability
    Separate the concerns of dropping duplicates and grouping to simplify the code

    The list drop_duplicates_cols includes columns for dropping duplicates and then
    grouping by some of them. However, the grouping columns are repeated in the list,
    which is redundant. Simplify the list by separating the concerns of dropping
    duplicates and grouping.

    mile-point-approach/get_neighbouring_roads.py [221-225]

     drop_duplicates_cols=['geometry', 'created_unique_id_1_left', 'bridge_id_left','created_unique_id_1_right','RD_NAME_right']
    -grouped = df[drop_duplicates_cols].drop_duplicates().groupby(['geometry', 'created_unique_id_1_left', 'bridge_id_left']).agg({
    +grouping_cols=['geometry', 'created_unique_id_1_left', 'bridge_id_left']
    +grouped = df.drop_duplicates(subset=drop_duplicates_cols).groupby(grouping_cols).agg({
         'created_unique_id_1_right': lambda x: ', '.join(x.astype(str)),
         'RD_NAME_right': lambda x: ', '.join(x.astype(str)),
     }).reset_index()
     
    Suggestion importance[1-10]: 7

    Why: The suggestion improves code maintainability by separating the logic for dropping duplicates and grouping, making the code more readable and easier to modify.

    7
    Performance
    Optimize the groupby operation by avoiding an unnecessary reset_index()

    The groupby operation is followed by a reset_index() which could be optimized by
    using as_index=False in the groupby method to avoid the extra step.

    mile-point-approach/get_neighbouring_roads.py [222-225]

    -grouped = df[drop_duplicates_cols].drop_duplicates().groupby(['geometry', 'created_unique_id_1_left', 'bridge_id_left']).agg({
    +grouped = df[drop_duplicates_cols].drop_duplicates().groupby(['geometry', 'created_unique_id_1_left', 'bridge_id_left'], as_index=False).agg({
         'created_unique_id_1_right': lambda x: ', '.join(x.astype(str)),
         'RD_NAME_right': lambda x: ', '.join(x.astype(str)),
    -}).reset_index()
    +})
     
    Suggestion importance[1-10]: 6

    Why: The suggestion offers a minor performance improvement by eliminating an unnecessary step, making the code slightly more efficient.

    6

    @pushpak-gote-mapup
    Copy link
    Contributor Author

    pushpak-gote-mapup commented Aug 21, 2024

    Did changes as per necessary bot review:
    Commented Code
    The commented code at lines 291-292 might lead to confusion or be accidentally uncommented without proper review. It's generally a good practice to remove or clearly justify commented-out code.
    Solved
    Changed comment

    @pushpak-gote-mapup pushpak-gote-mapup merged commit 3329d6c into main Aug 21, 2024
    2 checks passed
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    2 participants