Fixed duplicate neighbouring names and ids #35

pushpak-gote-mapup · 2024-08-21T10:55:44Z

User description

Modified script to remove duplicate neighbouring names and IDs

PR Type

enhancement

Description

Enhanced the group_and_aggregate function by adding logic to drop duplicate rows based on specific columns before performing group and aggregate operations.
Introduced a list drop_duplicates_cols to specify columns for duplicate removal, ensuring unique entries before aggregation.
Commented out a line for renaming a column in the state_road dataset, indicating a potential future change or consideration.

Changes walkthrough 📝

Relevant files

Enhancement

get_neighbouring_roads.py `Enhance grouping by removing duplicate rows and update column handling` mile-point-approach/get_neighbouring_roads.py Added logic to drop duplicate rows based on specific columns before grouping. Introduced a list `drop_duplicates_cols` to specify columns for duplicate removal. Commented out a line for renaming a column in the `state_road` dataset.	+9/-2

💡 PR-Agent usage:
Comment /help on the PR to get a list of all available PR-Agent tools and their descriptions

github-actions · 2024-08-21T10:56:43Z

PR Reviewer Guide 🔍

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Key issues to review Code Redundancy The new code line at 210 seems to be a redundant addition, as it simply re-adds a previously commented-out argument description without any changes. Commented Code The commented code at lines 291-292 might lead to confusion or be accidentally uncommented without proper review. It's generally a good practice to remove or clearly justify commented-out code.

github-actions · 2024-08-21T10:56:45Z

PR Code Suggestions ✨

Category	Suggestion	Score
Possible issue	Handle potential NaN values in data aggregation to avoid 'nan' in output strings The aggregation functions convert values to strings and join them, which might not handle `NaN` values gracefully. Consider using `dropna()` before the `join()` to ensure no 'nan' strings are included in the output. mile-point-approach/get_neighbouring_roads.py [223-224] -'created_unique_id_1_right': lambda x: ', '.join(x.astype(str)), -'RD_NAME_right': lambda x: ', '.join(x.astype(str)), +'created_unique_id_1_right': lambda x: ', '.join(x.dropna().astype(str)), +'RD_NAME_right': lambda x: ', '.join(x.dropna().astype(str)), Suggestion importance[1-10]: 9 Why: The suggestion effectively handles NaN values during aggregation, preventing 'nan' strings in the output, which is crucial for data integrity.	9
Possible bug	Ensure the column exists before renaming to avoid errors The renaming of columns should be conditional based on the presence of the column to avoid potential errors if the column does not exist. Use `if 'NAME' in` `state_road.columns:` before renaming. mile-point-approach/get_neighbouring_roads.py [291-292] -#Change 'NAME' to column name that contains road names from state_road dataset -# state_road.rename(columns={'NAME':'RD_NAME'}, inplace=True) +if 'NAME' in state_road.columns: + state_road.rename(columns={'NAME':'RD_NAME'}, inplace=True) Suggestion importance[1-10]: 8 Why: This suggestion addresses a potential bug by ensuring that the column exists before attempting to rename it, which prevents runtime errors.	8
Maintainability	Separate the concerns of dropping duplicates and grouping to simplify the code The list `drop_duplicates_cols` includes columns for dropping duplicates and then grouping by some of them. However, the grouping columns are repeated in the list, which is redundant. Simplify the list by separating the concerns of dropping duplicates and grouping. mile-point-approach/get_neighbouring_roads.py [221-225] drop_duplicates_cols=['geometry', 'created_unique_id_1_left', 'bridge_id_left','created_unique_id_1_right','RD_NAME_right'] -grouped = df[drop_duplicates_cols].drop_duplicates().groupby(['geometry', 'created_unique_id_1_left', 'bridge_id_left']).agg({ +grouping_cols=['geometry', 'created_unique_id_1_left', 'bridge_id_left'] +grouped = df.drop_duplicates(subset=drop_duplicates_cols).groupby(grouping_cols).agg({ 'created_unique_id_1_right': lambda x: ', '.join(x.astype(str)), 'RD_NAME_right': lambda x: ', '.join(x.astype(str)), }).reset_index() Suggestion importance[1-10]: 7 Why: The suggestion improves code maintainability by separating the logic for dropping duplicates and grouping, making the code more readable and easier to modify.	7
Performance	Optimize the `groupby` operation by avoiding an unnecessary `reset_index()` The `groupby` operation is followed by a `reset_index()` which could be optimized by using `as_index=False` in the `groupby` method to avoid the extra step. mile-point-approach/get_neighbouring_roads.py [222-225] -grouped = df[drop_duplicates_cols].drop_duplicates().groupby(['geometry', 'created_unique_id_1_left', 'bridge_id_left']).agg({ +grouped = df[drop_duplicates_cols].drop_duplicates().groupby(['geometry', 'created_unique_id_1_left', 'bridge_id_left'], as_index=False).agg({ 'created_unique_id_1_right': lambda x: ', '.join(x.astype(str)), 'RD_NAME_right': lambda x: ', '.join(x.astype(str)), -}).reset_index() +}) Suggestion importance[1-10]: 6 Why: The suggestion offers a minor performance improvement by eliminating an unnecessary step, making the code slightly more efficient.	6

pushpak-gote-mapup · 2024-08-21T11:06:57Z

Did changes as per necessary bot review:
Commented Code
The commented code at lines 291-292 might lead to confusion or be accidentally uncommented without proper review. It's generally a good practice to remove or clearly justify commented-out code.
Solved
Changed comment

Fixed duplicate neighbouring names and ids

db52d6b

pushpak-gote-mapup requested a review from Abeer-Mapup August 21, 2024 10:55

github-actions bot added enhancement New feature or request Review effort [1-5]: 2 labels Aug 21, 2024

Uncommented line

06ab1dd

Changed comment

cdbdeb6

Abeer-Mapup approved these changes Aug 21, 2024

View reviewed changes

pushpak-gote-mapup merged commit 3329d6c into main Aug 21, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed duplicate neighbouring names and ids #35

Fixed duplicate neighbouring names and ids #35

pushpak-gote-mapup commented Aug 21, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Aug 21, 2024

github-actions bot commented Aug 21, 2024 •

edited

Loading

pushpak-gote-mapup commented Aug 21, 2024 •

edited

Loading

Fixed duplicate neighbouring names and ids #35

Fixed duplicate neighbouring names and ids #35

Conversation

pushpak-gote-mapup commented Aug 21, 2024 • edited by github-actions bot Loading

User description

PR Type

Description

Changes walkthrough 📝

github-actions bot commented Aug 21, 2024

PR Reviewer Guide 🔍

github-actions bot commented Aug 21, 2024 • edited Loading

PR Code Suggestions ✨

pushpak-gote-mapup commented Aug 21, 2024 • edited Loading

pushpak-gote-mapup commented Aug 21, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Aug 21, 2024 •

edited

Loading

pushpak-gote-mapup commented Aug 21, 2024 •

edited

Loading