-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modified scripts for better fuzzy matching #34
Conversation
…s and state neighboring roads
PR Reviewer Guide 🔍
|
PR Code Suggestions ✨
|
Addressing PR agent's reviews. |
Error Handling def read_exploded_osm_data_csv(exploded_osm_data_csv: str, osm_cols_for_road_names: List[str]) -> Tuple[pd.DataFrame, List[str]]:
"""
Reads the 'exploded_osm_data' CSV file and returns a DataFrame with the required columns.
Parameters:
exploded_osm_data_csv (str): The path to the CSV file.
osm_cols_for_road_names (list): The list of column names to read from the CSV file.
Returns:
pandas.DataFrame: The DataFrame containing the required columns.
"""
series_list=[]
available_osm_road_names=[]
for col in osm_cols_for_road_names:
try:
series=pd.read_csv(exploded_osm_data_csv,usecols=[col])
series_list.append(series)
available_osm_road_names.append(col)
except ValueError as e:
logger.warning(f"Column {col} not found in CSV: {e}")
raise
except pd.errors.EmptyDataError as e:
logger.error(f"Empty data error when reading CSV: {e}", exc_info=True)
raise
except Exception as e:
logger.error(f"Unexpected error when reading CSV: {e}", exc_info=True)
raise
try:
exploded_osm_data_df=pd.concat(series_list,axis=1)
return exploded_osm_data_df, available_osm_road_names
except ValueError as e:
logger.error(f"ValueError when concatenating series: {e}", exc_info=True)
raise
except Exception as e:
logger.error(f"Unexpected error when concatenating series: {e}", exc_info=True)
raise
Performance Concern def vectorized_fuzz(s1: pd.Series, s2: pd.Series) -> pd.Series:
try:
mask = s1.notna() & s2.notna()
result = pd.Series(0, index=s1.index)
if mask.any():
result[mask] = np.vectorize(fuzz.token_sort_ratio, otypes=[np.int64])(
s1[mask].astype(str), s2[mask].astype(str)
)
return result
except ValueError as e:
logger.error(f"ValueError in vectorized_fuzz: {str(e)}", exc_info=True)
raise
except TypeError as e:
logger.error(f"TypeError in vectorized_fuzz: {str(e)}", exc_info=True)
raise
except Exception as e:
logger.error(f"Unexpected error in vectorized_fuzz: {str(e)}", exc_info=True)
raise Exception Handling def read_geopackage_to_dataframe(filepath: str) -> gpd.GeoDataFrame:
"""
Read a GeoPackage file into a GeoDataFrame.
Args:
filepath (str): Path to the GeoPackage file.
Returns:
gpd.GeoDataFrame: The read GeoDataFrame.
"""
try:
return gpd.read_file(filepath)
except FileNotFoundError as e:
logger.error(f"FileNotFoundError: GeoPackage file not found at {filepath}: {str(e)}", exc_info=True)
raise
except PermissionError as e:
logger.error(f"PermissionError: Unable to access GeoPackage file at {filepath}: {str(e)}", exc_info=True)
raise
except gpd.io.file.DriverError as e:
logger.error(f"DriverError: Unable to read GeoPackage file at {filepath}: {str(e)}", exc_info=True)
raise
except Exception as e:
logger.error(f"Unexpected error while reading GeoPackage file at {filepath}: {str(e)}", exc_info=True)
raise In addition to this, PR code suggestions are also implemented. |
hydrography-approach/processing_scripts/bridge_statistics/create_bridge_stats.py
Outdated
Show resolved
Hide resolved
hydrography-approach/processing_scripts/bridge_statistics/create_bridge_stats.py
Show resolved
Hide resolved
hydrography-approach/processing_scripts/bridge_statistics/create_bridge_stats.py
Show resolved
Hide resolved
…tive code redundancy.
User description
Added scripts for matching fuzzy similarity scores from new OSM columns and state neighboring roads.
Added feature: Create bridge statistics
PR Type
enhancement, bug fix
Description
Changes walkthrough 📝
calculate_match_percentage.py
Enhanced similarity calculation and CSV reading functions
hydrography-approach/processing_scripts/associate_data/calculate_match_percentage.py
calculate_similarity
to compute similarity scores.read_exploded_osm_data_csv
to read specific columns from alarge CSV file.
run
function to include new similarity calculations and datamerging.
get_merged_association_output.py
Refactored and enhanced similarity calculations and data merging
merge-approaches/get_merged_association_output.py
calculate_similarity_for_neighbouring_roads
to handleneighbouring roads similarity.
main
function to integrate new similarity calculations anddata merging.