Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding logging for Hydrography scripts #31

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

varun-andhra-mapup
Copy link
Contributor

@varun-andhra-mapup varun-andhra-mapup commented Jul 30, 2024

PR Type

enhancement, other


Description

  • Added logger parameter to multiple functions across various scripts to replace print statements with logging.
  • Initialized logging configuration in run-hydrography-pipeline.py.
  • Enhanced logging for better traceability and debugging.

Changes walkthrough 📝

Relevant files
Enhancement
determine_final_osm_id.py
Add logging to association functions                                         

hydrography-approach/processing_scripts/associate_data/determine_final_osm_id.py

  • Added logger parameter to multiple functions.
  • Replaced print statements with logger.info for logging.
  • +12/-11 
    exclude_nearby_bridges.py
    Add logging to exclude nearby bridges                                       

    hydrography-approach/processing_scripts/associate_data/exclude_nearby_bridges.py

  • Added logger parameter to functions.
  • Replaced print statements with logger.info.
  • +6/-7     
    join_all_data.py
    Add logging to join all data script                                           

    hydrography-approach/processing_scripts/associate_data/join_all_data.py

  • Added logger parameter to process_all_join function.
  • Replaced print statements with logger.info.
  • +2/-2     
    filter_osm_ways.py
    Add logging to filter OSM ways                                                     

    hydrography-approach/processing_scripts/filter_data/filter_osm_ways.py

  • Added logger parameter to filter_ways function.
  • Replaced print statements with logger.info.
  • +2/-2     
    process_filter_nbi_bridges.py
    Add logging to filter NBI bridges                                               

    hydrography-approach/processing_scripts/filter_data/process_filter_nbi_bridges.py

  • Added logger parameter to functions.
  • Replaced print statements with logger.info.
  • +4/-4     
    tag_nbi_and_osm_data.py
    Add logging to tag NBI and OSM data                                           

    hydrography-approach/processing_scripts/tag_data/tag_nbi_and_osm_data.py

  • Added logger parameter to multiple functions.
  • Replaced print statements with logger.info and logger.error.
  • +31/-20 
    run-hydrography-pipeline.py
    Add logging to hydrography pipeline script                             

    hydrography-approach/run-hydrography-pipeline.py

  • Initialized logging configuration.
  • Replaced print statements with logging.info.
  • Passed logger to various processing functions.
  • +27/-15 

    💡 PR-Agent usage:
    Comment /help on the PR to get a list of all available PR-Agent tools and their descriptions

    Copy link

    github-actions bot commented Jul 30, 2024

    PR Reviewer Guide 🔍

    (Review updated until commit c221f25)

    ⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Key issues to review

    Error Handling
    The function load_layers uses sys.exit(1) for error handling which might not be the best approach in a larger application context. Consider raising exceptions that can be caught and handled at a higher level to provide more control over the application's flow.

    Exception Handling
    The main function catches general exceptions and logs them, which is good. However, it might be beneficial to handle specific exceptions more granularly to provide clearer error messages and recovery options.

    Copy link

    github-actions bot commented Jul 30, 2024

    PR Code Suggestions ✨

    CategorySuggestion                                                                                                                                    Score
    Enhancement
    Add error handling for CSV writing operations to improve robustness

    Consider adding error handling for the CSV writing operation to ensure the program
    handles potential I/O errors gracefully.

    hydrography-approach/processing_scripts/associate_data/determine_final_osm_id.py [202-203]

    -df.to_csv(intermediate_association)
    -logger.info(f"{intermediate_association} file has been created successfully!")
    +try:
    +    df.to_csv(intermediate_association)
    +    logger.info(f"{intermediate_association} file has been created successfully!")
    +except IOError as e:
    +    logger.error(f"Failed to write {intermediate_association}: {str(e)}")
     
    Suggestion importance[1-10]: 9

    Why: Adding error handling for I/O operations is crucial for robustness, especially when dealing with file operations that can fail due to various reasons such as permission issues or disk space limitations.

    9
    Add exception handling for robust error management

    For better error handling and debugging, consider adding exception handling around
    critical operations such as file operations and data processing.

    hydrography-approach/run-hydrography-pipeline.py [48]

    -config = load_config(state_name)
    +try:
    +    config = load_config(state_name)
    +except Exception as e:
    +    logger.error(f"Failed to load configuration: {e}")
    +    raise
     
    Suggestion importance[1-10]: 9

    Why: Adding exception handling around critical operations improves the robustness and debuggability of the code, making it easier to identify and handle errors.

    9
    Use a more descriptive logger name

    Consider using a more specific logger name instead of name to help differentiate
    logs from different parts of the application more easily.

    hydrography-approach/run-hydrography-pipeline.py [23]

    -logger = logging.getLogger(__name__)
    +logger = logging.getLogger("hydrography_pipeline")
     
    Suggestion importance[1-10]: 7

    Why: Using a more descriptive logger name can help differentiate logs from different parts of the application, improving maintainability and debugging.

    7
    Best practice
    Use exceptions instead of sys.exit for error handling

    Replace the direct sys.exit calls with exceptions to allow for better error handling
    and testing.

    hydrography-approach/processing_scripts/associate_data/determine_final_osm_id.py [208-209]

     if not nbi_points_gl.isValid():
         logger.error("NBI points layer failed to load!")
    -    sys.exit(1)
    +    raise Exception("Failed to load NBI points layer")
     
    Suggestion importance[1-10]: 8

    Why: Replacing sys.exit with exceptions allows for better error handling and makes the code more testable. This change improves the overall robustness and maintainability of the code.

    8
    Avoid hardcoding file paths in the code

    It is recommended to avoid hardcoding file paths in the logging configuration.
    Consider using a configuration file or environment variable to manage the log file
    path.

    hydrography-approach/run-hydrography-pipeline.py [19]

    -filename="hydrography-pipeline.log",
    +filename=os.getenv("LOG_FILE_PATH", "hydrography-pipeline.log"),
     
    Suggestion importance[1-10]: 8

    Why: Using environment variables for file paths enhances flexibility and security, making the code more adaptable to different environments.

    8
    Use structured logging for consistent log format

    Ensure consistent logging format by using structured logging for better traceability
    and analysis.

    hydrography-approach/processing_scripts/associate_data/determine_final_osm_id.py [203]

    -logger.info(f"{intermediate_association} file has been created successfully!")
    +logger.info("File creation successful", extra={"file_name": intermediate_association})
     
    Suggestion importance[1-10]: 7

    Why: Structured logging improves log readability and traceability, which is beneficial for debugging and monitoring. However, it is a minor enhancement compared to error handling.

    7
    Ensure proper file handling by using a context manager

    Use a context manager to ensure the CSV file is closed properly after operations,
    which is especially important in a loop or when handling large data.

    hydrography-approach/processing_scripts/associate_data/determine_final_osm_id.py [202]

    -df.to_csv(intermediate_association)
    +with open(intermediate_association, 'w', newline='') as file:
    +    df.to_csv(file)
     
    Suggestion importance[1-10]: 6

    Why: Using a context manager ensures that the file is properly closed after the operation, which is good practice. However, in the context of using pandas' to_csv method, this is less critical as pandas handles file opening and closing internally.

    6
    Possible issue
    Ensure directories exist before use

    To ensure that directories are created before they are used, consider checking and
    creating necessary directories at the start of the script or function.

    hydrography-approach/run-hydrography-pipeline.py [52-54]

    -os.makedirs(
    -    config["output_data_folders"]["state_folder"],
    -    exist_ok=True,
    +if not os.path.exists(config["output_data_folders"]["state_folder"]):
    +    os.makedirs(config["output_data_folders"]["state_folder"])
     
    Suggestion importance[1-10]: 3

    Why: The existing code already ensures the directory exists using exist_ok=True, making this suggestion redundant and less impactful.

    3

    @varun-andhra-mapup
    Copy link
    Contributor Author

    Fixed logging configuration

    Updated the logging configuration to use a path specified in the config.yml file, replacing the hardcoded path. This enhances flexibility and ensures consistent logging across different environments.

    Commit:

    e05fbc2

    Code Improvement

    def main() -> None:
        ...
        # Configure logging after loading config
        log_file_path = config["logging"].get("log_file_path", "hydrography-pipeline.log")
        logging.basicConfig(
            filename=log_file_path,  # Use the path from the configuration
            level=logging.INFO,
            format="%(asctime)s - [%(levelname)s] - (%(filename)s).%(funcName)s - %(message)s",
        )
        ...

    @varun-andhra-mapup
    Copy link
    Contributor Author

    varun-andhra-mapup commented Aug 1, 2024

    Error Handling for Layer Loading

    Improved error handling for layer loading by adding detailed logging and maintaining process integrity. The function now logs specific errors if layers fail to load and exits immediately, as retries are not practical given the likely file path issues. Continuing without these layers is not possible since subsequent processes depend on their successful loading.

    Commit

    16da213

    Code Improvement

    from typing import Optional, Tuple
    from qgis.core import QgsVectorLayer
    
    def load_layers(nbi_points_fp: str, osm_fp: str, logger) -> Tuple[Optional[QgsVectorLayer], Optional[QgsVectorLayer]]:
        """
        Load required layers with improved error handling.
    
        Args:
            nbi_points_fp (str): File path to the NBI points layer.
            osm_fp (str): File path to the OSM ways layer.
            logger: Logger instance for logging errors.
    
        Returns:
            Tuple[Optional[QgsVectorLayer], Optional[QgsVectorLayer]]:
                A tuple containing the NBI points layer and the OSM ways layer.
        """
        def load_layer(fp: str, layer_name: str) -> Optional[QgsVectorLayer]:
            """
            Load a layer and log errors if the loading fails.
    
            Args:
                fp (str): File path to the layer.
                layer_name (str): Name of the layer for logging purposes.
    
            Returns:
                Optional[QgsVectorLayer]: Loaded layer or None if failed.
            """
            layer = QgsVectorLayer(fp, layer_name, "ogr")
            if not layer.isValid():
                logger.error(f"{layer_name} layer failed to load. Check the file path and ensure the file exists.")
                return None
            return layer
    
        nbi_points_gl = load_layer(nbi_points_fp, "nbi-points")
        osm_gl = load_layer(osm_fp, "filtered")
    
        if nbi_points_gl is None:
            logger.error("NBI points layer is critical and could not be loaded. Exiting.")
            sys.exit(1)
    
        if osm_gl is None:
            logger.error("OSM ways layer could not be loaded. Exiting.")
            sys.exit(1)
    
        return nbi_points_gl, osm_gl

    @varun-andhra-mapup
    Copy link
    Contributor Author

    varun-andhra-mapup commented Aug 1, 2024

    Improved Error Handling for CSV Writing

    Enhanced the robustness of the CSV file creation process by adding error handling for potential I/O errors. Previously, if there was an issue writing the CSV file, the application would fail silently without providing any indication of the failure.

    Commit

    16da213

    Code Improvement

    • Added a try-except block around the df.to_csv method to catch and log IOError exceptions.
    try:
        df.to_csv(intermediate_association)
        logger.info(f"{intermediate_association} file has been created successfully!")
    except IOError as e:
        logger.error(f"Failed to write {intermediate_association}: {str(e)}")

    @varun-andhra-mapup
    Copy link
    Contributor Author

    varun-andhra-mapup commented Aug 1, 2024

    Ensure directory existence before use

    This change adds a check to see if directories already exist before attempting to create them. It also includes logging for both successful directory creation and pre-existing directories.

    Commit

    c221f25

    Code Improvement

    • Implemented os.path.exists() to check directory existence.
    • Added logging for directory creation and pre-existing directories.
    • Improved error handling with try-except blocks.
    def create_directories(config: Dict[str, str]) -> None:
        """
        Create directories for output data as specified in the configuration.
        
        :param config: Dictionary containing paths for directories to be created.
        """
        output_folders = config.get("output_data_folders", {})
    
        # Define required folder keys
        required_folders = ["state_folder", "csv_files", "gpkg_files", "pbf_files"]
    
        for folder in required_folders:
            folder_path = output_folders.get(folder)
            if folder_path:
                if not os.path.exists(folder_path):
                    try:
                        os.makedirs(folder_path)
                        logger.info(f"Directory created: {folder_path}")
                    except Exception as e:
                        logger.error(f"Failed to create directory {folder_path}: {e}")
                else:
                    logger.info(f"Directory already exists: {folder_path}")
            else:
                logger.warning(f"Path for {folder} not specified in configuration.")

    This ensures directories are only created if they do not already exist and provides better feedback during execution.

    @varun-andhra-mapup
    Copy link
    Contributor Author

    /review

    Copy link

    github-actions bot commented Aug 1, 2024

    Persistent review updated to latest commit c221f25

    @varun-andhra-mapup
    Copy link
    Contributor Author

    Improved Error Handling in load_layers

    The load_layers function now incorporates a custom LayerLoadError exception to handle errors more gracefully. This change replaces the use of sys.exit(1) with exceptions that can be caught and managed at a higher level, offering better control over the application flow.

    Commit

    d8fc364

    Code Improvement

    class LayerLoadError(Exception):
        """Custom exception for errors in loading vector layers."""
    
        def __init__(self, layer_name: str, message: str):
            super().__init__(f"{layer_name} layer: {message}")
            self.layer_name = layer_name
    
    
    def load_layers(
        nbi_points_fp: str, osm_fp: str, logger: logging.Logger
    ) -> Tuple[Optional[QgsVectorLayer], Optional[QgsVectorLayer]]:
        """
        Load required layers with improved error handling.
        Tuple[Optional[QgsVectorLayer], Optional[QgsVectorLayer]]:
            A tuple containing the NBI points layer and the OSM ways layer.
        """
    
        def load_layer(fp: str, layer_name: str) -> Optional[QgsVectorLayer]:
            """
            Load a layer and log errors if the loading fails.
            """
            layer = QgsVectorLayer(fp, layer_name, "ogr")
            if not layer.isValid():
                logger.error(
                    f"{layer_name} layer failed to load. Check the file path and ensure the file exists."
                )
                raise LayerLoadError(layer_name, "Failed to load layer.")
            return layer
    
        try:
            nbi_points_gl = load_layer(nbi_points_fp, "nbi-points")
            osm_gl = load_layer(osm_fp, "filtered")
        except LayerLoadError as e:
            logger.error(e)
            raise  # Re-raise the exception to be handled at a higher level

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    4 participants