Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(monitoring): create duplicated pipelines #225

Merged
merged 6 commits into from
Dec 3, 2024

Conversation

zyy17
Copy link
Collaborator

@zyy17 zyy17 commented Nov 27, 2024

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced a new method for retrieving existing pipelines, enhancing monitoring status management.
    • Added functionality to manage a new Docker image for the Vector service in the setup script.
    • Implemented a new function to gather specific GreptimeDB resource information.
  • Bug Fixes

    • Improved pod filtering to only include running pods for metrics collection.
  • Chores

    • Updated dependencies for better module management.
    • Removed unnecessary command-line argument affecting standalone GreptimeDB configuration.
  • Documentation

    • Enhanced error handling and clarity in metrics collection processes.

Copy link
Contributor

coderabbitai bot commented Nov 27, 2024

Walkthrough

The pull request introduces multiple changes across several files, primarily enhancing the functionality of the MonitoringDeployer class to include a new method for retrieving pipelines from a MySQL database. It modifies the control flow to prioritize checking for existing pipelines before creation. Additionally, it updates command-line arguments for a standalone GreptimeDB instance, introduces new functions and variables in setup scripts, and refines pod filtering in metrics collection. The go.mod file is also updated to manage new dependencies.

Changes

File Change Summary
controllers/greptimedbcluster/deployers/monitoring.go - Added method getPipeline for retrieving pipelines from MySQL.
- Modified CheckAndUpdateStatus to use getPipeline first.
controllers/greptimedbstandalone/deployer.go - Removed --data-home argument from generateMainContainerArgs.
go.mod - Added dependencies: github.com/go-sql-driver/mysql v1.8.1 and github.com/prometheus/client_golang v1.16.0.
- Removed indirect dependency on github.com/prometheus/client_golang.
tests/e2e/setup/create-cluster.sh - Added variable VECTOR_IMAGE for Docker image.
- Introduced function pull_vector_image to manage image pulling.
tests/e2e/setup/diagnostic-cluster.sh - Added function dump_greptime_resources to gather GreptimeDB resource information.
pkg/metrics/metrics.go - Enhanced getPods method to filter only "Running" pods.
- Refined error handling in collectPodImagePullingDuration.

Possibly related PRs

🐰 In the meadow, changes bloom bright,
New pipelines flow, a wondrous sight.
With metrics refined and images in tow,
The GreptimeDB thrives, watch it grow!
A hop and a skip, we dance with glee,
For every new feature, a victory! 🌼


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 6c905bf and e2b5498.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (3)
  • controllers/greptimedbcluster/deployers/monitoring.go (4 hunks)
  • controllers/greptimedbstandalone/deployer.go (0 hunks)
  • go.mod (2 hunks)
💤 Files with no reviewable changes (1)
  • controllers/greptimedbstandalone/deployer.go
🔇 Additional comments (2)
go.mod (2)

16-16: LGTM!

The prometheus client dependency change from indirect to direct with the same version is appropriate for the monitoring-related changes.


9-9: Verify the MySQL driver version.

The addition of mysql driver v1.8.1 for database interactions needs verification for stability and security concerns.

✅ Verification successful

MySQL driver v1.8.1 is secure and up-to-date

The verification shows that:

  • v1.8.1 is the latest available version of the mysql driver
  • No security vulnerabilities were found in the GitHub Security Advisory Database
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for security advisories and latest versions of the mysql driver

# Check for latest version on pkg.go.dev
curl -s "https://pkg.go.dev/github.com/go-sql-driver/mysql?tab=versions" | grep -o 'v[0-9]\+\.[0-9]\+\.[0-9]\+'

# Check for security advisories
gh api graphql -f query='
{
  securityVulnerabilities(first: 5, ecosystem: GO, package: "github.com/go-sql-driver/mysql") {
    nodes {
      advisory {
        summary
        severity
        publishedAt
      }
      vulnerableVersionRange
      firstPatchedVersion {
        identifier
      }
    }
  }
}'

Length of output: 1344

controllers/greptimedbcluster/deployers/monitoring.go Outdated Show resolved Hide resolved
controllers/greptimedbcluster/deployers/monitoring.go Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between e2b5498 and 3f82ecf.

⛔ Files ignored due to path filters (1)
  • tests/e2e/testdata/resources/cluster/enable-monitoring/cluster.yaml is excluded by !tests/e2e/testdata/**
📒 Files selected for processing (1)
  • tests/e2e/setup/create-cluster.sh (2 hunks)

tests/e2e/setup/create-cluster.sh Show resolved Hide resolved
tests/e2e/setup/create-cluster.sh Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
tests/e2e/setup/diagnostic-cluster.sh (1)

29-32: LGTM! Consider enhancing error handling and output formatting.

The function correctly retrieves GreptimeDB resources. Consider these improvements:

  1. Add error handling in case CRDs aren't installed
  2. Add output formatting for better readability
 function dump_greptime_resources() {
+  echo "===> Dumping GreptimeDB Cluster resources"
+  if ! kubectl api-resources | grep -q "greptimedbclusters.greptime.io"; then
+    echo "Warning: GreptimeDB Cluster CRD not found"
+    return
+  fi
   kubectl get greptimedbclusters.greptime.io -A
+  echo "===> Dumping GreptimeDB Standalone resources"
+  if ! kubectl api-resources | grep -q "greptimedbstandalones.greptime.io"; then
+    echo "Warning: GreptimeDB Standalone CRD not found"
+    return
+  fi
   kubectl get greptimedbstandalones.greptime.io -A
 }
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 3f82ecf and f3cfd5d.

📒 Files selected for processing (2)
  • tests/e2e/helper/helper.go (1 hunks)
  • tests/e2e/setup/diagnostic-cluster.sh (2 hunks)
✅ Files skipped from review due to trivial changes (1)
  • tests/e2e/helper/helper.go
🔇 Additional comments (1)
tests/e2e/setup/diagnostic-cluster.sh (1)

48-48: LGTM! Logical placement of the new function call.

The new function is appropriately placed in the diagnostic sequence, between basic information gathering and event collection.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (2)
controllers/greptimedbcluster/deployers/monitoring.go (2)

98-111: Improve error message formatting

The error message uses single quotes which is inconsistent with Go conventions. Go typically uses double quotes for strings.

Apply this diff:

-			klog.Errorf("failed to get pipeline for standalone, err: '%v'", err)
+			klog.Errorf("failed to get pipeline for standalone, err: %v", err)
-				klog.Errorf("failed to create pipeline for standalone, err: '%v'", err)
+				klog.Errorf("failed to create pipeline for standalone, err: %v", err)

221-227: Improve error handling for table existence check

The current error check for table existence is fragile as it relies on string matching. Consider using a more robust approach.

Apply this diff:

-	rows, err := db.QueryContext(ctx, "SELECT 1 FROM pipelines LIMIT 1")
-	if err != nil {
-		if strings.Contains(err.Error(), "TableNotFound") {
-			return "", nil
-		}
-		return "", err
-	}
+	var exists bool
+	err = db.QueryRowContext(ctx, `
+		SELECT EXISTS (
+			SELECT 1 FROM information_schema.tables 
+			WHERE table_schema = ? AND table_name = 'pipelines'
+		)`, "greptime_private").Scan(&exists)
+	if err != nil {
+		return "", fmt.Errorf("failed to check table existence: %v", err)
+	}
+	if !exists {
+		return "", nil
+	}
🧰 Tools
🪛 GitHub Check: Build the project

[failure] 221-221:
unreachable code

🪛 GitHub Check: Run unit tests

[failure] 221-221:
unreachable code

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 4a156af and 1d91260.

📒 Files selected for processing (1)
  • controllers/greptimedbcluster/deployers/monitoring.go (4 hunks)
🧰 Additional context used
🪛 GitHub Check: Build the project
controllers/greptimedbcluster/deployers/monitoring.go

[failure] 221-221:
unreachable code

🪛 GitHub Check: Run unit tests
controllers/greptimedbcluster/deployers/monitoring.go

[failure] 221-221:
unreachable code

🔇 Additional comments (2)
controllers/greptimedbcluster/deployers/monitoring.go (2)

20-20: LGTM: Required imports added for database interaction

The addition of database/sql and github.com/go-sql-driver/mysql imports is appropriate for the new database interaction functionality.

Also applies to: 30-30


221-230: ⚠️ Potential issue

Fix resource management for database rows

The rows objects should be properly managed using defer to ensure they're always closed, even in error cases.

Apply this diff:

 	rows, err := db.QueryContext(ctx, "SELECT 1 FROM pipelines LIMIT 1")
 	if err != nil {
 		if strings.Contains(err.Error(), "TableNotFound") {
 			return "", nil
 		}
 		return "", err
 	}
+	defer rows.Close()
-	if rows != nil {
-		rows.Close()
-	}

 	pipelineName := common.LogsPipelineName(cluster.Namespace, cluster.Name)
 	rows, err = db.QueryContext(ctx, "SELECT pipeline FROM pipelines WHERE name = ? LIMIT 1", pipelineName)
 	if err != nil {
 		return "", err
 	}
+	defer rows.Close()

 	if rows.Next() {
 		var pipeline string
 		if err := rows.Scan(&pipeline); err != nil {
 			return "", err
 		}
 		return pipeline, nil
 	}
-
-	if rows != nil {
-		rows.Close()
-	}

Also applies to: 233-249

🧰 Tools
🪛 GitHub Check: Build the project

[failure] 221-221:
unreachable code

🪛 GitHub Check: Run unit tests

[failure] 221-221:
unreachable code

controllers/greptimedbcluster/deployers/monitoring.go Outdated Show resolved Hide resolved
controllers/greptimedbcluster/deployers/monitoring.go Outdated Show resolved Hide resolved
@zyy17 zyy17 force-pushed the fix/duplicate-pipeline branch 2 times, most recently from 7656ec8 to a153da1 Compare December 2, 2024 10:00
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (1)
controllers/greptimedbcluster/deployers/monitoring.go (1)

98-111: Enhance error handling and logging

While the logic flow is correct, consider these improvements:

  1. Add more context to error messages
  2. Use structured logging with fields
-       pipeline, err := d.getPipeline(ctx, cluster)
+       pipeline, err := d.getPipeline(ctx, cluster)
        if err != nil {
-           klog.Errorf("failed to get pipeline for standalone, err: '%v'", err)
+           klog.ErrorS(err, "failed to get pipeline",
+               "cluster", klog.KObj(cluster),
+               "namespace", cluster.Namespace)
            return false, err
        }

        if pipeline == "" {
-           klog.Infof("create pipeline '%s' for standalone monitoring", common.LogsPipelineName(cluster.Namespace, cluster.Name))
+           pipelineName := common.LogsPipelineName(cluster.Namespace, cluster.Name)
+           klog.InfoS("creating pipeline", 
+               "pipeline", pipelineName,
+               "cluster", klog.KObj(cluster))
            if err := d.createPipeline(cluster); err != nil {
-               klog.Errorf("failed to create pipeline for standalone, err: '%v'", err)
+               klog.ErrorS(err, "failed to create pipeline",
+                   "pipeline", pipelineName,
+                   "cluster", klog.KObj(cluster))
                return false, err
            }
        }
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 7656ec8 and a153da1.

📒 Files selected for processing (2)
  • controllers/greptimedbcluster/deployers/monitoring.go (4 hunks)
  • tests/e2e/setup/create-cluster.sh (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/e2e/setup/create-cluster.sh
🔇 Additional comments (4)
controllers/greptimedbcluster/deployers/monitoring.go (4)

20-20: LGTM: Required imports added for database functionality

The new imports for database/sql and mysql driver are appropriate for the added database interaction functionality.

Also applies to: 30-30


218-227: ⚠️ Potential issue

Fix resource leak in table existence check

The current implementation may leak database resources. Use defer immediately after successful query.

 rows, err := db.QueryContext(ctx, "SELECT 1 FROM pipelines LIMIT 1")
 if err != nil {
     if strings.Contains(err.Error(), "TableNotFound") {
         return "", nil
     }
     return "", err
 }
+defer rows.Close()
-if rows != nil {
-    rows.Close()
-}

230-246: ⚠️ Potential issue

Fix resource leak in pipeline query

Similar to the table check, this query also needs proper resource cleanup.

 rows, err = db.QueryContext(ctx, "SELECT pipeline FROM pipelines WHERE name = ? LIMIT 1", pipelineName)
 if err != nil {
     return "", err
 }
+defer rows.Close()

 if rows.Next() {
     var pipeline string
     if err := rows.Scan(&pipeline); err != nil {
         return "", err
     }
     return pipeline, nil
 }

-if rows != nil {
-    rows.Close()
-}

211-216: 🛠️ Refactor suggestion

Add connection pool configuration

The database connection should be configured with proper pool settings.

 db, err := sql.Open("mysql", cfg.FormatDSN())
 if err != nil {
     return "", err
 }
 defer db.Close()
+
+// Configure connection pool
+db.SetMaxOpenConns(defaultMaxOpenConns)
+db.SetMaxIdleConns(defaultMaxIdleConns)
+db.SetConnMaxLifetime(defaultConnMaxLifetime)
+
+// Test connection
+if err := db.PingContext(ctx); err != nil {
+    return "", fmt.Errorf("failed to connect to database: %w", err)
+}

Likely invalid or redundant comment.

@zyy17 zyy17 force-pushed the fix/duplicate-pipeline branch from a153da1 to f16e49e Compare December 2, 2024 14:16
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
controllers/greptimedbcluster/deployers/monitoring.go (1)

234-237: Enhance error handling for row scanning

Add more context to the scan error to help with debugging.

Apply this diff:

 if rows.Next() {
     var pipeline string
     if err := rows.Scan(&pipeline); err != nil {
-        return "", err
+        return "", fmt.Errorf("failed to scan pipeline for '%s': %w", pipelineName, err)
     }
     return pipeline, nil
 }
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between a153da1 and f16e49e.

📒 Files selected for processing (2)
  • controllers/greptimedbcluster/deployers/monitoring.go (4 hunks)
  • tests/e2e/setup/create-cluster.sh (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/e2e/setup/create-cluster.sh
🔇 Additional comments (5)
controllers/greptimedbcluster/deployers/monitoring.go (5)

20-20: LGTM: Required imports added for database operations

The new imports for database/sql and github.com/go-sql-driver/mysql are appropriate for the database operations introduced.

Also applies to: 30-30


204-210: 🛠️ Refactor suggestion

Make database configuration more configurable

The database configuration values are hardcoded. Consider:

  1. Moving configuration to CRD spec or environment variables
  2. Adding connection pool settings

211-216: 🛠️ Refactor suggestion

Add connection pooling and testing

The database connection should be configured with proper connection pooling settings and tested before use.


233-241: ⚠️ Potential issue

Fix resource leak in rows handling

The defer rows.Close() statement after rows.Next() is ineffective as it comes after a return statement. Move it immediately after the query.

Apply this diff:

 rows, err = db.QueryContext(ctx, "SELECT pipeline FROM pipelines WHERE name = ? LIMIT 1", pipelineName)
 if err != nil {
     return "", err
 }
+defer rows.Close()

 if rows.Next() {
     var pipeline string
     if err := rows.Scan(&pipeline); err != nil {
         return "", err
     }
     return pipeline, nil
 }
-defer rows.Close()

 return "", nil

Likely invalid or redundant comment.


98-111: Consider handling concurrent pipeline creation

While the logic to check before creating is good, there's a potential race condition if multiple instances check simultaneously and find no pipeline exists. Consider adding a distributed locking mechanism or handling the case where pipeline creation fails due to it already existing.

Let's check if there are any existing distributed locking mechanisms in use:

@zyy17 zyy17 requested a review from daviderli614 December 2, 2024 14:27
@zyy17 zyy17 merged commit 3b0b025 into GreptimeTeam:main Dec 3, 2024
4 checks passed
@zyy17 zyy17 deleted the fix/duplicate-pipeline branch December 3, 2024 08:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants