Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New metricbeat module: pgbouncer #41174

Open
wants to merge 44 commits into
base: main
Choose a base branch
from

Conversation

manuelsaks
Copy link

@manuelsaks manuelsaks commented Oct 8, 2024

  • WHAT:
    The module follows the standard design patterns of Metricbeat modules, ensuring consistency with other database modules like PostgreSQL. It implements the MetricSet interface to collect metrics from PgBouncer.
    The module interacts with PgBouncer using SQL queries to gather pool statistics and server performance data. The data is parsed and processed through Metricbeat's internal processing pipeline. It communicates with PgBouncer using the native Postgres protocol over TCP. It sends queries to fetch relevant metrics and processes the responses, converting them into Metricbeat-compatible events.
  • WHY:
    PGBouncer is widely used as a connection pooler for PostgreSQL, and monitoring its performance is crucial for maintaining the health of database-backed systems. By adding this module, Metricbeat provides users with out-of-the-box monitoring capabilities for PGBouncer, enabling them to track metrics such as connection pool utilization, server performance, and query rates. This integration helps users maintain optimal database performance, reduce latency, and prevent resource exhaustion in their systems.
    -->

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Disruptive User Impact

It shouldn't impact users.

Author's Checklist

  • Code follows the established coding guidelines and style.
  • Proper error handling is in place.
  • Tests passing and the coverage > 80%.
  • Code builds as expected.
  • Data collection is efficient, and no unnecessary queries are run.
  • Documentation is updated to reflect the new module, including how to configure and use it.

How to test this PR locally

  1. Run the docker-compose stack and wait until you will be able to reach kibana on localhost:5601
docker-compose -f ./metricbeat/module/pgbouncer/docker-compose.yml up -d
  1. Run the metricbeat agent:
go run ./metricbeat/main.go -c ./metricbeat/module/pgbouncer/_meta/config_local.yml
  1. Go to kibana panel, create Data View with the test pattern and check logs in discover.

@manuelsaks manuelsaks requested review from a team as code owners October 8, 2024 13:57
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Oct 8, 2024
Copy link

cla-checker-service bot commented Oct 8, 2024

💚 CLA has been signed

Copy link
Contributor

mergify bot commented Oct 8, 2024

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b module_pgbouncer upstream/module_pgbouncer
git merge upstream/main
git push upstream module_pgbouncer

Copy link
Contributor

mergify bot commented Oct 8, 2024

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @manuelsaks? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit

Copy link
Contributor

mergify bot commented Oct 8, 2024

backport-8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label and remove the backport-8.x label.

@mergify mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Oct 8, 2024
@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Oct 8, 2024
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Oct 8, 2024
@pierrehilbert
Copy link
Collaborator

Thanks @manuelsaks for your contribution.
As a first step before we are reviewing your work, could you please sign our CLA?

@pierrehilbert
Copy link
Collaborator

@lalit-satapathy as this is related to PostgreSQL, would you mind assigning someone from your team for the reviewing part?

@pierrehilbert
Copy link
Collaborator

@manuelsaks it looks like your tests are failing, could you please fix them?

=== FAIL: metricbeat/module/pgbouncer TestQueryStats (5.66s)

@manuelsaks
Copy link
Author

@manuelsaks it looks like your tests are failing, could you please fix them?

=== FAIL: metricbeat/module/pgbouncer TestQueryStats (5.66s)

I don't have an access to the buildkite logs.
What is exactly wrong? Because it's working locally.

Running tool: C:\Program Files\Go\bin\go.exe test -timeout 30s -run ^TestQueryStats$ github.com/elastic/beats/v7/metricbeat/module/pgbouncer
ok  	github.com/elastic/beats/v7/metricbeat/module/pgbouncer	8.825s

@mauri870
Copy link
Member

mauri870 commented Oct 9, 2024

@manuelsaks Thanks for willing to look into the CI issue.

I spotted this error in the logs:

ERROR: for pgbouncer  Cannot start service pgbouncer: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "/go/src/github.com/elastic/beats/metricbeat/module/pgbouncer/_meta/userlist.txt" to rootfs at "/etc/pgbouncer/userlist.txt": create mount destination for /etc/pgbouncer/userlist.txt mount: cannot mkdir in /var/lib/docker/overlay2/3883fa799c9c240081c07e9928532040d78e2836a7d53843d8f77634ca22bbc5/merged/etc/pgbouncer/userlist.txt: not a directory: unknown: Are you trying to mount a directory onto a file (or vice-versa)? Check if the specified host path exists and is the expected type
Encountered errors while bringing up the project.

@manuelsaks
Copy link
Author

manuelsaks commented Oct 10, 2024

@manuelsaks Thanks for willing to look into the CI issue.

I spotted this error in the logs:

ERROR: for pgbouncer  Cannot start service pgbouncer: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "/go/src/github.com/elastic/beats/metricbeat/module/pgbouncer/_meta/userlist.txt" to rootfs at "/etc/pgbouncer/userlist.txt": create mount destination for /etc/pgbouncer/userlist.txt mount: cannot mkdir in /var/lib/docker/overlay2/3883fa799c9c240081c07e9928532040d78e2836a7d53843d8f77634ca22bbc5/merged/etc/pgbouncer/userlist.txt: not a directory: unknown: Are you trying to mount a directory onto a file (or vice-versa)? Check if the specified host path exists and is the expected type
Encountered errors while bringing up the project.

I successfully recreated the issue in my own temporary buildkite pipeline and confirmed that it should be resolved with the commit 5e8158d.

@pierrehilbert
Copy link
Collaborator

/test

@pierrehilbert
Copy link
Collaborator

run docs-build

@manuelsaks
Copy link
Author

Can you provide me details/logs about those new errors?

@pierrehilbert
Copy link
Collaborator

Every tests are failing on the windows runner for the same reason:

=== Failed
--
  | === FAIL: metricbeat/module/pgbouncer TestDBConnection (0.31s)
  | metricset_test.go:94: failed to start service 'pgbouncer': ps: failed to get container list: failed to get list of service names: exec: "docker-compose": executable file not found in %PATH%
  | metricset_test.go:94: kill container error: exec: "docker-compose": executable file not found in %PATH%
  | metricset_test.go:94: remove container error: exec: "docker-compose": executable file not found in %PATH%
  | metricset_test.go:94: failed to start service 'pgbouncer': ps: failed to get container list: failed to get list of service names: exec: "docker-compose": executable file not found in %PATH%
  | metricset_test.go:94: kill container error: exec: "docker-compose": executable file not found in %PATH%
  | metricset_test.go:94: remove container error: exec: "docker-compose": executable file not found in %PATH%
  | metricset_test.go:94: failed to start service 'pgbouncer': ps: failed to get container list: failed to get list of service names: exec: "docker-compose": executable file not found in %PATH%
  | metricset_test.go:94: kill container error: exec: "docker-compose": executable file not found in %PATH%
  | metricset_test.go:94: remove container error: exec: "docker-compose": executable file not found in %PATH%
  | metricset_test.go:94: getting host for pgbouncer: ps: failed to get container list: failed to get list of service names: exec: "docker-compose": executable file not found in %PATH%

@manuelsaks
Copy link
Author

Every tests are failing on the windows runner for the same reason:

=== Failed
--
  | === FAIL: metricbeat/module/pgbouncer TestDBConnection (0.31s)
  | metricset_test.go:94: failed to start service 'pgbouncer': ps: failed to get container list: failed to get list of service names: exec: "docker-compose": executable file not found in %PATH%
  | metricset_test.go:94: kill container error: exec: "docker-compose": executable file not found in %PATH%
  | metricset_test.go:94: remove container error: exec: "docker-compose": executable file not found in %PATH%
  | metricset_test.go:94: failed to start service 'pgbouncer': ps: failed to get container list: failed to get list of service names: exec: "docker-compose": executable file not found in %PATH%
  | metricset_test.go:94: kill container error: exec: "docker-compose": executable file not found in %PATH%
  | metricset_test.go:94: remove container error: exec: "docker-compose": executable file not found in %PATH%
  | metricset_test.go:94: failed to start service 'pgbouncer': ps: failed to get container list: failed to get list of service names: exec: "docker-compose": executable file not found in %PATH%
  | metricset_test.go:94: kill container error: exec: "docker-compose": executable file not found in %PATH%
  | metricset_test.go:94: remove container error: exec: "docker-compose": executable file not found in %PATH%
  | metricset_test.go:94: getting host for pgbouncer: ps: failed to get container list: failed to get list of service names: exec: "docker-compose": executable file not found in %PATH%

I missed adding build tags, which caused the unit test pipelines to also run integration tests. This failed on Windows Server agents without docker-compose.

I've added the necessary build tags and successfully tested with commit fdd7d0f on Windows Server 2022.

@pierrehilbert
Copy link
Collaborator

/test

@manuelsaks
Copy link
Author

manuelsaks commented Oct 11, 2024

I'm not sure if the error from the x-pack/metricbeat: Python Integration Tests pipeline is related to my changes.

https://buildkite.com/elastic/beats-xpack-metricbeat/builds/7926#01927bde-dd3c-4a49-ad9b-c828f1e5cfc6

I merged from main to have the latest changes.

@pierrehilbert
Copy link
Collaborator

/test

1 similar comment
@pierrehilbert
Copy link
Collaborator

/test

@pierrehilbert
Copy link
Collaborator

We are moving in the right direction, one last error:

__________________________________________________________________ Test.test_index_management __________________________________________________________________
--
  |  
  | self = <test_xpack_base.Test testMethod=test_index_management>
  |  
  | @unittest.skipUnless(INTEGRATION_TESTS, "integration test")
  | def test_index_management(self):
  | """
  | Test that the template can be loaded with `setup --index-management`
  | """
  | es = Elasticsearch([self.get_elasticsearch_url()])
  | self.render_config_template(
  | modules=[{
  | "name": "apache",
  | "metricsets": ["status"],
  | "hosts": ["localhost"],
  | }],
  | elasticsearch={"host": self.get_elasticsearch_url()},
  | )
  | exit_code = self.run_beat(extra_args=["setup", "--index-management", "-E", "setup.template.overwrite=true"])
  |  
  | >       assert exit_code == 0
  | E       assert 1 == 0
  |  
  | ../../metricbeat/tests/system/test_base.py:57: AssertionError

@pierrehilbert
Copy link
Collaborator

/test

@manuelsaks
Copy link
Author

manuelsaks commented Oct 14, 2024

The problem is:

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "composable template [metricbeat-9.0.0] template after composition is invalid"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "composable template [metricbeat-9.0.0] template after composition is invalid",
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "invalid composite mappings for [metricbeat-9.0.0]",
      "caused_by": {
        "type": "illegal_argument_exception",
        "reason": "Limit of total fields [10000] has been exceeded"
      }
    }
  },
  "status": 400
}

This occurs when running the test_index_management function in beats/metricbeat/tests/system/test_base.py:

def test_index_management(self):
    """
    Test that the template can be loaded with `setup --index-management`
    """
    es = Elasticsearch([self.get_elasticsearch_url()])
    self.render_config_template(
        modules=[{
            "name": "apache",
            "metricsets": ["status"],
            "hosts": ["localhost"],
        }],
        elasticsearch={"host": self.get_elasticsearch_url()},
    )
    exit_code = self.run_beat(extra_args=["setup", "--index-management", "-E", "setup.template.overwrite=true"])

    assert exit_code == 0
    assert self.log_contains('Loaded index template')
    assert len(es.cat.templates(name='metricbeat-*', h='name')) > 0

If I increase the field limit from the default: 10000:

exit_code = self.run_beat(extra_args=["setup", "--index-management", "-E", "setup.template.overwrite=true"])

to 10090:

exit_code = self.run_beat(extra_args=["setup", "--index-management", "-E", "setup.template.overwrite=true", "-E", "setup.template.settings.index.mapping.total_fields.limit=10090"])

the test passes.

Please advise what should I do now.

@pierrehilbert
Copy link
Collaborator

/test

@pierrehilbert
Copy link
Collaborator

/test

@pierrehilbert
Copy link
Collaborator

/test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants