Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

133/prometheus #173

Merged
merged 11 commits into from
Apr 28, 2021
Merged

133/prometheus #173

merged 11 commits into from
Apr 28, 2021

Conversation

bartmacbartek
Copy link
Contributor

Closes #133

prometheus.yaml Outdated
Comment on lines 1 to 10
global:
scrape_interval: 15s
evaluation_interval: 15s

scrape_configs:
- job_name: confio_relayer

scrape_interval: 5s
static_configs:
- targets: ['host.docker.internal:26660']
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an example configuration file. Is this the right place to store it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to commit it to this repo, so it is maintained.

You could put it in a samples or ops dir. We can add example systemd and other configs there as needed.
(Maybe we move it together with the registry.conf file?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that init command has registry.yaml hardcoded to main branch, so I cannot move registry under samples.

'https://raw.githubusercontent.com/confio/ts-relayer/main/demo/registry.yaml'

One way to "fix" this would be to use parametrized link with version:

`https://raw.githubusercontent.com/confio/ts-relayer/tree/v${packageJson.version}/demo/registry.yaml`

But it fixes the future versions only. Everything till 0.1.3 included will break.

👉 For now, I moved prometheus.yaml under demo. 92287b8

Copy link
Contributor

@ethanfrey ethanfrey Apr 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And let's keep it in demo for now (marked deprecated), and make the main copy in another directory (final location)

I am not sure about fixing to a version. We can think about a better location, but nice to be able to roll out registry without requiring upgrades

return getStubbedMetrics();
}

client.collectDefaultMetrics({ prefix: 'confio_relayer_' });
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -262,5 +294,7 @@ async function relayerLoop(link: Link, options: Options, logger: Logger) {
logger.info(`Sleeping ${options.poll} seconds...`);
await sleep(options.poll * 1000);
logger.info('... waking up and checking for packets!');

metrics?.pollCounter?.inc();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With nullish coalescing operator, we can collect metrics nicely w/o condition blocks (if metrics are disabled).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is neat, but I find it less clear than an if statement.

Comment on lines +11 to +14
pollCounter: new client.Counter({
name: 'poll_counter',
help: 'Poll counter',
}),
Copy link
Contributor Author

@bartmacbartek bartmacbartek Apr 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only an example of a custom metric that should be removed. Let's replace this with some real ones.

Copy link
Contributor

@alpe alpe Apr 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good start! Counter makes sense here for this metric. 👍

Some notes for a future version:
I have noticed though that the naming does not fit the others, yet:

confio_relayer_nodejs_gc_duration_seconds_count{kind="minor"} 80
poll_counter 13

With a common prefix we can find all relayer metrics easily. Otherwise some context is missing as a prometheus instance is likely not exclusive to the relayer.

No idea if the relayer is always bidirectional but as this would involve 2 chains the semantics of poll is not fully clear to me. You can use labels to mark which chain you polled.

Naming is always hard. You don't need to carry the type inside the name. Maybe "_count" or "_total" is a better fit. I always found this page very helpful: https://prometheus.io/docs/practices/naming/

@@ -262,5 +294,7 @@ async function relayerLoop(link: Link, options: Options, logger: Logger) {
logger.info(`Sleeping ${options.poll} seconds...`);
await sleep(options.poll * 1000);
logger.info('... waking up and checking for packets!');

metrics?.pollCounter?.inc();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is neat, but I find it less clear than an if statement.

};
}

export type Metrics = ReturnType<typeof setupPrometheus>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not define the type here and use it in the function definition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's safer than relying on the inference.

af3e18c

port,
logger,
}: {
enabled: boolean;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make these readonly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course! 4d93ebd


import { Logger } from '../create-logger';

let initialized = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, do we need some kind of global state?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just nice to have to make sure we don't initialize the HTTP server more than once: https://github.com/confio/ts-relayer/pull/173/files#diff-95ad111838746b4a0030c8e37ff66c5de9df11b1a2eab5aa3f59895c9f70016cR28-R32

I thought of implementing a singleton here but since it takes arguments like enabled I decided to just throw on second call.

@changeset-bot
Copy link

changeset-bot bot commented Apr 28, 2021

🦋 Changeset detected

Latest commit: 19664fd

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@confio/relayer Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Contributor

@ethanfrey ethanfrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

A few minor points that can be addressed in a follow up PR. (There are more prometheus tasks)


> **NOTE:** Metrics can also be enabled via an environment variable `RELAYER_ENABLE_METRICS=true`, or with an `enableMetrics: true` entry in the `app.yaml` file, as explained in the [config specification](./spec/config.md#configuration).

The `GET /metrics` endpoint will be exposed by default on port `26660`, which you can override with `--metrics-port` flag, `RELAYER_METRICS_PORT` env variable, or `metricsPort` entry in `app.yaml`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice list of options to set the port. 🌷
Just a note: port 26660 is used by Tendermint as their default port for Prometheus and therefore for any blockchain based on cosmos-sdk like wasmd. This may cause some conflicts when both run on the same machine. 🤷
If you want to change this then I would go with 8080 where people expect conflicts anyway :-)

README.md Outdated
1. Start the relayer with metrics enabled
2. Spin up the Prometheus instance:
```sh
docker run -it -v $(pwd):/prometheus -p9090:9090 prom/prometheus --config.file=prometheus.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool, a default prometheus config committed in repo. Better than any docs.

prometheus.yaml Outdated
Comment on lines 1 to 10
global:
scrape_interval: 15s
evaluation_interval: 15s

scrape_configs:
- job_name: confio_relayer

scrape_interval: 5s
static_configs:
- targets: ['host.docker.internal:26660']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to commit it to this repo, so it is maintained.

You could put it in a samples or ops dir. We can add example systemd and other configs there as needed.
(Maybe we move it together with the registry.conf file?)

destConnection?: string;
scanFromSrc?: string;
scanFromDest?: string;
readonly interactive: boolean;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will was here 😄

Boolean(process.env.RELAYER_ENABLE_METRICS) ||
app?.enableMetrics ||
false;
const metricsPort = resolveOption('metricsPort', {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice.

@@ -171,6 +176,20 @@ export async function start(flags: Flags, logger: Logger) {
flags.scanFromDest,
process.env.RELAYER_SCAN_FROM_DEST
);
const enableMetrics =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you use resolveOption here like everywhere else?

Ah, there is no support for boolean, only string and integer?
If that is the case, leave it for this PR and make an issue to add boolean support, pointing to this line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly!

#174

logger: Logger;
}): Metrics {
if (initialized) {
throw new Error(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to find bugs in our code fast.

I was going to suggest to only set initialized to true if enabled was true, but that means double-registry bugs only surface in production. It is good like this

}: {
enabled: boolean;
port: number;
logger: Logger;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logger is not yet used. Do you plan to use it in this PR or a direct follow up?
If not, can we remove it until there is a clear use? We can always add it back then

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used here: https://github.com/confio/ts-relayer/pull/173/files#diff-95ad111838746b4a0030c8e37ff66c5de9df11b1a2eab5aa3f59895c9f70016cR57

or do you think it's better to do console log? I would rather keep the logger to make sure the endpoint is mentioned in the logs.

@bartmacbartek bartmacbartek merged commit fb5934f into main Apr 28, 2021
@bartmacbartek bartmacbartek deleted the 133/prometheus branch April 28, 2021 21:15
Copy link
Contributor

@alpe alpe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work! 💐 Very happy to see all the system metrics already 🚀


> **NOTE:** Metrics can also be enabled via an environment variable `RELAYER_ENABLE_METRICS=true`, or with an `enableMetrics: true` entry in the `app.yaml` file, as explained in the [config specification](./spec/config.md#configuration).

The `GET /metrics` endpoint will be exposed by default on port `26660`, which you can override with `--metrics-port` flag, `RELAYER_METRICS_PORT` env variable, or `metricsPort` entry in `app.yaml`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice list of options to set the port. 🌷
Just a note: port 26660 is used by Tendermint as their default port for Prometheus and therefore for any blockchain based on cosmos-sdk like wasmd. This may cause some conflicts when both run on the same machine. 🤷
If you want to change this then I would go with 8080 where people expect conflicts anyway :-)

Comment on lines +11 to +14
pollCounter: new client.Counter({
name: 'poll_counter',
help: 'Poll counter',
}),
Copy link
Contributor

@alpe alpe Apr 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good start! Counter makes sense here for this metric. 👍

Some notes for a future version:
I have noticed though that the naming does not fit the others, yet:

confio_relayer_nodejs_gc_duration_seconds_count{kind="minor"} 80
poll_counter 13

With a common prefix we can find all relayer metrics easily. Otherwise some context is missing as a prometheus instance is likely not exclusive to the relayer.

No idea if the relayer is always bidirectional but as this would involve 2 chains the semantics of poll is not fully clear to me. You can use labels to mark which chain you polled.

Naming is always hard. You don't need to carry the type inside the name. Maybe "_count" or "_total" is a better fit. I always found this page very helpful: https://prometheus.io/docs/practices/naming/

@bartmacbartek bartmacbartek mentioned this pull request Apr 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add prometheus support (local dev)
4 participants