-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
133/prometheus #173
133/prometheus #173
Conversation
prometheus.yaml
Outdated
global: | ||
scrape_interval: 15s | ||
evaluation_interval: 15s | ||
|
||
scrape_configs: | ||
- job_name: confio_relayer | ||
|
||
scrape_interval: 5s | ||
static_configs: | ||
- targets: ['host.docker.internal:26660'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an example configuration file. Is this the right place to store it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to commit it to this repo, so it is maintained.
You could put it in a samples
or ops
dir. We can add example systemd
and other configs there as needed.
(Maybe we move it together with the registry.conf
file?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized that init
command has registry.yaml
hardcoded to main branch, so I cannot move registry under samples
.
'https://raw.githubusercontent.com/confio/ts-relayer/main/demo/registry.yaml' |
One way to "fix" this would be to use parametrized link with version:
`https://raw.githubusercontent.com/confio/ts-relayer/tree/v${packageJson.version}/demo/registry.yaml`
But it fixes the future versions only. Everything till 0.1.3
included will break.
👉 For now, I moved prometheus.yaml
under demo
. 92287b8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And let's keep it in demo for now (marked deprecated), and make the main copy in another directory (final location)
I am not sure about fixing to a version. We can think about a better location, but nice to be able to roll out registry without requiring upgrades
return getStubbedMetrics(); | ||
} | ||
|
||
client.collectDefaultMetrics({ prefix: 'confio_relayer_' }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we want to collect these? Reference:
@@ -262,5 +294,7 @@ async function relayerLoop(link: Link, options: Options, logger: Logger) { | |||
logger.info(`Sleeping ${options.poll} seconds...`); | |||
await sleep(options.poll * 1000); | |||
logger.info('... waking up and checking for packets!'); | |||
|
|||
metrics?.pollCounter?.inc(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With nullish coalescing operator, we can collect metrics nicely w/o condition blocks (if metrics are disabled).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is neat, but I find it less clear than an if
statement.
pollCounter: new client.Counter({ | ||
name: 'poll_counter', | ||
help: 'Poll counter', | ||
}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only an example of a custom metric that should be removed. Let's replace this with some real ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good start! Counter makes sense here for this metric. 👍
Some notes for a future version:
I have noticed though that the naming does not fit the others, yet:
confio_relayer_nodejs_gc_duration_seconds_count{kind="minor"} 80
poll_counter 13
With a common prefix we can find all relayer metrics easily. Otherwise some context is missing as a prometheus instance is likely not exclusive to the relayer.
No idea if the relayer is always bidirectional but as this would involve 2 chains the semantics of poll is not fully clear to me. You can use labels
to mark which chain you polled.
Naming is always hard. You don't need to carry the type inside the name. Maybe "_count" or "_total" is a better fit. I always found this page very helpful: https://prometheus.io/docs/practices/naming/
Also formats with prettier
@@ -262,5 +294,7 @@ async function relayerLoop(link: Link, options: Options, logger: Logger) { | |||
logger.info(`Sleeping ${options.poll} seconds...`); | |||
await sleep(options.poll * 1000); | |||
logger.info('... waking up and checking for packets!'); | |||
|
|||
metrics?.pollCounter?.inc(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is neat, but I find it less clear than an if
statement.
}; | ||
} | ||
|
||
export type Metrics = ReturnType<typeof setupPrometheus>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not define the type here and use it in the function definition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's safer than relying on the inference.
port, | ||
logger, | ||
}: { | ||
enabled: boolean; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make these readonly
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course! 4d93ebd
|
||
import { Logger } from '../create-logger'; | ||
|
||
let initialized = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, do we need some kind of global state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just nice to have to make sure we don't initialize the HTTP server more than once: https://github.com/confio/ts-relayer/pull/173/files#diff-95ad111838746b4a0030c8e37ff66c5de9df11b1a2eab5aa3f59895c9f70016cR28-R32
I thought of implementing a singleton here but since it takes arguments like enabled
I decided to just throw on second call.
🦋 Changeset detectedLatest commit: 19664fd The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
A few minor points that can be addressed in a follow up PR. (There are more prometheus tasks)
|
||
> **NOTE:** Metrics can also be enabled via an environment variable `RELAYER_ENABLE_METRICS=true`, or with an `enableMetrics: true` entry in the `app.yaml` file, as explained in the [config specification](./spec/config.md#configuration). | ||
|
||
The `GET /metrics` endpoint will be exposed by default on port `26660`, which you can override with `--metrics-port` flag, `RELAYER_METRICS_PORT` env variable, or `metricsPort` entry in `app.yaml`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nice list of options to set the port. 🌷
Just a note: port 26660
is used by Tendermint as their default port for Prometheus and therefore for any blockchain based on cosmos-sdk like wasmd. This may cause some conflicts when both run on the same machine. 🤷
If you want to change this then I would go with 8080
where people expect conflicts anyway :-)
README.md
Outdated
1. Start the relayer with metrics enabled | ||
2. Spin up the Prometheus instance: | ||
```sh | ||
docker run -it -v $(pwd):/prometheus -p9090:9090 prom/prometheus --config.file=prometheus.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool, a default prometheus config committed in repo. Better than any docs.
prometheus.yaml
Outdated
global: | ||
scrape_interval: 15s | ||
evaluation_interval: 15s | ||
|
||
scrape_configs: | ||
- job_name: confio_relayer | ||
|
||
scrape_interval: 5s | ||
static_configs: | ||
- targets: ['host.docker.internal:26660'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to commit it to this repo, so it is maintained.
You could put it in a samples
or ops
dir. We can add example systemd
and other configs there as needed.
(Maybe we move it together with the registry.conf
file?)
destConnection?: string; | ||
scanFromSrc?: string; | ||
scanFromDest?: string; | ||
readonly interactive: boolean; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will was here 😄
Boolean(process.env.RELAYER_ENABLE_METRICS) || | ||
app?.enableMetrics || | ||
false; | ||
const metricsPort = resolveOption('metricsPort', { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice.
@@ -171,6 +176,20 @@ export async function start(flags: Flags, logger: Logger) { | |||
flags.scanFromDest, | |||
process.env.RELAYER_SCAN_FROM_DEST | |||
); | |||
const enableMetrics = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you use resolveOption
here like everywhere else?
Ah, there is no support for boolean, only string and integer?
If that is the case, leave it for this PR and make an issue to add boolean support, pointing to this line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly!
logger: Logger; | ||
}): Metrics { | ||
if (initialized) { | ||
throw new Error( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to find bugs in our code fast.
I was going to suggest to only set initialized
to true if enabled
was true, but that means double-registry bugs only surface in production. It is good like this
}: { | ||
enabled: boolean; | ||
port: number; | ||
logger: Logger; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logger is not yet used. Do you plan to use it in this PR or a direct follow up?
If not, can we remove it until there is a clear use? We can always add it back then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is used here: https://github.com/confio/ts-relayer/pull/173/files#diff-95ad111838746b4a0030c8e37ff66c5de9df11b1a2eab5aa3f59895c9f70016cR57
or do you think it's better to do console log? I would rather keep the logger to make sure the endpoint is mentioned in the logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work! 💐 Very happy to see all the system metrics already 🚀
|
||
> **NOTE:** Metrics can also be enabled via an environment variable `RELAYER_ENABLE_METRICS=true`, or with an `enableMetrics: true` entry in the `app.yaml` file, as explained in the [config specification](./spec/config.md#configuration). | ||
|
||
The `GET /metrics` endpoint will be exposed by default on port `26660`, which you can override with `--metrics-port` flag, `RELAYER_METRICS_PORT` env variable, or `metricsPort` entry in `app.yaml`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nice list of options to set the port. 🌷
Just a note: port 26660
is used by Tendermint as their default port for Prometheus and therefore for any blockchain based on cosmos-sdk like wasmd. This may cause some conflicts when both run on the same machine. 🤷
If you want to change this then I would go with 8080
where people expect conflicts anyway :-)
pollCounter: new client.Counter({ | ||
name: 'poll_counter', | ||
help: 'Poll counter', | ||
}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good start! Counter makes sense here for this metric. 👍
Some notes for a future version:
I have noticed though that the naming does not fit the others, yet:
confio_relayer_nodejs_gc_duration_seconds_count{kind="minor"} 80
poll_counter 13
With a common prefix we can find all relayer metrics easily. Otherwise some context is missing as a prometheus instance is likely not exclusive to the relayer.
No idea if the relayer is always bidirectional but as this would involve 2 chains the semantics of poll is not fully clear to me. You can use labels
to mark which chain you polled.
Naming is always hard. You don't need to carry the type inside the name. Maybe "_count" or "_total" is a better fit. I always found this page very helpful: https://prometheus.io/docs/practices/naming/
Closes #133