Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

133/prometheus #173

Merged
merged 11 commits into from
Apr 28, 2021
6 changes: 6 additions & 0 deletions .changeset/gorgeous-houses-listen.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
"@confio/relayer": patch
---

Add support for Prometheus monitoring.
Read more: https://github.com/confio/ts-relayer#monitoring
24 changes: 24 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,30 @@ There are 3 files that live in the relayer's home.

[Learn more about configuration.](spec/config.md)

## Monitoring

The relayer collects various metrics that a [Prometheus](https://prometheus.io/docs/introduction/overview/) instance can consume.

To enable metrics collection, pass the `--enable-metrics` flag when starting the relayer:

```sh
ibc-relayer start --enable-metrics
```

> **NOTE:** Metrics can also be enabled via an environment variable `RELAYER_ENABLE_METRICS=true`, or with an `enableMetrics: true` entry in the `app.yaml` file, as explained in the [config specification](./spec/config.md#configuration).

The `GET /metrics` endpoint will be exposed by default on port `26660`, which you can override with `--metrics-port` flag, `RELAYER_METRICS_PORT` env variable, or `metricsPort` entry in `app.yaml`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice list of options to set the port. 🌷
Just a note: port 26660 is used by Tendermint as their default port for Prometheus and therefore for any blockchain based on cosmos-sdk like wasmd. This may cause some conflicts when both run on the same machine. 🤷
If you want to change this then I would go with 8080 where people expect conflicts anyway :-)


### Local setup

1. Start the relayer with metrics enabled
2. Spin up the Prometheus instance:
```sh
docker run -it -v $(pwd):/prometheus -p9090:9090 prom/prometheus --config.file=demo/prometheus.yaml
```
> **NOTE:** Ensure that `the --config.file=<path>` flag points at the existing configuration file. You can find an example here: [prometheus.yaml](demo/prometheus.yaml).
3. Open the Prometheus dashboard in a browser at [http://localhost:9090](http://localhost:9090)

## Development

[Refer to the development page.](DEVELOPMENT.md)
Expand Down
10 changes: 10 additions & 0 deletions demo/prometheus.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
global:
scrape_interval: 15s
evaluation_interval: 15s

scrape_configs:
- job_name: confio_relayer

scrape_interval: 5s
static_configs:
- targets: ['host.docker.internal:26660']
1 change: 1 addition & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@
"fast-safe-stringify": "2.0.4",
"js-yaml": "4.0.0",
"lodash": "4.17.21",
"prom-client": "13.1.0",
"protobufjs": "6.10.2",
"table": "6.0.7",
"triple-beam": "1.3.0",
Expand Down
90 changes: 62 additions & 28 deletions src/binary/ibc-relayer/commands/start.ts
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ import { resolveHomeOption } from '../../utils/options/shared/resolve-home-optio
import { resolveKeyFileOption } from '../../utils/options/shared/resolve-key-file-option';
import { resolveMnemonicOption } from '../../utils/options/shared/resolve-mnemonic-option';
import { signingClient } from '../../utils/signing-client';
import { Metrics, setupPrometheus } from '../setup-prometheus';

type ResolveHeightsParams = {
scanFromSrc: number | null;
Expand Down Expand Up @@ -67,53 +68,57 @@ function resolveHeights(
}

type LoopFlags = {
poll?: string;
maxAgeSrc?: string;
maxAgeDest?: string;
once: boolean;
readonly poll?: string;
readonly maxAgeSrc?: string;
readonly maxAgeDest?: string;
readonly once: boolean;
};
type LoopOptions = {
// how many seconds we sleep between relaying batches
poll: number;
readonly poll: number;
// number of seconds old the client on chain A can be
maxAgeSrc: number;
readonly maxAgeSrc: number;
// number of seconds old the client on chain B can be
maxAgeDest: number;
readonly maxAgeDest: number;
// if set to 'true' quit after one pass
once: boolean;
readonly once: boolean;
};

type Flags = {
interactive: boolean;
home?: string;
src?: string;
dest?: string;
keyFile?: string;
mnemonic?: string;
srcConnection?: string;
destConnection?: string;
scanFromSrc?: string;
scanFromDest?: string;
readonly interactive: boolean;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will was here 😄

readonly home?: string;
readonly src?: string;
readonly dest?: string;
readonly keyFile?: string;
readonly mnemonic?: string;
readonly srcConnection?: string;
readonly destConnection?: string;
readonly scanFromSrc?: string;
readonly scanFromDest?: string;
readonly enableMetrics: boolean;
readonly metricsPort?: string;
} & LoggerFlags &
LoopFlags;

type Options = {
home: string;
src: string;
dest: string;
mnemonic: string;
srcConnection: string;
destConnection: string;
heights: RelayedHeights | null;
readonly home: string;
readonly src: string;
readonly dest: string;
readonly mnemonic: string;
readonly srcConnection: string;
readonly destConnection: string;
readonly heights: RelayedHeights | null;
readonly enableMetrics: boolean;
readonly metricsPort: number;
} & LoopOptions;

// some defaults for looping
export const defaults = {
// check once per minute
poll: 60,
// once per day: 86400s
maxAgeSrc: 86400,
maxAgeDest: 86400,
metricsPort: 26660,
};

export async function start(flags: Flags, logger: Logger) {
Expand Down Expand Up @@ -171,6 +176,20 @@ export async function start(flags: Flags, logger: Logger) {
flags.scanFromDest,
process.env.RELAYER_SCAN_FROM_DEST
);
const enableMetrics =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you use resolveOption here like everywhere else?

Ah, there is no support for boolean, only string and integer?
If that is the case, leave it for this PR and make an issue to add boolean support, pointing to this line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly!

#174

flags.enableMetrics ||
Boolean(process.env.RELAYER_ENABLE_METRICS) ||
app?.enableMetrics ||
false;
const metricsPort = resolveOption('metricsPort', {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice.

integer: true,
required: true,
})(
flags.metricsPort,
process.env.RELAYER_METRICS_PORT,
app?.metricsPort,
defaults.metricsPort
);

const heights = resolveHeights({ scanFromSrc, scanFromDest, home }, logger);

Expand All @@ -189,12 +208,20 @@ export async function start(flags: Flags, logger: Logger) {
maxAgeDest,
once,
heights,
enableMetrics,
metricsPort,
};

await run(options, logger);
}

async function run(options: Options, logger: Logger) {
const metrics = setupPrometheus({
enabled: options.enableMetrics,
port: options.metricsPort,
logger,
});

const registryFilePath = path.join(options.home, registryFile);
const { chains } = loadAndValidateRegistry(registryFilePath);
const srcChain = chains[options.src];
Expand Down Expand Up @@ -224,10 +251,15 @@ async function run(options: Options, logger: Logger) {
logger
);

await relayerLoop(link, options, logger);
await relayerLoop(link, options, logger, metrics);
}

async function relayerLoop(link: Link, options: Options, logger: Logger) {
async function relayerLoop(
link: Link,
options: Options,
logger: Logger,
metrics: Metrics
) {
let nextRelay = options.heights ?? {};
const lastQueriedHeightsFilePath = path.join(
options.home,
Expand Down Expand Up @@ -262,5 +294,7 @@ async function relayerLoop(link: Link, options: Options, logger: Logger) {
logger.info(`Sleeping ${options.poll} seconds...`);
await sleep(options.poll * 1000);
logger.info('... waking up and checking for packets!');

metrics?.pollCounter?.inc();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With nullish coalescing operator, we can collect metrics nicely w/o condition blocks (if metrics are disabled).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is neat, but I find it less clear than an if statement.

}
}
7 changes: 7 additions & 0 deletions src/binary/ibc-relayer/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,13 @@ const startCommand = program
.addOption(mnemonicOption)
.addOption(srcConnection)
.addOption(destConnection)
.option(
'--enable-metrics',
'Enable Prometheus metrics collection and GET /metrics endpoint'
)
.option(
`--metrics-port <port>', 'Specify port for GET /metrics http server (default: ${startDefaults.metricsPort})`
)
.option(
'--poll <frequency>',
`How many seconds we sleep between checking for packets (default: ${startDefaults.poll})`
Expand Down
60 changes: 60 additions & 0 deletions src/binary/ibc-relayer/setup-prometheus.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
import http from 'http';

import client from 'prom-client';

import { Logger } from '../create-logger';

let initialized = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, do we need some kind of global state?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just nice to have to make sure we don't initialize the HTTP server more than once: https://github.com/confio/ts-relayer/pull/173/files#diff-95ad111838746b4a0030c8e37ff66c5de9df11b1a2eab5aa3f59895c9f70016cR28-R32

I thought of implementing a singleton here but since it takes arguments like enabled I decided to just throw on second call.


function getMetrics() {
return {
pollCounter: new client.Counter({
name: 'poll_counter',
help: 'Poll counter',
}),
Comment on lines +11 to +14
Copy link
Contributor Author

@bartmacbartek bartmacbartek Apr 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only an example of a custom metric that should be removed. Let's replace this with some real ones.

Copy link
Contributor

@alpe alpe Apr 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good start! Counter makes sense here for this metric. 👍

Some notes for a future version:
I have noticed though that the naming does not fit the others, yet:

confio_relayer_nodejs_gc_duration_seconds_count{kind="minor"} 80
poll_counter 13

With a common prefix we can find all relayer metrics easily. Otherwise some context is missing as a prometheus instance is likely not exclusive to the relayer.

No idea if the relayer is always bidirectional but as this would involve 2 chains the semantics of poll is not fully clear to me. You can use labels to mark which chain you polled.

Naming is always hard. You don't need to carry the type inside the name. Maybe "_count" or "_total" is a better fit. I always found this page very helpful: https://prometheus.io/docs/practices/naming/

};
}

export type Metrics = {
pollCounter: client.Counter<string>;
} | null;
export function setupPrometheus({
enabled,
port,
logger,
}: {
enabled: boolean;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make these readonly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course! 4d93ebd

port: number;
logger: Logger;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logger is not yet used. Do you plan to use it in this PR or a direct follow up?
If not, can we remove it until there is a clear use? We can always add it back then

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used here: https://github.com/confio/ts-relayer/pull/173/files#diff-95ad111838746b4a0030c8e37ff66c5de9df11b1a2eab5aa3f59895c9f70016cR57

or do you think it's better to do console log? I would rather keep the logger to make sure the endpoint is mentioned in the logs.

}): Metrics {
if (initialized) {
throw new Error(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to find bugs in our code fast.

I was going to suggest to only set initialized to true if enabled was true, but that means double-registry bugs only surface in production. It is good like this

`"setupPrometheus" func shouldn't be initialized more than once.`
);
}
initialized = true;

if (!enabled) {
return null;
}

client.collectDefaultMetrics({ prefix: 'confio_relayer_' });
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const server = http.createServer(async (request, response) => {
if (request.method === 'GET' && request.url === '/metrics') {
const metrics = await client.register.metrics();
response.writeHead(200, {
'Content-Type': client.register.contentType,
});
response.end(metrics);

return;
}

response.writeHead(404);
response.end('404');
});
server.listen(port);
logger.info(`Prometheus GET /metrics exposed on port ${port}.`);

return getMetrics();
}
2 changes: 2 additions & 0 deletions src/binary/types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@ export type AppConfig = {
destConnection?: string;
mnemonic?: string;
keyFile?: string;
enableMetrics?: boolean;
metricsPort?: number;
};

export type LoggerFlags = {
Expand Down
2 changes: 2 additions & 0 deletions src/binary/utils/load-and-validate-app.ts
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,8 @@ export function loadAndValidateApp(home: string) {
destConnection: { type: 'string', nullable: true },
mnemonic: { type: 'string', nullable: true },
keyFile: { type: 'string', nullable: true },
enableMetrics: { type: 'boolean', nullable: true },
metricsPort: { type: 'number', nullable: true },
},
};
const validate = ajv.compile(schema);
Expand Down
19 changes: 19 additions & 0 deletions yarn.lock
Original file line number Diff line number Diff line change
Expand Up @@ -1199,6 +1199,11 @@ binary-extensions@^2.0.0:
resolved "https://registry.npmjs.org/binary-extensions/-/binary-extensions-2.2.0.tgz"
integrity sha512-jDctJ/IVQbZoJykoeHbhXpOlNBqGNcwXJKJog42E5HDPUwQTSdjCHdihjj0DlnheQ7blbT6dHOafNAiS8ooQKA==

[email protected]:
version "1.0.1"
resolved "https://registry.yarnpkg.com/bintrees/-/bintrees-1.0.1.tgz#0e655c9b9c2435eaab68bf4027226d2b55a34524"
integrity sha1-DmVcm5wkNeqraL9AJyJtK1WjRSQ=

bip39@^3.0.2:
version "3.0.3"
resolved "https://registry.npmjs.org/bip39/-/bip39-3.0.3.tgz"
Expand Down Expand Up @@ -4043,6 +4048,13 @@ progress@^2.0.0:
resolved "https://registry.npmjs.org/progress/-/progress-2.0.3.tgz"
integrity sha512-7PiHtLll5LdnKIMw100I+8xJXR5gW2QwWYkT6iJva0bXitZKa/XMrSbdmg3r2Xnaidz9Qumd0VPaMrZlF9V9sA==

[email protected]:
version "13.1.0"
resolved "https://registry.yarnpkg.com/prom-client/-/prom-client-13.1.0.tgz#1185caffd8691e28d32e373972e662964e3dba45"
integrity sha512-jT9VccZCWrJWXdyEtQddCDszYsiuWj5T0ekrPszi/WEegj3IZy6Mm09iOOVM86A4IKMWq8hZkT2dD9MaSe+sng==
dependencies:
tdigest "^0.1.1"

[email protected], protobufjs@^6.8.8, protobufjs@~6.10.2:
version "6.10.2"
resolved "https://registry.npmjs.org/protobufjs/-/protobufjs-6.10.2.tgz"
Expand Down Expand Up @@ -4742,6 +4754,13 @@ [email protected], table@^6.0.4:
slice-ansi "^4.0.0"
string-width "^4.2.0"

tdigest@^0.1.1:
version "0.1.1"
resolved "https://registry.yarnpkg.com/tdigest/-/tdigest-0.1.1.tgz#2e3cb2c39ea449e55d1e6cd91117accca4588021"
integrity sha1-Ljyyw56kSeVdHmzZEReszKRYgCE=
dependencies:
bintrees "1.0.1"

[email protected]:
version "6.0.1"
resolved "https://registry.npmjs.org/teeny-request/-/teeny-request-6.0.1.tgz"
Expand Down