Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

133/prometheus #173

Merged
merged 11 commits into from
Apr 28, 2021
19 changes: 19 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,25 @@ There are 3 files that live in the relayer's home.

[Learn more about configuration.](spec/config.md)

## Monitoring
The relayer collects various metrics that [Prometheus](https://prometheus.io/docs/introduction/overview/) instance can consume.

To enable metrics collection, pass `--enable-metrics` flag when starting the relayer:
```sh
ibc-relayer start --enable-metrics
```
> **NOTE:** Metrics can be also enabled via an environment variable `RELAYER_ENABLE_METRICS=true`, or with a `enableMetrics: true` entry in the `app.yaml` file, as explained in the [config specifiation](./spec/config.md#configuration).

The `GET /metrics` endpoint is exposed on a default port `26660`, which you can modify with `--metrics-port` flag/env variable/entry in app.yaml.

### Local setup
1. Start the relayer with metrics enabled
2. Spin up the Prometheus instance:
```sh
docker run -it -v $(pwd):/prometheus -p9090:9090 prom/prometheus --config.file=prometheus.yaml
```
> **NOTE:** Ensure that `--config.file=<path>` flag points at the existing configuration file, you can find an example here: [prometheus.yaml](prometheus.yaml).
3. Open Prometheus dashboard in the browser at [http://localhost:9090](http://localhost:9090)
## Development
[Refer to the development page.](DEVELOPMENT.md)

Expand Down
1 change: 1 addition & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@
"fast-safe-stringify": "2.0.4",
"js-yaml": "4.0.0",
"lodash": "4.17.21",
"prom-client": "13.1.0",
"protobufjs": "6.10.2",
"table": "6.0.7",
"triple-beam": "1.3.0",
Expand Down
10 changes: 10 additions & 0 deletions prometheus.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
global:
scrape_interval: 15s
evaluation_interval: 15s

scrape_configs:
- job_name: confio_relayer

scrape_interval: 5s
static_configs:
- targets: ['host.docker.internal:26660']
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an example configuration file. Is this the right place to store it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to commit it to this repo, so it is maintained.

You could put it in a samples or ops dir. We can add example systemd and other configs there as needed.
(Maybe we move it together with the registry.conf file?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that init command has registry.yaml hardcoded to main branch, so I cannot move registry under samples.

'https://raw.githubusercontent.com/confio/ts-relayer/main/demo/registry.yaml'

One way to "fix" this would be to use parametrized link with version:

`https://raw.githubusercontent.com/confio/ts-relayer/tree/v${packageJson.version}/demo/registry.yaml`

But it fixes the future versions only. Everything till 0.1.3 included will break.

👉 For now, I moved prometheus.yaml under demo. 92287b8

Copy link
Contributor

@ethanfrey ethanfrey Apr 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And let's keep it in demo for now (marked deprecated), and make the main copy in another directory (final location)

I am not sure about fixing to a version. We can think about a better location, but nice to be able to roll out registry without requiring upgrades

40 changes: 37 additions & 3 deletions src/binary/ibc-relayer/commands/start.ts
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ import { resolveHomeOption } from '../../utils/options/shared/resolve-home-optio
import { resolveKeyFileOption } from '../../utils/options/shared/resolve-key-file-option';
import { resolveMnemonicOption } from '../../utils/options/shared/resolve-mnemonic-option';
import { signingClient } from '../../utils/signing-client';
import { Metrics, setupPrometheus } from '../setup-prometheus';

type ResolveHeightsParams = {
scanFromSrc: number | null;
Expand Down Expand Up @@ -94,6 +95,8 @@ type Flags = {
destConnection?: string;
scanFromSrc?: string;
scanFromDest?: string;
enableMetrics: boolean;
metricsPort?: string;
} & LoggerFlags &
LoopFlags;

Expand All @@ -105,15 +108,17 @@ type Options = {
srcConnection: string;
destConnection: string;
heights: RelayedHeights | null;
enableMetrics: boolean;
metricsPort: number;
} & LoopOptions;

// some defaults for looping
export const defaults = {
// check once per minute
poll: 60,
// once per day: 86400s
maxAgeSrc: 86400,
maxAgeDest: 86400,
metricsPort: 26660,
};

export async function start(flags: Flags, logger: Logger) {
Expand Down Expand Up @@ -171,6 +176,20 @@ export async function start(flags: Flags, logger: Logger) {
flags.scanFromDest,
process.env.RELAYER_SCAN_FROM_DEST
);
const enableMetrics =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you use resolveOption here like everywhere else?

Ah, there is no support for boolean, only string and integer?
If that is the case, leave it for this PR and make an issue to add boolean support, pointing to this line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly!

#174

flags.enableMetrics ||
Boolean(process.env.RELAYER_ENABLE_METRICS) ||
app?.enableMetrics ||
false;
const metricsPort = resolveOption('metricsPort', {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice.

integer: true,
required: true,
})(
flags.metricsPort,
process.env.RELAYER_METRICS_PORT,
app?.metricsPort,
defaults.metricsPort
);

const heights = resolveHeights({ scanFromSrc, scanFromDest, home }, logger);

Expand All @@ -189,12 +208,20 @@ export async function start(flags: Flags, logger: Logger) {
maxAgeDest,
once,
heights,
enableMetrics,
metricsPort,
};

await run(options, logger);
}

async function run(options: Options, logger: Logger) {
const metrics = setupPrometheus({
enabled: options.enableMetrics,
port: options.metricsPort,
logger,
});

const registryFilePath = path.join(options.home, registryFile);
const { chains } = loadAndValidateRegistry(registryFilePath);
const srcChain = chains[options.src];
Expand Down Expand Up @@ -224,10 +251,15 @@ async function run(options: Options, logger: Logger) {
logger
);

await relayerLoop(link, options, logger);
await relayerLoop(link, options, logger, metrics);
}

async function relayerLoop(link: Link, options: Options, logger: Logger) {
async function relayerLoop(
link: Link,
options: Options,
logger: Logger,
metrics: Metrics
) {
let nextRelay = options.heights ?? {};
const lastQueriedHeightsFilePath = path.join(
options.home,
Expand Down Expand Up @@ -262,5 +294,7 @@ async function relayerLoop(link: Link, options: Options, logger: Logger) {
logger.info(`Sleeping ${options.poll} seconds...`);
await sleep(options.poll * 1000);
logger.info('... waking up and checking for packets!');

metrics?.pollCounter?.inc();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With nullish coalescing operator, we can collect metrics nicely w/o condition blocks (if metrics are disabled).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is neat, but I find it less clear than an if statement.

}
}
7 changes: 7 additions & 0 deletions src/binary/ibc-relayer/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,13 @@ const startCommand = program
.addOption(mnemonicOption)
.addOption(srcConnection)
.addOption(destConnection)
.option(
'--enable-metrics',
'Enable Prometheus metrics collection and GET /metrics endpoint'
)
.option(
`--metrics-port <port>', 'Specify port for GET /metrics http server (default: ${startDefaults.metricsPort})`
)
.option(
'--poll <frequency>',
`How many seconds we sleep between checking for packets (default: ${startDefaults.poll})`
Expand Down
58 changes: 58 additions & 0 deletions src/binary/ibc-relayer/setup-prometheus.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
import http from 'http';

import client from 'prom-client';

import { Logger } from '../create-logger';

let initialized = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, do we need some kind of global state?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just nice to have to make sure we don't initialize the HTTP server more than once: https://github.com/confio/ts-relayer/pull/173/files#diff-95ad111838746b4a0030c8e37ff66c5de9df11b1a2eab5aa3f59895c9f70016cR28-R32

I thought of implementing a singleton here but since it takes arguments like enabled I decided to just throw on second call.


function getMetrics() {
return {
pollCounter: new client.Counter({
name: 'poll_counter',
help: 'Poll counter',
}),
Comment on lines +11 to +14
Copy link
Contributor Author

@bartmacbartek bartmacbartek Apr 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only an example of a custom metric that should be removed. Let's replace this with some real ones.

Copy link
Contributor

@alpe alpe Apr 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good start! Counter makes sense here for this metric. 👍

Some notes for a future version:
I have noticed though that the naming does not fit the others, yet:

confio_relayer_nodejs_gc_duration_seconds_count{kind="minor"} 80
poll_counter 13

With a common prefix we can find all relayer metrics easily. Otherwise some context is missing as a prometheus instance is likely not exclusive to the relayer.

No idea if the relayer is always bidirectional but as this would involve 2 chains the semantics of poll is not fully clear to me. You can use labels to mark which chain you polled.

Naming is always hard. You don't need to carry the type inside the name. Maybe "_count" or "_total" is a better fit. I always found this page very helpful: https://prometheus.io/docs/practices/naming/

};
}

export type Metrics = ReturnType<typeof setupPrometheus>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not define the type here and use it in the function definition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's safer than relying on the inference.

af3e18c

export function setupPrometheus({
enabled,
port,
logger,
}: {
enabled: boolean;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make these readonly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course! 4d93ebd

port: number;
logger: Logger;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logger is not yet used. Do you plan to use it in this PR or a direct follow up?
If not, can we remove it until there is a clear use? We can always add it back then

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used here: https://github.com/confio/ts-relayer/pull/173/files#diff-95ad111838746b4a0030c8e37ff66c5de9df11b1a2eab5aa3f59895c9f70016cR57

or do you think it's better to do console log? I would rather keep the logger to make sure the endpoint is mentioned in the logs.

}) {
if (initialized) {
throw new Error(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to find bugs in our code fast.

I was going to suggest to only set initialized to true if enabled was true, but that means double-registry bugs only surface in production. It is good like this

`"setupPrometheus" func shouldn't be initialized more than once.`
);
}
initialized = true;

if (!enabled) {
return null;
}

client.collectDefaultMetrics({ prefix: 'confio_relayer_' });
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const server = http.createServer(async (request, response) => {
if (request.method === 'GET' && request.url === '/metrics') {
const metrics = await client.register.metrics();
response.writeHead(200, {
'Content-Type': client.register.contentType,
});
response.end(metrics);

return;
}

response.writeHead(404);
response.end('404');
});
server.listen(port);
logger.info(`Prometheus GET /metrics exposed on port ${port}.`);

return getMetrics();
}
2 changes: 2 additions & 0 deletions src/binary/types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@ export type AppConfig = {
destConnection?: string;
mnemonic?: string;
keyFile?: string;
enableMetrics?: boolean;
metricsPort?: number;
};

export type LoggerFlags = {
Expand Down
2 changes: 2 additions & 0 deletions src/binary/utils/load-and-validate-app.ts
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,8 @@ export function loadAndValidateApp(home: string) {
destConnection: { type: 'string', nullable: true },
mnemonic: { type: 'string', nullable: true },
keyFile: { type: 'string', nullable: true },
enableMetrics: { type: 'boolean', nullable: true },
metricsPort: { type: 'number', nullable: true },
},
};
const validate = ajv.compile(schema);
Expand Down
19 changes: 19 additions & 0 deletions yarn.lock
Original file line number Diff line number Diff line change
Expand Up @@ -958,6 +958,11 @@ binary-extensions@^2.0.0:
resolved "https://registry.npmjs.org/binary-extensions/-/binary-extensions-2.2.0.tgz"
integrity sha512-jDctJ/IVQbZoJykoeHbhXpOlNBqGNcwXJKJog42E5HDPUwQTSdjCHdihjj0DlnheQ7blbT6dHOafNAiS8ooQKA==

[email protected]:
version "1.0.1"
resolved "https://registry.yarnpkg.com/bintrees/-/bintrees-1.0.1.tgz#0e655c9b9c2435eaab68bf4027226d2b55a34524"
integrity sha1-DmVcm5wkNeqraL9AJyJtK1WjRSQ=

bip39@^3.0.2:
version "3.0.3"
resolved "https://registry.npmjs.org/bip39/-/bip39-3.0.3.tgz"
Expand Down Expand Up @@ -3513,6 +3518,13 @@ progress@^2.0.0:
resolved "https://registry.npmjs.org/progress/-/progress-2.0.3.tgz"
integrity sha512-7PiHtLll5LdnKIMw100I+8xJXR5gW2QwWYkT6iJva0bXitZKa/XMrSbdmg3r2Xnaidz9Qumd0VPaMrZlF9V9sA==

[email protected]:
version "13.1.0"
resolved "https://registry.yarnpkg.com/prom-client/-/prom-client-13.1.0.tgz#1185caffd8691e28d32e373972e662964e3dba45"
integrity sha512-jT9VccZCWrJWXdyEtQddCDszYsiuWj5T0ekrPszi/WEegj3IZy6Mm09iOOVM86A4IKMWq8hZkT2dD9MaSe+sng==
dependencies:
tdigest "^0.1.1"

[email protected], protobufjs@^6.8.8, protobufjs@~6.10.2:
version "6.10.2"
resolved "https://registry.npmjs.org/protobufjs/-/protobufjs-6.10.2.tgz"
Expand Down Expand Up @@ -4141,6 +4153,13 @@ [email protected], table@^6.0.4:
slice-ansi "^4.0.0"
string-width "^4.2.0"

tdigest@^0.1.1:
version "0.1.1"
resolved "https://registry.yarnpkg.com/tdigest/-/tdigest-0.1.1.tgz#2e3cb2c39ea449e55d1e6cd91117accca4588021"
integrity sha1-Ljyyw56kSeVdHmzZEReszKRYgCE=
dependencies:
bintrees "1.0.1"

[email protected]:
version "6.0.1"
resolved "https://registry.npmjs.org/teeny-request/-/teeny-request-6.0.1.tgz"
Expand Down