Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloudwatch alarms #1955

Merged
merged 3 commits into from
Nov 30, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
99 changes: 93 additions & 6 deletions infra/aws/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -23,13 +23,35 @@ const albZoneId = coreInfraStack.getOutput("coreAlbZoneId");
const albHttpsListenerArn = coreInfraStack.getOutput("coreAlbHttpsListenerArn");
// const albData = coreInfraStack.getOutput("coreAlbData");

const snsAlertsTopicArn = coreInfraStack.getOutput("snsAlertsTopicArn");

const defaultTags = {
ManagedBy: "pulumi",
PulumiStack: stack,
Project: "passport"
};

const containerInsightsStatus = stack == "production" ? "enabled" : "disabled"
const logsRetention = Object({
"review": 1,
"staging": 7,
"production": 30
});

const serviceResources = Object({
"review": {
memory: 512, // 512 MiB
cpu: 256 // 0.25 vCPU
},
"staging": {
memory: 512, // 512 MiB
cpu: 256 // 0.25 vCPU
},
"production": {
memory: 2048, // 2GB
cpu: 1024 // 1vCPU
}
});

//////////////////////////////////////////////////////////////
// Service IAM Role
Expand Down Expand Up @@ -179,20 +201,85 @@ const cluster = new aws.ecs.Cluster(`gitcoin`,

const serviceLogGroup = new aws.cloudwatch.LogGroup("passport-iam", {
name: "passport-iam",
retentionInDays: 1, // TODO: make it as a paramater and change it for production & staging
retentionInDays: logsRetention[stack],
tags: {
...defaultTags
}
});

// TaskDefinition

//////////////////////////////////////////////////////////////
// CloudWatch Alerts
//////////////////////////////////////////////////////////////

const unhandledErrorsMetric = new aws.cloudwatch.LogMetricFilter("unhandledErrorsMetric", {
logGroupName: serviceLogGroup.name,
metricTransformation: {
defaultValue: "0",
name: "providerError",
namespace: "/iam/errors/unhandled",
unit: "Count",
value: "1",
},
name: "Unhandled Provider Errors",
pattern: '"UNHANDLED ERROR:" type address',
});

const unhandledErrorsAlarm = new aws.cloudwatch.MetricAlarm("unhandledErrorsAlarm", {
alarmActions: [snsAlertsTopicArn],
comparisonOperator: "GreaterThanOrEqualToThreshold",
datapointsToAlarm: 1,
evaluationPeriods: 1,
insufficientDataActions: [],
metricName: "providerError",
name: "Unhandled Provider Errors",
namespace: "/iam/errors/unhandled",
okActions: [],
period: 21600,
statistic: "Sum",
threshold: 1,
treatMissingData: "notBreaching",
});

const redisFilter = new aws.cloudwatch.LogMetricFilter("redisConnectionErrors", {
logGroupName: serviceLogGroup.name,
metricTransformation: {
defaultValue: "0",
name: "redisConnectionError",
namespace: "/iam/errors/redis",
unit: "Count",
value: "1",
},
name: "Redis Connection Error",
pattern: '"REDIS CONNECTION ERROR:"',
});

const redisErrorAlarm = new aws.cloudwatch.MetricAlarm("redisConnectionErrorsAlarm", {
alarmActions: [snsAlertsTopicArn],
comparisonOperator: "GreaterThanOrEqualToThreshold",
datapointsToAlarm: 1,
evaluationPeriods: 1,
insufficientDataActions: [],
metricName: "redisConnectionError",
name: "Redis Connection Error",
namespace: "/iam/errors/redis",
okActions: [],
period: 21600,
statistic: "Sum",
threshold: 1,
treatMissingData: "notBreaching",
});

//////////////////////////////////////////////////////////////
// ECS Task & Service
//////////////////////////////////////////////////////////////
const taskDefinition = new aws.ecs.TaskDefinition(`passport-iam`, {
family: `passport-iam`,
containerDefinitions: JSON.stringify([{
name: "iam",
image: dockerGtcPassportIamImage,
cpu: 2048,
memory: 4096,
cpu: serviceResources[stack]["cpu"],
memory: serviceResources[stack]["memory"],
links: [],
essential: true,
portMappings: [{
Expand Down Expand Up @@ -394,8 +481,8 @@ const taskDefinition = new aws.ecs.TaskDefinition(`passport-iam`, {
volumesFrom: []
}]),
executionRoleArn: serviceRole.arn,
cpu: "2048",
memory: "4096",
cpu: serviceResources[stack]["cpu"],
memory: serviceResources[stack]["memory"],
networkMode: "awsvpc",
requiresCompatibilities: ["FARGATE"],
tags: {
Expand Down
Loading