This page describes the process for configuring logging and monitoring of an FDA MyStudies deployment.
Cloud Logging can be used to configure a log viewer for viewing, querying, and analysis of logs. Log sinks can be set up to manage log retention policies. Cloud Logging can also be used to create log-based metrics and alerts.
Cloud Monitoring is used for setting up charts, dashboards, alerts, and notifications. It also can be used for Service Level Objective (SLO) monitoring and uptime checks.
Cloud Logging API can be used for real-time log management analysis by configuring the following components:
- Logs viewer
- Logs alerting
- Logs Router, Retention & Storage
- Log buckets and views
- Error Reporting
- Audit Logging
Logs Viewer and Logs Explorer can be used to filter and analyze logs during an incident.
Logs-based alerting is integrated with Cloud Monitoring and can be used to set alerts on the logs events.
Cloud Logging uses two predefined log sinks for each GCP Project _Required
and _Default
. All logs that are generated in GCP projects are automatically processed through these two log sinks and then are stored in the correspondingly named log buckets. Cloud Logging doesn't charge to route logs, but destination charges may apply.
Bucket Configurations:
Name | Description | Monthly | Retention | Type |
---|---|---|---|---|
_Default | Default bucket | Billable | 30 days | global |
_Required | Audit bucket | Not billed | 400 days | global |
Logs Storage is used to create logs buckets and to monitor usage and retention.
Error Reporting API can be used to aggregate and display errors produced in the running cloud services. The centralized error management interface can be used to find errors during the deployments and infrastructure update process.
Cloud Audit Logs can be used to help meet security, auditing, and compliance needs by maintaining audit trails such as admin activity, data access, and system events. These are configured to be stored in coldline storage for longer-term storage.
Name | Location | Type | Life Cycle |
---|---|---|---|
<PREFIX>-<ENV>-7yr-audit-logs |
Region | Coldline | 2555+ days since object was update |
The Cloud Monitoring services will be used to monitor the project resources. Different monitoring and alerts are set up for the various projects involved in a deployment, including:
- Apps - the project storing container images and running the compute (k8s) resources
- Audit - the project storing audit logs from the platform and applications
- Data - the project with the CloudSQL databases
- Devops - the project for the CI/CD pipeline
- Firebase - the project used for storing study response data
- Networks - the project for the network policies and firewalls
- Secrets - the project for managing the deployment secrets such as client IDs and credentials
The types of monitoring used include:
- Dashboard Monitoring
- Metrics based alerts with email notifications
- Logs based alerts with email notifications
- Incident monitoring dashboard
Monitoring in the Apps project includes Cloud Storage, Kubernetes Engine (GKE), Compute Engine resources, Load Balancers, and Cloud Pub/Sub. There are metrics based alerts for GKE and SSL certificates. Logs based alerts include Cloud Build and GKE errors. This project also uses uptime and health checks for the GKE deployments. Instructions for creating the individual alerts can be found in the section Configuring Alerts and Notifications.
Dashboard | Resources to be Monitored |
---|---|
Cloud Storage | Requests, Network Traffic Sent, Network Traffic Received, Object Count, Object Size |
GKE | Alerts, Container restarts, Error logs, CPU utilization, Memory utilization, Disk utilization |
VM Instances ( GKE Node VM’s) | CPU , Memory, Disk Utilization, Network Traffic |
External HTTP(S) Load Balancers | 5xx Error Ratio, Backend Latency (95th Percentile), Total Latency (95th Percentile), Forwarding Rules, Health Status |
Cloud Pub/Sub | Publish Message Operations, Average Message Size |
Alert | Threshold | Description |
---|---|---|
SSL certificate expiring soon | 1 Month |
Expiration date of SSL certificates |
Kubernetes Container - CPU utilization | >= 70% |
CPU utilization percentage |
Kubernetes Container - Memory usage | >= 70% |
Memory usage percentage |
Kubernetes Container - Restart count | 20 |
Container restarting more than a specified number of times |
k8s_pod - Volume utilization | >= 70% |
Volume utilization of the kubernetes pod |
k8s_pod - autoscaler_panic_mode | 1 or 0 |
Value of 1 represents autoscaler failing |
Kubernetes Node - Memory usage | >= 70% |
Memory usage percentage of kubernetes node |
Kubernetes Node - node_network_up | 1 or 0 |
Value of 0 represents the node network being down |
Kubernetes Node - CPU allocatable utilization | >= 70% |
CPU utilization percentage of kubernetes node |
VM - CPU utilization | >= 70% |
CPU utilization percentage of underlying VM |
VM - disk utilization too high | >= 70% |
Disk utilization percentage of underlying VM |
VM - p95 networking latency too high | < 1s |
Network latency of underlying VM |
VM - memory utilization too high | >= 70% |
Memory utilization percentage of underlying VM |
Alert | Metrics Query | Description |
---|---|---|
Cloud Build failure | resource.type="build", logName=("projects/<projectname>/logs/cloudaudit.googleapis.com%2f Activity" OR "projects/<projectname>/logs/cloudaudit.googleapis.com%2Fdata_access" OR "projects/<projectname>/logs/cloudbuild"), severity>=ERROR, |
Trigger an alert when Cloud Build experiences a failure or error |
GKE container failure | resource.type="k8s_container", resource.labels.cluster_name="<name of the cluster>", resource.labels.namespace_name="default", severity=(EMERGENCY OR ALERT OR CRITICAL OR ERROR), NOT textPayload:New connection, NOT textPayload:Client closed, |
Trigger an alert when a GKE container has a log message with severity of ERROR, CRITICAL, ALERT, or EMERGENCY |
Uptime checks can be created for the following services which will test accessing the services from various global locations. Use the host name and path below Instructions when configuring the health checks following the instructions in the section Configuring Alerts and Notifications.
Application | Host Name | Path |
---|---|---|
Auth Server | participants.<prefix>-<env>.<domain> |
/auth-server/healthCheck |
Participant Consent Datastore | participants.<prefix>-<env>.<domain> |
/participant-consent-datastore/healthCheck |
Participant Enroll Datastore | participants.<prefix>-<env>.<domain> |
/participant-enroll-datastore/healthCheck |
Participant Manager | participants.<prefix>-<env>.<domain> |
/participant-manager/index.html |
Participant Manager Datastore | participants.<prefix>-<env>.<domain> |
/participant-manager-datastore/healthCheck |
Participant User Datastore | participants.<prefix>-<env>.<domain> |
/participant-user-datastore/healthCheck |
Response Datastore | participants.<prefix>-<env>.<domain> |
/response-datastore/healthCheck |
Study Datastore | studies.<prefix>-<env>.<domain> |
/study-datastore/healthCheck |
Study Builder | studies.<prefix>-<env>.<domain> |
/studybuilder/healthCheck.do |
Monitoring in the Audit project includes Cloud Storage, and BigQuery. There are metrics based alerts for Cloud Storage and BigQuery. Optional logs based alerts can include the amount of logs ingested. Instructions for creating the individual alerts can be found in the section Configuring Alerts and Notifications.
Dashboard | Resources to be Monitored |
---|---|
Cloud Storage | Requests, Network Traffic Sent, Network Traffic Received, Object Count, Object Size |
BigQuery | Tables, Stored Bytes, Uploaded Rows |
Alert | Threshold | Description |
---|---|---|
Bigquery - dataset | # |
Number of bytes stored |
GCS Bucket - Sent bytes | > 1024MB |
Cloud Storage bucket sent data in bytes |
GCS Bucket - Received bytes | > 1024MB |
Cloud Storage bucket received data in bytes |
GCS Bucket - Total bytes | >= 80% |
Cloud Storage cumulative data in bytes |
GCS Bucket - Object count | 5000 |
Cloud Storage bucket object counts |
Log based alerts are not necessary for this project but if needed can be configured to collect payload metrics.
Alert | Type | Description |
---|---|---|
Log bucket monthly bytes ingested | Log | Log bucket month-to-date bytes ingested. |
Log bytes ingested | Log | Log bytes ingested to know the billing of ingested data |
Monitoring in the Data project includes Cloud Storage, Cloud Pub/Sub, and Cloud SQL. There are metrics based alerts for SSL certificates, Cloud SQL, Cloud Storage, Cloud Pub/Sub, and Cloud Functions. This project does not contain logs based alerts. Instructions for creating the individual alerts can be found in the section Configuring Alerts and Notifications.
Dashboard | Resources to be Monitored |
---|---|
Cloud Storage | Requests, Network Traffic Sent, Network Traffic Received, Object Count, Object Size |
Cloud Pub/Sub | Publish Message Operations, Average Message Size |
Cloud SQL | Queries, Network Connections, CPU Utilization, Memory Utilization, Disk Utilization |
Alert | Threshold | Description |
---|---|---|
SSL certificate expiring soon | 1 Month |
Expiration date of SSL certificates |
Cloud SQL Available for failover | > 0 |
A value >0 indicates the failover operation is available on the instance. |
Cloud SQL Connections | < 500 |
Number of SQL Connections |
Cloud SQL Database - Memory utilization | >= 70% |
Memory utilization percentage of the Cloud SQL instance |
Cloud SQL Database - Disk utilization | >= 70% |
Disk utilization percentage of the Cloud SQL instance |
Cloud SQL Instances State | 1 or 0 |
Cloud SQL instance state - a value 1 indicates the instance is up |
Cloud SQL Database - CPU utilization | >=70% |
CPU utilization percentage of the Cloud SQL instance |
Cloud SQL InnoDB pages written | >=70% |
InnoDB pages written percentage |
Cloud SQL InnoDB pages read | >= 70% |
InnoDB pages read percentage |
Cloud SQL - Disk write IO | >= 1000/s |
Disk IO write count of the Cloud SQL instance |
Cloud SQL - Disk read IO | >= 1000/s |
Disk IO read count of the Cloud SQL instance |
Cloud SQL Database - Queries | # |
Number of running SQL Queries |
GCS Bucket - Sent bytes | 1024 MB |
Cloud Storage bucket sent data in bytes |
GCS Bucket - Received bytes | 1024 MB |
Cloud Storage bucket received data in bytes |
GCS Bucket - Total bytes | >= 80% |
Cloud Storage cumulative data in bytes |
GCS Bucket - Object count | 5000 |
Cloud Storage bucket object count |
Cloud Pub/Sub-Topic-Publish requests | >= 10MB |
Cloud Pub/Sub Topics Publish message request |
Cloud Pub/Sub- subscription-Backlog size | <= 1MB |
Size of Cloud Pub/Sub subscription backlog |
Oldest unacked message age | <= 1s |
Age (in seconds) of the oldest unacknowledged message |
Cloud Function - Execution times | >= 9 minutes |
Cloud Function Execution time |
Monitoring in the Devops project is primarily based around Cloud storage usage with metrics based alerts triggering on Cloud Storage criteria. Logs based alerts are used to track failures in Cloud Build. Instructions for creating the individual alerts can be found in the section Configuring Alerts and Notifications.
Below are the recommended dashboard components and alerts for the Devops project
| Dashboard | Resources to be Monitored | | Cloud Storage | Requests, Network Traffic Sent, Network Traffic Received, Object Count, Object Size |
Alert | Threshold | Description |
---|---|---|
GCS Bucket - Sent bytes | >= 1024 MB |
Cloud Storage bucket sent data in bytes |
GCS Bucket - Received bytes | >= 1024 MB |
Cloud Storage bucket received data in bytes |
GCS Bucket - Total bytes | >= 80% |
Cloud Storage cumulative data in bytes |
GCS Bucket - Object count | >= 5000 |
Cloud Storage bucket Object counts |
Alert | Metric Query | Description |
---|---|---|
Cloud Build failure | resource.type="build", logName=("projects/<projectname>/logs/cloudaudit.googleapis.com%2f Activity" OR "projects/<projectname>/logs/cloudaudit.googleapis.com%2Fdata_access" OR "projects/<projectname>/logs/cloudbuild"), severity>=ERROR, |
Trigger an alert when Cloud Build experiences a failure or error |
Monitoring in the Data project includes Cloud Storage, Cloud Pub/Sub, and firewall configurations. There are metrics based alerts for SSL certificates, Cloud Storage, Cloud Pub/Sub, Firestore, and Cloud Functions. Optional logs based alerts can include the amount of logs ingested. Instructions for creating the individual alerts can be found in the section Configuring Alerts and Notifications.
Dashboard | Resources to be Monitored |
---|---|
Cloud Storage | Requests, Network Traffic Sent, Network Traffic Received, Object Count, Object Size |
Cloud Pub/Sub | Tables, Stored Bytes, Uploaded Rows |
App Engine ( informational Dashboard) | App Engine Dashboard will give the information about the firewall configuration. |
Alert | Threshold | Description |
---|---|---|
SSL certificate expiring soon | 1 Month |
Expiration date of SSL certificates |
GCS Bucket - Object count | >= 5000 |
Total number of objects grouped by storage class. This value is measured once per day, and the value is repeated at each sampling interval throughout the day. For this metric, the sampling period is a reporting period, not a measurement period. |
GCS Bucket - Sent bytes | > 1024MB |
Cloud Storage bucket sent data in bytes |
GCS Bucket - Received bytes | > 1024MB |
Cloud Storage bucket received data in bytes |
GCS Bucket - Total bytes | >= 80% |
Cloud storage cumulative data in bytes |
Oldest unacked message age | < 1s |
Age (in seconds) of the oldest unacknowledged message |
Pub/Sub - Publish message size | 10MB |
Distribution of publish message sizes (in bytes) |
Firestore-instance - read | # |
Number of successful document reads from queries or lookups |
Firestore-instance - write | # |
Number of successful document writes |
Firestore-instance - delete | # |
Number of document deletes |
Cloud Function - Execution times | >= 9 minutes |
Cloud Function execution time |
Log based alerts are not necessary for this project but if needed can be configured to collect payload metrics.
| Alert | Type | Description | | Log bucket monthly bytes ingested | Log | Log bucket month-to-date bytes ingested. | | Log bytes ingested | Log | Log bytes ingested to know the billing of ingested data |
Monitoring in the Networks project includes GCE VMs, network traffic and firewall configuration. There are metrics based alerts for the VPC usage and VM utilization. This project does not contain logs based alerts. Instructions for creating the individual alerts can be found in the section Configuring Alerts and Notifications.
Dashboard | Resources to be Monitored |
---|---|
VM instances | CPU , Memory, Disk Utilization, Network Traffic |
Firewalls | Firewall configuration information |
Alert | Type | Description |
---|---|---|
Instances Per VPC Network quota limit | # |
Quota limit for compute.googleapis.com/instances_per_vpc_network |
Instances Per VPC Network quota usage | # |
Current usage on quota metric compute.googleapis.com/instances_per_vpc_network |
VM - CPU utilization | >= 80% |
CPU utilization percentage |
VM - disk utilization | >= 80% |
Disk utilization percentage |
VM - memory utilization | >= 80% |
Memory utilization percentage |
There are not recommended monitoring or alerts to be configured for the Secrets project.
To configure monitoring dashboards go to the Dashboards Overview page in the Monitoring section of the console and use the project selector at the top to select the project. Use the Create Dashboard
button to create individual dashboards and then configure the recommended resources for the project.
Configure Notification Channels
When creating an alerting policy, you can select a configured notification channel and add it to your policy. You can pre-configure your notification channels, or you can configure them as part of the process of creating an alerting policy.
Email notification channels can also be created during the creation of an alerting policy. For more information, see Creating a channel on demand. If you use a group email address as the notification channel for an alerting policy, make sure the group is configured to allow incoming mail from [[email protected]](mailto:[email protected])
.
To configure a notification channel, you must have one of the following Identity and Access Management roles on the project being monitored
Monitoring Notification Channel Editor
Monitoring Editor
Monitoring Admin
Project Editor
Project Owner
A notification channel must be created for each project. Use the following steps to configure a notification channel:
- Go to the Alerting page in the Monitoring section of the console and use the project selector at the top to select the project
- Click the
Edit Notification Channels
button - If creating an email channel, locate the channel type
Email
, clickAdd New
and follow the instructions - If creating another type of notification channel such as a mobile app, PagerDuty, SMS, Slack, Pub/Sub, or webhooks, see the instructions for Creating channels.
- Go to the Uptime checks page in the Monitoring section of the console and use the project selector at the top to select the project
- Click
Create Uptime Check
- Enter a descriptive title such as:
Auth-Server-Health Check
- In the Target, select the protocol
HTTPS
- Choose the Resource Type
URL
- Enter the hostname and path as specified in the project instructions above
- Select a frequency of how often you would like to run the checks
- If you want to only run checks from certain geographic regions, you can edit this in
More Target Options
- Select a value for the response timeout to use and make sure the box
Log check failures
is checked to save errors to Cloud Logging - Give the alert a name and select one of your pre-configured notification channels
- Verify the configuration using the
Test
button and thenCreate
once the settings are confirmed
- Go to the Alerting page in the Monitoring section of the console and use the project selector at the top to select the project
- Click on the
Create Policy
button - Click
Add Condition
- Search for the target resource type, metric, and condition specified in the project recommendations above
- After adding the condition, select one of your pre-configured notification channels and add auto-close duration before reviewing the configuration and creating the policy
- Go to the Logs Explorer page in the Logging section of the console and use the project selector at the top to select the project
- In the Query field enter the metrics query specified in the project recommendations above and then run the query to validate the results
- In the query results pane click on the
Actions
menu and thenCreate log alert
- In the alert details pane, add a name and description
- In
Choose logs to include in the alert
check the query and results by clickingPreview logs
- Select a minimum time between notifications
- Select one of your pre-configured notification channels and save the alert