Overview
Stackdriver Monitoring to monitor signals and build operations in your Kubernetes Engine clusters. Stackdriver Monitoring can access metrics about CPU utilization, some disk traffic metrics, network traffic, and uptime information. Stackdriver Monitoring uses the Monitoring agent to access additional system resources and application services in virtual machine instances.
Rationale
By Enabling Stackdriver Monitoring you will have system metrics and custom metrics. System metrics are measurements of the cluster's infrastructure, such as CPU or memory usage. For system metrics, Stackdriver creates a Deployment that periodically connects to each node and collects metrics about its Pods and containers, then sends the metrics to Stackdriver. Metrics for usage of system resources are collected from the CPU, Memory, Evictable memory, Non-evictable memory, and Disk sources.
Remediation guidance
Using Console
- Go to Kubernetes GCP Console by visiting https://console.cloud.google.com/kubernetes/list?
- Select reported Kubernetes clusters for which monitoring is disabled
- Click on EDIT button and Set 'Stackdriver Monitoring' to Enabled
Using Command Line
To enable monitoring for an existing cluster, run the following command:
gcloud container clusters update \[CLUSTER_NAME\] --zone \[COMPUTE_ZONE\] --monitoring-service monitoring.googleapis.com
Impact
You are charged for the accrued storage costs when you export logs to another Google Cloud Platform service, such as BigQuery. Exporting logs from Stackdriver has no Stackdriver charge.
Default Value
By default, Stackdriver Monitoring is enabled when you create a new cluster using the gcloud command-line tool or Google Cloud Platform Console.
References
- https://cloud.google.com/kubernetes-engine/docs/how-to/creating-a-container-cluster
- https://cloud.google.com/kubernetes-engine/docs/how-to/monitoring?hl=en_US
- https://cloud.google.com/monitoring/agent/
Notes
If you are using Stackdriver Logging, Stackdriver Error Reporting, Debugging, or Stackdriver Trace, and you are not using any services from Stackdriver Monitoring, then you do not have to associate your GCP project with a Stackdriver account and you do not have to select a service tier. By default, Stackdriver limits the features available to your project to those features in the Basic Tier of service.
Multiple Remediation Paths
Google Cloud
SERVICE-WIDE (RECOMMENDED when many resources are affected): Enforce Organization Policies at org/folder level so new resources inherit secure defaults.
gcloud org-policies set-policy policy.yaml
ASSET-LEVEL: Use the product-specific remediation steps above for only the impacted project/resources.
PREVENTIVE: Use org policy constraints/custom constraints and enforce checks in deployment pipelines.
References for Service-Wide Patterns
- GCP Organization Policy overview: https://cloud.google.com/resource-manager/docs/organization-policy/overview
- GCP Organization policy constraints catalog: https://cloud.google.com/resource-manager/docs/organization-policy/org-policy-constraints
- gcloud org-policies: https://cloud.google.com/sdk/gcloud/reference/org-policies
Operational Rollout Workflow
Use this sequence to reduce risk and avoid repeated drift.
1. Contain at Service-Wide Scope First (Recommended)
- Google Cloud: apply organization policy constraints at org/folder scope.
gcloud org-policies set-policy policy.yaml
2. Remediate Existing Affected Assets
- Execute the control-specific Console/CLI steps documented above for each flagged resource.
- Prioritize internet-exposed and production assets first.
3. Validate and Prevent Recurrence
- Re-scan after each remediation batch.
- Track exceptions with owner and expiry date.
- Add preventive checks in IaC/CI pipelines.
Query logic
These are the stored checks tied to this control.
Stackdriver Monitoring is set to Enabled on Kubernetes Engine Clusters
Connectors
Covered asset types
Expected check: eq []
gkeClusters(where:{monitoringService_NOT:"monitoring.googleapis.com"}){...AssetFragment}
Google Cloud