Overview
Stackdriver Monitoring to monitor signals and build operations in your Kubernetes Engine clusters. Stackdriver Monitoring can access metrics about CPU utilization, some disk traffic metrics, network traffic, and uptime information. Stackdriver Monitoring uses the Monitoring agent to access additional system resources and application services in virtual machine instances.
Rationale
By Enabling Stackdriver Monitoring you will have system metrics and custom metrics. System metrics are measurements of the cluster's infrastructure, such as CPU or memory usage. For system metrics, Stackdriver creates a Deployment that periodically connects to each node and collects metrics about its Pods and containers, then sends the metrics to Stackdriver. Metrics for usage of system resources are collected from the CPU, Memory, Evictable memory, Non-evictable memory, and Disk sources.
Remediation guidance
Using Console
- Go to Kubernetes GCP Console by visiting https://console.cloud.google.com/kubernetes/list?
- Select reported Kubernetes clusters for which monitoring is disabled
- Click on EDIT button and Set 'Stackdriver Monitoring' to Enabled
Using Command Line
To enable monitoring for an existing cluster, run the following command:
gcloud container clusters update \[CLUSTER_NAME\] --zone \[COMPUTE_ZONE\] --monitoring-service monitoring.googleapis.com
Impact
You are charged for the accrued storage costs when you export logs to another Google Cloud Platform service, such as BigQuery. Exporting logs from Stackdriver has no Stackdriver charge.
Default Value
By default, Stackdriver Monitoring is enabled when you create a new cluster using the gcloud command-line tool or Google Cloud Platform Console.
References
- https://cloud.google.com/kubernetes-engine/docs/how-to/creating-a-container-cluster
- https://cloud.google.com/kubernetes-engine/docs/how-to/monitoring?hl=en_US
- https://cloud.google.com/monitoring/agent/
Notes
If you are using Stackdriver Logging, Stackdriver Error Reporting, Debugging, or Stackdriver Trace, and you are not using any services from Stackdriver Monitoring, then you do not have to associate your GCP project with a Stackdriver account and you do not have to select a service tier. By default, Stackdriver limits the features available to your project to those features in the Basic Tier of service.
Service-wide remediation
Recommended when many resources are affected: fix the platform baseline first so new resources inherit the secure setting, then remediate the existing flagged resources in batches.
Google Cloud
Use organization or folder policies where available, shared project templates, logs and alerting baselines, and IaC modules so new resources inherit the secure setting.
Operational rollout
- Fix the baseline first at the account, subscription, project, cluster, or tenant scope that owns this control.
- Remediate the currently affected resources in batches, starting with internet-exposed and production assets.
- Re-scan and track approved exceptions with an owner and expiry date.
Query logic
These are the stored checks tied to this control.
Stackdriver Monitoring is set to Enabled on Kubernetes Engine Clusters
Connectors
Covered asset types
Expected check: eq []
gkeClusters(where:{monitoringService_NOT:"monitoring.googleapis.com"}){...AssetFragment}
Google Cloud