Container technology is a key enabler of the transition to DevOps practices and cloud infrastructure. Kubernetes enables developers and relevant collaborators to orchestrate containerized applications. You can use it to build a modern, microservices application architecture, scale it seamlessly, and manage ongoing operations.
However, you cannot use Kubernetes in an enterprise environment without robust monitoring. In this article we’ll explain some of the unique considerations of enterprise Kubernetes, and how to set up a monitoring infrastructure that will allow you to identify and prevent operational and security issues.
Why is Enterprise Kubernetes Important?
Enterprise Kubernetes is a foundation for digital transformation at many organizations. In the age before container technology, different departments and technical teams were using different tools, applications, and hardware, with no standardized platform for developing and deploying software. An enterprise Kubernetes strategy gives the organization that common platform.
Kubernetes can help shift security and testing left, ensuring they start earlier in the software development life cycle. It can help standardize change cycles and make them more manageable, helping the entire organization deliver more value with the same investment in software projects.
Kubernetes automates every step of the deployment process, by turning components into reusable container images—a key enabler for DevOps processes. Based on open API and service standards, Kubernetes offers endpoints that can be managed, orchestrated and governed consistently and programmatically. This enables organizations to build and run cross-platform deployments, replicating the same environment in development, test and production.
How Does Kubernetes Monitoring Work?
The Kubernetes Metrics Server aggregates the data collected by the Kubelet on each node, and sends the data through the Metrics API.
Kubernetes provides a DaemonSet that performs problem detection on nodes, collecting metrics and sending them as node conditions and events back to the APIserver. This data can be fed to various monitoring and visualization tools, primarily Prometheus, discussed in more detail below.
Some key indicators to consider when monitoring:
Node status—including local system resources and connectivity.
Pod availability—important because pods that are unavailable can indicate an issue with readiness probes or a configuration problem.
Memory, disk and CPU utilization—should be monitored for pods and nodes, critical metrics to identify the current health of a cluster component.
API request latency—the time (in milliseconds) clients wait for an API request.
Kubernetes Monitoring Tools and Utilities
Prometheus lets you collect and monitor metrics for cloud native systems, including Kubernetes. Prometheus is offered under an open source licence, and the project is CNCF. Prometheus installs a data export pod on each node, and collects data on an ongoing basis. The time series data is stored in the Prometheus database, and alerts can be generated automatically based on predefined conditions.
The Prometheus dashboard is quite limited, but is often extended with other visualization tools, such as Grafana. Administrators can directly access the Prometheus database for complex querying, debugging, and reporting. They can also define customized reports for the use of development, test and operations teams. Prometheus data can also be exported to third-party data analysis systems, such as BI tools.
The Kubernetes dashboard lets you manage cluster resources and debug container applications from a simple web interface. It provides an overview of each cluster node and its resources, and shows all namespaces and predefined storage classes in the cluster.
cAdvisor is a utility that operates both at the cluster level and at the container level, collecting data on resource usage and resource isolation. Its data is used as a basis for many monitoring and visualization tools. It is also often queries directly by administrators and developers.
To track bugs, crashes, and performance issues, you need to collect logs from different layers of the environment—cluster, container, and the controller manager. For example, in order to debug issues in Kubernetes pods, developers first need to ensure the container is running, and get runtime metrics from the controller manager.
Ensure Data Consistency Across Layers
To quickly and accurately debug systems, you must maintain log data integrity at all levels of the container environment. Using correct timestamps, as well as consistent units of measurement (like milliseconds and seconds), and common metrics both in applications and Kubernetes components, will enable fast and accurate troubleshooting and resolution.
Don’t Focus Monitoring on Individual Containers
Because of the dynamic changes in Kubernetes resources and the symmetrical provisioning of replicas for containers, monitoring individual container resources can be very noisy.
Metrics can change on an hourly basis, due to the short life cycle of a typical container. For example, if a new ReplicaSetID is created, the metrics of each deployed ReplicaSet will change. Typically, you track patterns on sets of containers, rather than metrics on individual containers. To do this, aggregate cAdvisor container-level metrics across groups of containers.
Always Alert on High Disk Usage
HDU (High Disk Utilization) is the most common problem on any Kubernetes system. Kubernetes does not provide self-healing capabilities for StatefulSets or statically attached volumes, which can be frequently used. HDU warnings are always significant, and can often lead to severe application issues. Make sure to keep track of disk volumes, including the root file system, on all Kubernetes components.
Enterprise Kubernetes can be a boon for an enterprise transitioning to a DevOps-driven cloud environment. However, Kubernetes is complex and problems cannot be easily identified or diagnosed using traditional monitoring tools.
Just like Kubernetes revolutionized application development and deployment it also provides its own ecosystem of monitoring tools and new monitoring practices. DevOps teams will need to be well aware of these practices to ensure they can enjoy the value of Kubernetes, while protecting the organization against operational and production issues.
Gilad David Maayan is a technology writer who has worked with over 150 technology companies including SAP, Imperva, Samsung NEXT, NetApp and Ixia, producing technical and thought leadership content that elucidates technical solutions for developers and IT leadership. Today he heads Agile SEO, the leading marketing agency in the technology industry.