Kubernetes: Observability with Prometheus and Grafana

Paul Escarcena included in Kubernetes

2025-12-28 2598 words 13 minutes ... views

/images/kubernetes-prometheus-grafana/prometheus-grafana-header.png

Contents

Updated March 2026: This article uses Prometheus 3.x, Grafana 11.x and the updated kube-prometheus-stack chart.

Prerequisites

This post continues the Kubernetes series. You’ll need:

Everything from the first chapter: Docker (or OrbStack), kubectl and Kind with an active cluster.
Familiarity with Deployments and Services from the second chapter.
Helm installed — we’ll use it to deploy the monitoring stack.

If you don’t have Helm, install it quickly:

# macOS
brew install helm

# Windows (Chocolatey)
choco install kubernetes-helm

Introduction

In the previous chapters we learned how to deploy apps and auto-scale them with the HPA. But there’s a fundamental question we haven’t answered: how do you know what’s happening inside your cluster?

Imagine you’re driving a car with no dashboard — no speedometer, no fuel gauge, no warning lights. You can technically drive, but you’re flying blind. That’s exactly what running a Kubernetes cluster without observability is like.

Prometheus and Grafana are the tools that give you that dashboard. And in this chapter we’re going to set them up in our Kind cluster step by step.

Prometheus collects metrics, Grafana visualizes them — together they give you full visibility into your cluster

What is Prometheus?

Prometheus is an open source monitoring and alerting system, originally created at SoundCloud and now part of the Cloud Native Computing Foundation (CNCF) — just like Kubernetes.

Think of it as an obsessive data collector: every so often (by default every 30-60 seconds) it goes to your applications, asks them “how are you doing?” and stores the answers as time series.

Prometheus uses a pull model: it goes out and fetches metrics from your applications

Key Features

Pull model: Prometheus goes and fetches the metrics from your apps (scraping), it doesn’t wait for them to be sent. This gives it full control over what it collects and when.
Time Series Database (TSDB): stores data as timestamp + value pairs, optimized for temporal queries.
PromQL: its own query language for exploring and aggregating metrics.
Alerts: define rules and Prometheus will let you know when something’s wrong.
Service discovery: integrates natively with Kubernetes to automatically discover what to scrape.
Native OTLP: since Prometheus 3.x, it supports metric ingestion via OpenTelemetry Protocol directly.

What kind of metrics does it collect?

Prometheus works with 4 types of metrics:

Type	What it’s for	Example
Counter	Values that only go up (cumulative)	Total HTTP requests, total errors
Gauge	Values that go up and down	Temperature, memory usage, active Pods
Histogram	Distribution of values in buckets	Request latency (p50, p95, p99)
Summary	Similar to histogram but computes percentiles on the client	Latency with pre-calculated percentiles

How do applications expose metrics?

Your apps expose metrics on an HTTP endpoint (by convention /metrics) in plain text format:

# HELP http_requests_total Total number of HTTP requests
# TYPE http_requests_total counter
http_requests_total{method="GET", path="/api/users", status="200"} 1234
http_requests_total{method="POST", path="/api/users", status="201"} 56

# HELP http_request_duration_seconds Duration of HTTP requests
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{le="0.1"} 500
http_request_duration_seconds_bucket{le="0.5"} 900
http_request_duration_seconds_bucket{le="1.0"} 980
http_request_duration_seconds_bucket{le="+Inf"} 1000

Prometheus reads this endpoint periodically and stores the data. If your app doesn’t expose metrics natively, you can use exporters (like node-exporter for operating system metrics or kube-state-metrics for Kubernetes object state).

What is Grafana?

Grafana is an open source visualization and dashboards platform. If Prometheus is the one collecting the data, Grafana is the one that turns it into beautiful and useful charts.

Grafana turns raw metrics into interactive, actionable dashboards

Key Features

Multi-datasource: not just Prometheus. It can connect to Loki (logs), Tempo (traces), InfluxDB, Elasticsearch, CloudWatch, and many more.
Interactive dashboards: line charts, bars, gauges, tables, heatmaps, and more — all configurable without code.
Alerts: Grafana has its own alerting system that can complement or replace Prometheus alerts.
Explore: ad-hoc exploration mode for investigating metrics without creating a dashboard.
Templating: variables in dashboards to filter by namespace, pod, node, etc.
Sharing: share dashboards with your team or export as JSON.

What’s the role of each component?

This is the key to understanding how they complement each other:

Component	Role	Analogy
Prometheus	Collect and store metrics	The security camera that records everything
Grafana	Visualize and explore metrics	The monitors where you watch the recordings
Alertmanager	Manage and route alerts	The alarm that goes off when something’s wrong
kube-state-metrics	Expose K8s object state as metrics	The inventory of the cluster (how many pods, deployments, etc.)
node-exporter	Expose operating system metrics	The health checkup of each node (CPU, RAM, disk)

How do they feed on metrics?

Understanding the data flow is fundamental. This is how metrics travel from your app to a chart in Grafana:

The journey of a metric: from your app to a Grafana dashboard

The complete flow

How does Prometheus know what to scrape in Kubernetes?

With the Prometheus Operator (which comes included in the stack we’ll install), you use CRDs to tell Prometheus what to monitor:

ServiceMonitor: “scrape the Pods behind this Service”.
PodMonitor: “scrape these Pods directly” (no Service needed).
PrometheusRule: “evaluate these alerting rules”.

# Example: ServiceMonitor to monitor your app
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: mi-app-monitor
  labels:
    release: kube-prometheus-stack  # Important: must match the Helm chart release
spec:
  selector:
    matchLabels:
      app: mi-app
  endpoints:
    - port: metrics
      interval: 30s
      path: /metrics

This tells Prometheus: “find all Services with the label app: mi-app, scrape the metrics port every 30 seconds on the /metrics path”.

Installing the full stack with Helm

The recommended way to install Prometheus + Grafana on Kubernetes is with the kube-prometheus-stack chart. This chart includes everything you need in a single package:

Prometheus Operator
Prometheus
Grafana
Alertmanager
kube-state-metrics
node-exporter
Pre-configured dashboards and alerts

Step 1: Add the Helm repository

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

Step 2: Create the namespace

kubectl create namespace monitoring

Step 3: Configure values for Kind

In Kind, some control plane components aren’t accessible from Pods. Create a prometheus-values.yaml file with these settings:

# prometheus-values.yaml

# Disable components that aren't accessible in Kind
kubeEtcd:
  enabled: false
kubeScheduler:
  enabled: false
kubeControllerManager:
  enabled: false
kubeProxy:
  enabled: false

# Prometheus configuration
prometheus:
  prometheusSpec:
    # Resources adjusted for Kind (we're not in production)
    resources:
      requests:
        memory: 400Mi
        cpu: 200m
      limits:
        memory: 800Mi
        cpu: 500m
    # Data retention
    retention: 7d
    # Select all ServiceMonitors (without filtering by label)
    serviceMonitorSelectorNilUsesHelmValues: false
    podMonitorSelectorNilUsesHelmValues: false

# Grafana configuration
grafana:
  resources:
    requests:
      memory: 128Mi
      cpu: 100m
    limits:
      memory: 256Mi
      cpu: 200m
  # Admin password (in production use a Secret)
  adminPassword: "admin123"

# Alertmanager configuration
alertmanager:
  alertmanagerSpec:
    resources:
      requests:
        memory: 64Mi
        cpu: 50m
      limits:
        memory: 128Mi
        cpu: 100m

Why do we disable etcd, scheduler, controller-manager and proxy? In Kind, these components run inside the control plane container and their metrics endpoints are bound to 127.0.0.1 — Prometheus can’t reach them from a Pod. In a managed cluster (EKS, GKE, AKS) this isn’t a problem because the provider exposes these metrics.

Step 4: Install the chart

helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --values prometheus-values.yaml

This takes a few minutes. Verify that all Pods are running:

kubectl get pods -n monitoring

NAME                                                        READY   STATUS    RESTARTS   AGE
alertmanager-kube-prometheus-stack-alertmanager-0            2/2     Running   0          2m
kube-prometheus-stack-grafana-7b9f5c4d5-x2k8j               3/3     Running   0          2m
kube-prometheus-stack-kube-state-metrics-6c4d7b9f8-abc12     1/1     Running   0          2m
kube-prometheus-stack-operator-5d8f7b6c4-def34               1/1     Running   0          2m
kube-prometheus-stack-prometheus-node-exporter-ghi56          1/1     Running   0          2m
prometheus-kube-prometheus-stack-prometheus-0                 2/2     Running   0          2m

Check the created Services:

kubectl get svc -n monitoring

NAME                                              TYPE        CLUSTER-IP      PORT(S)    AGE
alertmanager-operated                             ClusterIP   None            9093/TCP   2m
kube-prometheus-stack-alertmanager                ClusterIP   10.96.10.1      9093/TCP   2m
kube-prometheus-stack-grafana                     ClusterIP   10.96.10.2      80/TCP     2m
kube-prometheus-stack-kube-state-metrics          ClusterIP   10.96.10.3      8080/TCP   2m
kube-prometheus-stack-operator                    ClusterIP   10.96.10.4      443/TCP    2m
kube-prometheus-stack-prometheus                  ClusterIP   10.96.10.5      9090/TCP   2m
kube-prometheus-stack-prometheus-node-exporter    ClusterIP   10.96.10.6      9100/TCP   2m
prometheus-operated                               ClusterIP   None            9090/TCP   2m

All monitoring stack components running in the monitoring namespace

Accessing Grafana

With the stack installed, let’s access Grafana using port-forward:

kubectl port-forward svc/kube-prometheus-stack-grafana 3000:80 -n monitoring

Open your browser at http://localhost:3000 and log in with:

Username: admin
Password: admin123 (the one we set in the values)

Grafana login screen — use the credentials configured in the Helm values

Included dashboards

The chart installs more than 15 pre-configured dashboards ready to use. The most useful ones:

Dashboard	What it shows
Kubernetes / Compute Resources / Cluster	Cluster overview: total CPU, memory and network
Kubernetes / Compute Resources / Namespace (Pods)	Resource usage by namespace
Kubernetes / Compute Resources / Pod	Individual Pod detail
Kubernetes / Compute Resources / Workload	Resources by Deployment/StatefulSet/DaemonSet
Kubernetes / Kubelet	Kubelet health and Pod lifecycle
Kubernetes / API Server	Request rates, latencies and API Server errors
Kubernetes / CoreDNS	DNS queries, latencies and errors
Kubernetes / Networking / Cluster	Network traffic by namespace
Kubernetes / Persistent Volumes	Persistent storage usage
Node Exporter / Full	Operating system-level metrics per node
Prometheus / Overview	Prometheus self-monitoring

To find them, go to Dashboards (the squares icon in the sidebar) and look in the “General” or “Default” folder.

Pre-configured dashboard showing cluster resources: CPU, memory and network usage

Accessing Prometheus

You can also access the Prometheus UI directly to run PromQL queries:

kubectl port-forward svc/kube-prometheus-stack-prometheus 9090:9090 -n monitoring

Open http://localhost:9090 in your browser.

The Prometheus UI lets you run PromQL queries and explore metrics directly

Trying out some PromQL queries

In the Prometheus query bar, try these queries:

# CPU usage by Pod (in cores)
rate(container_cpu_usage_seconds_total{namespace="default"}[5m])

# Memory used by Pod (in bytes)
container_memory_working_set_bytes{namespace="default"}

# Total Pods by namespace
count by (namespace) (kube_pod_info)

# Pods that aren't Ready
kube_pod_status_ready{condition="false"}

# HTTP requests to the API Server (per second)
rate(apiserver_request_total[5m])

Tip

PromQL is a powerful language. The rate() function calculates the per-second rate of change of a counter, and [5m] indicates the time range to consider. For a complete guide, check out the PromQL documentation.

Verifying Prometheus targets

A target is each endpoint that Prometheus scrapes. To verify that everything is being monitored:

In the Prometheus UI, go to Status → Targets (or access http://localhost:9090/targets directly).

All targets should show as UP (green) — if any are DOWN, check the connectivity

You should see targets like:

serviceMonitor/monitoring/kube-prometheus-stack-kubelet — UP
serviceMonitor/monitoring/kube-prometheus-stack-apiserver — UP
serviceMonitor/monitoring/kube-prometheus-stack-kube-state-metrics — UP
serviceMonitor/monitoring/kube-prometheus-stack-node-exporter — UP
serviceMonitor/monitoring/kube-prometheus-stack-prometheus — UP
serviceMonitor/monitoring/kube-prometheus-stack-grafana — UP

If any target is DOWN, check:

Is the exporter Pod running? (kubectl get pods -n monitoring)
Does the Service have endpoints? (kubectl get endpoints -n monitoring)
Do the ServiceMonitor labels match the Service?

Alertmanager: the alerting system

The stack includes Alertmanager, which manages the alerts that Prometheus generates. It’s not just “fire an email” — it’s a sophisticated system that:

Groups related alerts so it doesn’t bombard you.
Deduplicates repeated alerts.
Routes to different channels based on severity (Slack for warnings, PagerDuty for critical).
Silences alerts temporarily (useful during maintenance).

Access the Alertmanager UI:

kubectl port-forward svc/kube-prometheus-stack-alertmanager 9093:9093 -n monitoring

Open http://localhost:9093.

Alertmanager groups, deduplicates and routes alerts — here you can see the active alerts

Pre-configured alerts

The chart includes dozens of ready-to-go alerts. Some of the most important ones:

Alert	What it detects	Severity
`KubePodCrashLooping`	A Pod that keeps restarting	warning
`KubePodNotReady`	A Pod that won’t transition to Ready	warning
`KubeDeploymentReplicasMismatch`	A Deployment with fewer replicas than desired	warning
`NodeNotReady`	A node that stopped working	critical
`NodeMemoryHighUtilization`	A node with more than 90% memory usage	warning
`PrometheusTargetDown`	A target that stopped responding	warning
`Watchdog`	An alert that’s always active (to verify the pipeline works)	none

To see the configured rules:

kubectl get prometheusrules -n monitoring

Monitoring your own app

So far we’re seeing cluster metrics. How do you add metrics from your app? You need two things:

1. Your app needs to expose metrics

Whether your app is in Node.js, Python, Go, Java or any language, there are Prometheus libraries that make this easy. For example, for a generic app that exposes /metrics:

# deployment with metrics port
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mi-app
  labels:
    app: mi-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: mi-app
  template:
    metadata:
      labels:
        app: mi-app
    spec:
      containers:
        - name: mi-app
          image: mi-app:latest
          ports:
            - name: http
              containerPort: 8080
            - name: metrics
              containerPort: 9090

# Service that exposes both ports
apiVersion: v1
kind: Service
metadata:
  name: mi-app-svc
  labels:
    app: mi-app
spec:
  selector:
    app: mi-app
  ports:
    - name: http
      port: 8080
      targetPort: 8080
    - name: metrics
      port: 9090
      targetPort: 9090

2. Create a ServiceMonitor

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: mi-app-monitor
  namespace: monitoring
  labels:
    release: kube-prometheus-stack
spec:
  namespaceSelector:
    matchNames:
      - default
  selector:
    matchLabels:
      app: mi-app
  endpoints:
    - port: metrics
      interval: 15s
      path: /metrics

kubectl apply -f servicemonitor.yaml

After a few seconds, go to the Prometheus targets and you should see your app listed. Its metrics will be available in Grafana.

Important

The release: kube-prometheus-stack label on the ServiceMonitor is key. It must match the Helm release name. Without this label, Prometheus won’t discover your ServiceMonitor.

Creating your first dashboard in Grafana

The pre-configured dashboards are great, but eventually you’ll want to create your own. Here’s the process:

Step 1: Create a new dashboard

In Grafana, go to Dashboards → New → New Dashboard → Add visualization.

Step 2: Select the datasource

Select Prometheus as the data source (it’s already configured automatically).

Step 3: Write the query

For example, to see your app’s CPU usage:

rate(container_cpu_usage_seconds_total{namespace="default", pod=~"mi-app.*"}[5m])

Step 4: Customize the visualization

Choose the chart type (Time series, Gauge, Stat, etc.).
Configure the title, legends, units (CPU in cores, memory in bytes, etc.).
Add variables so you can filter by namespace or pod dynamically.

Step 5: Save

Give it a name and save. You can also export the dashboard as JSON to version it in Git.

Create custom dashboards to monitor exactly what your team needs

Tip

Grafana has a community dashboards repository with thousands of dashboards ready to import. Search for “kubernetes” and you’ll find excellent options.

Full stack architecture

So you have the complete picture of how everything fits together:

Arquitectura completa del stack de observabilidad

Stack architecture: exporters → Prometheus → Grafana + Alertmanager → Notifications

Useful day-to-day commands

# See all stack resources
kubectl get all -n monitoring

# See configured ServiceMonitors
kubectl get servicemonitors -n monitoring

# See alerting rules
kubectl get prometheusrules -n monitoring

# See the Prometheus configuration
kubectl get prometheus -n monitoring -o yaml

# Port-forward to Grafana
kubectl port-forward svc/kube-prometheus-stack-grafana 3000:80 -n monitoring

# Port-forward to Prometheus
kubectl port-forward svc/kube-prometheus-stack-prometheus 9090:9090 -n monitoring

# Port-forward to Alertmanager
kubectl port-forward svc/kube-prometheus-stack-alertmanager 9093:9093 -n monitoring

# See Prometheus logs
kubectl logs -n monitoring prometheus-kube-prometheus-stack-prometheus-0 -c prometheus

# Update the stack (after modifying values)
helm upgrade kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --values prometheus-values.yaml

Official references

Prometheus — Official Prometheus documentation
PromQL — Query language guide
Grafana — Official Grafana documentation
kube-prometheus-stack — Helm chart and configuration
Prometheus Operator — Operator and CRDs documentation
Grafana Dashboards — Community dashboards repository
Alertmanager — Alerting and routing configuration
kube-state-metrics — K8s object state metrics
Kubernetes Monitoring with Prometheus (CNCF) — CNCF guide

Summary

Today we set up the full observability stack in our cluster:

Prometheus collects metrics with a pull model, scraping /metrics endpoints every N seconds.
Grafana visualizes those metrics in interactive, pre-configured dashboards.
Alertmanager manages alerts: it groups, deduplicates and routes them to your notification channels.
kube-state-metrics exposes Kubernetes object state as metrics.
node-exporter exposes operating system-level metrics for each node.
Everything is installed with a single Helm chart: kube-prometheus-stack.
We use ServiceMonitors to tell Prometheus what to scrape.
Metrics travel: App → Prometheus (scraping) → Grafana (visualization).

With Prometheus and Grafana you have full visibility into your cluster. You’re no longer flying blind — now you can see exactly what’s happening, when and why.

In the next chapter we’ll put everything to the test: we’ll deploy two real apps (Python and Java) with metrics, HPA, and see how they behave under load in Grafana. Theory + full hands-on practice.

Did you enjoy this article? Share it with someone who’s setting up their observability stack. And if you have questions, leave me a comment!