Kubernetes: Observability with Prometheus and Grafana

Updated March 2026: This article uses Prometheus 3.x, Grafana 11.x and the updated kube-prometheus-stack chart.
Prerequisites
This post continues the Kubernetes series. You’ll need:
- Everything from the first chapter: Docker (or OrbStack), kubectl and Kind with an active cluster.
- Familiarity with Deployments and Services from the second chapter.
- Helm installed — we’ll use it to deploy the monitoring stack.
If you don’t have Helm, install it quickly:
# macOS
brew install helm
# Windows (Chocolatey)
choco install kubernetes-helmIntroduction
In the previous chapters we learned how to deploy apps and auto-scale them with the HPA. But there’s a fundamental question we haven’t answered: how do you know what’s happening inside your cluster?
Imagine you’re driving a car with no dashboard — no speedometer, no fuel gauge, no warning lights. You can technically drive, but you’re flying blind. That’s exactly what running a Kubernetes cluster without observability is like.
Prometheus and Grafana are the tools that give you that dashboard. And in this chapter we’re going to set them up in our Kind cluster step by step.

Prometheus collects metrics, Grafana visualizes them — together they give you full visibility into your cluster
What is Prometheus?
Prometheus is an open source monitoring and alerting system, originally created at SoundCloud and now part of the Cloud Native Computing Foundation (CNCF) — just like Kubernetes.
Think of it as an obsessive data collector: every so often (by default every 30-60 seconds) it goes to your applications, asks them “how are you doing?” and stores the answers as time series.

Prometheus uses a pull model: it goes out and fetches metrics from your applications
Key Features
- Pull model: Prometheus goes and fetches the metrics from your apps (scraping), it doesn’t wait for them to be sent. This gives it full control over what it collects and when.
- Time Series Database (TSDB): stores data as timestamp + value pairs, optimized for temporal queries.
- PromQL: its own query language for exploring and aggregating metrics.
- Alerts: define rules and Prometheus will let you know when something’s wrong.
- Service discovery: integrates natively with Kubernetes to automatically discover what to scrape.
- Native OTLP: since Prometheus 3.x, it supports metric ingestion via OpenTelemetry Protocol directly.
What kind of metrics does it collect?
Prometheus works with 4 types of metrics:
| Type | What it’s for | Example |
|---|---|---|
| Counter | Values that only go up (cumulative) | Total HTTP requests, total errors |
| Gauge | Values that go up and down | Temperature, memory usage, active Pods |
| Histogram | Distribution of values in buckets | Request latency (p50, p95, p99) |
| Summary | Similar to histogram but computes percentiles on the client | Latency with pre-calculated percentiles |
How do applications expose metrics?
Your apps expose metrics on an HTTP endpoint (by convention /metrics) in plain text format:
# HELP http_requests_total Total number of HTTP requests
# TYPE http_requests_total counter
http_requests_total{method="GET", path="/api/users", status="200"} 1234
http_requests_total{method="POST", path="/api/users", status="201"} 56
# HELP http_request_duration_seconds Duration of HTTP requests
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{le="0.1"} 500
http_request_duration_seconds_bucket{le="0.5"} 900
http_request_duration_seconds_bucket{le="1.0"} 980
http_request_duration_seconds_bucket{le="+Inf"} 1000Prometheus reads this endpoint periodically and stores the data. If your app doesn’t expose metrics natively, you can use exporters (like node-exporter for operating system metrics or kube-state-metrics for Kubernetes object state).
What is Grafana?
Grafana is an open source visualization and dashboards platform. If Prometheus is the one collecting the data, Grafana is the one that turns it into beautiful and useful charts.

Grafana turns raw metrics into interactive, actionable dashboards
Key Features
- Multi-datasource: not just Prometheus. It can connect to Loki (logs), Tempo (traces), InfluxDB, Elasticsearch, CloudWatch, and many more.
- Interactive dashboards: line charts, bars, gauges, tables, heatmaps, and more — all configurable without code.
- Alerts: Grafana has its own alerting system that can complement or replace Prometheus alerts.
- Explore: ad-hoc exploration mode for investigating metrics without creating a dashboard.
- Templating: variables in dashboards to filter by namespace, pod, node, etc.
- Sharing: share dashboards with your team or export as JSON.
What’s the role of each component?
This is the key to understanding how they complement each other:
| Component | Role | Analogy |
|---|---|---|
| Prometheus | Collect and store metrics | The security camera that records everything |
| Grafana | Visualize and explore metrics | The monitors where you watch the recordings |
| Alertmanager | Manage and route alerts | The alarm that goes off when something’s wrong |
| kube-state-metrics | Expose K8s object state as metrics | The inventory of the cluster (how many pods, deployments, etc.) |
| node-exporter | Expose operating system metrics | The health checkup of each node (CPU, RAM, disk) |
How do they feed on metrics?
Understanding the data flow is fundamental. This is how metrics travel from your app to a chart in Grafana:

The journey of a metric: from your app to a Grafana dashboard
The complete flow
How does Prometheus know what to scrape in Kubernetes?
With the Prometheus Operator (which comes included in the stack we’ll install), you use CRDs to tell Prometheus what to monitor:
- ServiceMonitor: “scrape the Pods behind this Service”.
- PodMonitor: “scrape these Pods directly” (no Service needed).
- PrometheusRule: “evaluate these alerting rules”.
# Example: ServiceMonitor to monitor your app
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: mi-app-monitor
labels:
release: kube-prometheus-stack # Important: must match the Helm chart release
spec:
selector:
matchLabels:
app: mi-app
endpoints:
- port: metrics
interval: 30s
path: /metricsThis tells Prometheus: “find all Services with the label app: mi-app, scrape the metrics port every 30 seconds on the /metrics path”.
Installing the full stack with Helm
The recommended way to install Prometheus + Grafana on Kubernetes is with the kube-prometheus-stack chart. This chart includes everything you need in a single package:
- Prometheus Operator
- Prometheus
- Grafana
- Alertmanager
- kube-state-metrics
- node-exporter
- Pre-configured dashboards and alerts
Step 1: Add the Helm repository
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo updateStep 2: Create the namespace
kubectl create namespace monitoringStep 3: Configure values for Kind
In Kind, some control plane components aren’t accessible from Pods. Create a prometheus-values.yaml file with these settings:
# prometheus-values.yaml
# Disable components that aren't accessible in Kind
kubeEtcd:
enabled: false
kubeScheduler:
enabled: false
kubeControllerManager:
enabled: false
kubeProxy:
enabled: false
# Prometheus configuration
prometheus:
prometheusSpec:
# Resources adjusted for Kind (we're not in production)
resources:
requests:
memory: 400Mi
cpu: 200m
limits:
memory: 800Mi
cpu: 500m
# Data retention
retention: 7d
# Select all ServiceMonitors (without filtering by label)
serviceMonitorSelectorNilUsesHelmValues: false
podMonitorSelectorNilUsesHelmValues: false
# Grafana configuration
grafana:
resources:
requests:
memory: 128Mi
cpu: 100m
limits:
memory: 256Mi
cpu: 200m
# Admin password (in production use a Secret)
adminPassword: "admin123"
# Alertmanager configuration
alertmanager:
alertmanagerSpec:
resources:
requests:
memory: 64Mi
cpu: 50m
limits:
memory: 128Mi
cpu: 100mWhy do we disable etcd, scheduler, controller-manager and proxy? In Kind, these components run inside the control plane container and their metrics endpoints are bound to
127.0.0.1— Prometheus can’t reach them from a Pod. In a managed cluster (EKS, GKE, AKS) this isn’t a problem because the provider exposes these metrics.
Step 4: Install the chart
helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--values prometheus-values.yamlThis takes a few minutes. Verify that all Pods are running:
kubectl get pods -n monitoringNAME READY STATUS RESTARTS AGE
alertmanager-kube-prometheus-stack-alertmanager-0 2/2 Running 0 2m
kube-prometheus-stack-grafana-7b9f5c4d5-x2k8j 3/3 Running 0 2m
kube-prometheus-stack-kube-state-metrics-6c4d7b9f8-abc12 1/1 Running 0 2m
kube-prometheus-stack-operator-5d8f7b6c4-def34 1/1 Running 0 2m
kube-prometheus-stack-prometheus-node-exporter-ghi56 1/1 Running 0 2m
prometheus-kube-prometheus-stack-prometheus-0 2/2 Running 0 2mCheck the created Services:
kubectl get svc -n monitoringNAME TYPE CLUSTER-IP PORT(S) AGE
alertmanager-operated ClusterIP None 9093/TCP 2m
kube-prometheus-stack-alertmanager ClusterIP 10.96.10.1 9093/TCP 2m
kube-prometheus-stack-grafana ClusterIP 10.96.10.2 80/TCP 2m
kube-prometheus-stack-kube-state-metrics ClusterIP 10.96.10.3 8080/TCP 2m
kube-prometheus-stack-operator ClusterIP 10.96.10.4 443/TCP 2m
kube-prometheus-stack-prometheus ClusterIP 10.96.10.5 9090/TCP 2m
kube-prometheus-stack-prometheus-node-exporter ClusterIP 10.96.10.6 9100/TCP 2m
prometheus-operated ClusterIP None 9090/TCP 2m
All monitoring stack components running in the monitoring namespace
Accessing Grafana
With the stack installed, let’s access Grafana using port-forward:
kubectl port-forward svc/kube-prometheus-stack-grafana 3000:80 -n monitoringOpen your browser at http://localhost:3000 and log in with:
- Username:
admin - Password:
admin123(the one we set in the values)

Grafana login screen — use the credentials configured in the Helm values
Included dashboards
The chart installs more than 15 pre-configured dashboards ready to use. The most useful ones:
| Dashboard | What it shows |
|---|---|
| Kubernetes / Compute Resources / Cluster | Cluster overview: total CPU, memory and network |
| Kubernetes / Compute Resources / Namespace (Pods) | Resource usage by namespace |
| Kubernetes / Compute Resources / Pod | Individual Pod detail |
| Kubernetes / Compute Resources / Workload | Resources by Deployment/StatefulSet/DaemonSet |
| Kubernetes / Kubelet | Kubelet health and Pod lifecycle |
| Kubernetes / API Server | Request rates, latencies and API Server errors |
| Kubernetes / CoreDNS | DNS queries, latencies and errors |
| Kubernetes / Networking / Cluster | Network traffic by namespace |
| Kubernetes / Persistent Volumes | Persistent storage usage |
| Node Exporter / Full | Operating system-level metrics per node |
| Prometheus / Overview | Prometheus self-monitoring |
To find them, go to Dashboards (the squares icon in the sidebar) and look in the “General” or “Default” folder.

Pre-configured dashboard showing cluster resources: CPU, memory and network usage
Accessing Prometheus
You can also access the Prometheus UI directly to run PromQL queries:
kubectl port-forward svc/kube-prometheus-stack-prometheus 9090:9090 -n monitoringOpen http://localhost:9090 in your browser.

The Prometheus UI lets you run PromQL queries and explore metrics directly
Trying out some PromQL queries
In the Prometheus query bar, try these queries:
# CPU usage by Pod (in cores)
rate(container_cpu_usage_seconds_total{namespace="default"}[5m])
# Memory used by Pod (in bytes)
container_memory_working_set_bytes{namespace="default"}
# Total Pods by namespace
count by (namespace) (kube_pod_info)
# Pods that aren't Ready
kube_pod_status_ready{condition="false"}
# HTTP requests to the API Server (per second)
rate(apiserver_request_total[5m])rate() function calculates the per-second rate of change of a counter, and [5m] indicates the time range to consider. For a complete guide, check out the PromQL documentation.Verifying Prometheus targets
A target is each endpoint that Prometheus scrapes. To verify that everything is being monitored:
In the Prometheus UI, go to Status → Targets (or access http://localhost:9090/targets directly).

All targets should show as UP (green) — if any are DOWN, check the connectivity
You should see targets like:
serviceMonitor/monitoring/kube-prometheus-stack-kubelet— UPserviceMonitor/monitoring/kube-prometheus-stack-apiserver— UPserviceMonitor/monitoring/kube-prometheus-stack-kube-state-metrics— UPserviceMonitor/monitoring/kube-prometheus-stack-node-exporter— UPserviceMonitor/monitoring/kube-prometheus-stack-prometheus— UPserviceMonitor/monitoring/kube-prometheus-stack-grafana— UP
If any target is DOWN, check:
- Is the exporter Pod running? (
kubectl get pods -n monitoring) - Does the Service have endpoints? (
kubectl get endpoints -n monitoring) - Do the ServiceMonitor labels match the Service?
Alertmanager: the alerting system
The stack includes Alertmanager, which manages the alerts that Prometheus generates. It’s not just “fire an email” — it’s a sophisticated system that:
- Groups related alerts so it doesn’t bombard you.
- Deduplicates repeated alerts.
- Routes to different channels based on severity (Slack for warnings, PagerDuty for critical).
- Silences alerts temporarily (useful during maintenance).
Access the Alertmanager UI:
kubectl port-forward svc/kube-prometheus-stack-alertmanager 9093:9093 -n monitoringOpen http://localhost:9093.

Alertmanager groups, deduplicates and routes alerts — here you can see the active alerts
Pre-configured alerts
The chart includes dozens of ready-to-go alerts. Some of the most important ones:
| Alert | What it detects | Severity |
|---|---|---|
KubePodCrashLooping | A Pod that keeps restarting | warning |
KubePodNotReady | A Pod that won’t transition to Ready | warning |
KubeDeploymentReplicasMismatch | A Deployment with fewer replicas than desired | warning |
NodeNotReady | A node that stopped working | critical |
NodeMemoryHighUtilization | A node with more than 90% memory usage | warning |
PrometheusTargetDown | A target that stopped responding | warning |
Watchdog | An alert that’s always active (to verify the pipeline works) | none |
To see the configured rules:
kubectl get prometheusrules -n monitoringMonitoring your own app
So far we’re seeing cluster metrics. How do you add metrics from your app? You need two things:
1. Your app needs to expose metrics
Whether your app is in Node.js, Python, Go, Java or any language, there are Prometheus libraries that make this easy. For example, for a generic app that exposes /metrics:
# deployment with metrics port
apiVersion: apps/v1
kind: Deployment
metadata:
name: mi-app
labels:
app: mi-app
spec:
replicas: 2
selector:
matchLabels:
app: mi-app
template:
metadata:
labels:
app: mi-app
spec:
containers:
- name: mi-app
image: mi-app:latest
ports:
- name: http
containerPort: 8080
- name: metrics
containerPort: 9090# Service that exposes both ports
apiVersion: v1
kind: Service
metadata:
name: mi-app-svc
labels:
app: mi-app
spec:
selector:
app: mi-app
ports:
- name: http
port: 8080
targetPort: 8080
- name: metrics
port: 9090
targetPort: 90902. Create a ServiceMonitor
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: mi-app-monitor
namespace: monitoring
labels:
release: kube-prometheus-stack
spec:
namespaceSelector:
matchNames:
- default
selector:
matchLabels:
app: mi-app
endpoints:
- port: metrics
interval: 15s
path: /metricskubectl apply -f servicemonitor.yamlAfter a few seconds, go to the Prometheus targets and you should see your app listed. Its metrics will be available in Grafana.
release: kube-prometheus-stack label on the ServiceMonitor is key. It must match the Helm release name. Without this label, Prometheus won’t discover your ServiceMonitor.Creating your first dashboard in Grafana
The pre-configured dashboards are great, but eventually you’ll want to create your own. Here’s the process:
Step 1: Create a new dashboard
In Grafana, go to Dashboards → New → New Dashboard → Add visualization.
Step 2: Select the datasource
Select Prometheus as the data source (it’s already configured automatically).
Step 3: Write the query
For example, to see your app’s CPU usage:
rate(container_cpu_usage_seconds_total{namespace="default", pod=~"mi-app.*"}[5m])Step 4: Customize the visualization
- Choose the chart type (Time series, Gauge, Stat, etc.).
- Configure the title, legends, units (CPU in cores, memory in bytes, etc.).
- Add variables so you can filter by namespace or pod dynamically.
Step 5: Save
Give it a name and save. You can also export the dashboard as JSON to version it in Git.

Create custom dashboards to monitor exactly what your team needs
Full stack architecture
So you have the complete picture of how everything fits together:

Stack architecture: exporters → Prometheus → Grafana + Alertmanager → Notifications
Useful day-to-day commands
# See all stack resources
kubectl get all -n monitoring
# See configured ServiceMonitors
kubectl get servicemonitors -n monitoring
# See alerting rules
kubectl get prometheusrules -n monitoring
# See the Prometheus configuration
kubectl get prometheus -n monitoring -o yaml
# Port-forward to Grafana
kubectl port-forward svc/kube-prometheus-stack-grafana 3000:80 -n monitoring
# Port-forward to Prometheus
kubectl port-forward svc/kube-prometheus-stack-prometheus 9090:9090 -n monitoring
# Port-forward to Alertmanager
kubectl port-forward svc/kube-prometheus-stack-alertmanager 9093:9093 -n monitoring
# See Prometheus logs
kubectl logs -n monitoring prometheus-kube-prometheus-stack-prometheus-0 -c prometheus
# Update the stack (after modifying values)
helm upgrade kube-prometheus-stack prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--values prometheus-values.yamlOfficial references
- Prometheus — Official Prometheus documentation
- PromQL — Query language guide
- Grafana — Official Grafana documentation
- kube-prometheus-stack — Helm chart and configuration
- Prometheus Operator — Operator and CRDs documentation
- Grafana Dashboards — Community dashboards repository
- Alertmanager — Alerting and routing configuration
- kube-state-metrics — K8s object state metrics
- Kubernetes Monitoring with Prometheus (CNCF) — CNCF guide
Summary
Today we set up the full observability stack in our cluster:
- Prometheus collects metrics with a pull model, scraping
/metricsendpoints every N seconds. - Grafana visualizes those metrics in interactive, pre-configured dashboards.
- Alertmanager manages alerts: it groups, deduplicates and routes them to your notification channels.
- kube-state-metrics exposes Kubernetes object state as metrics.
- node-exporter exposes operating system-level metrics for each node.
- Everything is installed with a single Helm chart: kube-prometheus-stack.
- We use ServiceMonitors to tell Prometheus what to scrape.
- Metrics travel: App → Prometheus (scraping) → Grafana (visualization).
With Prometheus and Grafana you have full visibility into your cluster. You’re no longer flying blind — now you can see exactly what’s happening, when and why.
In the next chapter we’ll put everything to the test: we’ll deploy two real apps (Python and Java) with metrics, HPA, and see how they behave under load in Grafana. Theory + full hands-on practice.
Did you enjoy this article? Share it with someone who’s setting up their observability stack. And if you have questions, leave me a comment!