Kubernetes in Practice: Deploying Python and Java Apps with Metrics, HPA, and Grafana

Updated March 2026: Code and manifests verified with Kubernetes v1.35, Prometheus 3.x, and
autoscaling/v2.
Prerequisites
This is the hands-on chapter of the series. You need everything from previous chapters up and running:
- Kind cluster active — Chapter 1: Setting Up the Cluster
- Know how to create Deployments, Services, and YAML manifests — Chapter 2: K8s Resources
- Understand the HPA — Chapter 3: Autoscaling
- Prometheus and Grafana installed in the cluster — Chapter 4: Observability
- Metrics Server installed (we did this in chapter 3)
Additional tools for this chapter:
- Docker to build the images.
- Git and a GitHub account to push the images to the Container Registry.
What Are We Going to Build?
We’re going to deploy two real APIs in our Kind cluster, each in a different language, both exposing metrics to Prometheus:
| App | Language | Framework | Metrics | Endpoints |
|---|---|---|---|---|
| kubernetes-demo-apps-01 (Python) | Python 3.12 | Flask + prometheus-client | Counter, Histogram, Gauge | /api/users, /api/heavy, /api/cache |
| kubernetes-demo-apps-02 (Java) | Java 21 | Spring Boot + Micrometer | Counter, Timer, Gauge | /api/products, /api/orders, /api/heavy |
Both apps have:
- An
/api/heavyendpoint that consumes CPU — perfect for triggering the HPA. - Request, latency, and cache metrics exposed in Prometheus format.
- Health checks for Kubernetes probes.
- Complete manifests: Deployment, Service, ServiceMonitor, and HPA.

Two APIs deployed on Kind with metrics, HPA, and monitoring in Grafana
The Source Code
The complete code for both apps is available on GitHub:
- Python: kubernetes-demo-apps-01
- Java: kubernetes-demo-apps-02
Clone the repo and navigate to each app’s folder:
git clone https://github.com/pescarcena/blog-pescarcena-code.git
cd blog-pescarcena-code
# Python
cd kubernetes-demo-apps-01
# Java
cd kubernetes-demo-apps-02Let’s walk through the key parts of each app.
Python App: Flask + prometheus-client
Project Structure
kubernetes-demo-apps-01/
├── app.py # API code
├── requirements.txt # Dependencies
├── Dockerfile # Docker image
├── k8s/
│ ├── deployment.yaml # K8s Deployment
│ ├── service.yaml # Service
│ ├── servicemonitor.yaml # ServiceMonitor for Prometheus
│ └── hpa.yaml # HPA by CPU and memory
└── .github/
└── workflows/
└── build-push.yaml # CI/CD for GitHub ActionsThe Metrics It Exposes
In app.py we define three types of Prometheus metrics:
from prometheus_client import Counter, Histogram, Gauge
# Counter: counts total requests (only goes up)
REQUEST_COUNT = Counter(
"http_requests_total",
"Total de requests HTTP",
["method", "endpoint", "status"],
)
# Histogram: measures latency distribution
REQUEST_LATENCY = Histogram(
"http_request_duration_seconds",
"Duración de requests HTTP en segundos",
["method", "endpoint"],
buckets=[0.01, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0],
)
# Gauge: value that goes up and down (items in cache)
ITEMS_IN_CACHE = Gauge(
"app_cache_items_total",
"Cantidad de items en caché",
)The metrics are automatically instrumented with Flask’s before_request and after_request decorators — every incoming request is counted and its latency is measured without touching the business logic.
The Heavy Endpoint (for Testing HPA)
@app.route("/api/heavy")
def heavy_endpoint():
"""Endpoint que consume CPU — útil para probar el HPA."""
total = 0
for i in range(random.randint(500_000, 2_000_000)):
total += i * i
return jsonify({"result": total, "message": "Heavy computation done"})This endpoint does heavy computations on purpose. When you call it repeatedly, CPU usage will spike and the HPA will scale the Pods.
The /metrics Endpoint
@app.route("/metrics")
def metrics():
return Response(generate_latest(), mimetype=CONTENT_TYPE_LATEST)Prometheus scrapes this endpoint. If you access localhost:8080/metrics you’ll see something like:
# HELP http_requests_total Total de requests HTTP
# TYPE http_requests_total counter
http_requests_total{endpoint="/api/users",method="GET",status="200"} 42.0
http_requests_total{endpoint="/api/heavy",method="GET",status="200"} 15.0
# HELP http_request_duration_seconds Duración de requests HTTP en segundos
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{endpoint="/api/users",le="0.1"} 38.0
http_request_duration_seconds_bucket{endpoint="/api/users",le="0.25"} 42.0
...Java App: Spring Boot + Micrometer
Project Structure
kubernetes-demo-apps-02/
├── pom.xml # Maven dependencies
├── Dockerfile # Multi-stage build
├── src/main/java/com/demo/metricsapi/
│ ├── MetricsApiApplication.java # Main class
│ ├── ApiController.java # API endpoints
│ └── MetricsConfig.java # Micrometer configuration
├── src/main/resources/
│ └── application.yaml # Spring Boot config
├── k8s/
│ ├── deployment.yaml
│ ├── service.yaml
│ ├── servicemonitor.yaml
│ └── hpa.yaml
└── .github/
└── workflows/
└── build-push.yamlThe Metrics It Exposes
Spring Boot with Micrometer and the Prometheus registry does most of the work automatically. You just need to add the dependency:
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>And in application.yaml enable the endpoint:
management:
endpoints:
web:
exposure:
include: health,prometheus,info
prometheus:
metrics:
export:
enabled: trueWith this, Spring Boot automatically exposes JVM, HTTP, Tomcat, connection metrics, and more at /actuator/prometheus.
For custom metrics, we use annotations and the registry:
// Automatic timer with @Timed
@GetMapping("/api/products")
@Timed(value = "http_request_duration_seconds", extraTags = {"endpoint", "/api/products"})
public List<Map<String, Object>> getProducts() { ... }
// Manual counter
this.ordersCounter = Counter.builder("app_orders_total")
.description("Total de órdenes procesadas")
.register(registry);
// Gauge that tracks cache size
Gauge.builder("app_cache_items_total", cache, ConcurrentHashMap::size)
.register(registry);Key Difference: /metrics vs /actuator/prometheus
| Python (prometheus-client) | Java (Micrometer) |
|---|---|
Endpoint: /metrics | Endpoint: /actuator/prometheus |
| Metrics defined manually | Many automatic metrics (JVM, HTTP, etc.) |
| Native Prometheus format | Prometheus format via Micrometer registry |
This is important for the ServiceMonitors: each app has a different path where Prometheus needs to scrape.
Building and Pushing Images to GHCR
Let’s build the Docker images and push them to the GitHub Container Registry (GHCR) so Kind can use them.
Option A: Automatic CI/CD with GitHub Actions
Both repos include a workflow in .github/workflows/build-push.yaml that builds and pushes the image automatically when you push to main. You just need to:
- Create the repos on GitHub.
- Push the code.
- The workflow runs on its own.
The images will be at:
ghcr.io/<your-username>/python-metrics-demo:latestghcr.io/<your-username>/java-metrics-demo:latest
Option B: Manual Build and Push
If you prefer doing it manually:
# Login to GHCR
echo $GITHUB_TOKEN | docker login ghcr.io -u <YOUR_USERNAME> --password-stdinPython:
cd kubernetes-demo-apps-01
# Build
docker build -t ghcr.io/<YOUR_USERNAME>/python-metrics-demo:latest .
# Push
docker push ghcr.io/<YOUR_USERNAME>/python-metrics-demo:latestJava:
cd kubernetes-demo-apps-02
# Build (multi-stage: compiles with Maven + creates lightweight image)
docker build -t ghcr.io/<YOUR_USERNAME>/java-metrics-demo:latest .
# Push
docker push ghcr.io/<YOUR_USERNAME>/java-metrics-demo:latest
Building and pushing Docker images to the GitHub Container Registry
Loading Images into Kind (Alternative without GHCR)
If you don’t want to use GHCR, you can load the images directly into Kind:
# Local build
docker build -t python-metrics-demo:latest ./kubernetes-demo-apps-01
docker build -t java-metrics-demo:latest ./kubernetes-demo-apps-02
# Load into Kind
kind load docker-image python-metrics-demo:latest --name mi-cluster
kind load docker-image java-metrics-demo:latest --name mi-clusterIf you use this option, change the image in the Deployments to python-metrics-demo:latest and java-metrics-demo:latest (without the ghcr.io prefix) and add imagePullPolicy: Never.
Deploying to Kubernetes
Now comes the fun part. Let’s deploy everything in our cluster.
Step 1: Update the Images in the Manifests
Edit the k8s/deployment.yaml for each app and replace <YOUR_USERNAME> with your GitHub username:
# In both deployment.yaml files, change:
image: ghcr.io/<YOUR_USERNAME>/python-metrics-demo:latest
image: ghcr.io/<YOUR_USERNAME>/java-metrics-demo:latestStep 2: Deploy the Python App
cd kubernetes-demo-apps-01
kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/service.yaml
kubectl apply -f k8s/servicemonitor.yaml
kubectl apply -f k8s/hpa.yamlStep 3: Deploy the Java App
cd kubernetes-demo-apps-02
kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/service.yaml
kubectl apply -f k8s/servicemonitor.yaml
kubectl apply -f k8s/hpa.yamlStep 4: Verify Everything Is Running
kubectl get deploymentsNAME READY UP-TO-DATE AVAILABLE AGE
python-metrics-demo 2/2 2 2 30s
java-metrics-demo 2/2 2 2 25skubectl get podsNAME READY STATUS RESTARTS AGE
python-metrics-demo-6b8f9c7d5-abc12 1/1 Running 0 35s
python-metrics-demo-6b8f9c7d5-def34 1/1 Running 0 35s
java-metrics-demo-7c9g0d8e6-ghi56 1/1 Running 0 30s
java-metrics-demo-7c9g0d8e6-jkl78 1/1 Running 0 30skubectl get svcNAME TYPE CLUSTER-IP PORT(S) AGE
python-metrics-demo ClusterIP 10.96.50.10 8080/TCP 40s
java-metrics-demo ClusterIP 10.96.50.11 8080/TCP 35skubectl get hpaNAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS
python-metrics-demo-hpa Deployment/python-metrics-demo 5%/50%, 20%/70% 2 8 2
java-metrics-demo-hpa Deployment/java-metrics-demo 8%/50%, 35%/70% 2 8 2
Deployments, Services, HPAs, and Pods running in the cluster
Testing the APIs
Let’s use port-forward to test both apps.
Python App
kubectl port-forward svc/python-metrics-demo 8081:8080In another terminal:
# List users
curl http://localhost:8081/api/users
# Response:
[
{"id": 1, "name": "Alice", "email": "[email protected]"},
{"id": 2, "name": "Bob", "email": "[email protected]"},
{"id": 3, "name": "Charlie", "email": "[email protected]"}
]
# Save to cache
curl -X POST http://localhost:8081/api/cache/mikey/mivalue
# View metrics
curl http://localhost:8081/metricsJava App
kubectl port-forward svc/java-metrics-demo 8082:8080# List products
curl http://localhost:8082/api/products
# Response:
[
{"id": 1, "name": "Laptop Pro", "price": 1299.99},
{"id": 2, "name": "Wireless Mouse", "price": 29.99},
{"id": 3, "name": "USB-C Hub", "price": 49.99}
]
# Create an order
curl -X POST http://localhost:8082/api/orders -H "Content-Type: application/json"
# View metrics (note the different path)
curl http://localhost:8082/actuator/prometheus
Both APIs responding correctly via port-forward
Checking the Logs
Logs are your first line of debugging. Let’s see how to check them:
# Logs from a specific Pod
kubectl logs python-metrics-demo-6b8f9c7d5-abc12
# Logs from all Pods in a Deployment
kubectl logs -l app=python-metrics-demo
# Follow logs in real time
kubectl logs -l app=java-metrics-demo -f
# Logs from the last 5 minutes
kubectl logs -l app=python-metrics-demo --since=5m
# Logs with timestamps
kubectl logs -l app=java-metrics-demo --timestamps
Checking Pod logs with kubectl — your first line of debugging
Verifying Metrics in Prometheus
Let’s confirm that Prometheus is scraping both apps.
kubectl port-forward svc/kube-prometheus-stack-prometheus 9090:9090 -n monitoringOpen http://localhost:9090/targets and look for your app targets. You should see:
serviceMonitor/monitoring/python-metrics-demo— UPserviceMonitor/monitoring/java-metrics-demo— UP

Both apps showing as UP targets in Prometheus
Try some queries in the PromQL bar:
# Total requests from the Python app
http_requests_total{job="python-metrics-demo"}
# p95 latency from the Java app (last 5 minutes)
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket{job="java-metrics-demo"}[5m]))
# Orders created in Java
app_orders_total{job="java-metrics-demo"}
# Cache items from both apps
app_cache_items_total
# CPU usage from the demo Pods
rate(container_cpu_usage_seconds_total{pod=~"python-metrics-demo.*|java-metrics-demo.*"}[5m])
Querying app metrics in the Prometheus UI
Visualizing in Grafana
Now let’s head to Grafana to see everything visually.
kubectl port-forward svc/kube-prometheus-stack-grafana 3000:80 -n monitoringCluster Resource Dashboard
Go to Dashboards -> Kubernetes / Compute Resources / Namespace (Pods) and select the default namespace. You’ll see the CPU and memory consumption of your apps.

Resource view by namespace: CPU and memory for the Python and Java apps
Per-Pod Dashboard
Go to Kubernetes / Compute Resources / Pod and select one of the Pods. You’ll see the individual details:

Individual Pod detail: CPU, memory, network I/O, and filesystem
Custom Dashboard for the Apps
Create a new dashboard with these panels:
Panel 1 — Request Rate (both apps):
sum(rate(http_requests_total{job=~"python-metrics-demo|java-metrics-demo"}[5m])) by (job, endpoint)Panel 2 — p95 Latency:
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket{job=~"python-metrics-demo|java-metrics-demo"}[5m])) by (job, le))Panel 3 — Active Pods (Gauge):
count by (job) (up{job=~"python-metrics-demo|java-metrics-demo"})Panel 4 — Cache Items:
app_cache_items_total
Custom dashboard showing request rate, p95 latency, active pods, and cache
Testing Autoscaling Under Load
Now the real test: we’re going to generate load against the /api/heavy endpoint so the HPA scales automatically.
Generating Load Against Python
kubectl run load-python --image=busybox --rm -it -- /bin/sh -c \
"while true; do wget -q -O- http://python-metrics-demo:8080/api/heavy; done"Generating Load Against Java
In another terminal:
kubectl run load-java --image=busybox --rm -it -- /bin/sh -c \
"while true; do wget -q -O- http://java-metrics-demo:8080/api/heavy; done"Watching the Scaling
In another terminal, watch the HPAs in real time:
kubectl get hpa --watchNAME TARGETS MINPODS MAXPODS REPLICAS AGE
python-metrics-demo-hpa 5%/50%, 20%/70% 2 8 2 10m
java-metrics-demo-hpa 8%/50%, 35%/70% 2 8 2 10m
python-metrics-demo-hpa 72%/50%, 25%/70% 2 8 2 11m
python-metrics-demo-hpa 72%/50%, 25%/70% 2 8 3 11m30s
java-metrics-demo-hpa 65%/50%, 40%/70% 2 8 2 11m30s
java-metrics-demo-hpa 65%/50%, 40%/70% 2 8 3 12m
python-metrics-demo-hpa 58%/50%, 28%/70% 2 8 4 12m30s
...
The HPA detects high CPU load and creates new Pods automatically
Verify that more Pods were created:
kubectl get pods -l app=python-metrics-demoNAME READY STATUS RESTARTS AGE
python-metrics-demo-6b8f9c7d5-abc12 1/1 Running 0 12m
python-metrics-demo-6b8f9c7d5-def34 1/1 Running 0 12m
python-metrics-demo-6b8f9c7d5-ghi56 1/1 Running 0 1m
python-metrics-demo-6b8f9c7d5-jkl78 1/1 Running 0 30sWatching the Scaling in Grafana
Open the namespace dashboard in Grafana and you’ll see in real time how CPU usage goes up and new Pods appear:

Grafana showing in real time: CPU going up -> HPA creating Pods -> CPU going down
Stopping the Load
When you stop the load generators (Ctrl+C in each terminal), after the 5-minute stabilization window, the HPA will scale the replicas back down to 2.
Summary of Everything We Used
In this chapter we put everything we learned in the series to the test. Let’s recap which Kubernetes resource we used and what for:
| Resource | What We Used It For |
|---|---|
| Deployment | Deploy the apps with replicas, rolling updates, and rollback |
| Service | Provide a stable access point to the apps within the cluster |
| ServiceMonitor | Tell Prometheus to scrape our apps |
| HPA | Automatically scale by CPU and memory |
| Port-forward | Test the APIs from our local machine |
| Metrics Server | Provide CPU/memory metrics to the HPA |
| Prometheus | Collect and store app metrics |
| Grafana | Visualize metrics in interactive dashboards |
| kubectl logs | Debugging and troubleshooting Pods |
| Probes (liveness/readiness) | Verify that the apps are healthy |
| Resources (requests/limits) | Control how much CPU/memory each Pod can use |
What We Built
Source Code
All the code is available on GitHub:
- Python Demo: kubernetes-demo-apps-01
- Java Demo: kubernetes-demo-apps-02
Each repo includes:
- API source code
- Dockerfile
- Kubernetes manifests (
k8s/) - GitHub Actions for CI/CD (
.github/workflows/)
References
- prometheus-client (Python) — Official Prometheus library for Python
- Micrometer — Metrics library for Java/Spring
- Spring Boot Actuator + Prometheus — Official Spring Boot guide
- GitHub Container Registry — GHCR documentation
- Kind: Loading an Image — Loading images into Kind
Summary
Today we put it all together:
- We created two real APIs (Python + Java) that expose Prometheus metrics.
- We built and pushed them to the GitHub Container Registry.
- We deployed them to Kubernetes with Deployments, Services, and health probes.
- We configured ServiceMonitors so Prometheus scrapes them.
- We configured HPAs to automatically scale by CPU and memory.
- We verified metrics in Prometheus with PromQL queries.
- We created Grafana dashboards to visualize everything.
- We generated load and watched in real time how the HPA scaled the Pods.
- We checked logs with kubectl for debugging.
This is the complete lifecycle of a workload in Kubernetes: deploy -> expose -> monitor -> autoscale. With this knowledge, you’re ready to deploy and operate real applications in a cluster.
Did you enjoy this article? Share it with your team. And if you have any questions, leave me a comment!