Kubernetes in Practice: Deploying Python and Java Apps with Metrics, HPA, and Grafana

Paul Escarcena included in Kubernetes

2025-12-29 2273 words 11 minutes ... views

/images/kubernetes-demo-apps/demo-apps-header.png

Contents

Updated March 2026: Code and manifests verified with Kubernetes v1.35, Prometheus 3.x, and autoscaling/v2.

Prerequisites

This is the hands-on chapter of the series. You need everything from previous chapters up and running:

Kind cluster active — Chapter 1: Setting Up the Cluster
Know how to create Deployments, Services, and YAML manifests — Chapter 2: K8s Resources
Understand the HPA — Chapter 3: Autoscaling
Prometheus and Grafana installed in the cluster — Chapter 4: Observability
Metrics Server installed (we did this in chapter 3)

Additional tools for this chapter:

Docker to build the images.
Git and a GitHub account to push the images to the Container Registry.

What Are We Going to Build?

We’re going to deploy two real APIs in our Kind cluster, each in a different language, both exposing metrics to Prometheus:

App	Language	Framework	Metrics	Endpoints
kubernetes-demo-apps-01 (Python)	Python 3.12	Flask + prometheus-client	Counter, Histogram, Gauge	`/api/users`, `/api/heavy`, `/api/cache`
kubernetes-demo-apps-02 (Java)	Java 21	Spring Boot + Micrometer	Counter, Timer, Gauge	`/api/products`, `/api/orders`, `/api/heavy`

Both apps have:

An /api/heavy endpoint that consumes CPU — perfect for triggering the HPA.
Request, latency, and cache metrics exposed in Prometheus format.
Health checks for Kubernetes probes.
Complete manifests: Deployment, Service, ServiceMonitor, and HPA.

Two APIs deployed on Kind with metrics, HPA, and monitoring in Grafana

The Source Code

The complete code for both apps is available on GitHub:

Python: kubernetes-demo-apps-01
Java: kubernetes-demo-apps-02

Clone the repo and navigate to each app’s folder:

git clone https://github.com/pescarcena/blog-pescarcena-code.git
cd blog-pescarcena-code

# Python
cd kubernetes-demo-apps-01

# Java
cd kubernetes-demo-apps-02

Let’s walk through the key parts of each app.

Python App: Flask + prometheus-client

Project Structure

kubernetes-demo-apps-01/
├── app.py                 # API code
├── requirements.txt       # Dependencies
├── Dockerfile             # Docker image
├── k8s/
│   ├── deployment.yaml    # K8s Deployment
│   ├── service.yaml       # Service
│   ├── servicemonitor.yaml # ServiceMonitor for Prometheus
│   └── hpa.yaml           # HPA by CPU and memory
└── .github/
    └── workflows/
        └── build-push.yaml # CI/CD for GitHub Actions

The Metrics It Exposes

In app.py we define three types of Prometheus metrics:

from prometheus_client import Counter, Histogram, Gauge

# Counter: counts total requests (only goes up)
REQUEST_COUNT = Counter(
    "http_requests_total",
    "Total de requests HTTP",
    ["method", "endpoint", "status"],
)

# Histogram: measures latency distribution
REQUEST_LATENCY = Histogram(
    "http_request_duration_seconds",
    "Duración de requests HTTP en segundos",
    ["method", "endpoint"],
    buckets=[0.01, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0],
)

# Gauge: value that goes up and down (items in cache)
ITEMS_IN_CACHE = Gauge(
    "app_cache_items_total",
    "Cantidad de items en caché",
)

The metrics are automatically instrumented with Flask’s before_request and after_request decorators — every incoming request is counted and its latency is measured without touching the business logic.

The Heavy Endpoint (for Testing HPA)

@app.route("/api/heavy")
def heavy_endpoint():
    """Endpoint que consume CPU — útil para probar el HPA."""
    total = 0
    for i in range(random.randint(500_000, 2_000_000)):
        total += i * i
    return jsonify({"result": total, "message": "Heavy computation done"})

This endpoint does heavy computations on purpose. When you call it repeatedly, CPU usage will spike and the HPA will scale the Pods.

The /metrics Endpoint

@app.route("/metrics")
def metrics():
    return Response(generate_latest(), mimetype=CONTENT_TYPE_LATEST)

Prometheus scrapes this endpoint. If you access localhost:8080/metrics you’ll see something like:

# HELP http_requests_total Total de requests HTTP
# TYPE http_requests_total counter
http_requests_total{endpoint="/api/users",method="GET",status="200"} 42.0
http_requests_total{endpoint="/api/heavy",method="GET",status="200"} 15.0

# HELP http_request_duration_seconds Duración de requests HTTP en segundos
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{endpoint="/api/users",le="0.1"} 38.0
http_request_duration_seconds_bucket{endpoint="/api/users",le="0.25"} 42.0
...

Java App: Spring Boot + Micrometer

Project Structure

kubernetes-demo-apps-02/
├── pom.xml                         # Maven dependencies
├── Dockerfile                      # Multi-stage build
├── src/main/java/com/demo/metricsapi/
│   ├── MetricsApiApplication.java  # Main class
│   ├── ApiController.java          # API endpoints
│   └── MetricsConfig.java          # Micrometer configuration
├── src/main/resources/
│   └── application.yaml            # Spring Boot config
├── k8s/
│   ├── deployment.yaml
│   ├── service.yaml
│   ├── servicemonitor.yaml
│   └── hpa.yaml
└── .github/
    └── workflows/
        └── build-push.yaml

The Metrics It Exposes

Spring Boot with Micrometer and the Prometheus registry does most of the work automatically. You just need to add the dependency:

<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

And in application.yaml enable the endpoint:

management:
  endpoints:
    web:
      exposure:
        include: health,prometheus,info
  prometheus:
    metrics:
      export:
        enabled: true

With this, Spring Boot automatically exposes JVM, HTTP, Tomcat, connection metrics, and more at /actuator/prometheus.

For custom metrics, we use annotations and the registry:

// Automatic timer with @Timed
@GetMapping("/api/products")
@Timed(value = "http_request_duration_seconds", extraTags = {"endpoint", "/api/products"})
public List<Map<String, Object>> getProducts() { ... }

// Manual counter
this.ordersCounter = Counter.builder("app_orders_total")
        .description("Total de órdenes procesadas")
        .register(registry);

// Gauge that tracks cache size
Gauge.builder("app_cache_items_total", cache, ConcurrentHashMap::size)
        .register(registry);

Key Difference: /metrics vs /actuator/prometheus

Python (prometheus-client)	Java (Micrometer)
Endpoint: `/metrics`	Endpoint: `/actuator/prometheus`
Metrics defined manually	Many automatic metrics (JVM, HTTP, etc.)
Native Prometheus format	Prometheus format via Micrometer registry

This is important for the ServiceMonitors: each app has a different path where Prometheus needs to scrape.

Building and Pushing Images to GHCR

Let’s build the Docker images and push them to the GitHub Container Registry (GHCR) so Kind can use them.

Option A: Automatic CI/CD with GitHub Actions

Both repos include a workflow in .github/workflows/build-push.yaml that builds and pushes the image automatically when you push to main. You just need to:

Create the repos on GitHub.
Push the code.
The workflow runs on its own.

The images will be at:

ghcr.io/<your-username>/python-metrics-demo:latest
ghcr.io/<your-username>/java-metrics-demo:latest

Option B: Manual Build and Push

If you prefer doing it manually:

# Login to GHCR
echo $GITHUB_TOKEN | docker login ghcr.io -u <YOUR_USERNAME> --password-stdin

Python:

cd kubernetes-demo-apps-01

# Build
docker build -t ghcr.io/<YOUR_USERNAME>/python-metrics-demo:latest .

# Push
docker push ghcr.io/<YOUR_USERNAME>/python-metrics-demo:latest

Java:

cd kubernetes-demo-apps-02

# Build (multi-stage: compiles with Maven + creates lightweight image)
docker build -t ghcr.io/<YOUR_USERNAME>/java-metrics-demo:latest .

# Push
docker push ghcr.io/<YOUR_USERNAME>/java-metrics-demo:latest

Building and pushing Docker images to the GitHub Container Registry

Loading Images into Kind (Alternative without GHCR)

If you don’t want to use GHCR, you can load the images directly into Kind:

# Local build
docker build -t python-metrics-demo:latest ./kubernetes-demo-apps-01
docker build -t java-metrics-demo:latest ./kubernetes-demo-apps-02

# Load into Kind
kind load docker-image python-metrics-demo:latest --name mi-cluster
kind load docker-image java-metrics-demo:latest --name mi-cluster

If you use this option, change the image in the Deployments to python-metrics-demo:latest and java-metrics-demo:latest (without the ghcr.io prefix) and add imagePullPolicy: Never.

Deploying to Kubernetes

Now comes the fun part. Let’s deploy everything in our cluster.

Step 1: Update the Images in the Manifests

Edit the k8s/deployment.yaml for each app and replace <YOUR_USERNAME> with your GitHub username:

# In both deployment.yaml files, change:
image: ghcr.io/<YOUR_USERNAME>/python-metrics-demo:latest
image: ghcr.io/<YOUR_USERNAME>/java-metrics-demo:latest

Step 2: Deploy the Python App

cd kubernetes-demo-apps-01

kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/service.yaml
kubectl apply -f k8s/servicemonitor.yaml
kubectl apply -f k8s/hpa.yaml

Step 3: Deploy the Java App

cd kubernetes-demo-apps-02

kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/service.yaml
kubectl apply -f k8s/servicemonitor.yaml
kubectl apply -f k8s/hpa.yaml

Step 4: Verify Everything Is Running

kubectl get deployments

NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
python-metrics-demo     2/2     2            2           30s
java-metrics-demo       2/2     2            2           25s

kubectl get pods

NAME                                    READY   STATUS    RESTARTS   AGE
python-metrics-demo-6b8f9c7d5-abc12     1/1     Running   0          35s
python-metrics-demo-6b8f9c7d5-def34     1/1     Running   0          35s
java-metrics-demo-7c9g0d8e6-ghi56       1/1     Running   0          30s
java-metrics-demo-7c9g0d8e6-jkl78       1/1     Running   0          30s

kubectl get svc

NAME                    TYPE        CLUSTER-IP      PORT(S)    AGE
python-metrics-demo     ClusterIP   10.96.50.10     8080/TCP   40s
java-metrics-demo       ClusterIP   10.96.50.11     8080/TCP   35s

kubectl get hpa

NAME                        REFERENCE                      TARGETS           MINPODS   MAXPODS   REPLICAS
python-metrics-demo-hpa     Deployment/python-metrics-demo 5%/50%, 20%/70%   2         8         2
java-metrics-demo-hpa       Deployment/java-metrics-demo   8%/50%, 35%/70%   2         8         2

Deployments, Services, HPAs, and Pods running in the cluster

Testing the APIs

Let’s use port-forward to test both apps.

Python App

kubectl port-forward svc/python-metrics-demo 8081:8080

In another terminal:

# List users
curl http://localhost:8081/api/users

# Response:
[
  {"id": 1, "name": "Alice", "email": "[email protected]"},
  {"id": 2, "name": "Bob", "email": "[email protected]"},
  {"id": 3, "name": "Charlie", "email": "[email protected]"}
]

# Save to cache
curl -X POST http://localhost:8081/api/cache/mikey/mivalue

# View metrics
curl http://localhost:8081/metrics

Java App

kubectl port-forward svc/java-metrics-demo 8082:8080

# List products
curl http://localhost:8082/api/products

# Response:
[
  {"id": 1, "name": "Laptop Pro", "price": 1299.99},
  {"id": 2, "name": "Wireless Mouse", "price": 29.99},
  {"id": 3, "name": "USB-C Hub", "price": 49.99}
]

# Create an order
curl -X POST http://localhost:8082/api/orders -H "Content-Type: application/json"

# View metrics (note the different path)
curl http://localhost:8082/actuator/prometheus

Both APIs responding correctly via port-forward

Checking the Logs

Logs are your first line of debugging. Let’s see how to check them:

# Logs from a specific Pod
kubectl logs python-metrics-demo-6b8f9c7d5-abc12

# Logs from all Pods in a Deployment
kubectl logs -l app=python-metrics-demo

# Follow logs in real time
kubectl logs -l app=java-metrics-demo -f

# Logs from the last 5 minutes
kubectl logs -l app=python-metrics-demo --since=5m

# Logs with timestamps
kubectl logs -l app=java-metrics-demo --timestamps

Checking Pod logs with kubectl — your first line of debugging

Tip

If you need more advanced logs (search, filtering, retention), the natural next step is to add Loki to the observability stack. Prometheus is for metrics, Loki is for logs.

Verifying Metrics in Prometheus

Let’s confirm that Prometheus is scraping both apps.

kubectl port-forward svc/kube-prometheus-stack-prometheus 9090:9090 -n monitoring

Open http://localhost:9090/targets and look for your app targets. You should see:

serviceMonitor/monitoring/python-metrics-demo — UP
serviceMonitor/monitoring/java-metrics-demo — UP

Both apps showing as UP targets in Prometheus

Try some queries in the PromQL bar:

# Total requests from the Python app
http_requests_total{job="python-metrics-demo"}

# p95 latency from the Java app (last 5 minutes)
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket{job="java-metrics-demo"}[5m]))

# Orders created in Java
app_orders_total{job="java-metrics-demo"}

# Cache items from both apps
app_cache_items_total

# CPU usage from the demo Pods
rate(container_cpu_usage_seconds_total{pod=~"python-metrics-demo.*|java-metrics-demo.*"}[5m])

Querying app metrics in the Prometheus UI

Visualizing in Grafana

Now let’s head to Grafana to see everything visually.

kubectl port-forward svc/kube-prometheus-stack-grafana 3000:80 -n monitoring

Cluster Resource Dashboard

Go to Dashboards -> Kubernetes / Compute Resources / Namespace (Pods) and select the default namespace. You’ll see the CPU and memory consumption of your apps.

Resource view by namespace: CPU and memory for the Python and Java apps

Per-Pod Dashboard

Go to Kubernetes / Compute Resources / Pod and select one of the Pods. You’ll see the individual details:

Individual Pod detail: CPU, memory, network I/O, and filesystem

Custom Dashboard for the Apps

Create a new dashboard with these panels:

Panel 1 — Request Rate (both apps):

sum(rate(http_requests_total{job=~"python-metrics-demo|java-metrics-demo"}[5m])) by (job, endpoint)

Panel 2 — p95 Latency:

histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket{job=~"python-metrics-demo|java-metrics-demo"}[5m])) by (job, le))

Panel 3 — Active Pods (Gauge):

count by (job) (up{job=~"python-metrics-demo|java-metrics-demo"})

Panel 4 — Cache Items:

app_cache_items_total

Custom dashboard showing request rate, p95 latency, active pods, and cache

Testing Autoscaling Under Load

Now the real test: we’re going to generate load against the /api/heavy endpoint so the HPA scales automatically.

Generating Load Against Python

kubectl run load-python --image=busybox --rm -it -- /bin/sh -c \
  "while true; do wget -q -O- http://python-metrics-demo:8080/api/heavy; done"

Generating Load Against Java

In another terminal:

kubectl run load-java --image=busybox --rm -it -- /bin/sh -c \
  "while true; do wget -q -O- http://java-metrics-demo:8080/api/heavy; done"

Watching the Scaling

In another terminal, watch the HPAs in real time:

kubectl get hpa --watch

NAME                        TARGETS            MINPODS   MAXPODS   REPLICAS   AGE
python-metrics-demo-hpa     5%/50%, 20%/70%    2         8         2          10m
java-metrics-demo-hpa       8%/50%, 35%/70%    2         8         2          10m
python-metrics-demo-hpa     72%/50%, 25%/70%   2         8         2          11m
python-metrics-demo-hpa     72%/50%, 25%/70%   2         8         3          11m30s
java-metrics-demo-hpa       65%/50%, 40%/70%   2         8         2          11m30s
java-metrics-demo-hpa       65%/50%, 40%/70%   2         8         3          12m
python-metrics-demo-hpa     58%/50%, 28%/70%   2         8         4          12m30s
...

The HPA detects high CPU load and creates new Pods automatically

Verify that more Pods were created:

kubectl get pods -l app=python-metrics-demo

NAME                                    READY   STATUS    RESTARTS   AGE
python-metrics-demo-6b8f9c7d5-abc12     1/1     Running   0          12m
python-metrics-demo-6b8f9c7d5-def34     1/1     Running   0          12m
python-metrics-demo-6b8f9c7d5-ghi56     1/1     Running   0          1m
python-metrics-demo-6b8f9c7d5-jkl78     1/1     Running   0          30s

Watching the Scaling in Grafana

Open the namespace dashboard in Grafana and you’ll see in real time how CPU usage goes up and new Pods appear:

Grafana showing in real time: CPU going up -> HPA creating Pods -> CPU going down

Stopping the Load

When you stop the load generators (Ctrl+C in each terminal), after the 5-minute stabilization window, the HPA will scale the replicas back down to 2.

Summary of Everything We Used

In this chapter we put everything we learned in the series to the test. Let’s recap which Kubernetes resource we used and what for:

Resource	What We Used It For
Deployment	Deploy the apps with replicas, rolling updates, and rollback
Service	Provide a stable access point to the apps within the cluster
ServiceMonitor	Tell Prometheus to scrape our apps
HPA	Automatically scale by CPU and memory
Port-forward	Test the APIs from our local machine
Metrics Server	Provide CPU/memory metrics to the HPA
Prometheus	Collect and store app metrics
Grafana	Visualize metrics in interactive dashboards
kubectl logs	Debugging and troubleshooting Pods
Probes (liveness/readiness)	Verify that the apps are healthy
Resources (requests/limits)	Control how much CPU/memory each Pod can use

What We Built

Source Code

All the code is available on GitHub:

Python Demo: kubernetes-demo-apps-01
Java Demo: kubernetes-demo-apps-02

Each repo includes:

API source code
Dockerfile
Kubernetes manifests (k8s/)
GitHub Actions for CI/CD (.github/workflows/)

References

prometheus-client (Python) — Official Prometheus library for Python
Micrometer — Metrics library for Java/Spring
Spring Boot Actuator + Prometheus — Official Spring Boot guide
GitHub Container Registry — GHCR documentation
Kind: Loading an Image — Loading images into Kind

Summary

Today we put it all together:

We created two real APIs (Python + Java) that expose Prometheus metrics.
We built and pushed them to the GitHub Container Registry.
We deployed them to Kubernetes with Deployments, Services, and health probes.
We configured ServiceMonitors so Prometheus scrapes them.
We configured HPAs to automatically scale by CPU and memory.
We verified metrics in Prometheus with PromQL queries.
We created Grafana dashboards to visualize everything.
We generated load and watched in real time how the HPA scaled the Pods.
We checked logs with kubectl for debugging.

This is the complete lifecycle of a workload in Kubernetes: deploy -> expose -> monitor -> autoscale. With this knowledge, you’re ready to deploy and operate real applications in a cluster.

Did you enjoy this article? Share it with your team. And if you have any questions, leave me a comment!