From 4 Minutes to 2 Seconds: Docker Build Optimization

So our Docker build was taking almost 4 minutes every time we pushed. Even with cache. It was killing our CI/CD speed and honestly just embarrassing at this point. The image was also 562MB which meant slow deployments and sluggish Kubernetes pod starts.

I fixed it. Here's what I did.

What Was Wrong

The old setup was your typical single-stage Dockerfile. Install everything, build everything, ship everything. It worked but it was bloated and slow.

The real issue? Cache wasn't working properly. Every rebuild was basically starting from scratch. And worse, the container ran as root with /bin/sh available. Not great for security.

The Fix: Multi-Stage Build with BuildKit Cache

I rewrote the Dockerfile into three stages:

# Stage 1: Dependencies - cached layer
FROM node:22-alpine AS deps
RUN --mount=type=cache,target=/root/.pnpm-store pnpm install

# Stage 2: Builder - needs ALL deps
FROM node:22-alpine AS builder
COPY --from=deps /app/node_modules ./node_modules
RUN --mount=type=cache,target=/app/.next/cache pnpm build

# Stage 3: Runtime - distroless (no shell!)
FROM gcr.io/distroless/nodejs22-debian12 AS runner
COPY --from=builder /app/.next/standalone ./
COPY --from=builder /app/.next/static .next/static
COPY --from=builder /app/public ./public
USER node
CMD ["server.js"]

The magic here is the BuildKit cache mounts. We mount two persistent caches:

~/.pnpm-store keeps npm packages between builds
/app/.next/cache is the Next.js build cache for incremental builds

Now when only source code changes, package installation is skipped entirely.

The Numbers

Cold build: 3m 39s → 2m 04s
Cached rebuild: ~3-4 min → 1.9s
Image size: 562MB → 348MB

That 1.9 second cached rebuild was the game changer. Our CI pipeline no longer dies on trivial code changes.

Security: Actually Doing It Right

The biggest win wasn't even the speed. It was actually securing the container properly.

We switched to distroless (gcr.io/distroless/nodejs22-debian12). This image has no shell, no package manager, nothing extra. Just Node.js.

security_opt:
  - no-new-privileges:true
read_only: true
cap_drop:
  - ALL
tmpfs:
  - /tmp:size=64M

The filesystem is read-only, we run as non-root, and all Linux capabilities are dropped.

If someone finds an RCE vulnerability in the app, they can't do much. There's no shell to spawn, no filesystem to explore, no tools to install. The attack is contained to the Node.js process.

Nginx Layer

I also added nginx in front of Next.js to handle static assets properly. This means:

Static files cached for 1 year
Gzip compression (level 6, ~70% bandwidth savings)
Rate limiting to protect against traffic spikes
65k worker connections for high concurrency

location /_next/static/ {
    proxy_pass http://frontend;
    expires 1y;
    add_header Cache-Control "public, max-age=31536000, immutable";
    access_log off;
}

Static assets never hit the Node.js app. Nginx serves them directly with aggressive caching.

What's Next

Still have some things on the list. Here's how I'd actually implement them.

Add a CDN for Edge Caching

CloudFlare is the easiest option. Point your DNS to CloudFlare, enable caching, and done.

For Next.js specifically, set up aggressive cache headers in next.config.ts:

headers: async () => [
  {
    source: '/(.*)',
    headers: [
      { key: 'Cache-Control', value: 'public, max-age=86400, stale-while-revalidate=31536000' },
    ],
  },
]

CloudFlare will cache static assets at the edge. Your server barely gets touched for repeat visits.

Try Brotli Compression

Brotli is 15-20% better than gzip. Nginx has built-in support, just enable it:

brotli on;
brotli_types text/plain text/css application/json application/javascript text/xml application/xml;
brotli_comp_level 6;

That's it. Nginx will automatically use brotli if the browser supports it.

Set Up Prometheus + Grafana

This is the monitoring stack I want. Export metrics from your Node.js app using prom-client:

import { register, Counter, Histogram } from 'prom-client'

const httpRequestDuration = new Histogram({
  name: 'http_request_duration_seconds',
  help: 'Duration of HTTP requests',
  labelNames: ['method', 'route', 'status_code'],
  buckets: [0.1, 0.3, 0.5, 1, 3, 5]
})

// In your request handler
httpRequestDuration.observe({ method: 'GET', route: '/api' }, duration)

Then Prometheus scrapes the /metrics endpoint, Grafana visualizes. You'll see request rates, error rates, latency percentiles, memory usage.

But the core infrastructure is solid now. Fast builds, small images, actually secure containers.

If you're still running root in your containers with a shell available, fix that first. It's 2026, no excuse.

From 4 Minutes to 2 Seconds: Docker Build Optimization

What Was Wrong

The Fix: Multi-Stage Build with BuildKit Cache

The Numbers

Security: Actually Doing It Right

Nginx Layer

What's Next

Add a CDN for Edge Caching

Try Brotli Compression

Set Up Prometheus + Grafana

Related Posts

Two Pools Are Better Than One: Splitting Our PostgreSQL Connections

How to Optimise a Next.js Web App

Command Palette

What Was Wrong

The Fix: Multi-Stage Build with BuildKit Cache

The Numbers

Security: Actually Doing It Right

Nginx Layer

What's Next

Add a CDN for Edge Caching

Try Brotli Compression

Set Up Prometheus + Grafana

Related Posts

Two Pools Are Better Than One: Splitting Our PostgreSQL Connections

How to Optimise a Next.js Web App