Skip to content
C Codeloom
Testing

Load Testing with k6

A practical introduction to k6 for load testing HTTP services. Covers scripting, stages, thresholds, and how to read the results without fooling yourself.

·4 min read · By Codeloom
Intermediate 10 min read

What you'll learn

  • What load testing actually measures
  • How k6 scripts are structured
  • How to model realistic traffic with stages
  • What thresholds give you
  • How to avoid common measurement mistakes

Prerequisites

  • Basic JavaScript
  • Familiar with HTTP APIs

Load testing answers the question of how your service behaves under realistic and unrealistic traffic. k6 is a modern, scriptable tool that makes it cheap to ask that question repeatedly. This post walks through the basics and the gotchas that catch teams the first time.

What and Why

Load testing is the practice of sending controlled traffic to a service and measuring how it responds. The goals are usually some mix of finding the capacity ceiling, confirming a service stays within an SLO under expected load, and exposing memory leaks or saturation points before they happen in production.

k6 is a load testing tool written in Go with a JavaScript scripting layer. You write a small script that defines virtual users and their behaviour, then run it locally or in cloud workers. Results come back as time series of request rates, response times, and custom metrics you define.

Mental Model

Picture a control panel with two knobs. One knob is the number of virtual users, which is how many parallel clients are hitting your service. The other is the duration of each phase. You ramp up, hold, ramp down, and watch the dials. Some dials are about the load generator (requests sent, errors received). Some are about the system under test (latency percentiles, error rate). Thresholds are pass/fail rules you attach to those dials.

The trap is to confuse virtual users with real users. A virtual user is just a script loop. Whether that maps to a real user depends on how realistic your script is.

Hands-on Example

A starter script that hits an API and checks the response.

import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  stages: [
    { duration: '30s', target: 20 },
    { duration: '1m',  target: 20 },
    { duration: '30s', target: 0 },
  ],
  thresholds: {
    http_req_failed:   ['rate<0.01'],
    http_req_duration: ['p(95)<300'],
  },
};

export default function () {
  const res = http.get('https://api.example.com/products');
  check(res, { 'status is 200': (r) => r.status === 200 });
  sleep(1);
}

This ramps to 20 virtual users over 30 seconds, holds for a minute, and ramps down. It fails the run if more than 1 percent of requests fail or if the 95th percentile latency exceeds 300 milliseconds.

stages: ramp up -> hold -> ramp down

[k6 VUs] --requests--> [your service] --responses--> [k6]
                                                     |
                                                     v
                                        metrics: rps, p95, errors
                                                     |
                                                     v
                                        thresholds: pass / fail
k6 traffic profile and metrics flow

Run it with k6 run script.js and stream results to a time-series database like InfluxDB if you want to inspect them later.

Common Pitfalls

The first pitfall is testing from a single machine over the public internet. The bottleneck becomes your laptop’s CPU or the bandwidth between you and the service, not the service itself. Run k6 close to the service or use a distributed runner.

The second is reusing a single token or single resource ID across all virtual users. Caches and database hot rows make the service look faster than it is. Generate varied inputs.

The third is reading averages instead of percentiles. A mean of 80 milliseconds with a p99 of 6 seconds is a system on fire. Always look at p95 and p99 alongside the mean.

Practical Tips

Treat the load test as code. Commit scripts, review them, and run them in CI against a staging environment on every release candidate.

Tag requests with names so the report breaks down by endpoint instead of treating every URL as one bucket. Useful when a single script exercises several routes.

Warm up. A cold cache and a cold JIT make the first thirty seconds of every run misleading. Either discard that window or ramp up slowly.

Wrap-up

k6 lowers the cost of asking how your service behaves under load. The hard part is not the tool. It is being honest about whether your script and environment mirror reality well enough for the numbers to mean anything.