Endurance Testing: Finding Memory Leaks and Slow Burns
Introduction
Some performance problems hide in plain sight. A web application that passes a 15-minute load test can still run out of memory after 6 hours, fill a disk after two days, or gradually exhaust a connection pool until requests start timing out. Short tests don’t catch these because they don’t long enough.
Endurance testing (also called soak testing) fills that gap. You apply steady, moderate traffic for a long stretch of hours or even days and watch what the system does over time. The goal isn’t to push the system to its limit… It’s to confirm that a system staying online under normal use doesn’t gradually fall apart.
What Is Endurance Testing?
An endurance test is a type of load test that runs long enough for time-sensitive failures to surface. Unlike a stress test, which pushes a system past its capacity to see where it breaks, an endurance test holds load at a realistic level and waits. Problems that only show up after hours or days of sustained load — memory leaks, storage exhaustion, gradual slowdowns — get a chance to appear.
The term “soak test” means the same thing. In both cases, the system is “soaking” in load: steady pressure, no spikes, long duration. Some teams also use “stability testing” for the same concept, though stability testing can sometimes imply a broader scope (including functional tests over time, not just performance).
Why Endurance Tests Matter For Your Site
There are a few classes of problems that may never show up in a short test but can take a production system down within a day:
- Memory leaks. The application holds references longer than it should and the heap grows steadily. A short test looks fine; an overnight test eventually runs out of memory or triggers garbage-collection pauses.
- Connection pool exhaustion. Connections to a database, message broker, or third-party API leak one at a time until the pool is drained and requests start failing.
- Disk-fill issues. Log files, cached artifacts, or temp files accumulate faster than they’re rotated or cleaned up. The system works fine until the disk or partition is full, then everything fails at once.
- Cache bloat. In-memory caches without a proper eviction policy grow unbounded. Performance degrades slowly as lookups scan more entries.
- Background job backlog. Scheduled jobs run slightly slower than incoming work. Each job completes, but the queue grows.
- File descriptor leaks. Sockets, files, or pipes aren’t closed and slowly accumulate toward the process limit.
These problems all have something in common: they don’t exist initially but show up after a certain amount of time has elapsed. To detect or reproduce them, the test has to run long enough to observe the effect.
Endurance vs. Load vs. Stress vs. Spike Testing
These techniques all simulate traffic but answer different questions. In short:
- Load testing tests whether the system can handle expected traffic. This typically takes just 15–60 minutes.
- Stress testing tests what happens when traffic exceeds capacity. This ramps load fairly aggressively until the system breaks.
- Spike testing tests how the system handles a sudden, sharp burst of traffic.
- Endurance testing tests if the system stays healthy over hours or days at realistic traffic levels?
A thorough performance engineering program runs all four types of performance tess. See Load Testing vs Stress Testing for a longer breakdown of the two most commonly confused variants.
How to Run an Endurance Test
An endurance test has the same anatomy as a load test, just held steady for longer.
Duration. At least several hours. Overnight (8–12 hours) is a common baseline; a full weekend run is a more thorough check for slow-climbing metrics. For systems with long-running scheduled jobs (nightly reports, daily rollups), you want the test to cover at least one full cycle.
Traffic level. Representative of production, not peak. A typical rule of thumb is 50–80% of a stress-test ceiling, or whatever matches your busiest-typical-hour traffic. The point is steady sustained pressure, not a maximum.
Scripting. Reuse your existing load test scripts. The scenarios that matter for endurance are usually the same ones you already run for load tests — the user flows that keep the system warm — so a separate script library isn’t typically required.
What to monitor. Throughout the test, watch:
- Memory — heap usage, RSS, or equivalent. Trend matters more than the number.
- CPU — should be roughly flat once the system is warm.
- Open file descriptors, sockets, threads — these shouldn’t climb over time.
- Connection pool utilization — database, HTTP, cache. Leaks here cause the most dramatic failures.
- Disk space and log file sizes — especially if you don’t rotate logs aggressively.
- Queue depths — for any background work.
- Request latency and error rate — are they flat, or drifting up?
Loadster handles the traffic side of this; your APM or infrastructure monitoring handles the target-side metrics. A solid endurance test usually has both windows open side by side.
Common Issues Found During Endurance Tests
The issues endurance tests catch tend to fall into three buckets:
- Leaks. Memory, file descriptors, database connections, thread pools. These usually show as a metric that climbs steadily without stabilizing.
- Accumulation. Logs, caches, queues, temp files. Same shape — climbing — but the fix is usually rotation, eviction, or rate-limiting rather than freeing resources.
- Scheduled job interference. Nightly database vacuums, backups, or report jobs that coexist fine at 3 a.m. but collide with sustained traffic when they run during an endurance test.
In most cases, the fix is cheap once the issue is identified — a missing close(), a misconfigured cache size, an
unrotated log. The value of endurance testing is in surfacing the issue before it takes down production, not in
fixing anything exotic.
When to Run Endurance Tests
Endurance tests take longer than other performance tests, so most teams don’t run them on every build. Reasonable triggers:
- Before a major release, especially one that changes long-running behavior (new caching, new background jobs, changes to connection pooling, new memory-intensive features).
- Before a seasonal peak or extended campaign where traffic will stay elevated longer than usual.
- After a production incident that hinted at a slow-climb issue, to reproduce and verify the fix.
- On a regular cadence, like monthly or per-release, for critical production systems where quiet regressions are expensive.
If you’ve never run one, the first endurance test will likely find something interesting.
Conclusion
Endurance testing is the cheapest and safest way to catch problems that grow with time. A short load test tells you the system handles the traffic; an endurance test tells you that it keeps handling the traffic. Both are useful, and they answer different questions. If your performance program only runs short tests, you’ll miss an entire class of bugs that will eventually emerge in production instead.
Frequently Asked Questions
What is endurance testing?
Endurance testing (also called soak testing) applies steady, moderate load to a system over a long period — typically hours or days — to surface slow-burn problems like memory leaks, resource exhaustion, and log file growth that don’t show up in short tests. The goal is to confirm the system stays healthy under sustained use, not just under brief spikes.
What's the difference between endurance testing and soak testing?
They’re the same thing. “Endurance test” and “soak test” are used interchangeably in performance engineering to describe a long-running test at moderate load. Some teams also call this “stability testing.” The technique is identical regardless of the name.
How long should an endurance test run?
Long enough for the slow-burn issues to actually surface. A few hours is a reasonable minimum; many teams run overnight (8–12 hours) or across a full weekend. Memory leaks typically show up within a few hours, but log rotation problems, disk-fill issues, and scheduled-job backlog may only surface across days.
What load level should an endurance test use?
Moderate, not peak. Endurance tests use a sustained load that represents typical production traffic, or a little above it. The point isn’t to push the system to its limit — that’s stress testing — but to keep steady pressure on it long enough for gradual degradation to appear.
What should I monitor during an endurance test?
Memory usage, CPU, open file descriptors, database and HTTP connection pools, disk space, log file sizes, and queue depths. Watch the trend over time, not just the instantaneous value. A metric that climbs steadily without plateauing is usually a leak.
When should I run an endurance test?
Before any release that materially changes long-running behavior — new caching, new background jobs, new database pooling, or a new memory-intensive feature. Also before a major campaign or season where traffic will stay elevated longer than usual. For critical production systems, many teams run an endurance test every release as part of performance regression testing.
Related Guides
- Load Testing Guide — primer on load testing types and when to use each.
- Load Testing vs Stress Testing — the difference between the two disciplines.
- Load Testing Best Practices — getting good results from any performance test.
- Best Load Testing Tools — comparison of open source and commercial options.