Scaling Playwright to 1000 Nodes

A case study on reducing build times from 4 hours to 10 minutes using sharding and transient Docker swarms.

The Linear Time Problem

As our test suite grew to over 5,000 end-to-end scenarios, execution time became a major blocker for the CI/CD pipeline. Running tests sequentially takes hours. Even with standard parallelization on a single large VM, we hit memory and CPU contention limits at around 50 workers.

Native Playwright Sharding

Playwright's built-in --shard flag is the secret weapon for massive scale. It allows you to split your test suite into disjoint sets that can be executed on completely different machines.

1000 nodes = 10min build!

# Running shard 1 of 10
npx playwright test --shard=1/10

# Running shard 5 of 100
npx playwright test --shard=5/100

Infrastructure Orchestration

To handle 1000 shards, we moved away from static runners to transient Docker containers managed by Kubernetes. Each shard spawns its own pod, executes the tests, uploads the traces to S3, and immediately terminates. This "serverless" approach to automation ensures we only pay for the compute we actually use.

Key Takeaways

Scaling to 1000 nodes isn't just about throwing hardware at the problem. It requires a robust architecture, centralized reporting, and deep knowledge of how Playwright handles test distribution.

The Linear Time Problem

Native Playwright Sharding

Infrastructure Orchestration

Key Takeaways

Ready to Engineer Your Future?