Parallel Tests Without Waiting
Picture this: your team is waiting 40+ minutes for CI builds to complete. You've already set up parallel test execution with multiple workers, but there's still one stubborn bottleneck ruining everything.
The culprit? A few massive RSpec test files that can't be split across workers.
While 29 of your 30 test workers finish in 5 minutes, that one unlucky worker gets stuck processing giant spec files for another 35 minutes. Your entire pipeline grinds to a halt, waiting for the slowest worker to finish.
Sounds familiar? You're not alone. This is one of the most frustrating CI bottlenecks in Ruby applications.
Let's explore how a smart solution, now available as the rspec-big-split gem, can cut your build times dramatically.
The Problem: When Parallel Execution Isn't Actually Parallel
Here's what's happening behind the scenes: your CI distributes test files evenly across workers, but it treats each file as an indivisible unit. When one worker gets assigned multiple large files that take "many minutes to run," it becomes the bottleneck that determines your total build time.
This creates a significant tail in your test suite, where the total run time is dictated by the slowest worker—not the average performance across all workers.
Initial Approaches to Tackling This Issue
When faced with this problem, several solutions might come to mind:
- Splitting test files manually - Breaking large spec files into smaller, more focused files for each test scenario
- Knapsack PRO - A powerful, paid solution for test suite parallelisation
- Circle CI's built-in test splitting - A straightforward method to alter the order of test execution
- Custom solution - Developing a bespoke approach tailored to your specific needs
It's this last path—the custom solution—that Rspec::Big::Split perfectly embodies.
Introducing the Custom Splitter: Goals and Benefits
The motivation behind creating a custom splitter is clear, with several key goals in mind:
- Split big test files: The primary aim is to break down large files that take "many minutes to run".
- Keep small files together if possible: Maintain efficiency by grouping smaller, faster-running tests.
- Control: Have granular control over what constitutes a "big" versus a "small" file.
Beyond these immediate goals, opting for a custom solution offers significant benefits:
- One-time cost: The implementation involves a one-off effort.
- No monthly payments: Avoid recurring subscription fees.
- No third-party dependencies: This significantly enhances reliability and ensures data/code privacy.
Rspec::Big::Split: Your Open-Source Custom Splitter
Rspec::Big::Split is a Ruby gem specifically designed to address this bottleneck. Its purpose is to "Split one big RSpec test file into many smaller ones for parallel execution". This gem provides the necessary tools to implement that desirable custom splitting logic. It is released under the MIT License, making it a flexible and accessible tool.
Let's look at how you can integrate Rspec::Big::Split into your workflow:
Installation:
To get started, simply add the gem to your application's Gemfile by executing this command:
bundle add rspec-big-split
Mark Big Files for Splitting
Identify your large RSpec test files and add ci_split_example_group: true as metadata to their example groups. This metadata tells Rspec::Big::Split which parts of your test suite are candidates for splitting. For instance:
Rspec.describe Article, ci_split_example_group: true do
# ... your tests here
end
Generate a Test Map
Before running your tests, you need to generate a map of all your tests. This is done by running RSpec in "dry-run" mode with the Rspec::Big::Split::Formatter:
bundle exec rspec --dry-run --format Rspec::Big::Split::Formatter --out tmp/rspec_splitter.json
This command creates a JSON file (tmp/rspec_splitter.json) containing the test information needed for splitting.
Define CI/CD Variables
To enable Rspec::Big::Split to distribute tests across your parallel workers, you must define two environment variables in your CI/CD configuration:
- TEST_NODE_TOTAL: The total number of test nodes/workers you have. For example, TEST_NODE_TOTAL=20.
- TEST_NODE_INDEX: The index of the current test node. Important: Your indexing should start with 1 and go up to TEST_NODE_TOTAL. (There's an optional --add-one-on-test-node-index flag if your CI starts indexing from 0 and you want to adjust it to 1).
Run Tests on Each Node
Finally, execute your tests on each CI node using the rspec-big-split command to determine which tests should run on the current node:
bundle exec rspec $(bundle exec rspec-big-split tmp/rspec_splitter.json)
(Optionally, you can also print out the tests that will run on the current node using bundle exec rspec-big-split tmp/rspec_splitter.json before the actual test run).
Flexible Splitting Options:
By default, rspec-big-split splits tests by files into workers. However, if you need more granular control, you can choose to split by individual examples using the --split-by-example option.
The End Result: Faster and More Predictable CI Builds
By implementing a solution like Rspec::Big::Split, you directly tackle the uneven distribution of test workload caused by large spec files. This leads to more balanced parallel execution, ensuring that no single worker becomes a bottleneck. The result is significantly reduced overall test suite execution time on your CI, making your development feedback loop faster and more efficient. As an example of the dramatic impact test optimisation can have, in another context related to optimising Ruby tests, test time was reduced from 450 seconds to 15 seconds.
If you're battling slow CI builds because of monolithic _spec.rb files, Rspec::Big::Split offers a robust, customisable, and cost-effective solution.