Software Testing
In our quest to construct high-quality software, testing stands as the most popular and essential quality assurance activity. While other techniques like static analysis, model checking, and code reviews are valuable, testing is often the primary pillar of industry-standard quality assurance.
Test Classifications
Regression Testing
As software evolves, we must ensure that new features don’t inadvertently break existing functionality. This is the purpose of regression testing—the repetition of previously executed test cases. In a modern agile environment, these are often automated within a Continuous Integration (CI) pipeline, running every time code is changed
Black-Box and White-Box
When we design tests, we usually adopt one of two mindsets. Black-box testing treats the system as a “black box” where the internal workings are invisible; tests are derived strictly from the requirements or specification to ensure they don’t overfit the implementation. In contrast, white-box testing requires the tester to be aware of the inner workings of the code, deriving tests directly from the implementation to ensure high code coverage.
The Testing Pyramid: Levels of Execution
A robust testing strategy requires a mix of tests at different levels of abstraction.
These levels include:
- Unit Testing: The execution of a complete class, routine, or small program in isolation.
- Component Testing: The execution of a class, package, or larger program element, often still in isolation.
- Integration Testing: The combined execution of multiple classes or packages to ensure they work correctly in collaboration.
- System Testing: The execution of the software in its final configuration, including all hardware and external software integrations.
Interactive Tutorials
Three browser-based tutorials let you practice these ideas on live code:
- Testing Foundations — assertions, equivalence partitions, boundary values, oracle strength, and testing behavior rather than implementation.
- TDD — Red-Green-Refactor with pytest, katas, and AI-assisted TDD. Builds on Testing Foundations.
- Test Doubles — stubs, spies, mocks, fakes, the
unittest.mockAPI, the “patch where the SUT looks the name up” pitfall, and when not to reach for a double. Builds on Testing Foundations and TDD.
Test Quality and Test Design
Before choosing a tool or chasing a coverage number, ask whether the tests are good evidence. The new pages in this chapter separate two questions:
- Test Quality explains how to evaluate a whole suite: oracle strength, fault-revealing power, coverage limits, mutation testing, flakiness, and maintainability.
- Writing Good Tests gives a practical recipe for individual tests: behavior-focused names, small fixtures, strong assertions, systematic input selection, deterministic execution, and TDD as a rhythm of small verified steps.
Testability
Practice
Testing Foundations
Retrieval practice for the core vocabulary of software testing — regression, black-box vs. white-box, and the testing pyramid (unit, component, integration, system). Cards span Remember through Evaluate; scenario-based wherever possible.
What is regression testing, and why does it matter in CI?
What is the difference between black-box and white-box testing?
A teammate proposes deleting all white-box tests in favor of black-box tests, saying ‘we should only test the spec’. Critique this proposal.
Name the four levels of the testing pyramid from smallest to largest.
A team has 500 unit tests and 0 integration or system tests. They report production bugs where ‘all the units passed but they didn’t work together’. Diagnose and fix.
Translate into the pyramid: ‘A test starts the full web server, opens a real browser, logs in, navigates to checkout, and clicks Buy.’ Which level, and what does it cost/buy you?
Quantify why a regression caught in CI is cheaper than the same regression caught in production.
Give a three-question heuristic for deciding which pyramid level a new test belongs at.
Testing Foundations Quiz
Apply, Analyze, and Evaluate-level questions on the core vocabulary of testing — regression, black-box vs. white-box, and choosing the right level of the testing pyramid.
A team disables their regression suite for two months ‘because it’s flaky and slow’, planning to fix it later. After two months, a major feature ships with three regressions in unrelated areas. What is the most accurate diagnosis?
You are testing a new discount(cart, customer) function. You write two tests:
Test A (black-box): assert discount(cart_with_100_dollars(), premium()) == 10_00
Test B (white-box): assert discount._tier_lookup_table["premium"] == 0.10
Which test is more likely to survive a refactoring that preserves user-visible behavior, and what does that tell you about how to choose between black-box and white-box tests?
You are about to test the behavior: ‘when a user clicks “Save” in the profile editor, their changes persist and show up on next page load.’ Which level of the testing pyramid is the natural primary home for this test?
A team’s test breakdown is: 5 unit tests, 2 integration tests, 250 system (end-to-end) tests. CI takes 90 minutes; flake rate is 12%. What test-pyramid concept is being violated, and what’s the structural fix?
A reviewer says: ‘White-box testing is just an outdated form of testing — the only modern style is black-box.’ Which of the following are valid counter-arguments? (Select all that apply.)
A team adds ‘CI must pass’ as a release gate. Within a month, the gate is bypassed for ‘urgent fixes’ every other week. A retrospective reveals that CI takes 45 minutes and fails 1 run in 8 due to flake. Which two-part fix would restore the gate’s value?