Software Testing

Enable JavaScript to unlock Galleries, BibTeXs, and the Contact Form.

Dark Mode

Show Highlights

Read Aloud

In our quest to construct high-quality software, testing stands as the most popular and essential quality assurance activity. While other techniques like static analysis, model checking, and code reviews are valuable, testing is often the primary pillar of industry-standard quality assurance.

Test Classifications

Regression Testing

As software evolves, we must ensure that new features don’t inadvertently break existing functionality. This is the purpose of regression testing—the repetition of previously executed test cases. In a modern agile environment, these are often automated within a Continuous Integration (CI) pipeline, running every time code is changed

Black-Box and White-Box

When we design tests, we usually adopt one of two mindsets. Black-box testing treats the system as a “black box” where the internal workings are invisible; tests are derived strictly from the requirements or specification to ensure they don’t overfit the implementation. In contrast, white-box testing requires the tester to be aware of the inner workings of the code, deriving tests directly from the implementation to ensure high code coverage.

The Testing Pyramid: Levels of Execution

A robust testing strategy requires a mix of tests at different levels of abstraction.

These levels include:

Unit Testing: The execution of a complete class, routine, or small program in isolation.
Component Testing: The execution of a class, package, or larger program element, often still in isolation.
Integration Testing: The combined execution of multiple classes or packages to ensure they work correctly in collaboration.
System Testing: The execution of the software in its final configuration, including all hardware and external software integrations.

Interactive Tutorials

Three browser-based tutorials let you practice these ideas on live code:

Testing Foundations — assertions, equivalence partitions, boundary values, oracle strength, and testing behavior rather than implementation.
TDD — Red-Green-Refactor with pytest, katas, and AI-assisted TDD. Builds on Testing Foundations.
Test Doubles — stubs, spies, mocks, fakes, the unittest.mock API, the “patch where the SUT looks the name up” pitfall, and when not to reach for a double. Builds on Testing Foundations and TDD.

Test Quality and Test Design

Before choosing a tool or chasing a coverage number, ask whether the tests are good evidence. The new pages in this chapter separate two questions:

Test Quality explains how to evaluate a whole suite: oracle strength, fault-revealing power, coverage limits, mutation testing, flakiness, and maintainability.
Writing Good Tests gives a practical recipe for individual tests: behavior-focused names, small fixtures, strong assertions, systematic input selection, deterministic execution, and TDD as a rhythm of small verified steps.

Testability

Practice

Testing Foundations

Retrieval practice for the core vocabulary of software testing — regression, black-box vs. white-box, and the testing pyramid (unit, component, integration, system). Cards span Remember through Evaluate; scenario-based wherever possible.

Difficulty: Intermediate

What is regression testing, and why does it matter in CI?

Difficulty: Intermediate

What is the difference between black-box and white-box testing?

Difficulty: Advanced

A teammate proposes deleting all white-box tests in favor of black-box tests, saying ‘we should only test the spec’. Critique this proposal.

Difficulty: Intermediate

Name the four levels of the testing pyramid from smallest to largest.

Difficulty: Intermediate

A team has 500 unit tests and 0 integration or system tests. They report production bugs where ‘all the units passed but they didn’t work together’. Diagnose and fix.

Difficulty: Intermediate

Translate into the pyramid: ‘A test starts the full web server, opens a real browser, logs in, navigates to checkout, and clicks Buy.’ Which level, and what does it cost/buy you?

Difficulty: Advanced

Quantify why a regression caught in CI is cheaper than the same regression caught in production.

Difficulty: Advanced

Give a three-question heuristic for deciding which pyramid level a new test belongs at.

Testing Foundations Quiz

Apply, Analyze, and Evaluate-level questions on the core vocabulary of testing — regression, black-box vs. white-box, and choosing the right level of the testing pyramid.

Difficulty: Intermediate

A team disables their regression suite for two months ‘because it’s flaky and slow’, planning to fix it later. After two months, a major feature ships with three regressions in unrelated areas. What is the most accurate diagnosis?

Three unrelated regressions surfacing right after the suite went dark is the exact pattern the suite exists to catch, not coincidental variance. The cost-of-change curve makes late discovery the expensive outcome, not a wash against the suite’s runtime.

Unit tests on the new feature cover the new feature. Regression testing’s job is the breakage outside the area being edited — module A’s change silently breaking module B.

Regression suites can’t prove every regression is caught, but in practice they catch a large fraction of cross-area breakage. “It wouldn’t have caught them” assumes the worst case to justify removing the safety net.

Correct Answer:

Difficulty: Intermediate

You are testing a new discount(cart, customer) function. You write two tests:

Test A (black-box): assert discount(cart_with_100_dollars(), premium()) == 10_00

Test B (white-box): assert discount._tier_lookup_table["premium"] == 0.10

Which test is more likely to survive a refactoring that preserves user-visible behavior, and what does that tell you about how to choose between black-box and white-box tests?

Pinning the implementation is precisely what makes Test B brittle. Renaming _tier_lookup_table, swapping it for a rule engine, or moving the lookup to config all break it while the user still sees a 10% discount — a precise signal about the wrong thing.

They look alike but couple to different things. The black-box test breaks only when premium customers stop getting their discount; the white-box one breaks on internal renames. That gap is the whole point.

The black-box test survives any refactoring that preserves “premium → 10% off $100 = $10”. Calling both equally brittle treats coupling-to-spec and coupling-to-implementation as the same risk.

Correct Answer:

Difficulty: Intermediate

You are about to test the behavior: ‘when a user clicks “Save” in the profile editor, their changes persist and show up on next page load.’ Which level of the testing pyramid is the natural primary home for this test?

Mocking the database stubs out the very thing under test — does the data actually persist? A unit test on save_profile can check input validation or business logic, but a mock cannot confirm a real round-trip to storage.

A browser test verifies this too, but at higher cost — slower, flakier, harder to debug. Integration sits at the right level: it exercises the real persistence layer without driving a browser.

Persistence is a behavior the framework participates in, not one it lets you skip verifying. Misconfigured transactions, wrong boundaries, and migration drift all produce real persistence bugs in code that uses a well-tested ORM.

Correct Answer:

Difficulty: Advanced

A team’s test breakdown is: 5 unit tests, 2 integration tests, 250 system (end-to-end) tests. CI takes 90 minutes; flake rate is 12%. What test-pyramid concept is being violated, and what’s the structural fix?

Realism is genuine, but so is the cost — slow, flaky, hard to debug. The pyramid is a budget: many cheap fast tests, few expensive slow ones, because total feedback time and total flake rate both compound.

More system tests push runtime and flake rate higher, making CI more painful. The diagnosis points the opposite way — move behavior coverage down to faster, cheaper levels.

Unit tests pin contract behavior and integration/system tests pin deployment behavior; both are needed. Deleting the unit layer removes the fastest, most diagnostic tests while leaving the slow layer untouched.

Correct Answer:

Difficulty: Advanced

A reviewer says: ‘White-box testing is just an outdated form of testing — the only modern style is black-box.’ Which of the following are valid counter-arguments? (Select all that apply.)

This is a valid counter the answer should include: white-box tests reach risks the public spec never names, such as defensive paths and edge-case branches.

Worth selecting: coverage is itself a white-box signal, showing which code the black-box suite hasn’t exercised. It doesn’t prove correctness, but it stays useful as navigation.

A valid counter to include: some failures live in implementation choices the spec is silent on (a race in a private cache), and a white-box test can target that risk directly.

Property-based testing varies inputs at the spec boundary; it does not reach private paths the spec never mentions. The two operate at different layers, so one cannot make the other obsolete.

Correct Answers:

Difficulty: Advanced

A team adds ‘CI must pass’ as a release gate. Within a month, the gate is bypassed for ‘urgent fixes’ every other week. A retrospective reveals that CI takes 45 minutes and fails 1 run in 8 due to flake. Which two-part fix would restore the gate’s value?

Removing the gate concedes the goal — preventing broken code from shipping. The right move is to remove the friction (slowness, flakiness) that made the gate impractical, not the gate itself.

A 50% pass requirement removes the gate’s predictive power. Half the failing checks are now allowed; the cost-of-change curve reasserts itself and regressions ship through the holes.

Automatic retries paper over flake without fixing it, and they teach the team that a red test means ‘rerun and hope’. They make the suite less trustworthy over time, not more.

Correct Answer:

Software Testing

Test Classifications

Regression Testing

Black-Box and White-Box

The Testing Pyramid: Levels of Execution

Interactive Tutorials

Test Quality and Test Design

Testability

Practice

Testing Foundations

Workout Complete!

Testing Foundations Quiz

Workout Complete!