Test Doubles

Enable JavaScript to unlock Galleries, BibTeXs, and the Contact Form.

Dark Mode

Show Highlights

Read Aloud

Why test doubles exist

Imagine you push a green PR on April 28 that asserts the daily-event-day function returns True for "2026-04-28". CI is green. You sleep. The next morning — without anyone editing the code — CI turns red. The hidden collaborator was the wall clock; the test never really verified the function’s behavior, it verified that today happens to equal the hardcoded date.

That is the recurring problem test doubles exist to solve: a collaborator the test cannot control or observe makes the test flaky, slow, or unable to verify the right thing. Wall clocks, HTTP services, databases, message queues, payment gateways, email senders, random number generators — each one quietly turns a deterministic unit test into something else.

A test double is any object that stands in for a real dependency during a test. Borrowed from the film-industry stunt double, the metaphor is exact: the double looks like the real thing from the system’s perspective, but the test gets to choose what it does.

Two pieces of vocabulary from Meszaros that we use throughout this chapter:

SUT — System Under Test. The unit (function, class, or small group of collaborators) you actually want to verify.
DOC — Depended-On Component. A component the SUT calls into; replacing it with a test double is what lets the SUT be tested in isolation.

Four questions before you reach for a double

Before naming any specific kind of double, ask the four questions that decide which one fits. Every test double answers exactly one of these:

Question the test is asking	What the double provides	Typical role
“What should this collaborator return so I can drive the SUT down a specific branch?”	Control over indirect input	Stub
“Did the SUT actually call this collaborator, and with what arguments?”	Observation of indirect output	Spy
“Does the SUT follow the expected collaboration protocol — call this once, with these args, before that one?”	Verification of interaction	Mock Object
“I need a working-but-cheap replacement that behaves like the real collaborator across many calls.”	Substitution with simpler behavior	Fake

The first three are about what direction of data the test cares about — values flowing into the SUT (indirect input) versus actions flowing out of it (indirect output). Substitution (the fourth) is about how much state the test needs the collaborator to manage. Get the question right and the kind of double falls out.

The taxonomy — five named doubles, one umbrella

Gerard Meszaros’s canonical taxonomy in xUnit Test Patterns (2007) (Meszaros 2007) identifies five kinds of test double — Dummy, Fake, Stub, Spy, and Mock. The umbrella name Test Double covers all five; the five names below it are roles, each tagged for a different test-design problem.

Detailed description

UML class diagram with 6 classes (TestDouble, Dummy, Stub, Fake, Spy, MockObject). Dummy extends TestDouble. Stub extends TestDouble. Fake extends TestDouble. Spy extends TestDouble. MockObject extends TestDouble.

Classes

Dummy — Attributes: fills a parameter; never actually used — Operations: none declared
Stub — Attributes: controls indirect inputs; feeds canned values INTO the SUT — Operations: none declared
Fake — Attributes: working implementation; with shortcuts unsuitable; for production — Operations: none declared
Spy — Attributes: records indirect outputs; verify AFTER execution — Operations: none declared
MockObject — Attributes: expects indirect outputs; verify DURING execution — Operations: none declared

Relationships

Dummy extends TestDouble
Stub extends TestDouble
Fake extends TestDouble
Spy extends TestDouble
MockObject extends TestDouble

The three with the most subtle distinctions are Stub, Spy, and Mock — covered in depth below. Dummies (objects passed but never used — a parameter required by a signature you don’t care about) and Fakes (working implementations with shortcuts unsuitable for production — for example, an in-memory database) are simpler but worth knowing exist. The three core kinds differ along two axes: which direction of data flow they control (indirect input vs. indirect output) and when verification happens (after the fact vs. during execution).

Keep this map in mind as you read: each section below deepens one of the three branches.

The verbatim teaching sentence

Before any code, lock in one sentence — it solves the single biggest source of confusion in Python testing:

Mock is a tool class; stub, spy, and mock are test-design roles. Same in Python, JavaScript, and Java — the role is what matters; the class name is just syntax.

Python’s unittest.mock.Mock is a configurable object that can play any of the three roles depending on what the test does with it. Setting mock.return_value = ... makes it a stub. Asserting mock.method.assert_called_once_with(...) makes it a spy. Conflating the class name “Mock” with the Meszaros role “Mock Object” is the most common reason people say “I added a mock” when they really mean “I added a stub.” The role is determined by what the test does with the object, not by which class instantiated it.

Test Stub

A Test Stub (Meszaros 2007) is an object that replaces a real component so the test can control the indirect inputs of the SUT. Indirect inputs are the values returned to the SUT by another component whose services it uses — return values, output parameters, exceptions. By replacing the real DOC with a Test Stub, the test establishes a control point that forces the SUT down specific execution paths it might not otherwise take (the rare error branch, the timeout path, the empty-result case, the unreachable edge condition). During the test setup phase, the stub is configured to respond to calls from the SUT with highly specific values.

A hand-rolled stub in Python is just a class with a hard-coded method:

class FrozenClock:
    """A stub clock — always returns the datetime it was constructed with."""
    def __init__(self, fixed_dt):
        self._fixed_dt = fixed_dt

    def now(self):
        return self._fixed_dt

The framework-generated equivalent is one line:

clock = Mock()
clock.now.return_value = datetime(2026, 4, 28, 12, 0)

Same role; less typing. While Test Stubs perfectly address the injection of inputs, they inherently ignore the indirect outputs of the SUT. To observe outputs, we must shift to a different class of test double.

Test Spy

When the behavior of the SUT includes actions that cannot be observed through its public interface — sending a message on a network channel, writing a record to a database, dispatching a push notification — we refer to these actions as indirect outputs. To verify these indirect outputs, we use a Test Spy (Meszaros 2007).

A Test Spy is a more capable version of a Test Stub that serves as an observation point by quietly recording all method calls made to it by the SUT during execution. Like a Test Stub, a Test Spy may need to provide values back to the SUT to allow execution to continue, but its defining characteristic is its ability to capture the SUT’s indirect outputs and save them for later verification by the test.

The use of a Test Spy facilitates a technique called procedural behavior verification. The testing lifecycle using a spy looks like this:

The test installs the Test Spy in place of the DOC.
The SUT is exercised.
The test retrieves the recorded information from the Test Spy (often via a Retrieval Interface).
The test uses standard assertion methods to compare the actual values passed to the spy against the expected values.

A software engineer should reach for a Test Spy when the assertions should remain clearly visible within the test method itself, or when they cannot predict the values of all attributes of the SUT’s interactions ahead of time. Because a Test Spy does not fail the test at the first deviation from expected behavior, it allows tests to gather more execution data and include highly detailed diagnostic information in assertion failure messages.

The interesting test-design move with a spy is rarely writing it (a class with a list and an append call) — it is how much of each call to pin. Pinning too little produces a Liar test that always passes; pinning too much produces a brittle test that breaks under harmless refactors. The Goldilocks assertion pins exactly what the spec mandates, no more and no less.

Mock Object

A Mock Object (Meszaros 2007), like a Test Spy, acts as an observation point to verify the indirect outputs of the SUT. However, a Mock Object operates using a fundamentally different paradigm known as expected behavior specification. Instead of waiting until after the SUT executes to verify the outputs procedurally, a Mock Object is configured before the SUT is exercised with the exact method calls and arguments it should expect to receive. The Mock Object essentially acts as an active verification engine during the execution phase. As the SUT executes and calls the Mock Object, the mock dynamically compares the actual arguments received against its programmed expectations. If an unexpected call occurs, or if the arguments do not match, the Mock Object fails the test immediately.

Fowler’s distinction between classical and mockist testing styles (Fowler 2007) maps onto this difference: classical tests prefer real collaborators and observe the SUT’s state; mockist tests specify the interactions between the SUT and its collaborators up front. Neither style is universally correct. Mocks fit best when the interaction is the contract — “the payment gateway must be charged exactly once for the order total” — and worst when they merely freeze the implementation’s current call shape.

Fake Object

A Fake Object (Meszaros 2007) is a working implementation of the same interface as the real DOC, but with shortcuts that make it unsuitable for production — no durability, no concurrency safety, no transactional guarantees, no remote calls. The canonical example is an in-memory repository standing in for a database-backed one:

class FakeUserRepository:
    """In-memory implementation of UserRepository — for tests only."""
    def __init__(self):
        self._users = {}

    def save(self, user):
        self._users[user.id] = user

    def find_by_id(self, user_id):
        return self._users.get(user_id)

A Fake earns its keep when the SUT round-trips with the collaborator across multiple calls — write a user, look it up, update its email, look it up again. Modeling that sequence with stubs would require coordinating multiple return_value mappings, each one fragile and easy to misalign. The Fake just stores and retrieves; the test reads as if it were running against the real repository.

The Fake’s recurring risk — drift, and the contract test that defends against it

Every Fake is a promise that it behaves enough like the real collaborator for the SUT’s tests to be meaningful. That promise can silently break the moment the real collaborator’s behavior diverges (a new uniqueness constraint, a different error class, a transactional rollback the Fake doesn’t simulate). The defense is a contract test — a single shared test that both the Fake and the real implementation must pass:

def user_repo_contract(repo):
    """Behavioral contract that BOTH FakeUserRepository and the real
    Postgres-backed UserRepository must satisfy."""
    user = User(id="u1", email="ada@example.com")
    repo.save(user)
    assert repo.find_by_id("u1") == user
    assert repo.find_by_id("does-not-exist") is None

Run that test against the Fake (fast, every commit) and against the real repository (slower, on a schedule). When they diverge, you find out immediately.

Dummy Object

A Dummy Object (Meszaros 2007) is the lightest double — it fills a parameter slot but is never actually used by the SUT. Reach for it when the SUT’s signature requires a collaborator the particular test doesn’t care about (the SUT takes a logger but this test ignores logging; the constructor needs a notifier but this code path doesn’t notify). The minimum-viable-double rule says: start with a Dummy and escalate only when the test needs the double to do something.

When NOT to use a double

A test double is a tool you reach for when a real collaborator would make the test flaky, slow, or unable to verify the right thing. It is not a default. It is not a sign of professionalism. It is not a coverage strategy. The right number of doubles for many tests is zero.

A useful heuristic from (Fowler 2007) and the empirical mocking literature: use a real collaborator when it is fast, deterministic, locally available, and free of dangerous side effects. Reach for a double when the collaboration is awkward — slow, nondeterministic, expensive, dangerous, or unable to be put into the state the test needs.

Three antipatterns to recognize on sight:

Antipattern	Symptom	Why it happens	Fix
Over-mocking	Every internal helper is mocked; the test asserts only on the mocks.	“Isolation feels safe; more mocks = more tested.”	Mock at the architectural boundary (HTTP, DB, clock), not at every internal function.
Mocking what you don’t own	A third-party library’s API is mocked directly, scattered across many tests.	The library is brittle and the team doesn’t want to wait for real responses.	Wrap the third-party in your own thin Adapter class; double the Adapter. The third-party’s internals stay invisible to your tests.
Coverage chasing	Every line of the SUT runs in some test, but assertions are weak or mocked-on-mocks.	Coverage is misread as a quality signal.	Stronger oracles, real collaborators where possible, fewer tests that test more meaningfully. Coverage is not correctness.

A small decision rubric

If the SUT…	Reach for…
…is a pure function — same input always yields same output, no collaborators	No double
…calls a clock, a remote service, or any non-deterministic source	Stub
…needs to verify a fire-and-forget outbound call (e.g., `notifier.send(...)`)	Spy or Mock
…needs to round-trip with a stateful collaborator (write then read)	Fake
…calls a third-party library you don’t own	Adapter wrapper → double the adapter
…is just simple math, string, or list manipulation	No double (don’t make work)
…already uses a fake or adapter, and you need confidence it still matches the real collaborator	Contract / integration check against the real boundary

Test-double smells

Real codebases are full of tests that look productive but verify almost nothing. Naming the smells trains the eye to spot them in code review.

Smell	What it looks like	Why it hurts
The Mockery	A test with so many mocks that nearly every line of the SUT is replaced.	The test verifies orchestration, not behavior; pure refactors break it.
Counting on Spies	The test pins `assert_called_once_with(...)` after every internal call.	Couples the test to the SUT’s call sequence; refactoring becomes brittle.
Unnecessary Stubs	Stubs configured for calls the SUT does not make in this path.	Adds maintenance burden; misleads readers about what the test exercises.
Mystery Guest	The test reads from an external file, fixture, or database not visible in the test method.	Reader cannot tell from the test alone what was set up or why.
Eager Test	A single test exercises many behaviors of the SUT at once.	When it fails, the failure does not localize which behavior broke.
Assertion Roulette	Many unexplained assertions in one test, none with messages.	A failure tells you the test broke; figuring out which assertion requires reading the code.

What a doubled test does not prove

Every test double trades reality for control. That is usually the right trade in a unit test, but it leaves a gap: a stub might not match the real API, a fake might drift from the real database, an adapter mock cannot prove the third-party service still accepts your actual request. A professional test plan says all three halves out loud:

This unit test proves: the SUT behaves correctly given a controlled collaborator.
This unit test does not prove: the real collaborator still speaks the same contract.
Complementary check: a contract test, sandbox integration test, or adapter-level test that exercises the real boundary at lower frequency.

Apply what you’ve read

Build the skill in the Test Doubles Tutorial, which takes you through six steps in a Python sandbox: introducing a seam, hand-rolling a stub, hand-rolling a spy, recognizing the same roles inside unittest.mock, navigating the “patch where the SUT looks up the name” pitfall, and deciding when not to use a double at all.

Practice

Test Doubles

Retrieval practice for the test-double taxonomy — SUT, DOC, indirect inputs vs outputs, the five kinds of double (Dummy, Fake, Stub, Spy, Mock), procedural vs expected-behavior verification, and how to choose. Cards span Remember through Evaluate.

Difficulty: Basic

Define SUT and DOC, and why the distinction matters.

Difficulty: Basic

Difference between an indirect input to the SUT and an indirect output from the SUT? One example each.

Difficulty: Intermediate

Name all five kinds of test double in the standard taxonomy and what each one is for.

Difficulty: Intermediate

You need to drive the SUT down its error-handling branch — the one where the payment gateway returns Status.TIMEOUT. Which double, and why?

Difficulty: Intermediate

Compare Spy and Mock: when does failure occur, and what style of test does each produce?

Difficulty: Advanced

What is a Fake? Canonical example? How is it different from a Stub?

Difficulty: Advanced

A junior engineer asserts mock.method.assert_called_once_with(...) after every line of the SUT’s body. Diagnose.

Difficulty: Advanced

Your SUT calls notifier.send(channel, body) four times in a single workflow, in a data-dependent order. You want to assert each call had the right channel but can’t predict the order. Which double fits best?

Difficulty: Advanced

Pick a double for: ‘My SUT’s constructor requires a loader, but this behavior never calls loader.load_config().’

Difficulty: Advanced

Sketch the procedural verification lifecycle of a Spy-based test in four steps.

Difficulty: Advanced

A controller test does this:

user_repo = Mock()
user_repo.get.return_value = User(id=1)
email_service = Mock()
controller = Controller(user_repo, email_service)
controller.signup(email='a@b.c')
email_service.send.assert_called_once_with('a@b.c', subject='Welcome')

Classify each Mock() instance by the role it actually plays.

Difficulty: Advanced

Module app/report.py does from services.users import fetch_user and then calls fetch_user(user_id). Which patch() target intercepts the call from a test of app.report — "services.users.fetch_user" or "app.report.fetch_user"? Why?

Difficulty: Advanced

Your SUT catches ConnectionError and returns a fallback value. Sketch the Mock() configuration that drives the SUT down that branch deterministically. Why does setting return_value not work?

Set side_effect to the exception class:

api.fetch.side_effect = ConnectionError

side_effect = <exception class> makes the mock raise the exception on call — driving the SUT into its except branch. return_value = ConnectionError() would return an instance of the exception, which the SUT receives as a value rather than as a raise.

side_effect is Mock’s lever for behavior beyond returning a canned value: set it to an exception class to raise; set it to an iterable to return different values across consecutive calls; set it to a callable to compute the return value from the arguments. return_value and side_effect answer different test-design needs and are not interchangeable.

Difficulty: Advanced

A team’s tests directly mock requests.get in twelve different modules. A requests version upgrade just broke 30 of those tests. What’s the structural fix — and what’s the principle?

Difficulty: Expert

You use a FakeUserRepository (in-memory dict) for fast unit tests. The unit tests pass. Production then fails because the real PostgresUserRepository raises IntegrityError on a duplicate email, while the Fake had been raising ValueError. How do you keep the Fake’s speed and defend against this drift?

Difficulty: Advanced

Diagnose the test smell:

def test_processes_orders():
    loader = Mock()
    loader.load.return_value = open("/tmp/test_orders.csv").read()
    processor = OrderProcessor(loader)
    processor.process_all()
    assert processor.summary == "5 orders, $1240 total"

Test Doubles Quiz

Apply, Analyze, and Evaluate-level questions on the test-double taxonomy — pick the right double for a scenario, recognize Spy vs Mock by failure timing, and diagnose over-mocking that tests the mock instead of the SUT.

Difficulty: Intermediate

You are testing an OrderProcessor whose process() method calls paymentGateway.charge(amount) and then returns the gateway’s response. For your test, you want to force process() down the “gateway returned Status.DECLINED” branch. Which test double is the right choice?

A Dummy is passed but never used. Here the SUT does use the gateway’s return value to choose its branch — a Dummy gives the SUT no value to react to, so the declined path is never exercised.

Pre-programming the call as an expectation conflates two concerns. The behavior under test is what the SUT does with a declined response, not whether it called the gateway. Mocks fit best when the interaction itself is the contract.

A Spy records calls for after-the-fact checking, but the test needs to control the value the SUT receives — not observe what it sent. Spies observe; Stubs control.

Correct Answer:

Difficulty: Intermediate

A test uses a double for notifier. The SUT may call notifier.send(...) zero or more times depending on user input. The test wants to assert that when the user is a premium member, the notifier received exactly one call with channel="sms". Which double fits best?

A Stub controls indirect inputs. The behavior here is what the SUT sends — an indirect output — so a Stub gives you no way to verify the call pattern that the test cares about.

A Dummy fits when the test ignores the DOC’s role entirely. Here the test cares precisely about whether the SUT called the notifier with the right channel — that interaction is the contract under test.

Pre-programming every possible call sequence would tightly couple the test to the SUT’s internal flow. A Mock fits when the contract specifies a precise call sequence; for “exactly one matching call”, a Spy’s after-the-fact assertion is simpler and less brittle.

Correct Answer:

Difficulty: Advanced

A team’s controller test sets up a Mock() for user_repo with user_repo.get.return_value = User(id=1) and then asserts on the controller’s HTTP response — nothing else. The teammate insists this is a Mock; you disagree. What is the most precise classification?

The class name from the mocking library doesn’t determine the role the object plays. unittest.mock.Mock is one library construct used to implement many of these roles — pick the name that matches the behavior in this test.

A Dummy is passed but never used. Here the controller uses the return value to do its work — the double is doing real work in the SUT’s logic, so it is not a Dummy.

Spies do record calls, but a Spy is identified by the test actually inspecting those recordings. This test never asserts on user_repo calls, so it isn’t using the recording capability at all.

Correct Answer:

Difficulty: Advanced

You are deciding between a Spy and a Mock to verify a notification interaction. Which factor most strongly favors a Spy?

Failing at the exact call site is a Mock property — Mocks compare during execution. Spies fail later, at assertion time. If pinpoint failure location matters most, a Mock fits better than a Spy.

A short, fixed call sequence is a textbook fit for a Mock with strict expectations — the contract is precise and the cost of strictness is low. Spies pay off when the call shape is harder to specify up front.

Pushing expectations into setup is a stylistic feature of Mocks. Spies move assertions into the test body, which is the opposite trade-off — visible and flexible, not terse and strict.

Correct Answer:

Difficulty: Advanced

A teammate writes this test for a checkout controller:

def test_checkout_success():
    repo = Mock()
    gateway = Mock()
    emailer = Mock()
    repo.find_cart.return_value = Cart(items=[...])
    gateway.charge.return_value = ChargeResult(ok=True)
    controller = Controller(repo, gateway, emailer)
    controller.checkout(cart_id=42, token="tok_ok")
    repo.find_cart.assert_called_once_with(42)
    gateway.charge.assert_called_once_with(amount=2000, token="tok_ok")
    emailer.send.assert_called_once_with(template="receipt")
    repo.mark_paid.assert_called_once_with(42)

What’s the strongest critique?

Verifying every collaboration is exactly what makes the test brittle. The test is now a copy of the controller’s body translated into assertions — it locks down the implementation rather than the behavior.

Real implementations for everything would turn this into an end-to-end test, a different artifact with different tradeoffs. The structural problem here — over-specifying the controller’s collaboration sequence — would still be present with real DOCs.

Sharing setup would tidy the syntax but would not address the core problem: the test asserts on how the controller works rather than what the controller guarantees.

Correct Answer:

Difficulty: Advanced

You’re testing a ReportService that reads from a UserRepository (heavy I/O). Which of the following are good reasons to write a Fake InMemoryUserRepository instead of using a Stub or Mock for each test? (Select all that apply.)

Omitted: deduplicating shared data-setup is one of the biggest payoffs of writing a Fake. If you’ve configured the same five return_values across a dozen tests, the Fake is already cheaper than the Stub-heavy alternative.

Omitted: write-then-read sequences are particularly painful to model with Stubs because each call has to map to the right canned response. A Fake just stores and retrieves; the test reads as if against a real repository.

A Fake is by definition unsuitable for production — it takes shortcuts (no durability, no concurrency safety, no transactional guarantees) that make it light and fast for tests. If you intend to ship it, it’s an alternative implementation, not a Fake.

Omitted: query-realism is the strongest case for a Fake over a Stub. A Stub returning canned rows can mask filtering, joining, or sorting bugs that a working in-memory implementation would reveal.

Correct Answers:

Difficulty: Advanced

A test does this:

gateway = Spy()
controller.checkout(...)
assert len(gateway.recorded_calls) == 1
assert gateway.recorded_calls[0].method == "charge"
assert gateway.recorded_calls[0].amount == 2000

The team is migrating to a Mock-based assertion library and wants to express the same contract. Which Mock-style assertion captures the same behavior without strengthening or weakening it?

charge.assert_called() is much weaker — it permits any number of charge calls and says nothing about the amount. The Spy assertions pinned the count to 1, the method to charge, and the amount to 2000; this Mock call loses two of those constraints.

assert_called_with() only checks the most recent call. The Spy test required exactly one call total; allowing multiple charge calls where only the last matches would weaken the contract substantively.

assert_not_called() flips the assertion — the original Spy code requires that charge was called once with the right amount. This would invert the test, not preserve it.

Correct Answer:

Difficulty: Advanced

Your SUT takes a Logger parameter, but this behavior does not log anything. The test cares only about the SUT’s return value. What is the lightest double that lets the test work?

assert_not_called() would actually constrain the SUT — it would fail if the SUT logged anything, which the test explicitly doesn’t care about. That tightens the contract beyond what the test wants to assert.

Recording calls ‘just in case’ adds coupling and noise the test doesn’t need today. Add the Spy when a future test actually asserts on logs; until then, the lightest double is best.

A Fake list-logger is overkill for a test that ignores logs entirely. Building real behavior earns its keep only when many tests need it — premature investment costs more than it saves.

Correct Answer:

Difficulty: Advanced

Module app/report.py does from services.users import fetch_user, and the function display_name(user_id) then calls fetch_user(user_id) directly. A test does:

with patch("services.users.fetch_user", return_value={"name": "Ada"}):
    assert display_name("u1") == "ADA"

The test fails because the assertion saw the real fetch_user run, not the patched one. What is wrong?

autospec enforces the patched callable’s signature on the mock — it does not affect whether the patch intercepts the call. The patch is being applied; it’s just being applied in the wrong namespace.

from ... import is perfectly patchable — the rule is just that you must target the importing module’s namespace. Reshaping the SUT works but is far heavier than the one-line patch-target fix.

patch() works on any importable name — module-level functions, class methods, attributes, dict entries. monkeypatch is the pytest-fixture equivalent and follows the same where-to-patch rule.

Correct Answer:

Difficulty: Advanced

A team imports requests directly in twelve different modules and uses patch("requests.get") (or similar) in each of their tests. The patches are fragile, the tests are slow, and a requests version bump recently broke 30 tests because the library’s exception class names changed. Which refactor most directly addresses the structural problem?

spec= would tighten the signature check but the underlying coupling stays — twelve test files still depend on the shape of an API the team doesn’t own. The next requests upgrade still ripples through all twelve.

Pinning versions postpones the problem until the next security patch forces an upgrade. The structural issue is that the team’s tests are coupled to a third-party’s contract; pinning doesn’t decouple them.

Centralizing the patching reduces duplication but every test still names requests.get. The third-party API still leaks into the test suite. Centralization without an Adapter is a tidier version of the same coupling.

Correct Answer:

Difficulty: Expert

A team uses FakeUserRepository (in-memory dict) for fast unit tests of UserService. The unit tests pass on every commit. In production, a bug surfaces: the real PostgresUserRepository raises IntegrityError on duplicate emails, but UserService had been written assuming a ValueError, which the Fake was happily raising. What is the most direct defense against this class of bug without abandoning the Fake?

Abandoning the Fake forfeits its main benefit (fast, deterministic unit tests). The structural issue is that the Fake and the real repository drifted; the fix is to detect drift, not to remove the Fake.

autospec enforces the method signature, not the behavioral contract. Two implementations can share the same signature and still disagree on which exception class they raise — that’s the exact bug this team hit.

Unit tests catch design issues fast; abandoning them in favor of integration-only coverage trades one signal for another rather than fixing the gap. A small contract test is the proportionate defense, not a full coverage strategy swap.

Correct Answer:

Difficulty: Advanced

Your SUT catches ConnectionError from a weather API and returns a fallback value. You want a unit test that drives the SUT down the error-handling branch deterministically — without waiting for the real network to fail. Which configuration on a Mock() weather client gets you there?

return_value = ConnectionError() makes the mock return the exception object as a value — the SUT receives an exception instance as the function’s result. It does not raise. The SUT’s except branch never fires.

There is no assert_raises method on Mock. The pattern you may be thinking of is pytest.raises(...) in the test body, but that’s an assertion about the SUT’s behavior, not a configuration of the mock.

Patching low-level socket exceptions is a long way around for what side_effect does in one line. It is also fragile: real network code raises many exception classes, and emulating the right one at the socket level is harder than telling the mock to raise the class the SUT already catches.

Correct Answer:

Difficulty: Advanced

A teammate’s test reads:

def test_processes_orders():
    loader = Mock()
    loader.load.return_value = open("/tmp/test_orders.csv").read()
    processor = OrderProcessor(loader)
    processor.process_all()
    assert processor.summary == "5 orders, $1240 total"

Which test smell is this?

Only one mock appears in the test — far from a mockery. The smell here is about where the data lives, not how many doubles were used.

The test has exactly one assertion. The smell here is about a hidden input, not unexplained outputs.

The test exercises exactly one behavior — process_all summarizing a batch of orders. The smell here is about visibility of inputs, not breadth of coverage.

Correct Answer:

Test Doubles

Why test doubles exist

Four questions before you reach for a double

The taxonomy — five named doubles, one umbrella

The verbatim teaching sentence

Test Stub

Test Spy

Mock Object

Fake Object

The Fake’s recurring risk — drift, and the contract test that defends against it

Dummy Object

When NOT to use a double

A small decision rubric

Test-double smells

What a doubled test does not prove

Apply what you’ve read

Practice

Test Doubles

Workout Complete!

Test Doubles Quiz

Workout Complete!

References