Imagine you push a green PR on April 28 that asserts the daily-event-day function returns True for "2026-04-28". CI is green. You sleep. The next morning — without anyone editing the code — CI turns red. The hidden collaborator was the wall clock; the test never really verified the function’s behavior, it verified that today happens to equal the hardcoded date.
That is the recurring problem test doubles exist to solve: a collaborator the test cannot control or observe makes the test flaky, slow, or unable to verify the right thing. Wall clocks, HTTP services, databases, message queues, payment gateways, email senders, random number generators — each one quietly turns a deterministic unit test into something else.
A test double is any object that stands in for a real dependency during a test. Borrowed from the film-industry stunt double, the metaphor is exact: the double looks like the real thing from the system’s perspective, but the test gets to choose what it does.
Two pieces of vocabulary from Meszaros that we use throughout this chapter:
SUT — System Under Test. The unit (function, class, or small group of collaborators) you actually want to verify.
DOC — Depended-On Component. A component the SUT calls into; replacing it with a test double is what lets the SUT be tested in isolation.
Four questions before you reach for a double
Before naming any specific kind of double, ask the four questions that decide which one fits. Every test double answers exactly one of these:
Question the test is asking
What the double provides
Typical role
“What should this collaborator return so I can drive the SUT down a specific branch?”
Control over indirect input
Stub
“Did the SUT actually call this collaborator, and with what arguments?”
Observation of indirect output
Spy
“Does the SUT follow the expected collaboration protocol — call this once, with these args, before that one?”
Verification of interaction
Mock Object
“I need a working-but-cheap replacement that behaves like the real collaborator across many calls.”
Substitution with simpler behavior
Fake
The first three are about what direction of data the test cares about — values flowing into the SUT (indirect input) versus actions flowing out of it (indirect output). Substitution (the fourth) is about how much state the test needs the collaborator to manage. Get the question right and the kind of double falls out.
The taxonomy — five named doubles, one umbrella
Gerard Meszaros’s canonical taxonomy in xUnit Test Patterns (2007) (Meszaros 2007) identifies five kinds of test double — Dummy, Fake, Stub, Spy, and Mock. The umbrella name Test Double covers all five; the five names below it are roles, each tagged for a different test-design problem.
The three with the most subtle distinctions are Stub, Spy, and Mock — covered in depth below. Dummies (objects passed but never used — a parameter required by a signature you don’t care about) and Fakes (working implementations with shortcuts unsuitable for production — for example, an in-memory database) are simpler but worth knowing exist. The three core kinds differ along two axes: which direction of data flow they control (indirect input vs. indirect output) and when verification happens (after the fact vs. during execution).
Keep this map in mind as you read: each section below deepens one of the three branches.
The verbatim teaching sentence
Before any code, lock in one sentence — it solves the single biggest source of confusion in Python testing:
Mock is a tool class; stub, spy, and mock are test-design roles. Same in Python, JavaScript, and Java — the role is what matters; the class name is just syntax.
Python’s unittest.mock.Mock is a configurable object that can play any of the three roles depending on what the test does with it. Setting mock.return_value = ... makes it a stub. Asserting mock.method.assert_called_once_with(...) makes it a spy. Conflating the class name “Mock” with the Meszaros role “Mock Object” is the most common reason people say “I added a mock” when they really mean “I added a stub.” The role is determined by what the test does with the object, not by which class instantiated it.
Test Stub
A Test Stub(Meszaros 2007) is an object that replaces a real component so the test can control the indirect inputs of the SUT. Indirect inputs are the values returned to the SUT by another component whose services it uses — return values, output parameters, exceptions. By replacing the real DOC with a Test Stub, the test establishes a control point that forces the SUT down specific execution paths it might not otherwise take (the rare error branch, the timeout path, the empty-result case, the unreachable edge condition). During the test setup phase, the stub is configured to respond to calls from the SUT with highly specific values.
A hand-rolled stub in Python is just a class with a hard-coded method:
classFrozenClock:"""A stub clock — always returns the datetime it was constructed with."""def__init__(self,fixed_dt):self._fixed_dt=fixed_dtdefnow(self):returnself._fixed_dt
Same role; less typing. While Test Stubs perfectly address the injection of inputs, they inherently ignore the indirect outputs of the SUT. To observe outputs, we must shift to a different class of test double.
Test Spy
When the behavior of the SUT includes actions that cannot be observed through its public interface — sending a message on a network channel, writing a record to a database, dispatching a push notification — we refer to these actions as indirect outputs. To verify these indirect outputs, we use a Test Spy(Meszaros 2007).
A Test Spy is a more capable version of a Test Stub that serves as an observation point by quietly recording all method calls made to it by the SUT during execution. Like a Test Stub, a Test Spy may need to provide values back to the SUT to allow execution to continue, but its defining characteristic is its ability to capture the SUT’s indirect outputs and save them for later verification by the test.
The use of a Test Spy facilitates a technique called procedural behavior verification. The testing lifecycle using a spy looks like this:
The test installs the Test Spy in place of the DOC.
The SUT is exercised.
The test retrieves the recorded information from the Test Spy (often via a Retrieval Interface).
The test uses standard assertion methods to compare the actual values passed to the spy against the expected values.
A software engineer should reach for a Test Spy when the assertions should remain clearly visible within the test method itself, or when they cannot predict the values of all attributes of the SUT’s interactions ahead of time. Because a Test Spy does not fail the test at the first deviation from expected behavior, it allows tests to gather more execution data and include highly detailed diagnostic information in assertion failure messages.
The interesting test-design move with a spy is rarely writing it (a class with a list and an append call) — it is how much of each call to pin. Pinning too little produces a Liar test that always passes; pinning too much produces a brittle test that breaks under harmless refactors. The Goldilocks assertion pins exactly what the spec mandates, no more and no less.
Mock Object
A Mock Object(Meszaros 2007), like a Test Spy, acts as an observation point to verify the indirect outputs of the SUT. However, a Mock Object operates using a fundamentally different paradigm known as expected behavior specification. Instead of waiting until after the SUT executes to verify the outputs procedurally, a Mock Object is configured before the SUT is exercised with the exact method calls and arguments it should expect to receive. The Mock Object essentially acts as an active verification engine during the execution phase. As the SUT executes and calls the Mock Object, the mock dynamically compares the actual arguments received against its programmed expectations. If an unexpected call occurs, or if the arguments do not match, the Mock Object fails the test immediately.
Fowler’s distinction between classical and mockist testing styles (Fowler 2007) maps onto this difference: classical tests prefer real collaborators and observe the SUT’s state; mockist tests specify the interactions between the SUT and its collaborators up front. Neither style is universally correct. Mocks fit best when the interaction is the contract — “the payment gateway must be charged exactly once for the order total” — and worst when they merely freeze the implementation’s current call shape.
Fake Object
A Fake Object(Meszaros 2007) is a working implementation of the same interface as the real DOC, but with shortcuts that make it unsuitable for production — no durability, no concurrency safety, no transactional guarantees, no remote calls. The canonical example is an in-memory repository standing in for a database-backed one:
classFakeUserRepository:"""In-memory implementation of UserRepository — for tests only."""def__init__(self):self._users={}defsave(self,user):self._users[user.id]=userdeffind_by_id(self,user_id):returnself._users.get(user_id)
A Fake earns its keep when the SUT round-trips with the collaborator across multiple calls — write a user, look it up, update its email, look it up again. Modeling that sequence with stubs would require coordinating multiple return_value mappings, each one fragile and easy to misalign. The Fake just stores and retrieves; the test reads as if it were running against the real repository.
The Fake’s recurring risk — drift, and the contract test that defends against it
Every Fake is a promise that it behaves enough like the real collaborator for the SUT’s tests to be meaningful. That promise can silently break the moment the real collaborator’s behavior diverges (a new uniqueness constraint, a different error class, a transactional rollback the Fake doesn’t simulate). The defense is a contract test — a single shared test that both the Fake and the real implementation must pass:
defuser_repo_contract(repo):"""Behavioral contract that BOTH FakeUserRepository and the real
Postgres-backed UserRepository must satisfy."""user=User(id="u1",email="ada@example.com")repo.save(user)assertrepo.find_by_id("u1")==userassertrepo.find_by_id("does-not-exist")isNone
Run that test against the Fake (fast, every commit) and against the real repository (slower, on a schedule). When they diverge, you find out immediately.
Dummy Object
A Dummy Object(Meszaros 2007) is the lightest double — it fills a parameter slot but is never actually used by the SUT. Reach for it when the SUT’s signature requires a collaborator the particular test doesn’t care about (the SUT takes a logger but this test ignores logging; the constructor needs a notifier but this code path doesn’t notify). The minimum-viable-double rule says: start with a Dummy and escalate only when the test needs the double to do something.
When NOT to use a double
A test double is a tool you reach for when a real collaborator would make the test flaky, slow, or unable to verify the right thing. It is not a default. It is not a sign of professionalism. It is not a coverage strategy. The right number of doubles for many tests is zero.
A useful heuristic from (Fowler 2007) and the empirical mocking literature: use a real collaborator when it is fast, deterministic, locally available, and free of dangerous side effects. Reach for a double when the collaboration is awkward — slow, nondeterministic, expensive, dangerous, or unable to be put into the state the test needs.
Three antipatterns to recognize on sight:
Antipattern
Symptom
Why it happens
Fix
Over-mocking
Every internal helper is mocked; the test asserts only on the mocks.
“Isolation feels safe; more mocks = more tested.”
Mock at the architectural boundary (HTTP, DB, clock), not at every internal function.
Mocking what you don’t own
A third-party library’s API is mocked directly, scattered across many tests.
The library is brittle and the team doesn’t want to wait for real responses.
Wrap the third-party in your own thin Adapter class; double the Adapter. The third-party’s internals stay invisible to your tests.
Coverage chasing
Every line of the SUT runs in some test, but assertions are weak or mocked-on-mocks.
Coverage is misread as a quality signal.
Stronger oracles, real collaborators where possible, fewer tests that test more meaningfully. Coverage is not correctness.
A small decision rubric
If the SUT…
Reach for…
…is a pure function — same input always yields same output, no collaborators
No double
…calls a clock, a remote service, or any non-deterministic source
Stub
…needs to verify a fire-and-forget outbound call (e.g., notifier.send(...))
Spy or Mock
…needs to round-trip with a stateful collaborator (write then read)
Fake
…calls a third-party library you don’t own
Adapter wrapper → double the adapter
…is just simple math, string, or list manipulation
No double (don’t make work)
…already uses a fake or adapter, and you need confidence it still matches the real collaborator
Contract / integration check against the real boundary
Test-double smells
Real codebases are full of tests that look productive but verify almost nothing. Naming the smells trains the eye to spot them in code review.
Smell
What it looks like
Why it hurts
The Mockery
A test with so many mocks that nearly every line of the SUT is replaced.
The test verifies orchestration, not behavior; pure refactors break it.
Counting on Spies
The test pins assert_called_once_with(...) after every internal call.
Couples the test to the SUT’s call sequence; refactoring becomes brittle.
Unnecessary Stubs
Stubs configured for calls the SUT does not make in this path.
Adds maintenance burden; misleads readers about what the test exercises.
Mystery Guest
The test reads from an external file, fixture, or database not visible in the test method.
Reader cannot tell from the test alone what was set up or why.
Eager Test
A single test exercises many behaviors of the SUT at once.
When it fails, the failure does not localize which behavior broke.
Assertion Roulette
Many unexplained assertions in one test, none with messages.
A failure tells you the test broke; figuring out which assertion requires reading the code.
What a doubled test does not prove
Every test double trades reality for control. That is usually the right trade in a unit test, but it leaves a gap: a stub might not match the real API, a fake might drift from the real database, an adapter mock cannot prove the third-party service still accepts your actual request. A professional test plan says all three halves out loud:
This unit test proves: the SUT behaves correctly given a controlled collaborator.
This unit test does not prove: the real collaborator still speaks the same contract.
Complementary check: a contract test, sandbox integration test, or adapter-level test that exercises the real boundary at lower frequency.
Apply what you’ve read
Build the skill in the Test Doubles Tutorial, which takes you through six steps in a Python sandbox: introducing a seam, hand-rolling a stub, hand-rolling a spy, recognizing the same roles inside unittest.mock, navigating the “patch where the SUT looks up the name” pitfall, and deciding when not to use a double at all.
Practice
Test Doubles
Retrieval practice for the test-double taxonomy — SUT, DOC, indirect inputs vs outputs, the five kinds of double (Dummy, Fake, Stub, Spy, Mock), procedural vs expected-behavior verification, and how to choose. Cards span Remember through Evaluate.
Difficulty:Basic
Define SUT and DOC, and why the distinction matters.
SUT — System Under Test, the unit you want to verify. DOC — Depended-On Component, something the SUT calls into. Replacing a DOC with a double is what lets the SUT be tested in isolation.
When you reach for a mock or stub, naming the SUT and the DOC keeps the test honest: you are checking the SUT’s behavior, and you are controlling or observing the DOC’s role in it. Confusion between the two is the root of many over-mocked, brittle suites.
Difficulty:Basic
Difference between an indirect input to the SUT and an indirect output from the SUT? One example each.
Indirect input — a value the SUT receives from a DOC (return, exception). Example: DB query result. Indirect output — an effect the SUT produces through a DOC. Example: SMS sent.
The choice of test double follows from which direction matters: control indirect inputs with a Stub; observe indirect outputs with a Spy or Mock. Tests that try to do both with one double are often the ones that feel tangled — separate the concerns and the test usually clarifies.
Difficulty:Basic
Name all five kinds of test double in the standard taxonomy and what each one is for.
Dummy — fills a parameter, never used. Fake — working implementation with shortcuts (in-memory DB). Stub — returns canned values. Spy — records calls for after-the-fact assertion. Mock — pre-programmed expectations, fails during execution.
The five live on two axes: which direction of data flow they control (input vs output) and when verification happens (after vs during). Knowing the full taxonomy keeps you from reaching for a Mock when a Stub or Spy is closer to what you actually need.
Difficulty:Intermediate
You need to drive the SUT down its error-handling branch — the one where the payment gateway returns Status.TIMEOUT. Which double, and why?
A Stub. You need to control what the SUT receives from the gateway (indirect input) to force the path. You don’t need to observe what the SUT sent.
Stubs shine for exercising paths that are hard to trigger with real DOCs — error responses, slow paths, rare states. If you also need to verify what message the SUT sent in response to the timeout, you would add a Spy or Mock — but the input control always belongs to a Stub.
Difficulty:Intermediate
Compare Spy and Mock: when does failure occur, and what style of test does each produce?
Spy records calls quietly; test asserts on the recording after the SUT runs (procedural verification). Mock is pre-programmed with expectations; fails during the SUT’s execution if a call doesn’t match (expected behavior specification).
Spy-based tests put assertions in the test method, so the reader sees what is verified next to the act step; Mock-based tests push expectations into setup. Spies are friendlier when you can’t predict all attributes of the interaction up front; mocks fail faster, at the call site, when you can specify the contract precisely.
Difficulty:Intermediate
What is a Fake? Canonical example? How is it different from a Stub?
A Fake is a working alternative implementation with shortcuts unsuitable for production (e.g. in-memory DB satisfying the real interface). A Stub returns canned values for specific calls; a Fake actually implements the behavior.
Fakes are ideal when you want realistic behavior at high speed — write a row, read it back, query by index — without standing up the real dependency. They cost more to build but pay back across many tests. Stubs are cheap and case-specific; Fakes are richer and scenario-general.
Difficulty:Advanced
A junior engineer asserts mock.method.assert_called_once_with(...) after every line of the SUT’s body. Diagnose.
The test has crossed from checking behavior to encoding the implementation. Any refactor that changes how the SUT calls its collaborators breaks the test — even when user-visible behavior is preserved. The test is testing the mock, not the system.
This is the most common Mock anti-pattern. Interaction checks are useful when the interaction is the contract (‘exactly one receipt email after payment succeeds’) and harmful when they merely freeze the current implementation’s wiring. The remedy is usually to assert on the SUT’s outputs or persisted state instead, reserving interaction assertions for the cases where collaboration is the behavior.
Difficulty:Advanced
Your SUT calls notifier.send(channel, body) four times in a single workflow, in a data-dependent order. You want to assert each call had the right channel but can’t predict the order. Which double fits best?
A Spy. Let the SUT run, retrieve the recorded calls, sort or group them, and assert each. A Mock with strict-order expectations would fail on the first reorder; a Spy collects everything for flexible after-the-fact assertion.
Procedural verification with a Spy is well suited when you cannot predict all attributes of the interactions up front or when assertions need richer logic (grouping, sorting, set comparisons). The cost is that errors are detected at assertion time, not the moment they happen — but you trade that for flexibility the Mock model lacks.
Difficulty:Intermediate
Pick a double for: ‘My SUT’s constructor requires a loader, but this behavior never calls loader.load_config().’
A Dummy suffices — the loader satisfies the signature but is never used in this path. If the SUT does read fields from loader.load_config(), escalate to a Stub returning a specific config.
Reaching for a Mock or Spy here would over-specify the test. The minimum-viable-double rule says pick the simplest double that lets the test do its job — a Dummy exists only to satisfy the signature, and anything heavier is extra coupling for no benefit.
Difficulty:Advanced
Sketch the procedural verification lifecycle of a Spy-based test in four steps.
(1) Install the Spy in place of the DOC. (2) Exercise the SUT. (3) Retrieve recorded calls from the Spy. (4) Use ordinary assertions to compare recorded vs expected values.
This is the chapter’s four-step lifecycle. The contrast with mocks is the placement of the verification: spies make it explicit in the test body (visible, flexible, late); mocks make it implicit in setup (terse, strict, early). Both are valid; each suits a different shape of test.
Classify each Mock() instance by the role it actually plays.
user_repo acts as a Stub (returns canned User, no call assertion). email_service is on the Spy / Mock Object boundary: the test verifies an outbound call after execution with assert_called_once_with, so the important classification is behavior verification, not the Mock() class name.
Mock libraries blur the taxonomy — unittest.mock.Mock plays every role, so naming the role each instance plays is what keeps the test honest. Rule of thumb: configured return values → Stub; post-execution call assertions → Spy-style behavior verification; up-front strict expectations → Mock Object. A single object can even combine roles within one test.
Difficulty:Advanced
Module app/report.py does from services.users import fetch_user and then calls fetch_user(user_id). Which patch() target intercepts the call from a test of app.report — "services.users.fetch_user" or "app.report.fetch_user"? Why?
"app.report.fetch_user". After from services.users import fetch_user, the name fetch_user is bound in app.report’s namespace; the SUT looks it up there. Patching services.users.fetch_user leaves app.report’s local reference untouched.
Patch where the SUT looks up the name, not where it was defined. This is the #1 Python mocking pitfall. The same principle applies to JavaScript CommonJS (const { y } = require('x') creates a similar local binding) and to Java static imports — names live in the namespace of the module that introduces them.
Difficulty:Advanced
Your SUT catches ConnectionError and returns a fallback value. Sketch the Mock() configuration that drives the SUT down that branch deterministically. Why does setting return_value not work?
Set side_effect to the exception class:
api.fetch.side_effect=ConnectionError
side_effect = <exception class> makes the mock raise the exception on call — driving the SUT into its except branch. return_value = ConnectionError() would return an instance of the exception, which the SUT receives as a value rather than as a raise.
side_effect is Mock’s lever for behavior beyond returning a canned value: set it to an exception class to raise; set it to an iterable to return different values across consecutive calls; set it to a callable to compute the return value from the arguments. return_value and side_effect answer different test-design needs and are not interchangeable.
Difficulty:Advanced
A team’s tests directly mock requests.get in twelve different modules. A requests version upgrade just broke 30 of those tests. What’s the structural fix — and what’s the principle?
Wrap requests in a thin Adapter class (e.g., HttpClient) that exposes only the methods the codebase needs. Have all twelve modules depend on HttpClient. Mock the Adapter, not requests directly. Principle: don’t mock what you don’t own.
When tests depend on a third-party’s API directly, every library upgrade can ripple through the suite. The Adapter pattern (named in design-patterns literature) flips the dependency direction: the codebase depends on an interface the team controls, and tests double that interface. The third-party stays invisible to the test suite.
Difficulty:Expert
You use a FakeUserRepository (in-memory dict) for fast unit tests. The unit tests pass. Production then fails because the real PostgresUserRepository raises IntegrityError on a duplicate email, while the Fake had been raising ValueError. How do you keep the Fake’s speed and defend against this drift?
Write a shared contract test that both FakeUserRepository and PostgresUserRepository must pass — including the duplicate-email exception class. Run it against the Fake every commit (fast) and against the real repository on a schedule, against a sandbox database (slower).
Every Fake is a promise that it behaves enough like the real collaborator, and that promise can break silently. A contract test captures the behavioral expectations once and runs against both implementations, so the Fake keeps its speed while drift becomes visible the moment one side changes.
Mystery Guest. The test depends on the contents of /tmp/test_orders.csv — an external file invisible from the test body. A reader cannot tell what 5 orders, $1240 total is computed from, only that the assertion exists.
Mystery Guest is one of several named test-double smells. Neighbors to keep distinct: The Mockery (so many mocks the test verifies orchestration, not behavior); Counting on Spies (asserting every internal call, freezing the implementation); Unnecessary Stubs (stubs for calls the SUT never makes); Eager Test (one test, many behaviors). Naming the smell makes it easier to spot in review.
Workout Complete!
Your Score: 0/16
Come back later to improve your recall!
Test Doubles Quiz
Apply, Analyze, and Evaluate-level questions on the test-double taxonomy — pick the right double for a scenario, recognize Spy vs Mock by failure timing, and diagnose over-mocking that tests the mock instead of the SUT.
Difficulty:Intermediate
You are testing an OrderProcessor whose process() method calls paymentGateway.charge(amount) and then returns the gateway’s response. For your test, you want to force process() down the “gateway returned Status.DECLINED” branch. Which test double is the right choice?
A Dummy is passed but never used. Here the SUT does use the gateway’s return value to choose its branch — a Dummy gives the SUT no value to react to, so the declined path is never exercised.
Pre-programming the call as an expectation conflates two concerns. The behavior under test is what the SUT does with a declined response, not whether it called the gateway. Mocks fit best when the interaction itself is the contract.
A Spy records calls for after-the-fact checking, but the test needs to control the value the SUT receives — not observe what it sent. Spies observe; Stubs control.
Correct Answer:
Explanation
The cleanest framing is: which direction of data flow do you need? Indirect input (the SUT consumes a DOC’s output) → Stub. Indirect output (the SUT produces something through the DOC) → Spy or Mock. Here you need to force a specific indirect input — Status.DECLINED — so a Stub is the minimum-viable double.
Difficulty:Intermediate
A test uses a double for notifier. The SUT may call notifier.send(...) zero or more times depending on user input. The test wants to assert that when the user is a premium member, the notifier received exactly one call with channel="sms". Which double fits best?
A Stub controls indirect inputs. The behavior here is what the SUT sends — an indirect output — so a Stub gives you no way to verify the call pattern that the test cares about.
A Dummy fits when the test ignores the DOC’s role entirely. Here the test cares precisely about whether the SUT called the notifier with the right channel — that interaction is the contract under test.
Pre-programming every possible call sequence would tightly couple the test to the SUT’s internal flow. A Mock fits when the contract specifies a precise call sequence; for “exactly one matching call”, a Spy’s after-the-fact assertion is simpler and less brittle.
Correct Answer:
Explanation
Spies record calls quietly during the SUT’s execution and let the test do the verification afterward. That fits this scenario well because the SUT’s behavior is data-dependent — the test can collect everything and then assert on the property it cares about (exactly one SMS call), without pre-specifying the full call sequence.
Difficulty:Intermediate
A team’s controller test sets up a Mock() for user_repo with user_repo.get.return_value = User(id=1) and then asserts on the controller’s HTTP response — nothing else. The teammate insists this is a Mock; you disagree. What is the most precise classification?
The class name from the mocking library doesn’t determine the role the object plays. unittest.mock.Mock is one library construct used to implement many of these roles — pick the name that matches the behavior in this test.
A Dummy is passed but never used. Here the controller uses the return value to do its work — the double is doing real work in the SUT’s logic, so it is not a Dummy.
Spies do record calls, but a Spy is identified by the test actually inspecting those recordings. This test never asserts on user_repo calls, so it isn’t using the recording capability at all.
Correct Answer:
Explanation
These roles are about what the double does in this test, not which library type implements it. If only return values are configured and no calls are asserted on, the role is a Stub — regardless of whether the implementation is Mock(), a hand-rolled subclass, or a Fake with shortcuts. Naming the role explicitly keeps tests honest and helps reviewers spot over-mocking.
Difficulty:Advanced
You are deciding between a Spy and a Mock to verify a notification interaction. Which factor most strongly favors a Spy?
Failing at the exact call site is a Mock property — Mocks compare during execution. Spies fail later, at assertion time. If pinpoint failure location matters most, a Mock fits better than a Spy.
A short, fixed call sequence is a textbook fit for a Mock with strict expectations — the contract is precise and the cost of strictness is low. Spies pay off when the call shape is harder to specify up front.
Pushing expectations into setup is a stylistic feature of Mocks. Spies move assertions into the test body, which is the opposite trade-off — visible and flexible, not terse and strict.
Correct Answer:
Explanation
Spies and Mocks both observe indirect outputs but differ in when and how strictly they verify. Spies record everything and let the test method assert flexibly afterward — ideal when the SUT’s call pattern is data-dependent or when you want assertions richer than literal matchers. Mocks specify the contract up front and fail at the moment of divergence — ideal when the call sequence is precise and short.
Difficulty:Advanced
A teammate writes this test for a checkout controller:
Verifying every collaboration is exactly what makes the test brittle. The test is now a copy of the controller’s body translated into assertions — it locks down the implementation rather than the behavior.
Real implementations for everything would turn this into an end-to-end test, a different artifact with different tradeoffs. The structural problem here — over-specifying the controller’s collaboration sequence — would still be present with real DOCs.
Sharing setup would tidy the syntax but would not address the core problem: the test asserts on how the controller works rather than what the controller guarantees.
Correct Answer:
Explanation
This is an over-mocked test: it mirrors the SUT’s body line-for-line and breaks under any internal refactor. The fix is to assert on the outcomes the contract specifies — repo.mark_paid(42) may be one, but find_cart, charge, and emailer.send are usually implementation choices. Reserve interaction assertions for the cases where the interaction itself is the behavior.
Difficulty:Advanced
You’re testing a ReportService that reads from a UserRepository (heavy I/O). Which of the following are good reasons to write a FakeInMemoryUserRepository instead of using a Stub or Mock for each test? (Select all that apply.)
Omitted: deduplicating shared data-setup is one of the biggest payoffs of writing a Fake. If you’ve configured the same five return_values across a dozen tests, the Fake is already cheaper than the Stub-heavy alternative.
Omitted: write-then-read sequences are particularly painful to model with Stubs because each call has to map to the right canned response. A Fake just stores and retrieves; the test reads as if against a real repository.
A Fake is by definition unsuitable for production — it takes shortcuts (no durability, no concurrency safety, no transactional guarantees) that make it light and fast for tests. If you intend to ship it, it’s an alternative implementation, not a Fake.
Omitted: query-realism is the strongest case for a Fake over a Stub. A Stub returning canned rows can mask filtering, joining, or sorting bugs that a working in-memory implementation would reveal.
Correct Answers:
Explanation
Fakes earn their keep when many tests share the same dependency shape and rely on its nontrivial behavior — queries, writes, joins. The cost is the upfront work to build the in-memory implementation; the payoff is dozens of tests that are simpler, more realistic, and less coupled to canned return values than a Stub-heavy alternative.
The team is migrating to a Mock-based assertion library and wants to express the same contract. Which Mock-style assertion captures the same behavior without strengthening or weakening it?
charge.assert_called() is much weaker — it permits any number of charge calls and says nothing about the amount. The Spy assertions pinned the count to 1, the method to charge, and the amount to 2000; this Mock call loses two of those constraints.
assert_called_with() only checks the most recent call. The Spy test required exactly one call total; allowing multiple charge calls where only the last matches would weaken the contract substantively.
assert_not_called() flips the assertion — the original Spy code requires that chargewas called once with the right amount. This would invert the test, not preserve it.
Correct Answer:
Explanation
Translating between Spy-style and Mock-style assertions is a place tests quietly drift in strength. The parent mock_calls list preserves all three claims the Spy made: one gateway call total, method charge, and amount 2000. The cousins (assert_called_with, assert_called, and method-only assert_called_once_with) look similar but encode different contracts. When migrating, audit each translation: a test should make the same claim before and after, no more and no less.
Difficulty:Intermediate
Your SUT takes a Logger parameter, but this behavior does not log anything. The test cares only about the SUT’s return value. What is the lightest double that lets the test work?
assert_not_called() would actually constrain the SUT — it would fail if the SUT logged anything, which the test explicitly doesn’t care about. That tightens the contract beyond what the test wants to assert.
Recording calls ‘just in case’ adds coupling and noise the test doesn’t need today. Add the Spy when a future test actually asserts on logs; until then, the lightest double is best.
A Fake list-logger is overkill for a test that ignores logs entirely. Building real behavior earns its keep only when many tests need it — premature investment costs more than it saves.
Correct Answer:
Explanation
The minimum-viable-double rule: pick the simplest double that makes the test work and adds no further coupling. A Dummy is the lightest — it exists only to satisfy the signature. Escalating to a Stub, Spy, Mock, or Fake should be justified by what the test actually needs to verify or control.
Difficulty:Advanced
Module app/report.py does from services.users import fetch_user, and the function display_name(user_id) then calls fetch_user(user_id) directly. A test does:
The test fails because the assertion saw the real fetch_user run, not the patched one. What is wrong?
autospec enforces the patched callable’s signature on the mock — it does not affect whether the patch intercepts the call. The patch is being applied; it’s just being applied in the wrong namespace.
from ... import is perfectly patchable — the rule is just that you must target the importing module’s namespace. Reshaping the SUT works but is far heavier than the one-line patch-target fix.
patch() works on any importable name — module-level functions, class methods, attributes, dict entries. monkeypatch is the pytest-fixture equivalent and follows the same where-to-patch rule.
Correct Answer:
Explanation
After from services.users import fetch_user, the name fetch_user is bound in app.report’s namespace. The SUT looks it up there when it calls fetch_user(user_id). Patching the original services.users binding leaves app.report’s local reference untouched — the real function runs, the patch never intercepts. Rule: patch where the SUT looks the name up, not where it was originally defined.
Difficulty:Advanced
A team imports requests directly in twelve different modules and uses patch("requests.get") (or similar) in each of their tests. The patches are fragile, the tests are slow, and a requests version bump recently broke 30 tests because the library’s exception class names changed. Which refactor most directly addresses the structural problem?
spec= would tighten the signature check but the underlying coupling stays — twelve test files still depend on the shape of an API the team doesn’t own. The next requests upgrade still ripples through all twelve.
Pinning versions postpones the problem until the next security patch forces an upgrade. The structural issue is that the team’s tests are coupled to a third-party’s contract; pinning doesn’t decouple them.
Centralizing the patching reduces duplication but every test still names requests.get. The third-party API still leaks into the test suite. Centralization without an Adapter is a tidier version of the same coupling.
Correct Answer:
Explanation
Don’t mock what you don’t own. When tests depend on a third-party’s API surface directly, every library upgrade can ripple through the suite. The Adapter pattern flips the dependency: the codebase depends on an interface the team controls, and the tests double that interface. The third-party is wrapped once, in one place, and the tests stay decoupled from it. (Hynek Schlawack’s essay popularized this phrasing; the underlying idea is older.)
Difficulty:Expert
A team uses FakeUserRepository (in-memory dict) for fast unit tests of UserService. The unit tests pass on every commit. In production, a bug surfaces: the real PostgresUserRepository raises IntegrityError on duplicate emails, but UserService had been written assuming a ValueError, which the Fake was happily raising. What is the most direct defense against this class of bug without abandoning the Fake?
Abandoning the Fake forfeits its main benefit (fast, deterministic unit tests). The structural issue is that the Fake and the real repository drifted; the fix is to detect drift, not to remove the Fake.
autospec enforces the method signature, not the behavioral contract. Two implementations can share the same signature and still disagree on which exception class they raise — that’s the exact bug this team hit.
Unit tests catch design issues fast; abandoning them in favor of integration-only coverage trades one signal for another rather than fixing the gap. A small contract test is the proportionate defense, not a full coverage strategy swap.
Correct Answer:
Explanation
Every Fake is a promise that it behaves enough like the real collaborator, and that promise can break silently. A contract test is a single shared test that both the Fake and the real implementation must satisfy — exception classes, return shapes, edge-case behavior. Run it fast against the Fake every commit and slower against the real repository on a schedule, so drift surfaces at the contract test rather than at 3 a.m. in production.
Difficulty:Advanced
Your SUT catches ConnectionError from a weather API and returns a fallback value. You want a unit test that drives the SUT down the error-handling branch deterministically — without waiting for the real network to fail. Which configuration on a Mock() weather client gets you there?
return_value = ConnectionError() makes the mock return the exception object as a value — the SUT receives an exception instance as the function’s result. It does not raise. The SUT’s except branch never fires.
There is no assert_raises method on Mock. The pattern you may be thinking of is pytest.raises(...) in the test body, but that’s an assertion about the SUT’s behavior, not a configuration of the mock.
Patching low-level socket exceptions is a long way around for what side_effect does in one line. It is also fragile: real network code raises many exception classes, and emulating the right one at the socket level is harder than telling the mock to raise the class the SUT already catches.
Correct Answer:
Explanation
side_effect is Mock’s lever for behavior beyond returning a canned value. Set it to an exception class (or instance) and the mock raises on call; set it to an iterable to return different values on consecutive calls; set it to a callable to compute the return value dynamically from the arguments. Using side_effect = ConnectionError (the class) is the canonical way to drive the SUT into its error-handling branch in a deterministic, network-free test.
Only one mock appears in the test — far from a mockery. The smell here is about where the data lives, not how many doubles were used.
The test has exactly one assertion. The smell here is about a hidden input, not unexplained outputs.
The test exercises exactly one behavior — process_all summarizing a batch of orders. The smell here is about visibility of inputs, not breadth of coverage.
Correct Answer:
Explanation
Mystery Guest is the smell where a test depends on data living outside the test method — a file, shared fixture, or database row. A reader cannot tell from the test alone what 5 orders, $1240 total is computed from. The fix is to inline the relevant data (or use a clearly-named local builder) so the reader sees both halves of the assertion: what went in and what came out.
Cookie & Privacy Notice:
This site stores a few preferences and your progress locally in your browser
(cookies and localStorage) so it works the way you left it.
Nothing is sent to or stored on any external server, and this site does not
sell, share, or disclose any user data to third parties.
View & manage your data →