Test Doubles — Stubs, Spies, and Mocks
Learn to test code that depends on a clock, an HTTP service, a database, or a notification system — without actually hitting them. Concepts first; the Python+pytest syntax is provided so you can focus on the test-design decisions.
The Test That Lied: A Test That Passes Today and Fails Tomorrow
🎯 Goal: See a test that passes today but is fundamentally broken — then carve out a seam where you can later substitute a controlled clock. 🧠 Skills you’ll gain: Recognize when a real collaborator (clock, HTTP, database) makes a test non-deterministic, and introduce a seam — a parameter the test can swap out — so future tests can stay deterministic.
📣 The kind of test that ships green and rots overnight. Imagine you’re on the QuestForge team. A daily-quest event is scheduled for April 28, 2026. A teammate writes a test on April 28 that asserts
is_today_event_day("2026-04-28")returnsTrue. Test passes. PR merges. Next day — without a single code change — the same test fails on CI. Why? Because the test depends on the wall clock, not on what the function should do. That hidden dependency is what test doubles exist to control.
🧭 What you already know — and what’s about to shift
From Testing Foundations you know how to write a strong oracle, choose partition + boundary inputs, and avoid peeking at private state. From TDD you know the Red-Green-Refactor rhythm. Every example so far has had one thing in common: the function under test was self-contained. Pass it inputs, observe the output, done.
Real code is rarely like that. Real functions talk to collaborators — clocks, network APIs, databases, payment gateways, email services. Each of those collaborators turns a deterministic test into a flaky test, a slow test, or — worst — a test that appears green but actually never exercised the behavior you cared about. This entire tutorial is about that problem.
📖 New vocabulary (visible glossary)
| Term | Meaning |
|---|---|
| System Under Test (SUT) | The code being tested. Here: is_today_event_day. |
| Collaborator | Anything the SUT calls into. Here: datetime.now(). |
| Indirect input | A value the SUT receives from a collaborator (rather than from its caller). Here: today’s date from the clock. |
| Seam | A point where you can substitute a collaborator at test time without changing production behavior. We’re about to introduce one. |
| Dependency Injection | The technique: pass the collaborator in as a parameter instead of hard-coding it. Meszaros, Dependency Injection, p.678. |
🌍 The same vocabulary in another language
These terms come from xUnit Test Patterns (Meszaros, 2007). They’re language-agnostic. JavaScript+Jest, Java+Mockito, C#+Moq, Ruby+RSpec — all use the same words for the same roles. What changes between languages is the syntax of how you express a stub or a mock. The role doesn’t change.
⚙️ Task — three small moves:
- Read
quest_service.pyandtest_quest_service.py. The test asserts thatis_today_event_day("2026-04-28")returnsTrue. Predict: will it pass today? Will it pass tomorrow? Why? - Run the test (▶ button). It passes today (April 28, 2026). 🎉 — and that green dot is misleading. Tomorrow the same code with no changes will fail.
- Refactor
is_today_event_dayto accept aclockparameter (defaultdatetime.datetime). This creates the seam — but you don’t use it yet. Step 2 will show how a stub fills it.
flowchart LR
subgraph before["BEFORE — no seam"]
direction TB
S1["is_today_event_day(date_str)"]:::sut
S1 --> C1["datetime.now()<br/>📅 wall clock"]:::bad
end
subgraph after["AFTER — seam introduced"]
direction TB
S2["is_today_event_day(date_str, clock)"]:::sut
S2 --> C2["clock.now()<br/>↑ caller decides<br/>what clock"]:::good
end
before --> after
classDef sut fill:#e3f2fd,stroke:#1565c0,color:#0d47a1
classDef good fill:#e8f5e9,stroke:#2e7d32,color:#1b5e20
classDef bad fill:#ffebee,stroke:#c62828,color:#b71c1c
💡 Pedagogical move — concept over syntax. Your code change is a single keyword (clock) and one default. The point isn’t the syntax; the point is the idea — “this function used to depend on the wall clock; now its caller decides what ‘now’ means.” That idea is the foundation of every test double in this tutorial.
🧠 Why a default value? Why not just require the parameter?
Two reasons:
- Backwards compatibility. Other code that calls
is_today_event_day(date_str)keeps working — the defaultclock=datetime.datetimereproduces the old behavior exactly. - Pedagogy. The original test (
assert is_today_event_day("2026-04-28") is True) keeps passing without modification. The seam is available to any test that needs it; tests that don’t need it ignore it. That’s what non-intrusive seam means.
In Java, the equivalent would be a constructor parameter with a default factory. In JavaScript, an options object with a default. The concept — “let the caller swap this dependency” — is what carries.
🔭 Coming in Step 2: You created a seam. Now we’ll actually use it — by passing in a FrozenClock object that always says it’s Tuesday. Same SUT, same test shape, but now fully deterministic.
"""QuestForge — daily quest event service."""
from datetime import datetime
def is_today_event_day(event_date_str: str) -> bool:
"""Return True if today is the event date.
event_date_str is in YYYY-MM-DD format.
⚠️ This function calls datetime.now() directly. Tests that pin a
specific date will pass on that date and fail on every other day.
That hidden non-determinism is what we're about to fix.
"""
today = datetime.now().strftime("%Y-%m-%d")
return today == event_date_str
"""Test for is_today_event_day.
⚠️ This test was written on 2026-04-28. It passes today.
Read it carefully — would it pass *tomorrow*? Why or why not?
"""
from quest_service import is_today_event_day
def test_april_28_is_event_day():
# Today (2026-04-28) IS the event day, so this should return True.
assert is_today_event_day("2026-04-28") is True
Why Test Doubles? — Knowledge Check
Min. score: 80%1. Which of these collaborators are likely to make a test flaky (sometimes pass, sometimes fail without code changes)? (select all that apply)
Flakiness comes from collaborators that the test cannot fully control: wall clocks, network calls, remote databases, file systems, randomness. Pure in-memory operations (list reversal, arithmetic) are deterministic and don’t need a double.
2. What is an indirect input to the System Under Test?
Indirect input = a value the SUT obtains from a collaborator rather than
from its caller. clock.now(), db.fetch_user(id), api.get_weather() —
each returns an indirect input that the SUT then uses. Stubs control these.
3. (Spaced review — Testing Foundations) A test asserts result is not None after refactoring the SUT to accept a clock parameter. Is that a strong oracle?
Oracle strength is independent of whether collaborators are doubled.
is not None is the canonical weak oracle in any context. Even after
you replace a real clock with a stub, the assertion still has to pin
exactly what the spec mandates.
4. Why is dependency injection the right move before introducing any test doubles?
Dependency Injection is the design move that makes test doubles possible. Pass the collaborator as a parameter; now any test can substitute a controlled version. (Same principle in Java with constructor injection, in C# with interfaces, in JavaScript with options-object patterns. The pattern is language-agnostic.)
Hand-Rolled Stub: A Clock That Always Says Tuesday
🎯 Goal: Replace a real collaborator (clock, HTTP API) with a hand-written Test Stub — a tiny class that returns canned values — and watch a flaky test become deterministic. 🧠 Skills you’ll gain: Recognize a Test Stub as a role (Meszaros, p.529), implement one in plain Python without any library, and pick canned values that drive the SUT down a specific behavior partition.
🧭 Bridge from Step 1. You created a seam: DailyQuestService(clock, api) accepts its collaborators as parameters. Now we’ll use the seam — by passing in objects that always answer the same way. That’s a stub.
📖 The verbatim teaching sentence
“
Mockis a tool class; stub, spy, and mock are test-design roles. Same in Python, JavaScript, and Java — the role is what matters; the class name is just syntax.”
Read that twice. Most confusion about test doubles in Python comes from conflating Python’s unittest.mock.Mock class with the conceptual Mock role. They’re not the same thing. We’ll dismantle that confusion in Step 4. For now, lock in this: the role is the question; the syntax is the answer.
📖 What is a Test Stub? (Meszaros, xUnit Test Patterns, p.529)
A Test Stub replaces a collaborator with a hand-controlled object that answers questions with canned values. It does not record what was asked of it; it does not enforce a contract. It just answers.
flowchart LR
T["Test"]:::test --> S["DailyQuestService<br/>(SUT)"]:::sut
S -->|"clock.now()"| C1["FrozenClock<br/>📅 STUB<br/><i>always returns<br/>April 28, noon</i>"]:::stub
S -->|"api.fetch_quests(...)"| C2["StubQuestApiClient<br/>📋 STUB<br/><i>always returns<br/>the canned quest list</i>"]:::stub
T -.->|"asserts on return value"| S
classDef test fill:#e3f2fd,stroke:#1565c0,color:#0d47a1
classDef sut fill:#fff3e0,stroke:#e65100,color:#bf360c
classDef stub fill:#e8f5e9,stroke:#2e7d32,color:#1b5e20
Notice what the test asserts on: the SUT’s return value, not the stubs. That’s state verification — we observe the result of calling the SUT, not whether it talked to anyone. Stubs make state verification possible by removing the variability the real collaborators would have introduced.
⚙️ Task — two small moves:
- Read the worked example
test_tuesday_picks_tuesday_quest. TheFrozenClock, theStubQuestApiClient, and the assertion are all written for you. Predict the test’s outcome before running. Then run it — green. - Fill in the assertion in
test_thursday_picks_thursday_quest. The clock is frozen to a Thursday; the canned API quests include a Thursday entry. Compute the expected value from the spec — don’t run-and-paste. Replace"FILL_IN_HERE"with the exact title the SUT should return.
💡 The conceptual move. A stub answers questions — it doesn’t decide what those answers should be. You decide. Your decision drives the SUT down whichever behavior branch the test is meant to exercise. The canned quest list and the frozen weekday together form a precise input partition; the assertion locks in what the SUT does for that partition.
📖 Why we wrote `StubQuestApiClient` as a class with one method, not as a function
DailyQuestService calls self._api.fetch_quests(user_id) — it expects a fetch_quests method on the api object. So our stub must be an object with that method. A function alone wouldn’t have a .fetch_quests attribute.
In Python this is duck typing: any object with a fetch_quests(self, user_id) method that returns a list of quest dicts is acceptable. The real QuestApiClient does it. Our stub does it. The SUT can’t tell them apart — that’s the whole point.
In Java, you’d give both classes a common interface. In TypeScript, you’d type the parameter as { fetchQuests: (userId: string) => Quest[] }. The mechanism differs; the idea (stub satisfies the same contract as the real collaborator) is universal.
🧠 Stub vs Fake — the cousin you'll meet briefly
A Fake Object (Meszaros p.551) is the next-of-kin to a stub: a working but lightweight implementation. Where StubQuestApiClient returns the same canned list no matter what user_id is passed, a FakeQuestApiClient could keep an in-memory dict of {user_id: [quests]} and return different lists for different users.
class FakeQuestApiClient:
def __init__(self):
self._data = {}
def add_quests_for(self, user_id, quests):
self._data[user_id] = quests
def fetch_quests(self, user_id):
return self._data.get(user_id, [])
When to reach for a Fake instead of a Stub: when one canned answer isn’t enough — typically when multiple SUTs share the collaborator, or when the test sequence depends on state that the stub would have to manually thread.
We won’t use Fakes in the worked exercises (one canned list per test is plenty here), but it’s worth knowing they exist. Step 6’s decision guide covers when each one fits.
🌍 The same idea in another language
FrozenClock is just a class with a hard-coded method. Every language has a way to write that.
JavaScript (no framework):
const frozenClock = {
now: () => new Date('2026-04-28T12:00:00')
};
Java:
Clock frozenClock = Clock.fixed(
Instant.parse("2026-04-28T12:00:00Z"),
ZoneOffset.UTC
);
Same role; different syntax. Frameworks (unittest.mock, Jest, Mockito) generate these objects more concisely — but that’s boilerplate reduction, not a different idea.
🔭 Coming in Step 3: A stub answers questions. What if your SUT’s interesting behavior is whom it asks — like a complete_quest that should call ledger.credit(user_id, gold)? That’s where Test Spy comes in.
"""Reusable test helper: a clock that always says it's `fixed_dt`."""
from datetime import datetime
class FrozenClock:
"""A stub clock — always returns the datetime it was constructed with."""
def __init__(self, fixed_dt: datetime):
self._fixed_dt = fixed_dt
def now(self) -> datetime:
return self._fixed_dt
"""The REAL HTTP client — don't call this in tests.
Instantiating QuestApiClient and calling fetch_quests() would actually
hit the network. Tests that exercise `DailyQuestService` should pass
a stub instead.
"""
import urllib.request
import json
class QuestApiClient:
def fetch_quests(self, user_id: str) -> list[dict]:
url = f"https://questforge.example.com/quests/{user_id}"
with urllib.request.urlopen(url) as r:
return json.loads(r.read())
"""QuestForge — daily quest service.
DailyQuestService takes a clock and an API client as constructor
parameters (Dependency Injection). At test time we pass in stubs;
in production the caller passes the real ones.
"""
import datetime
def is_today_event_day(event_date_str: str, clock=datetime.datetime) -> bool:
today = clock.now().strftime("%Y-%m-%d")
return today == event_date_str
class DailyQuestService:
"""Picks today's daily quest title for a user."""
def __init__(self, clock, api):
self._clock = clock
self._api = api
def daily_quest_title(self, user_id: str) -> str:
"""Return today's quest title, or 'No quests today' if none match."""
try:
quests = self._api.fetch_quests(user_id)
except ConnectionError:
return "No quests today"
if not quests:
return "No quests today"
weekday = self._clock.now().strftime("%A")
for quest in quests:
if quest["weekday"] == weekday:
return quest["title"]
return "No quests today"
"""Step 2 — Hand-rolled stubs for DailyQuestService.
Two stubs are used here. FrozenClock is imported from clock.py.
StubQuestApiClient is defined right below — because it's a regular
class, not anything special. (Step 4 will show that `unittest.mock`
generates the same conceptual object in a single line — but the *idea*
is what we're locking in here, not the syntax.)
"""
from datetime import datetime
from clock import FrozenClock
from quest_service import DailyQuestService
class StubQuestApiClient:
"""A Test Stub (Meszaros, p.529) — returns canned quests regardless of user_id."""
def __init__(self, canned_quests: list[dict]):
self._canned = canned_quests
def fetch_quests(self, user_id: str) -> list[dict]:
return self._canned
# ===== WORKED EXAMPLE 1 — fully written =====
# Read carefully. Predict the assertion's outcome BEFORE running.
def test_tuesday_picks_tuesday_quest():
clock = FrozenClock(datetime(2026, 4, 28, 12, 0)) # 2026-04-28 is a Tuesday
api = StubQuestApiClient([
{"weekday": "Monday", "title": "Slay the Slime Lord"},
{"weekday": "Tuesday", "title": "Find the Lost Amulet"},
{"weekday": "Wednesday", "title": "Defeat the Dragon"},
])
service = DailyQuestService(clock, api)
assert service.daily_quest_title("u123") == "Find the Lost Amulet"
# ===== FADED EXAMPLE 2 — student fills in the expected value =====
# The stub class, the FrozenClock, and the canned data are all provided.
# YOUR JOB: replace "FILL_IN_HERE" with the EXACT title the SUT should return.
# Compute it from the spec; don't run-and-paste.
def test_thursday_picks_thursday_quest():
clock = FrozenClock(datetime(2026, 4, 30, 12, 0)) # 2026-04-30 is a Thursday
api = StubQuestApiClient([
{"weekday": "Monday", "title": "Slay the Slime Lord"},
{"weekday": "Thursday", "title": "Battle the Lich King"},
{"weekday": "Sunday", "title": "Save the Princess"},
])
service = DailyQuestService(clock, api)
# TODO — pin the exact title with `==` (strong oracle, Testing Foundations Step 3).
assert service.daily_quest_title("u456") == "FILL_IN_HERE"
Test Stubs — Knowledge Check
Min. score: 80%1. Which best describes a Test Stub?
Stub = canned answers. The SUT calls the stub; the stub returns whatever the test configured. Used to control what the SUT receives, not to inspect what the SUT does. (Step 3 covers the latter — that’s a Spy.)
2. Why is hardcoded datetime.now() (used directly inside the SUT) not a stub?
Stub = under the test’s control. datetime.now() is the opposite —
the wall clock is shared, mutable, and impossible for the test to
pin. Replacing it with FrozenClock(...) is what makes the
indirect input controllable.
3. (Spaced review — Testing Foundations Step 3) A teammate writes:
assert service.daily_quest_title("u123") is not None
Stubs and strong oracles solve independent problems. Stubs make indirect inputs controllable; oracles make assertions precise. You need both. Putting a weak oracle inside a stubbed test is a Liar test wearing a stub’s clothes.
4. When would a Fake Object (in-memory implementation) be a better choice than a Test Stub?
Stub: one canned answer per call. Fake: working in-memory implementation, useful when the SUT needs consistent stateful behavior across multiple calls (add → fetch → update → fetch again, etc.). Step 6’s decision guide covers when each fits.
Hand-Rolled Spy: Did the Ledger Actually Get the Gold?
🎯 Goal: Verify that the SUT called a collaborator with the right arguments — even when the SUT itself returns nothing observable. Implement a Test Spy in plain Python, and pin exactly the right amount of detail in the assertion. 🧠 Skills you’ll gain: Recognize a Test Spy as a role (Meszaros, p.538) — a stub that also records calls. Write spy assertions that are strong enough to catch bugs but loose enough to survive harmless refactors.
🧭 Bridge from Step 2. A stub answers the SUT’s questions. A spy also records what the SUT did. The new conceptual move:
| Stub (Step 2) | Spy (Step 3) | |
|---|---|---|
| What the test asserts on | The SUT’s return value | The recorded calls on the spy |
| What the SUT looks like | A function that returns something | Often a method that returns None (fire-and-forget) |
| Verification kind | State verification (Meszaros, p.462) | State verification of the spy — Step 5 will introduce the third kind |
The new collaborator is RewardLedger — its job is to credit gold to a user. The SUT calls ledger.credit(user_id, gold) and that’s the only observable effect. The SUT itself returns nothing useful — the call to credit IS the contract. To verify it, we need a spy.
📖 What is a Test Spy? (Meszaros, xUnit Test Patterns, p.538)
A Test Spy behaves like a stub and records every call made to it. The test runs the SUT, then inspects the spy’s recorded-call list. Same SUT/collaborator structure as Step 2; what changes is what the test asserts on.
flowchart LR
T["Test"]:::test --> S["DailyQuestService"]:::sut
S -->|"clock.now()"| C1["FrozenClock<br/>📅 STUB"]:::stub
S -->|"api.fetch_quests(...)"| C2["StubQuestApiClient<br/>📋 STUB"]:::stub
S -->|"ledger.credit(u1, 100)"| C3["SpyLedger<br/>🎙️ SPY<br/><i>records every call</i>"]:::spy
T -.->|"asserts on spy.calls"| C3
classDef test fill:#e3f2fd,stroke:#1565c0,color:#0d47a1
classDef sut fill:#fff3e0,stroke:#e65100,color:#bf360c
classDef stub fill:#e8f5e9,stroke:#2e7d32,color:#1b5e20
classDef spy fill:#f3e5f5,stroke:#6a1b9a,color:#4a148c
Notice the test now asserts on spy.calls, not on the SUT’s return value. The contract being verified is “the SUT called credit with these arguments”.
📖 The hard part isn’t writing the spy — it’s writing the assertion
A spy is even simpler than a stub: a class with a list and an append. The interesting test-design move is how much of each call to pin.
| Assertion | What still passes (i.e., what it misses) | Pattern |
|---|---|---|
assert len(spy.calls) >= 0 |
Everything. Always passes. Liar test. | Weak — same family as result is not None from Testing Foundations |
assert spy.calls == [("u1", 100, "2026-04-28T12:00:00Z", {"meta": "blob"})] |
Nothing. Breaks if the SUT later calls credit with cleaner arguments — even when the contract is unchanged. Brittle. | Over-specified |
assert spy.calls == [("u1", 100)] |
A wrong user_id, a wrong gold amount, no call at all, two calls. Goldilocks. | Strong, behaviorally-bounded |
Same lesson as Testing Foundations Step 4: assert on exactly what the spec says — no less, no more. The spec for complete_quest: “credit the user the gold for the completed quest.” That maps to a 2-tuple (user_id, gold). Anything beyond that is over-specification; anything less is a Liar.
⚙️ Task — three small moves:
- Read
test_complete_quest_LIAR_oracle. The assertion isassert len(spy.calls) >= 0— it always passes, regardless of whether the SUT called the spy at all. Add a Python comment above the assertion explaining (in your own words) why this is a Liar test — use the phrase “Liar test” or “weak oracle”. Don’t change the assertion; the test stays a Liar so the lesson is preserved. - Read and run
test_complete_quest_credits_correct_gold— fully written, pins the exact 2-tuple. This is the Goldilocks shape. - Fill in the assertion in
test_award_streak_bonus_5_days. The streak-bonus rule: 10 gold per day, capped at 100. The student passesdays=5. Compute the gold; pin the call.
📖 Why fire-and-forget methods need spies
complete_quest returns None. From the SUT’s caller’s perspective, nothing happens — the function is “void”. Yet the SUT did do something important: it told the ledger to credit gold. Without a spy, that work is invisible to the test.
A spy makes invisible side effects visible. In every language: Java mocks (Mockito.verify(...)), JavaScript spies (jest.fn() + expect(spy).toHaveBeenCalledWith(...)), Python’s unittest.mock recorded calls — the idea is the same. This is the only way to test fire-and-forget methods.
🌍 The same idea in another language
JavaScript with Jest:
const spy = jest.fn(); // creates a function spy
service.completeQuest('u1', 'Slay the Slime');
expect(spy).toHaveBeenCalledWith('u1', 100);
Java with Mockito:
RewardLedger spy = mock(RewardLedger.class); // also acts as a spy
service.completeQuest("u1", "Slay the Slime");
verify(spy).credit("u1", 100);
Same role; different syntax. The hand-rolled SpyLedger class makes the recording mechanism visible; framework spies (Step 4) hide the boilerplate.
🔭 Coming in Step 4: Hand-rolling spies gets repetitive — you’re writing the same self.calls.append(...) boilerplate every time. Python’s unittest.mock.Mock generates the entire SpyLedger class for you in a single line. But it’s the same conceptual object — just less typing.
"""The real reward ledger — would persist gold to a database in production."""
class RewardLedger:
def credit(self, user_id: str, gold: int) -> None:
# In production: writes a credit row to the rewards database.
raise NotImplementedError(
"Don't call the real ledger in tests — pass a SpyLedger instead."
)
"""QuestForge — daily quest service with reward ledger collaborator."""
import datetime
QUEST_REWARDS = {
"Slay the Slime Lord": 100,
"Find the Lost Amulet": 150,
"Battle the Lich King": 250,
"Defeat the Dragon": 500,
}
def is_today_event_day(event_date_str: str, clock=datetime.datetime) -> bool:
today = clock.now().strftime("%Y-%m-%d")
return today == event_date_str
class DailyQuestService:
"""Picks today's quest, completes quests, and awards streak bonuses."""
def __init__(self, clock, api, ledger=None):
self._clock = clock
self._api = api
self._ledger = ledger
def daily_quest_title(self, user_id: str) -> str:
try:
quests = self._api.fetch_quests(user_id)
except ConnectionError:
return "No quests today"
if not quests:
return "No quests today"
weekday = self._clock.now().strftime("%A")
for quest in quests:
if quest["weekday"] == weekday:
return quest["title"]
return "No quests today"
def complete_quest(self, user_id: str, quest_title: str) -> None:
"""Credit the user the gold for the completed quest. Returns None."""
gold = QUEST_REWARDS.get(quest_title, 0)
self._ledger.credit(user_id, gold)
def award_streak_bonus(self, user_id: str, days: int) -> None:
"""Award 10 gold per streak day, capped at 100. Returns None."""
gold = min(days * 10, 100)
self._ledger.credit(user_id, gold)
"""Step 3 — Hand-rolled spies for fire-and-forget collaborator calls.
A spy is a stub that ALSO records calls. The interesting test-design
move isn't writing the spy — it's writing the assertion. Pin exactly
what the spec mandates: no less (Liar), no more (over-specified).
"""
from datetime import datetime
from clock import FrozenClock
from quest_service import DailyQuestService
class StubQuestApiClient:
def __init__(self, canned_quests):
self._canned = canned_quests
def fetch_quests(self, user_id):
return self._canned
class SpyLedger:
"""A Test Spy (Meszaros, p.538) — records every credit() call."""
def __init__(self):
self.calls = []
def credit(self, user_id, gold):
self.calls.append((user_id, gold))
# ===== WORKED EXAMPLE 1 — the Liar test =====
# This assertion ALWAYS passes — even if the SUT never called the spy.
# YOUR JOB: add a Python comment ABOVE the assertion explaining (in
# your own words) why this is a "Liar test" / "weak oracle".
# Don't change the assertion — keep the Liar visible for comparison.
def test_complete_quest_LIAR_oracle():
spy = SpyLedger()
service = DailyQuestService(
FrozenClock(datetime(2026, 4, 28, 12, 0)),
StubQuestApiClient([]),
spy,
)
service.complete_quest("u1", "Slay the Slime Lord")
# TODO — add a comment HERE explaining the Liar pattern.
assert len(spy.calls) >= 0
# ===== WORKED EXAMPLE 2 — Goldilocks =====
# Pins exactly the (user_id, gold) the spec mandates. Read and run.
def test_complete_quest_credits_correct_gold():
spy = SpyLedger()
service = DailyQuestService(
FrozenClock(datetime(2026, 4, 28, 12, 0)),
StubQuestApiClient([]),
spy,
)
service.complete_quest("u1", "Slay the Slime Lord")
# Slay the Slime Lord rewards 100 gold (per QUEST_REWARDS in quest_service.py).
assert spy.calls == [("u1", 100)]
# ===== FADED EXAMPLE 3 — student writes the expected call =====
# The SUT is `award_streak_bonus(user_id, days)`.
# Spec: 10 gold per day, capped at 100.
# YOUR JOB: replace the placeholder gold value with the correct one
# for `days=5`. Compute it from the spec.
def test_award_streak_bonus_5_days():
spy = SpyLedger()
service = DailyQuestService(
FrozenClock(datetime(2026, 4, 28, 12, 0)),
StubQuestApiClient([]),
spy,
)
service.award_streak_bonus("u2", 5)
# TODO — replace 999 with the correct gold for a 5-day streak.
assert spy.calls == [("u2", 999)]
Test Spies — Knowledge Check
Min. score: 80%1. What is the defining role of a Test Spy that distinguishes it from a Test Stub?
Spy = stub + call recording. The test asserts on the recorded
call list (spy.calls), which is how we verify that the SUT
did something — even when “did something” leaves no observable
return value.
2. (Spaced review — Testing Foundations Step 3) A teammate asserts:
assert len(spy.calls) >= 0
The Liar pattern is independent of the assertion operator. The
issue is the assertion’s expression — len(...) >= 0 is
structurally trivial. Replace it with assert spy.calls == [...]
pinning the exact expected call.
3. Which spy assertion is brittle (would break under a harmless internal refactor)?
Brittle = pins details outside the spec. The 4-tuple includes a
timestamp and a metadata dict that aren’t part of the credit
contract — they’re internals. A pure refactor that drops the
metadata would break this test even though credit(user_id, gold)
is still being called correctly. (Same family as the
internal-coupling brittleness from Testing Foundations Step 4.)
4. (Spaced review — Step 2) Stub vs Spy in one sentence:
Stub: "control what the SUT receives." Spy: "observe what the SUT did." Same role-vs-syntax distinction as Step 2 — these are test-design roles, independent of whether you hand-roll them or generate them with a library (Step 4 incoming).
Meet `unittest.mock`: Same Roles, Less Typing
🎯 Goal: Re-recognize the stubs and spies you wrote by hand in Steps 2-3 — but now generated by
unittest.mock.Mockin a single line. See three syntactic forms of the same conceptual stub side-by-side. 🧠 Skills you’ll gain: ReadMock(return_value=...)and recognize it as a stub. Read a Mock withassert_called_once_with(...)and recognize it as a spy. Useside_effectto simulate collaborator failures.
🧭 Bridge from Steps 2-3. You wrote StubQuestApiClient and SpyLedger by hand. The recording boilerplate (self.calls.append(...)) gets repetitive. Python’s unittest.mock.Mock is a class that generates the same conceptual object on demand:
- Set
api.fetch_quests.return_value = [...]→api.fetch_quests(...)returns that list. (Stub.) - Set
api.fetch_quests.side_effect = ConnectionError→api.fetch_quests(...)raises. (Failing stub.) - Call
api.fetch_quests("u1")→ Mock auto-records the call;api.fetch_quests.assert_called_once_with("u1")checks the recording. (Spy.)
One class, three roles — depending on what the test asks of it. The role isn’t determined by the class; it’s determined by what the test does with it.
📖 The verbatim teaching sentence — louder this time
“
Mockis a tool class; stub, spy, and mock are test-design roles. Same in Python, JavaScript, and Java — the role is what matters; the class name is just syntax.”
unittest.mock.Mock is the most overloaded class name in Python testing. It is not a “Mock object” in Meszaros’ sense (Step 5 will introduce that role). It’s a tool — a configurable double that can play stub, spy, or mock depending on how the test uses it.
⚠️ Why this matters for your career
Reading other people’s tests, you’ll see Mock everywhere. Most uses are stubs in disguise (Mock(return_value=...)). When someone says “I added a mock for the database,” nine times out of ten they actually added a stub. Recognizing the role behind the class name is the difference between parroting Mock syntax and understanding what the test verifies.
⚙️ Task — read four tests, fill in one slot:
- Read
test_a_handrolled_stub— the Step 2 hand-rolled style for comparison. - Read
test_b_mock_return_value— same SUT, same role, generated byMock. Confirm both pass and verify the same behavior. - Read
test_c_mock_as_spy— the sameMockclass, now playing the spy role. Notice: nothing aboutMockchanges between Test B and Test C — only what the test does with it. - Fill in
test_d_side_effect_simulates_api_failure— replace the placeholder exception class. ReadDailyQuestService.daily_quest_titleto find which exception it catches; use that class. The concept: a stub doesn’t have to return — it can raise, simulating a real-world failure path.
📖 return_value vs side_effect — concept-level contrast
| Attribute | What it does | When to reach for it |
|---|---|---|
mock.return_value = X |
Calls return X (a canned answer) |
The collaborator should succeed; you want to drive the SUT down a happy-path partition. |
mock.side_effect = Exception |
Calls raise the exception | The collaborator should fail; you want to drive the SUT down its error-handling branch. |
mock.side_effect = [a, b, c] |
First call returns a, second b, third c |
The collaborator returns different values across the test sequence. |
mock.side_effect = my_function |
Calls invoke my_function(*args) |
The return value depends dynamically on the arguments. |
Both attributes are configurations of the same Mock object. They’re orthogonal; they answer different test-design questions.
📖 What about `monkeypatch`?
pytest’s monkeypatch fixture is another way to swap a collaborator at test time — particularly useful when the collaborator is a module-level function or constant that the SUT imports, rather than a constructor parameter:
def test_with_monkeypatch(monkeypatch):
# Replace QUEST_REWARDS at the module level for this one test only.
# monkeypatch automatically restores it after the test.
monkeypatch.setattr("quest_service.QUEST_REWARDS", {"Slay the Slime Lord": 9999})
spy = Mock()
service = DailyQuestService(FrozenClock(...), Mock(), spy)
service.complete_quest("u1", "Slay the Slime Lord")
spy.credit.assert_called_once_with("u1", 9999)
monkeypatch.setattr(target, value) replaces target with value. After the test, monkeypatch restores the original — automatically. The auto-cleanup is what makes monkeypatch safe: a manual replacement that you forgot to restore would leak into every subsequent test.
Conceptually, monkeypatch.setattr is a stub — you’re feeding the SUT a controlled value. Same role; different syntactic vehicle. Use it when the seam is at module level rather than at constructor level.
Step 5 will use the heavier unittest.mock.patch (decorator/context manager) for the same purpose — and explore the canonical pitfall: where in the namespace to patch.
🌍 The same idea in another language
JavaScript with Jest:
const api = { fetchQuests: jest.fn().mockReturnValue([...]) }; // stub
// OR
const api = { fetchQuests: jest.fn().mockImplementation(() => { throw new Error('boom'); }) }; // failing stub via side_effect
Java with Mockito:
QuestApiClient api = mock(QuestApiClient.class);
when(api.fetchQuests(anyString())).thenReturn(List.of(...)); // stub
// OR
when(api.fetchQuests(anyString())).thenThrow(new ConnectionException()); // failing stub
Same conceptual moves: tell the double “return X” or “raise X.” The names of the methods differ across libraries — the roles don’t.
🔭 Coming in Step 5: Mock can also play the third role — Mock Object in Meszaros’ strict sense (behavior verification). To see it cleanly, we need one more idea: patch(), and where in the namespace to patch. That’s the #1 Python-mocking pitfall.
"""Step 4 — unittest.mock generates the same conceptual objects you wrote by hand.
Four tests below, all testing the same SUT (DailyQuestService). They
differ only in HOW the double is constructed and what role it plays.
Read them as a side-by-side comparison.
"""
from unittest.mock import Mock
from datetime import datetime
from clock import FrozenClock
from quest_service import DailyQuestService
# Hand-rolled stub class (Step 2 style) — kept for direct comparison.
class StubQuestApiClient:
def __init__(self, canned_quests):
self._canned = canned_quests
def fetch_quests(self, user_id):
return self._canned
# ===== TEST A — Hand-rolled stub (Step 2 style) =====
def test_a_handrolled_stub():
clock = FrozenClock(datetime(2026, 4, 28, 12, 0))
api = StubQuestApiClient([
{"weekday": "Tuesday", "title": "Find the Lost Amulet"},
])
service = DailyQuestService(clock, api)
assert service.daily_quest_title("u1") == "Find the Lost Amulet"
# ===== TEST B — Mock with return_value (same ROLE: stub) =====
# `Mock()` creates an auto-magic object. Setting
# `api.fetch_quests.return_value = [...]` configures what
# `api.fetch_quests(anything)` returns. Functionally equivalent to
# the StubQuestApiClient class above — just no class definition.
def test_b_mock_return_value():
clock = FrozenClock(datetime(2026, 4, 28, 12, 0))
api = Mock()
api.fetch_quests.return_value = [
{"weekday": "Tuesday", "title": "Find the Lost Amulet"},
]
service = DailyQuestService(clock, api)
assert service.daily_quest_title("u1") == "Find the Lost Amulet"
# ===== TEST C — Mock used as a SPY (different ROLE, same class) =====
# Watch this carefully: `Mock` is the same class as Test B's. But
# we're using it as a SPY — recording the call to `credit` and
# asserting on the recording afterwards. The role isn't determined
# by the class; it's determined by what we DO with it.
def test_c_mock_as_spy():
clock = FrozenClock(datetime(2026, 4, 28, 12, 0))
api = Mock()
api.fetch_quests.return_value = [] # api still acts as stub
ledger = Mock() # ledger plays SPY
service = DailyQuestService(clock, api, ledger)
service.complete_quest("u1", "Slay the Slime Lord")
# Mock auto-records every call; `assert_called_once_with` checks the recording.
# This is identical in spirit to: assert ledger.calls == [("u1", 100)]
# — just generated automatically.
ledger.credit.assert_called_once_with("u1", 100)
# ===== TEST D — fill in the side_effect =====
# The SUT catches ConnectionError and returns "No quests today".
# Use side_effect to make the stub RAISE that exception instead of returning.
# YOUR JOB: replace `ValueError` (the wrong exception) with the right one.
# Read DailyQuestService.daily_quest_title in quest_service.py to confirm
# which exception class is caught.
def test_d_side_effect_simulates_api_failure():
clock = FrozenClock(datetime(2026, 4, 28, 12, 0))
api = Mock()
# TODO: replace ValueError with the exception class the SUT catches.
api.fetch_quests.side_effect = ValueError
service = DailyQuestService(clock, api)
assert service.daily_quest_title("u1") == "No quests today"
unittest.mock — Knowledge Check
Min. score: 80%1.
api = Mock()
api.fetch_quests.return_value = [{"weekday": "Tuesday", "title": "..."}]
api playing here?
Mock(return_value=X) is the framework’s way of writing what
you wrote by hand as class StubX: def method(self): return X.
Same role; less typing. The class is Mock; the role is stub.
(Verbatim teaching sentence in action.)
2. When should you reach for side_effect instead of return_value?
return_value: one canned answer for every call.
side_effect: dynamic — exception-raising, sequenced returns,
or computed-from-args. Pick based on what the test needs the
collaborator to do, not by what looks shorter.
3. A teammate writes:
ledger.credit.assrt_called_once_with("u1", 100) # typo
The typo trap. Mock’s auto-attribute behavior — convenient for
quickly stubbing nested attribute chains — also silently swallows
typos in assert_* method names. The test passes; the assertion
never ran. Step 5’s autospec=True is one defense; using mypy or
calling assert_called_once_with (no underscore typo) carefully
is another.
4. (Spaced review — TDD) During the Red-Green-Refactor cycle, when do you typically introduce a Mock?
Red is the test-design moment. Choosing stub/spy/mock/fake/no-double is a Red-phase decision because it shapes both the test’s structure and (often) the production design that emerges in Green. (Step 6 covers when not to double — also a Red-phase decision.)
5. Why is pytest’s monkeypatch fixture automatically restoring the original value an important property?
Test isolation. A test that patches a module attribute and
forgets to restore it leaves a time bomb for every subsequent
test. monkeypatch and with patch(...) both handle restoration
for you; manual setattr/delattr does not. Always prefer the
framework-managed forms.
Where to Patch — The #1 Python Pitfall, and Why autospec Defends You
🎯 Goal: Recognize and fix the wrong-namespace patch — the most common Python-mocking failure mode. See
autospec=Truedefend you against the typo trap and signature drift. 🧠 Skills you’ll gain: Pick the correctpatch()target string by tracing the SUT’s import. Recognize when a Mock has no spec and the risks that come with that.
🧭 Bridge from Step 4. Step 4 used Mocks at constructor parameters — DailyQuestService(clock, api, ledger) accepts the doubles directly. Sometimes that’s not possible: the SUT might call a module-level function directly, with no constructor parameter to swap. Then we use unittest.mock.patch() — and confront the canonical Python pitfall: where in the namespace does the patch belong?
📖 The new SUT — celebrate_milestone
Look at quest_service.py. There’s a new method celebrate_milestone(user_id, days) that calls send_push(...) from push_notifier. The import line in quest_service.py is:
from push_notifier import send_push
That single line is the source of every where-to-patch confusion in Python. After this import, send_push is bound in quest_service’s namespace. The quest_service module now has its own reference to the function — separate from push_notifier’s.
flowchart LR
subgraph push_mod["push_notifier module"]
P_DEF["send_push<br/>= <real function>"]:::neutral
end
subgraph quest_mod["quest_service module"]
Q_REF["send_push<br/>= <ref to real function>"]:::neutral
Q_USE["celebrate_milestone<br/>calls send_push(...)<br/>looks up 'send_push' HERE"]:::sut
Q_REF -.->|"looked up in<br/>this namespace"| Q_USE
end
P_DEF -->|"from push_notifier import send_push<br/>copies the reference"| Q_REF
classDef neutral fill:#fafafa,stroke:#bdbdbd,color:#424242
classDef sut fill:#fff3e0,stroke:#e65100,color:#bf360c
📜 The rule
Patch where the SUT looks up the name — not where it was originally defined.
celebrate_milestone does send_push(...). Python finds that name by looking it up in quest_service’s namespace (the importing module). So the patch target is "quest_service.send_push", not "push_notifier.send_push". Patching the latter does nothing — quest_service already has its own reference.
Part A — Predict and fix the patch target
⚙️ Task: open test_celebrate.py. The patch target is currently wrong. Run the test (it fails). Read the failure carefully — mock_send was never called, even though the SUT did run celebrate_milestone. That’s the signature of a wrong-namespace patch.
Then fix it: change the patch target string to the right one. Re-run.
💡 Pedagogical note. Your fix is one string change. The conceptual move is naming where the SUT looks the name up. That insight ports to JavaScript (CommonJS’ const { y } = require('x') has the same trap) and Java (static imports have a similar effect). Once you internalize the rule, you stop being trapped by the syntax.
Part B — autospec is a design guardrail, not a syntactic flourish
Read the second pair of tests in the file: test_loose_mock_accepts_wrong_call and test_autospec_rejects_wrong_call. Both run successfully — but they verify very different things.
| Loose Mock (no spec) | Autospec’d Mock | |
|---|---|---|
| Setup | with patch("X") as m: |
with patch("X", autospec=True) as m: |
What m(wrong_args) does |
Silently records the call | Raises TypeError because the real function’s signature is enforced |
What m.assrt_called_once_with(...) (typo) does |
Silently auto-creates an attribute, returns yet another Mock | Same in current Mock — autospec defends primarily against call-signature drift, not assertion-method typos. Use linters / mypy for the typo defense. |
| When you’d want it | Quick exploratory test where signature isn’t a concern | Default-safe habit for any patched callable — catches signature drift the moment a teammate’s refactor breaks the contract |
The pedagogical takeaway: autospec=True is a design guardrail. It says “make this Mock as strict as the real thing it’s replacing.” Without it, your test silently accepts calls that the real function would reject — until production catches it for you, which is the worst place to find out.
📖 Behavior verification — the third kind
Steps 2 and 3 used state verification: stubs feed inputs, the test asserts on the SUT’s return value or on the spy’s recorded list. The SUT’s internal call sequence was incidental.
test_celebrate_milestone_sends_push (after you fix the patch target) is different. The SUT returns None. Nothing in its observable state changes. The call itself is the entire contract. We assert that mock_send was called once with specific arguments. That’s behavior verification (Meszaros, p.468).
A Mock configured with call assertions is, in Meszaros’ strict sense, a Mock Object (p.544). The role isn’t “what class did you instantiate” — it’s “what does the test verify, and how?”
| Role | What the test verifies | Verification kind | |—|—|—| | Stub | The SUT’s return value (driven by canned indirect inputs) | State | | Spy | The recorded call list, after the fact | State (of the spy) | | Mock Object | The interaction itself, often with strict expectations | Behavior |
🌍 The same idea in another language
JavaScript with Jest (CommonJS): Same trap exists.
// questService.js
const { sendPush } = require('./pushNotifier');
function celebrateMilestone(...) { sendPush(...); }
jest.mock('./pushNotifier') works because Jest hoists this and intercepts at the require boundary. But if the consumer destructures and you only mock the original module, ES module imports can desync — same family of problem.
Java with Mockito static imports: Less prone to this since Java imports are class-level and Mockito patches at the type level. But PowerMock for static methods has its own where-to-patch dance.
The general lesson, language-independent: a name lives in the namespace of the module that introduces it. Patch there.
🧠 The typo trap and `autospec` — the precise truth
A common claim: “autospec catches typos like assrt_called_once_with.” Half-true. Here’s the precise picture.
autospec=True constrains the Mock to the spec of the patched object — its arguments, its attributes (if it’s a class), its method signatures. For attribute access, autospec does restrict the Mock to attributes the real object has — but assert_* methods are part of the Mock’s interface, not the real object’s. So mock.assrt_called_once_with may or may not be caught depending on Python version and exact patching shape.
The reliable defense against assrt_called_once_with typos: mypy or pylint, not autospec. Don’t rely on autospec for typo prevention.
The reliable defense against signature drift (calling send_push("u1") when the real function needs send_push("u1", "msg")): autospec catches this immediately. That’s the use case worth the keystrokes.
🔭 Coming in Step 6: You can build any of the three roles and you know the patching pitfalls. The harder skill is choosing which one — and choosing none at all when over-mocking would brittlify the test.
"""The real push-notification service — would call APNS / FCM in production."""
def send_push(user_id: str, message: str) -> None:
# In production: dispatches a real push notification.
# The print is a teaching aid — if you see this in test output,
# the patch DIDN'T intercept and the real function ran.
print(f"📲 REAL send_push fired: user={user_id!r}, message={message!r}")
"""QuestForge — daily quest service with milestone celebration."""
import datetime
from push_notifier import send_push
QUEST_REWARDS = {
"Slay the Slime Lord": 100,
"Find the Lost Amulet": 150,
"Battle the Lich King": 250,
"Defeat the Dragon": 500,
}
def is_today_event_day(event_date_str: str, clock=datetime.datetime) -> bool:
today = clock.now().strftime("%Y-%m-%d")
return today == event_date_str
class DailyQuestService:
def __init__(self, clock, api, ledger=None):
self._clock = clock
self._api = api
self._ledger = ledger
def daily_quest_title(self, user_id: str) -> str:
try:
quests = self._api.fetch_quests(user_id)
except ConnectionError:
return "No quests today"
if not quests:
return "No quests today"
weekday = self._clock.now().strftime("%A")
for quest in quests:
if quest["weekday"] == weekday:
return quest["title"]
return "No quests today"
def complete_quest(self, user_id: str, quest_title: str) -> None:
gold = QUEST_REWARDS.get(quest_title, 0)
self._ledger.credit(user_id, gold)
def award_streak_bonus(self, user_id: str, days: int) -> None:
gold = min(days * 10, 100)
self._ledger.credit(user_id, gold)
def celebrate_milestone(self, user_id: str, days: int) -> None:
"""When a streak hits a multiple of 7, send a push notification."""
if days % 7 == 0:
send_push(user_id, f"🎉 {days}-day streak!")
"""Step 5 — Where-to-patch and autospec.
Three tests below. Tests B and C are correct as-is and demonstrate
autospec's value. Test A's PATCH TARGET IS WRONG — fix it.
"""
from unittest.mock import Mock, patch
from datetime import datetime
from clock import FrozenClock
from quest_service import DailyQuestService
def _service():
return DailyQuestService(FrozenClock(datetime(2026, 4, 28, 12, 0)), Mock(), Mock())
# ===== TEST A — Part A: patch target is WRONG. Fix it. =====
# Run this test as-is. It FAILS — `mock_send.assert_called_once_with(...)`
# complains the mock was never called. That's the symptom of a
# wrong-namespace patch: the real send_push ran, the mock did nothing.
# YOUR JOB: change the patch target string from "push_notifier.send_push"
# to the correct one. Read `quest_service.py`'s import line — the SUT
# looks the name up in *which* namespace?
def test_celebrate_milestone_sends_push():
service = _service()
# ← FIX THE STRING BELOW. It's wrong.
with patch("push_notifier.send_push") as mock_send:
service.celebrate_milestone("u1", 7)
mock_send.assert_called_once_with("u1", "🎉 7-day streak!")
# ===== TEST B — Part C: a LOOSE Mock accepts a wrong-signature call =====
# The real send_push takes 2 arguments (user_id, message).
# Without autospec, the Mock will silently accept a 1-argument call.
# Watch what gets through.
def test_loose_mock_accepts_wrong_call():
with patch("quest_service.send_push") as mock_send:
# Imagine a teammate's refactor that drops the message arg
# (real production bug). The Mock has no spec — it accepts.
mock_send("u1") # Real send_push REQUIRES 2 args; Mock doesn't care.
# The recorded call passes assertion. The bug slipped through.
mock_send.assert_called_once_with("u1")
# ===== TEST C — Part C: autospec REJECTS the wrong-signature call =====
# With autospec=True, the Mock matches the real function's signature.
# Calling it with the wrong number of arguments raises TypeError.
def test_autospec_rejects_wrong_call():
with patch("quest_service.send_push", autospec=True) as mock_send:
try:
mock_send("u1") # Same bad call as Test B — autospec catches it
assert False, "autospec should have raised TypeError"
except TypeError as e:
# autospec correctly rejected the call. The signature was enforced.
print(f"✅ autospec caught it: {e}")
Where to Patch + autospec — Knowledge Check
Min. score: 80%
1. quest_service.py does:
from push_notifier import send_push
celebrate_milestone calls send_push(...). Which patch target intercepts the call?
The rule: patch where the SUT looks up the name, not where it
was defined. After from X import Y, the name Y is bound in the
importing module — that’s where the SUT will resolve it. The same
principle applies to JavaScript CommonJS, Java static imports, and
any language with import scoping.
2. What does autospec=True primarily defend against?
autospec=True is the default-safe habit for patched callables:
it makes the mock as strict as the real thing it’s replacing.
Signature drift (the most common refactoring bug) gets caught
immediately. Use it unless you have a reason not to.
3. (Spaced review) Match each Meszaros pattern to its book page:
Meszaros (2007) is the canonical reference. The page numbers come up regularly in code reviews, design discussions, and Stack Overflow answers — knowing them lets you point to the right chapter when the team is debating which double to use.
4. (Spaced review — Step 4) A Mock is patched in for the SUT’s collaborator. The test asserts mock.method.assert_called_once_with("u1", 100). What role is this Mock playing?
unittest.mock blurs the Spy/Mock-Object line that Meszaros drew
crisply. Both are forms of behavior verification; they differ
mainly in whether the expectation is set up-front (mockist style)
or read after-the-fact (spy style). For your day-to-day work:
don’t worry too much about which side of the line you’re on —
worry about whether the test actually verifies the contract.
5. (Spaced review — Step 4 typo trap) What’s the most reliable defense against typos like mock.assrt_called_once_with(...) silently passing?
Static tooling > runtime defense for spelling. mypy / pyright
understand unittest.mock’s type stubs and catch typos like
assrt_called_once_with at edit time, before the test ever runs.
When NOT to Use a Double — The Decision Guide
🎯 Goal: Build the judgment to not reach for a double when a real collaborator would do. Recognize over-mocking as brittleness. Apply a decision guide to five real scenarios. 🧠 Skills you’ll gain: Diagnose an over-mocked test by reading it. Classify scenarios by which double (or none) fits. Recognize the “mock what you own” heuristic and the Adapter wrap-and-mock pattern.
🧭 The whole arc, in one sentence. A test double is a tool you reach for when a real collaborator would make the test flaky, slow, or unable to verify the right thing. It is not a default. It is not a sign of professionalism. It is not a coverage strategy. The right number of doubles for many tests is zero.
📖 The decision flow
flowchart TD
A["What does this test need to verify?"]:::neutral --> B{"Does the SUT have collaborators<br/>worth doubling?<br/>(slow/flaky/unavailable)"}
B -->|"No — pure function"| NO["No double<br/>Just call it"]:::good
B -->|"Yes"| C{"Do you control the test's input<br/>via a collaborator?"}
C -->|"Yes — control input"| STUB["Stub<br/>(canned answers)"]:::good
C -->|"No — verify a call happened"| D{"Inspect after the fact<br/>or set up-front?"}
D -->|"After"| SPY["Spy<br/>(record + assert)"]:::good
D -->|"Up-front strict"| MOCK["Mock Object<br/>(behavior verification)"]:::good
B -->|"Yes — but stateful + multi-call"| FAKE["Fake<br/>(in-memory implementation)"]:::good
B -->|"Third-party library<br/>you don't own"| ADAPT["Wrap in an Adapter<br/>then double the adapter"]:::warn
classDef good fill:#e8f5e9,stroke:#2e7d32,color:#1b5e20
classDef warn fill:#fff3e0,stroke:#e65100,color:#bf360c
classDef neutral fill:#fafafa,stroke:#bdbdbd,color:#424242
📖 Three antipatterns to recognize on sight
| Antipattern | Symptom | Why it happens | Fix |
|---|---|---|---|
| Over-mocking | Every internal helper is mocked; the test asserts only on the mocks. | “Isolation feels safe; more mocks = more tested.” | Mock at the architectural boundary (HTTP, DB, clock), not at every internal function. |
| Mocking what you don’t own | A third-party library’s API is mocked directly, scattered across many tests. | The library is brittle and the team doesn’t want to wait for real responses. | Wrap the third-party in an Adapter (Adapter pattern); mock the Adapter. The third-party’s internals stay invisible to your tests. |
| Coverage chasing | Every line of the SUT runs in some test, but assertions are weak (is not None) or mocked-on-mocks. |
Coverage is misread as a quality signal. | Stronger oracles, real collaborators where possible, fewer tests that test more meaningfully. Coverage ≠ correctness (Testing Foundations Step 3). |
Part 1 — Read the over-mocked vs clean tests
Open xp_calculator.py. The function compute_total_xp(quests) is pure: it takes a list, computes a number, returns it. No clock, no HTTP, no database. No collaborators worth doubling. Yet test_xp_overmocked.py mocks every internal helper.
⚙️ Task 1: read both test_xp_overmocked.py and test_xp_clean.py. In test_xp_clean.py, uncomment the docstring at the top and fill in your one-line answer to: “What did the over-mocked version mock unnecessarily — and what did that cost?”
📖 What the over-mocked test actually verifies (look only after writing your answer)
Look at test_xp_overmocked.py. The mocks intercept _filter_completed, _apply_multipliers, and _sum_xp. With those internals replaced by Mocks returning canned values, the test only verifies that compute_total_xp calls the helpers in some order and returns the last one’s result. That’s not the spec. The spec is “given these quest dicts, return the total XP.”
Worse: if a teammate refactors the internals (rename _apply_multipliers to _apply_modifiers; merge two helpers into one; inline a helper away entirely), every one of those changes preserves the function’s behavior — but breaks the over-mocked test. Brittleness without protection. The clean test never breaks under those refactors because it asserts on the spec, not on the implementation choreography.
Same lesson as Testing Foundations Step 4 (“test behavior, not implementation”), now applied to mocks instead of internal state access. The principle is one principle.
Part 2 — Classify five scenarios
Open scenarios.py. For each of the five scenarios, set the variable to the best single recommendation from this list:
"no_double" "stub" "spy" "mock" "fake" "adapter"
The validator accepts any defensible answer for each scenario (some scenarios have more than one defensible answer — e.g., spy and mock are often interchangeable for a single outbound call). It rejects clearly wrong choices.
🧰 Quick decision rubric (use, don't memorize)
| If the SUT… | Reach for… |
|—|—|
| …is a pure function — same input always yields same output, no collaborators | No double |
| …calls a clock, a remote service, or any non-deterministic source | Stub |
| …needs to verify a fire-and-forget outbound call (e.g., notifier.send(...)) | Spy or Mock |
| …needs to round-trip with a stateful collaborator (write then read) | Fake |
| …calls a third-party library you don’t own | Adapter wrapper → double the adapter |
| …is just simple math/string/list manipulation | No double (don’t make work) |
🌍 The same decision in another language
The decision is purely about test design, not about syntax. JavaScript, Java, C#, Ruby, Go — every language with serious testing culture has the same five-or-so doubles, the same antipatterns, and the same heuristic: only mock what you own; only mock what’s actually a collaborator; pure functions don’t need doubles.
The frameworks differ; the design judgment doesn’t.
Part 3 — Forward pointers
You now have the conceptual vocabulary to read any test in any modern Python codebase and recognize what role each double is playing — even when the author called everything a “mock.” That recognition transfers across languages.
🔭 Where this leads in the rest of the curriculum:
solid.yml— Dependency Inversion makes doubles trivial: define an interface, have the SUT depend on it, swap implementations at test time. Most painful mocks are caused by skipped DIP.observer-pattern.yml— the Observer pattern is essentially a spy made into a permanent design feature.- TDD with doubles — the next natural sequel: TDD where the SUT has collaborators from the start. Red phase becomes “decide what to double, then write the failing test.”
🪞 Recalibrate. Look back at Step 1 — the test that passed today and would have failed tomorrow. Your toolkit now has six things to do instead of “ship and pray”:
- Recognize a flaky/slow/opaque collaborator (Step 1).
- Inject the collaborator as a parameter (Step 1).
- Substitute a stub when you need to control input (Step 2, Meszaros p.529).
- Substitute a spy when you need to verify a call (Step 3, p.538).
- Reach for
unittest.mockwhen boilerplate gets tedious (Step 4) — but recognize the role you’re playing. - Use
patch()carefully — where the SUT looks the name up — and preferautospec=True(Step 5).
And the seventh, just learned: sometimes the right answer is no double at all. That judgment is what makes you good at this.
"""A PURE function for computing XP earned across quests.
No collaborators. No clock. No HTTP. No database.
Helper functions are private (underscore prefix) — implementation detail.
"""
def _filter_completed(quests: list[dict]) -> list[dict]:
return [q for q in quests if q.get("completed")]
def _apply_multipliers(quests: list[dict]) -> list[tuple[str, int]]:
return [(q["title"], q["xp"] * q.get("multiplier", 1)) for q in quests]
def _sum_xp(items: list[tuple[str, int]]) -> int:
return sum(xp for _title, xp in items)
def compute_total_xp(quests: list[dict]) -> int:
"""Return the total XP earned from completed quests, with multipliers applied.
Each quest is a dict with keys: title (str), xp (int), completed (bool),
and an optional multiplier (int, default 1).
"""
completed = _filter_completed(quests)
with_multipliers = _apply_multipliers(completed)
return _sum_xp(with_multipliers)
"""SMELL — every internal helper is mocked. Read this and recoil.
Notice what's actually verified: nothing about the SUT's behavior.
The mocks made up the answer; the SUT just orchestrated them.
"""
from unittest.mock import patch
from xp_calculator import compute_total_xp
def test_total_xp_overmocked_brittle():
with patch("xp_calculator._filter_completed") as mock_filter, \
patch("xp_calculator._apply_multipliers") as mock_apply, \
patch("xp_calculator._sum_xp") as mock_sum:
mock_filter.return_value = "<canned>"
mock_apply.return_value = "<canned>"
mock_sum.return_value = 200
result = compute_total_xp([{"completed": True, "xp": 50}])
assert result == 200
# The "test" passes whether or not the SUT correctly filters,
# multiplies, or sums — because we mocked all three.
# If a teammate renames _apply_multipliers, this test breaks
# for the WRONG reason (refactor, not behavior change).
"""Clean: no doubles. compute_total_xp is a pure function — exercise it directly."""
# TODO: in your own words, in ONE LINE, answer the question below.
# The validator just checks that this docstring is no longer empty.
"""The over-mocked version mocked: ___ FILL IN ___
What that cost: ___ FILL IN ___"""
from xp_calculator import compute_total_xp
def test_total_xp_for_two_completed_quests():
quests = [
{"title": "Slay", "xp": 50, "completed": True, "multiplier": 2},
{"title": "Find", "xp": 30, "completed": False, "multiplier": 1},
{"title": "Defeat", "xp": 100, "completed": True, "multiplier": 1},
]
# 50*2 + (Find skipped: not completed) + 100*1 = 200
assert compute_total_xp(quests) == 200
def test_total_xp_for_no_completed_quests():
quests = [{"title": "Skip", "xp": 999, "completed": False}]
assert compute_total_xp(quests) == 0
"""Classify each scenario by the BEST single recommendation.
Allowed values:
"no_double" — the SUT is pure (or close enough); call it directly
"stub" — control indirect input with canned values
"spy" — verify a fire-and-forget call after the fact
"mock" — strict behavior verification of a single contract call
"fake" — stateful in-memory implementation across multiple calls
"adapter" — wrap a third-party library, then double the adapter
"""
# Scenario 1: A pure function `compute_tax(price: float, rate: float) -> float`
# that returns price * rate. No collaborators.
SCENARIO_1_BEST = "FILL_IN"
# Scenario 2: A function `is_coupon_expired(coupon)` that calls datetime.now()
# internally to compare against `coupon.expires_at`. We want a deterministic test.
SCENARIO_2_BEST = "FILL_IN"
# Scenario 3: `process_order(order)` POSTs to a payment gateway. The test must
# verify the gateway was called exactly once with the right amount.
SCENARIO_3_BEST = "FILL_IN"
# Scenario 4: A `UserRepository` reads/writes user records to Postgres.
# The SUT under test does many round-trips: register a user, then look them up,
# then update their email, then look them up again. Tests run on CI without a DB.
SCENARIO_4_BEST = "FILL_IN"
# Scenario 5: Throughout the codebase, many modules call `requests.get(...)`
# directly. Patching `requests` everywhere is fragile; the tests are slow.
SCENARIO_5_BEST = "FILL_IN"
Decision Guide — Synthesis Quiz
Min. score: 80%1. A test mocks every internal helper of the SUT and asserts only on the mocks’ return values. Which antipattern is this?
Mock at the architectural boundary; let internal helpers be real. The line “this collaborator is worth doubling” runs through the boundary between your code and the unpredictable world (clock, HTTP, DB, queue) — not through every function-call edge inside your own module.
2. (Cumulative review) Match each scenario to the best single double:
- A: A pure function that adds two integers
- B: A function that calls
datetime.now()to decide an expiration - C: A function that POSTs to a payment gateway, fire-and-forget
- D: A function that round-trips with a Postgres user table 5 times
The rubric: pure → no double; non-deterministic → stub; outbound call → spy/mock; stateful sequence → fake. Memorize the rubric shape (the diagram in the instructions); the words follow.
3. “Don’t mock what you don’t own.” What does this rule actually mean?
"Mock what you own" is shorthand for "depend on interfaces you control, then mock those interfaces." The Adapter pattern from classical OO (and the Adapter pattern in design-patterns literature) is exactly the maneuver this rule recommends.
4. (Spaced review — TDD) During Red-Green-Refactor, when do you typically decide which double to use?
Choosing a double is part of test design; test design happens in Red. Same lesson as Testing Foundations Step 5: input choice and oracle strength are independent test-design dimensions, both decided when you write the test. Add "choice of double" as a third independent dimension.
5. (Spaced review — Step 5) Why is autospec=True worth almost always reaching for when you patch a callable?
Default-safe habit: use autospec=True whenever you’re patching
a callable. It costs nothing at edit time, catches a real-world
bug class at test time, and makes refactoring safer in the long
run.