Debugging Python: From Symptom to Fix
Learn debugging as a distinct, learnable skill — not as accidental tinkering. You'll work three real Python bugs (recursive boundary, data-representation, temporal ordering) using a hypothesis-driven process and the time-travel debugger's breakpoints, conditional breakpoints, watch expressions, and history scrubber. Ends with an interleaved triage drill and an independent transfer challenge.
The Debugging Process
🎯 Goal: Apply the 7-stage debugging cycle to a tiny off-by-one bug.
flowchart TD
A[1. Symptom — what's wrong?] --> B[2. Predict — what should the state be?]
B --> C[3. Evidence — collect data with the right tool]
C --> D[4. Hypothesis — one sentence cause]
D --> E[5. Localize — first wrong line]
E --> F[6. Fix — minimal change]
F --> G[7. Verify — rerun ALL tests]
No edit happens until stage 6. That’s the central discipline.
Why this matters & what you'll learn
Debugging is a systematic, learnable process — not a vibe. Most engineers default to tinkering (edit, run, hope, repeat) and the bug eventually goes away without them learning what was wrong. The 7-stage cycle above replaces tinkering with a discipline you can repeat on any bug. Walking through it once on a tiny off-by-one anchors the cycle before you face anything harder.
You will learn to:
- Apply the 7-stage hypothesis-driven cycle to a small failing test.
- Distinguish fault, error, and failure — and trace one to the next.
- Evaluate why the local-verification trap (only rerunning the failing test) hides regressions.
📖 Recap from lecture: the four phases of debugging
Lecture 10 framed debugging as a systematic process with four phases:
- Investigating symptoms to reproduce the bug
- Locating the faulty code
- Determining the root cause of the bug
- Implementing and verifying a fix
Inside that frame, each phase has its own moves. The 7-stage cycle is the zoomed-in version of those four phases — same process, more resolution. The four phases tell you what to do; the seven stages tell you how.
| Lecture phase | This tutorial’s stages |
|---|---|
| 1. Investigate symptoms / Reproduce | Symptom + Predict + Evidence |
| 2. Determine root cause | Hypothesis |
| 3. Locate the faulty code | Localize |
| 4. Implement & verify fix | Fix + Verify |
🐞 Lecture vocabulary: fault vs error vs failure
The lecture distinguished three terms that get sloppily blurred in everyday speech:
| Term | Definition | Where it lives |
|---|---|---|
| Fault | The erroneous location in the code (e.g., range(1, ...) skipping index 0). |
In source code. |
| Error | An incorrect program state during execution (e.g., the loop variable i starts at the wrong value). |
In memory at runtime. |
| Failure | The observed outside behavior (e.g., greet([\"Ada\", \"Linus\", \"Grace\"]) returns \"Hello, Linus, Grace!\" instead of including Ada). |
What the user / test sees. |
Flow: Fault → (program execution) → Error → (error reaches the system boundary) → Failure.
A useful question the lecture leaves you with: “How can we prevent this error from becoming a failure?” — assertions and defensive checks are exactly that prevention. The bug you’re about to fix demonstrates this chain end-to-end.
📋 Reproducing the bug — what the lecture said about Step 1
The lecture spent extra time on the first phase (“Reproduce the bug”) because everything downstream depends on it. Two pieces to reproduce:
- Problem environment — the setting in which the bug occurs: hardware, OS, settings, runtime dependencies, software versions. Try to re-create it on a different machine.
- Problem history — the steps needed to recreate the failure: the sequence of data inputs, user interactions, communications with other components. Plus timing, randomness, physical influences.
And whenever possible, write an automated bug reproduction test — a test that fails on the bug and passes after the fix. Run it repeatedly during debugging so “did I fix it yet?” is one click, not five minutes of manual reproduction. After the fix, keep the test in the suite for regression testing — re-running existing tests after later code changes to make sure the bug doesn’t sneak back in.
In this tutorial the bug reproduction is already automated for you (the failing pytest test is the reproduction). Notice that we never click “I think I fixed it” without re-running the test — that’s the lecture’s discipline in action.
Reference: Andreas Zeller, Why Programs Fail – A Guide to Systematic Debugging (2009).
📂 What you have
Two files: greet.py (production code, has a bug) and test_greet.py (three pytest tests, one of which fails). Don’t run anything yet.
🔍 1. Symptom — predict, then run
Open greet.py. Read it. Predict what each of these returns:
greet(["Ada", "Linus", "Grace"])greet([])greet(["Solo"])
Now click Run. Read the failing assertion — the mismatch is the symptom. State it in your own words.
🧠 2. Predict the state
Before opening the debugger, predict: at the moment the loop body first executes, what should i be? What is names[i] supposed to be? Hold the answer.
🔬 3. Evidence — your first breakpoint
A breakpoint is already set on line 4 (the for line). Click Debug (next to Run). Execution pauses before the marked line runs. The Variables tab shows names. The Watch tab is empty — add i to it (you’ll see <not yet defined> since the loop hasn’t started).
Now click Step Over (F10) once. The loop has started one iteration. Look at i in Watch. Look at names[i]. Compare with your prediction.
🔎 4. Hypothesis (one sentence)
Don’t fix yet. Write your hypothesis as a single sentence — what is wrong and where it lives.
Compare with a sample sentence
*"The loop starts at index 1, so `names[0]` is never appended to `parts`."* Did yours name *which iteration* is wrong and *what consequence* follows? That's the schema.📍 5. Localize
Three candidates: the test, the return, the range(...). Pick the first divergence — the earliest line whose behavior contradicts your hypothesis. Justify in one sentence why the other two are not it.
🩹 6. Minimal fix
Now you may edit. Smallest possible change. Don’t refactor the whole function. Don’t add a special case for empty lists. Just fix the iteration range.
✅ 7. Verify
Click Run. All three tests must pass — the one that was failing AND the two that already passed. Verification means no regressions. Confusing those is the local-verification trap.
def greet(names: list[str]) -> str:
parts: list[str] = ["Hello"]
for i in range(1, len(names)):
parts.append(names[i])
return ", ".join(parts) + "!"
from greet import greet
def test_three_names_all_appear() -> None:
assert greet(["Ada", "Linus", "Grace"]) == "Hello, Ada, Linus, Grace!"
def test_empty_list_just_says_hello() -> None:
assert greet([]) == "Hello!"
def test_single_name_appears() -> None:
assert greet(["Solo"]) == "Hello, Solo!"
Step 1 — Knowledge Check
Min. score: 80%
1. A teammate says: “I added print(repr(x)) and saw the value had a leading space.”
Which stage of the debugging cycle is this?
Adding instrumentation and observing values is evidence collection (stage 3). The hypothesis comes after you have evidence — and the fix and verification come later still. Naming the stage you’re in helps you avoid skipping straight to fixing.
2. A student fixes their failing test, runs pytest test_failing.py (just that one file) and sees green. They mark the bug fixed and move on. What stage did they skip?
Verification means rerunning the entire test suite — including tests that previously passed. A fix in one place can introduce a regression somewhere else, and that’s exactly the kind of regression a quick “did the failing test go green?” check will miss.
3. A debugger user types len(parts) into the Watch panel during a paused session and sees 2, when they expected 3. Which stage of the cycle is this?
Reading a watched value during a pause is evidence collection. Predict happens upstream (before the run); Localize and Verify happen downstream (after a hypothesis or fix). Naming the stage you’re in is what keeps the cycle from collapsing into tinkering.
4. total(items) returns $5 too high for one user. You discover the discount-loading function reads the wrong database column, so that user’s discount is never applied.
Which is the symptom and which is the cause?
The symptom is what you observe (the wrong total). The cause is the reason it happens (the discount-loading function reading the wrong column). Symptom-patching — e.g., inserting a special if user_id == BAD_USER: total -= 5 check — would make one test green without fixing the underlying bug, and would fail on any other user affected by the same column read.
Debugger Tour
🎯 Goal: Build minimum tool fluency. Each section below pairs a debugging question with the smallest tool move that answers it. There’s no bug to fix —
tour.pyruns correctly.
Click Debug (not Run) to start each section.
Why this matters & what you'll learn
Tools subordinate to questions, not the other way around. If you learn debugger features as a feature menu, you’ll forget them; if you learn each one as the answer to a specific debugging question, they stick. This step pairs six common questions with the smallest tool move that answers each — on correct code — so when a real bug forces the question, the move is already in your fingers.
You will learn to:
- Apply six debugger moves (breakpoint, hover, watch, conditional breakpoint, call stack, history scrubber) to answer specific questions.
- Analyze which question each tool actually answers — and which it doesn’t.
1. “Where is execution right now?” → Breakpoint
Click the gutter next to line 8 in tour.py (the line total += score). A breakpoint marker appears — that’s the breakpoint you’ll edit later.
Click Debug. Execution pauses before line 8 runs; the debugger reports the current paused line, and sighted users also see an arrow marker in the gutter. The current line is highlighted.
2. “What does this variable hold right now?” → Variables tab + hover
Look at the Variables tab. You’ll see locals like score and total. Each value has a type badge (int, list, dict).
Now hover over score in the editor. A tooltip shows the value. The same trick works on any identifier in the source — no need to dig through the panel.
3. “What value will an expression have at this point?” → Watch
Open the Watch tab. Click ➕ and add total + score. The expression evaluates as if it ran right now. Click Step Over (F10). The value updates.
Watches are how you ask “what would len(items) * factor be at this exact moment?” without editing the program to add a print.
4. “Which iteration first violates an invariant?” → Conditional breakpoint
Right-click the breakpoint marker you placed on line 8 → Edit Breakpoint → enter score < 0 as the condition. Click Continue (F5).
Execution flies through every iteration where score >= 0 and pauses only at the iteration where score < 0 (line 8). That’s the iteration where the invariant first fails.
Without conditional breakpoints, you’d step 9 times through normal iterations to reach the one you care about. With one, the debugger does the filtering.
5. “How did we get here?” → Call Stack
Open the Call Stack tab. You’ll see process_scores → main. Click each frame to inspect that scope’s locals. The stack tells the story of how this line got executed.
For recursive code, the stack is a vertical history of decisions. You’ll use it heavily in Case 1.
6. “What was this variable BEFORE this line ran?” → History scrubber
Drag the History scrubber backward by 5-10 ticks. Watch total rewind in the Variables tab. Drag forward — it advances. The debugger switches from live execution to a rewound history state; sighted users also see the gutter marker change appearance.
This is the time-travel feature. You can move to any moment in the program’s history without restarting. You’ll drill it deliberately in the Backward Tour before Case 3.
🪞 Reflect
Close the editor. From memory, list the six moves. For each, name the debugging question it answers. If you can’t, that move isn’t yet yours — flag it for revisit.
Carry this forward: for any new debugger feature you encounter, name the question it answers. If you can’t, you don’t need it yet.
# Tour program — no bug. Exercise the debugger UI here.
def compute_score(raw: list[int]) -> float:
return sum(raw) / len(raw)
def process_scores(scores: list[float]) -> float:
total: float = 0
for score in scores:
total += score
return total / len(scores)
def main() -> float:
raw: list[tuple[str, list[int]]] = [
("Ada", [95, 88, 92]),
("Linus", [72, 81, 78]),
("Grace", [98, 95, 91]),
("Alan", [-3, 55, 70]), # negative — used by §4
("Margaret", [85, 89, 87]),
]
scores: list[float] = []
for name, raw_scores in raw:
score = compute_score(raw_scores)
scores.append(score)
average = process_scores(scores)
print(f"average score: {average:.2f}")
return average
main()
Step 2 — Knowledge Check
Min. score: 80%1. “I want to know which iteration of a 10,000-item loop is the first one to break the invariant.” Which tool answers it?
Conditional breakpoints filter. The condition runs at every loop pass; the debugger pauses only when it’s true.
2. “I want to inspect what total was 5 lines ago.” Which tool answers it?
Time-travel. The scrubber lets you slide back through any moment in the run without re-executing. (You’ll drill backward localization specifically in the Backward Tour before Case 3.)
3. The tour file’s line-14 def enroll(student, students=[]) lights up the ↔ aliasing badge across calls. Why?
Default argument values are evaluated exactly once, at function-definition time. The students=[] creates one list, bound to the function as its default. Every subsequent call that doesn’t override the parameter reuses that same list. Standard fix: def enroll(student, students=None): students = students if students is not None else []. The ↔ badge is the time-travel debugger’s way of pointing at exactly this aliasing — saving you 30 minutes of head-scratching.
Case 1 — Maze Pathfinder (Boundary Bug)
🎯 Goal: A maze has a valid 10-step path from
StoG, but the pathfinder returnsNonewhen called withmax_steps=10. Find why.
📋 Open
debugging_log.mdand fill each field as you work. The first time, the log carries you stage by stage. Cases 2 and 3 fade this scaffolding — by Case 3 you’ll name three of the stages yourself. Committing each stage to writing is the difference between thinking the cycle and doing the cycle.
Why this matters & what you'll learn
Boundary bugs — off-by-one in range, slice indices, comparison operators, loop sentinels — are the most common shape of algorithmic bug, and they hide in plain sight because nine of ten test cases pass. This case forces the discipline you just learned (the 7-stage cycle) onto a recursive boundary bug, so the cycle has to handle a real call stack before you internalize it.
You will learn to:
- Apply the full 7-stage cycle to a recursive boundary bug, writing each stage in the debugging log.
- Analyze recursive execution by walking the Call Stack tab to read frame-by-frame state.
- Evaluate which of two adjacent
ifchecks is the first divergence between intended and actual behavior.
📂 What you have
A small delivery robot has a battery measured in grid steps. find_path(maze, max_steps) should return a path if one exists using at most max_steps moves, otherwise None.
Three pytest tests in test_pathfinder.py:
test_tiny_maze_found_with_extra_budget— passes.test_path_rejected_when_battery_too_small— passes (max_steps=9, no 9-step path).test_path_found_when_battery_limit_is_exact— fails (max_steps=10, but a 10-step path exists).
1. Symptom — run and read
Click Run. Read the failing assertion. State the symptom in one sentence: expected what / got what.
2. Predict before debugging
Open pathfinder.py. Read _dfs carefully — especially the two checks at the top of the function:
if steps_used >= max_steps:
return None
if current == goal:
return path.copy()
Predict: at the moment a recursive call has just stepped onto the goal cell using exactly the budget, what are steps_used and max_steps? Which of the two checks above runs first? What does it return?
3. Set evidence — breakpoint and watches
Set a breakpoint at the top of _dfs (the steps_used = len(path) - 1 line). In the Watch tab, add at least the values your prediction depends on. Add more if you want orientation (e.g., current, goal, current == goal).
4. Drive
Click Debug. Continue (F5) advances to each next pause — repeat until current == goal is True in the Watch tab. Don’t fix yet.
As recursion deepens, the Call Stack tab grows. Click any frame to see that level’s locals — this is how you read recursion in a debugger.
5. Compare prediction to observation
When current == goal is True in the Watch tab, look at steps_used and max_steps.
- What did you predict
steps_usedwould be at the moment the goal cell is reached? - What does the debugger show?
- If they differ, complete this sentence before continuing: “My model assumed ___, but the code computes
steps_usedaslen(path) - 1, which means ___.”
⚠️ Click only AFTER you've written your prediction — what the comparison typically reveals
Most students predict `steps_used = 9` (the nine moves *leading to* the goal). The actual value is `10` — because the goal cell has already been appended to `path` before this recursive call starts, so `len(path) - 1` counts the goal cell itself as a step. If your prediction was wrong, that gap is the heart of the bug.Which conditional fires first when _dfs runs on this call — the cutoff or the goal check?
That is the first divergence between intended behavior (“we reached the goal, return the path”) and actual behavior (“we hit the budget, return None”).
6. Hypothesis
Write your one-sentence hypothesis. Format: *“
⚠️ Click only AFTER you've written your hypothesis — compare with a sample sentence
*"The cutoff check rejects exact-budget arrivals before the goal check can accept them."* Did yours name the *check* and the *timing*? If so, you have the schema for a debugging hypothesis: a specific code element doing the wrong thing at a specific moment.7. Minimal fix
Edit _dfs so the goal check runs before the cutoff check.
🪞 Reflect — before you verify
Bug family: Off-by-one boundaries hide in range, slice indices, comparison operators, loop sentinels, array bounds. Name one place in your own code where this exact shape could appear.
Cycle stage: Which stage was hardest on this case — Predict, Evidence, or Hypothesis? Name it.
If it was Predict: recursive code is hard to predict because you’d need to mentally simulate the whole call stack. The debugger’s Call Stack tab is built for exactly that gap.
If it was Hypothesis: the schema that helped was “which check does what when.” That schema transfers to every boundary bug you’ll meet.
8. Verify
Click Run. All three tests must pass — including test_path_rejected_when_battery_too_small. If that one breaks, your fix is too aggressive.
# Mazes used by the pathfinder case.
# Shortest valid path from S to G is exactly 10 steps.
BATTERY_LIMIT_MAZE: list[str] = [
"#########",
"#S..#..G#",
"#.#.#.#.#",
"#.#...#.#",
"#.#####.#",
"#.......#",
"#########",
]
# Sanity maze whose shortest path is 2 steps.
TINY_MAZE: list[str] = [
"#####",
"#S.G#",
"#####",
]
"""Depth-first maze pathfinder."""
from collections.abc import Iterator
Position = tuple[int, int]
Maze = list[str]
def find_marker(maze: Maze, marker: str) -> Position:
for row_index, row in enumerate(maze):
col_index = row.find(marker)
if col_index != -1:
return row_index, col_index
raise ValueError(f"marker {marker!r} not found")
def is_open(maze: Maze, position: Position) -> bool:
row, col = position
return maze[row][col] != "#"
def neighbors(maze: Maze, position: Position) -> Iterator[Position]:
"""Yield neighbors in a deterministic order so traces are repeatable."""
row, col = position
for next_position in [
(row, col + 1), # east
(row + 1, col), # south
(row, col - 1), # west
(row - 1, col), # north
]:
if is_open(maze, next_position):
yield next_position
def find_path(maze: Maze, max_steps: int) -> list[Position] | None:
"""Return a path from S to G using at most max_steps moves.
A path includes both the start and goal positions, so:
steps_used == len(path) - 1
"""
start = find_marker(maze, "S")
goal = find_marker(maze, "G")
return _dfs(
maze=maze,
current=start,
goal=goal,
max_steps=max_steps,
path=[start],
seen={start},
)
def _dfs(
maze: Maze,
current: Position,
goal: Position,
max_steps: int,
path: list[Position],
seen: set[Position],
) -> list[Position] | None:
steps_used = len(path) - 1
# Stop searching when the path has used the available battery budget.
if steps_used >= max_steps:
return None
if current == goal:
return path.copy()
for next_position in neighbors(maze, current):
if next_position in seen:
continue
seen.add(next_position)
path.append(next_position)
result = _dfs(maze, next_position, goal, max_steps, path, seen)
if result is not None:
return result
path.pop()
seen.remove(next_position)
return None
from maze_data import BATTERY_LIMIT_MAZE, TINY_MAZE
from pathfinder import find_path
def test_tiny_maze_found_with_extra_budget() -> None:
path = find_path(TINY_MAZE, max_steps=3)
assert path is not None
assert len(path) - 1 == 2
def test_path_rejected_when_battery_too_small() -> None:
path = find_path(BATTERY_LIMIT_MAZE, max_steps=9)
assert path is None
def test_path_found_when_battery_limit_is_exact() -> None:
path = find_path(BATTERY_LIMIT_MAZE, max_steps=10)
assert path is not None, "A 10-step path exists and should be accepted."
assert len(path) - 1 == 10
# Debugging log — Case 1 (Maze Pathfinder)
The 7 stages match the cycle from Step 1. Fill each field as you work.
1. **Symptom** — one sentence, expected vs actual: _..._
2. **Predict** — at the moment a recursive call has just stepped onto the goal cell on an exact-budget run, what should `steps_used` and `max_steps` be? Which of the two early checks should fire? _..._
3. **Evidence** — which tool you used, what cue you were watching, what value you actually observed when paused on the goal cell: _..._
4. **Hypothesis** — one sentence; name the *check* and the *timing* (format: *"\<which check\> \<does what\> \<when\>."*): _..._
5. **Localize** — which line is the first divergence between intended and actual behavior, and one sentence on why each of the other candidates is *not* it: _..._
6. **Fix** — file, line, the minimal change: _..._
7. **Verify** — `pytest` exit code, which tests pass; any regressions in the under-budget rejection case? _..._
Step 3 — Knowledge Check
Min. score: 80%1. Which of these would be a root-cause fix for this bug, as opposed to a workaround?
The root cause is the order of the two early checks in _dfs. Reordering them is a one-line, minimal change that addresses the cause directly. Every other option here is a workaround: it makes the symptom disappear without fixing the underlying logic.
2. A student fixes _dfs by loosening the cutoff to steps_used > max_steps instead of swapping the check order. The test_path_found_when_battery_limit_is_exact test now passes. Is this a correct fix?
The root-cause fix is check ordering — goal first, cutoff second — not loosening the comparator. Loosening >= to > makes the exact-budget test pass but breaks the under-budget-rejection test, because a path one step over budget is now accepted. A fix that passes the newly-passing test while breaking a previously-passing test is a regression, not a fix. This is exactly why Verify means rerunning the whole suite.
3. True or false: Once you’ve fixed the boundary bug in _dfs, you can verify the fix is correct by rerunning only test_path_found_when_battery_limit_is_exact (the previously failing test).
Verification means rerunning the whole suite. Specifically: after the goal-first fix, test_path_rejected_when_battery_too_small (max_steps=9) must still pass. If you accidentally over-loosen the cutoff, this test will catch you — but only if you rerun it.
Case 2 — Ledger Reconciliation (Data Representation Bug)
🎯 Goal: A campus debit-card system imports 30 transactions and one account is $36.00 wrong at month end. The technique you’ve used so far (single breakpoint + step) would force you to step through every transaction. Don’t.
📋 Keep filling
debugging_log.md. Fields are now name-only — refer to Case 1’s log if you need the per-stage prompts. Writing forces commitment; commitment is what makes the cycle yours.
Why this matters & what you'll learn
Data-representation bugs — hidden whitespace, mixed encodings, silent type coercions — are a different family from algorithmic bugs. The algorithm is correct; the data is carrying something invisible. The forward-stepping technique you used in Case 1 doesn’t scale to 30 transactions, and your eyes won’t catch a leading space. This case introduces two new moves (conditional breakpoints, repr()) that are nearly free once you know to reach for them.
You will learn to:
- Apply conditional breakpoints to filter a long input stream down to the suspicious case.
- Analyze a value with
repr()to surface invisible characters thatprint()hides. - Evaluate where a normalization fix belongs — at the load boundary, not at the consumer.
🔀 Before you start: Case 1 had a bug you could trace by reading two
ifchecks in one function. Is that true here? Spend 30 seconds predicting: what kind of thing is wrong, and what will the evidence-collection move look like?The contrast — read after you've tried step 3
Case 1 was *algorithmic* — the data was correct; one check was in the wrong place. This is a *data-representation* bug — the algorithm is correct; the data carries something invisible. Different family, different first move: you don't step through logic looking for a wrong branch; you inspect the data itself to find what it's hiding.
📂 What you have
ledger.py— loads transactions from a CSV and applies them to account balances.transactions.csv— 30 rows of test data.test_ledger.py— two pytest tests, both failing.
Read both failures carefully.
1. Symptom — and a clue
Click Run. Two tests fail:
test_month_end_balances—ACCT-202is wrong by $36.00.test_transaction_types_are_valid_after_loading— the loaded transaction kinds set contains an unexpected value.
The second failure is a clue, not a separate bug. Look at the assertion message — what kind appears that shouldn’t?
2. Predict before debugging
You could step through 30 transactions to find the wrong one. Don’t. That’s exactly the kind of work the debugger is supposed to save you. Predict instead: of the 30 transactions, which one(s) belong to ACCT-202? (You can scan transactions.csv if you want — but only briefly.)
3. Stop only on the suspicious account — conditional breakpoint
Set a breakpoint at the start of apply_transaction (the before = balances.get(...) line). Right-click that breakpoint marker → Edit Breakpoint → enter a condition that pauses only for the suspicious account. What predicate on tx discriminates ACCT-202 from the other accounts?
Predicate answer
`tx.account == "ACCT-202"`Click Debug. The debugger flies past every transaction for other accounts and pauses only on the rows for ACCT-202. Use Continue to move from one ACCT-202 row to the next.
4. Look closely
For each pause, inspect:
tx.idtx.kindrepr(tx.kind)← the secret weapon
Add repr(tx.kind) to your Watch tab so it shows on every pause. Across the ACCT-202 pauses, what does repr show that you wouldn’t notice otherwise?
5. Compare prediction to observation
Across the ACCT-202 pauses, look at repr(tx.kind) in your Watch tab.
- What did you predict
tx.kindwould be for transaction T011? - What does
repr()show thatprint()would have hidden? - Complete this sentence: “My model assumed the value was ___, but repr shows ___ because ___.”
What the comparison reveals
Most students predict `tx.kind == 'REVERSAL'`. The `repr()` output shows `"' REVERSAL'"` — the outer quotes make the leading space unmistakable. `print()` would have shown ` REVERSAL` with no delimiters, where the space blends invisibly into the line. The gap between prediction and observation is the bug's fingerprint.6. Where is the divergence?
Once you’ve spotted the malformed transaction, ask: where in the code is the bug? Is it in apply_transaction (which decides DEPOSIT vs WITHDRAWAL etc.)? Or earlier, in how the row got loaded into a Transaction object?
7. Hypothesis
Write your one-sentence hypothesis before expanding. Name the layer (loading vs processing) and what’s wrong with the data.
Compare with a sample sentence
*"The kind field arrives from the CSV with hidden whitespace. `load_transactions` doesn't normalize it, so it falls through to the unknown-kind branch in `apply_transaction` and gets treated as a withdrawal."* A clean hypothesis names *where* the bug enters (the loader) and *why* the symptom appears far from the cause (the if/elif cascade silently misses).8. Minimal fix
One change in load_transactions on the kind=row["type"].upper() line. Resist the temptation to:
- Patch the final balance.
- Edit the CSV.
- Change the reversal arithmetic in
apply_transaction. - Delete the unknown-kind fallback.
The right fix is the smallest change in the right place.
🪞 Reflect — before you verify
Bug family: Hidden-character bugs hide in CSV imports, copy-pasted strings, JSON keys, environment variables, log lines, command-line args. Name one place where repr() would surface something print() hides.
What repr() changed: Did it change the Evidence step for you (you saw the space you wouldn’t have seen), the Localize step (it told you exactly which field), or both? Write one sentence explaining why print() would have missed it.
9. Verify
Click Run. Both tests must turn green. The arithmetic in apply_transaction is unchanged; only the loading code was wrong.
"""Ledger reconciliation — applies CSV transactions to running balances."""
import csv
import logging
from dataclasses import dataclass
from decimal import Decimal
logger = logging.getLogger(__name__)
VALID_KINDS: set[str] = {"DEPOSIT", "WITHDRAWAL", "REFUND", "REVERSAL", "FEE"}
@dataclass(frozen=True)
class Transaction:
id: str
account: str
kind: str
amount_cents: int
def parse_money(text: str) -> int:
"""Convert a dollars-and-cents string to integer cents."""
return int(Decimal(text) * 100)
def load_transactions(path: str) -> list[Transaction]:
transactions: list[Transaction] = []
with open(path, newline="", encoding="utf-8") as csv_file:
reader = csv.DictReader(csv_file)
for row in reader:
transactions.append(
Transaction(
id=row["id"],
account=row["account"],
kind=row["type"].upper(),
amount_cents=parse_money(row["amount"]),
)
)
return transactions
def apply_transaction(balances: dict[str, int], tx: Transaction) -> None:
before = balances.get(tx.account, 0)
if tx.kind == "DEPOSIT":
after = before + tx.amount_cents
elif tx.kind == "WITHDRAWAL":
after = before - tx.amount_cents
elif tx.kind == "FEE":
after = before - tx.amount_cents
elif tx.kind == "REFUND":
after = before + tx.amount_cents
elif tx.kind == "REVERSAL":
after = before + tx.amount_cents
else:
# Realistic but dangerous legacy behavior: old exports used blank
# types for card charges, so unknown types are treated as
# withdrawals.
after = before - tx.amount_cents
balances[tx.account] = after
def reconcile(transactions: list[Transaction]) -> dict[str, int]:
balances: dict[str, int] = {}
for tx in transactions:
apply_transaction(balances, tx)
return balances
id,account,type,amount
T001,ACCT-100,DEPOSIT,200.00
T002,ACCT-100,WITHDRAWAL,45.25
T003,ACCT-100,FEE,2.50
T004,ACCT-100,REFUND,10.00
T005,ACCT-101,DEPOSIT,125.00
T006,ACCT-101,WITHDRAWAL,19.99
T007,ACCT-101,WITHDRAWAL,8.50
T008,ACCT-101,REFUND,8.50
T009,ACCT-202,DEPOSIT,80.00
T010,ACCT-202,WITHDRAWAL,18.00
T011,ACCT-202, REVERSAL,18.00
T012,ACCT-303,DEPOSIT,300.00
T013,ACCT-303,FEE,7.50
T014,ACCT-303,WITHDRAWAL,22.00
T015,ACCT-303,REFUND,3.25
T016,ACCT-100,WITHDRAWAL,16.00
T017,ACCT-101,FEE,2.50
T018,ACCT-202,WITHDRAWAL,7.25
T019,ACCT-303,WITHDRAWAL,41.99
T020,ACCT-100,REFUND,1.25
T021,ACCT-101,DEPOSIT,40.00
T022,ACCT-202,FEE,1.75
T023,ACCT-303,FEE,2.50
T024,ACCT-100,FEE,2.50
T025,ACCT-101,WITHDRAWAL,12.00
T026,ACCT-202,DEPOSIT,5.00
T027,ACCT-303,REFUND,10.00
T028,ACCT-100,WITHDRAWAL,30.00
T029,ACCT-101,REFUND,4.00
T030,ACCT-202,WITHDRAWAL,3.00
from ledger import load_transactions, reconcile
def test_month_end_balances() -> None:
transactions = load_transactions('/tutorial/transactions.csv')
balances = reconcile(transactions)
assert balances == {
"ACCT-100": 11500,
"ACCT-101": 13451,
"ACCT-202": 7300,
"ACCT-303": 23926,
}
def test_transaction_types_are_valid_after_loading() -> None:
transactions = load_transactions('/tutorial/transactions.csv')
kinds = {tx.kind for tx in transactions}
assert kinds <= {"DEPOSIT", "WITHDRAWAL", "REFUND", "REVERSAL", "FEE"}, \
f"unexpected transaction kind(s) loaded: {kinds}"
# Debugging log — Case 2 (Ledger Reconciliation)
Same 7-stage form, names only. If you're stuck on what a stage demands, reread Case 1's log.
1. **Symptom**: _..._
2. **Predict**: _..._
3. **Evidence**: _..._
4. **Hypothesis**: _..._
5. **Localize**: _..._
6. **Fix**: _..._
7. **Verify**: _..._
Step 4 — Knowledge Check
Min. score: 80%1. Which of these is the root-cause fix?
The bug is that the CSV row had a leading space, so kind became ' REVERSAL' instead of 'REVERSAL'. The fix belongs in load_transactions because that’s where data flows from external (untrusted) format into internal representation. Strip-and-validate at the boundary, then trust the data inside.
2. Why is repr(tx.kind) more useful than print(tx.kind) when investigating this bug?
repr('REVERSAL') returns \"'REVERSAL'\" — including the surrounding quotes — while repr(' REVERSAL') returns \"' REVERSAL'\". The leading space jumps out because repr() shows the string as a Python literal, with quotes around its contents. print() displays the string’s content without delimiters, so leading and trailing whitespace becomes invisible. This is the canonical Python trick for spotting whitespace bugs.
3. You have a 30-iteration loop where one specific iteration produces a wrong result. Which technique most efficiently locates the bad iteration?
Conditional breakpoints scale. They turn the debugger into a filter: only stop when this expression is true. The cost is the same regardless of whether the loop has 30 or 30,000 iterations. This is one of the highest-leverage debugger features and the reason “set a conditional breakpoint” is one of the first moves an experienced debugger reaches for in long-running data-processing code.
Backward Tour — Time-Travel Drill
🎯 Goal: Drill the backward moves. Stepping forward through code is the default; rewinding from a final state to find when something first changed is a different motor pattern. There’s no bug —
counter.pyruns correctly.
Click Debug to start.
Why this matters & what you'll learn
Stepping forward is the default; rewinding from a known-wrong final state to find when it first appeared is a separate motor pattern that takes deliberate practice. Case 3 will demand exactly this move on a real bug — but learning the move during the bug hunt mixes two hard things at once. Drilling the four scrubber moves on correct code now isolates the skill so Case 3 can focus on the bug, not the tool.
You will learn to:
- Apply the four scrubber moves: anchor, single-tick rewind, jump-to-tick, scrub-until-predicate.
- Analyze a recorded execution history by reading the Variables tab as you scrub.
- Evaluate when backward localization beats forward stepping (symptom-far-from-cause bugs).
1. “What was the final state?” → Run to completion, then anchor
Click Debug without setting any breakpoints. The program runs to completion. The debugger pauses at the last line.
In the Variables tab, expand state. Note count and the length of history. This is your anchor — every move below is relative to this final state. Anchoring on a known wrong final state is exactly what Case 3 will ask of you.
2. “Rewind one event” → Scrub backward by one tick
Drag the History scrubber backward by one tick. Watch count change in the Variables tab. The arrow gutter turns gray when you’re rewound — you’re not at “live” execution anymore.
Verify: count should now equal what it was just before the last event. Cross-check against history[-2].
3. “What was count after exactly N events?” → Scrub to a specific moment
Scrub backward until len(state["history"]) shows 3. Read state["count"]. That’s the value after exactly 3 events were applied.
Predict before scrubbing further: what was count after exactly 5 events? Now scrub to len == 5 and verify against your prediction.
4. “When did count first go negative?” → Anchor + walk backward to first divergence
Look at history — each entry is (event, count_after). Scan for the first negative second element. That moment is where count first turned negative.
Now use the scrubber to visit that moment: drag backward until state["count"] first shows a negative value. This is the localization move you’ll use in Case 3 — anchoring on a known state, rewinding to the first moment that state appeared.
5. “What was count immediately before the reset event?” → Predicate-driven scrub
The simulator includes a reset event that zeros count. Find the entry ("reset", 0) in history. Scrub to one tick before that reset fired. What was count?
6. “Forward again to live” → Scrub all the way forward
Drag the scrubber all the way to the right. The arrow gutter returns to its normal color — you’re back at “live” execution. Edits will run from this point if you make any.
🪞 Reflect
From memory, name the four scrubber moves:
- Run to end, inspect the anchor state
- Scrub backward one tick (per-event rewind)
- Scrub to a specific tick (jump by a marker like
len(history) == N) - Scrub backward until a predicate first holds — this is the move for Case 3
The shape is always: anchor on a known state, walk backward to find when it first appeared.
# Backward Tour — no bug. Exercise the history scrubber.
#
# A tiny event-driven counter. Each event modifies `count`.
# `history` records (event_name, count_after_event) for every step.
from typing import Any
CounterState = dict[str, Any]
def apply_event(state: CounterState, event: str) -> None:
if event == "inc":
state["count"] += 1
elif event == "dec":
state["count"] -= 1
elif event == "double":
state["count"] *= 2
elif event == "neg":
state["count"] = -state["count"]
elif event == "reset":
state["count"] = 0
else:
raise ValueError(f"unknown event {event!r}")
state["history"].append((event, state["count"]))
def main() -> CounterState:
state: CounterState = {"count": 1, "history": []}
events: list[str] = ["inc", "double", "neg", "double", "inc", "reset", "inc", "inc"]
for event in events:
apply_event(state, event)
return state
main()
Step 5 — Knowledge Check
Min. score: 80%1. “I want to find the first event in a 50-event stream that produced a wrong state.” Which scrubber move fits best?
Anchor on the wrong final state, scrub backward until it matches the spec. The first tick where the state is correct again is the one immediately before the bug fired. This is the canonical backward-localization move.
2. “What was count after exactly 4 events?” — which scrubber move answers this?
Scrub to a specific tick by reading a marker (here, len(history)). Pick a state property that monotonically increases (event count, log length, step number) so each tick is identifiable from the Variables tab.
3. After scrubbing backward, the arrow gutter turns gray. What does that mean?
Gray = rewound. You’re inspecting a recorded past state. Drag the scrubber all the way to the right to return to live execution. This visual cue prevents the confusion of “why isn’t my edit running?” — the answer is always “scrub forward first, then run.”
Case 3 — Course Waitlist (Temporal Bug)
🎯 Goal: A course-registration simulator processes 9 events and ends in a wrong state. The visible symptom appears several events after the event that caused it. Find the first bad state transition, not just the final wrong state.
📋
debugging_log.md— three stages are now unlabeled. Name them yourself before filling them in. Naming the stage you’re in is the move that keeps the cycle from collapsing into tinkering.
Why this matters & what you'll learn
Some bugs separate cause from symptom in time: a wrong decision happens early, the visible failure appears events later, and stepping forward forces you to inspect correct state for ages before anything looks wrong. This is what the time-travel debugger is built for — anchor on the wrong final state and rewind to the first divergence. Case 3 demands the backward-localization move you drilled in Step 5, on a real bug where forward stepping would waste the most attention.
You will learn to:
- Apply the anchor-and-rewind technique to find the first wrong state transition in an event stream.
- Analyze a temporal bug whose symptom appears events after the cause.
- Evaluate two correct fixes (
pop(0)vsdeque.popleft()) on intent, cost, and disruption.
🔀 Before you start: In Cases 1 and 2, you could find the bug by reaching one specific line with a breakpoint. Will that work here? Spend 30 seconds predicting: what kind of thing might be wrong, and will a single well-placed breakpoint be enough to find it?
The contrast — read after step 3
Cases 1–2 were *spatial* — the bug lives at a specific line you can reach with a breakpoint. This one is *temporal* — the cause and the symptom are separated by time. The wrong state is visible at the end, but the wrong decision happened much earlier. The new move is the history scrubber: run to the wrong final state, then rewind to find the first moment things went wrong.
📂 What you have
waitlist.py simulates two courses (CS201, MATH220) with sample events: students join waitlists, students drop, freed seats get allocated. The stated policy is FIFO: the first student to join a full course’s waitlist should be the first admitted when a seat opens.
test_waitlist.py has two tests, one failing:
test_cs201_waitlist_is_fifo— fails: enrolled list is wrong.test_math220_single_waitlisted_student_gets_open_seat— passes (only one waitlisted student, so FIFO/LIFO is indistinguishable).
1. Symptom — read the failure carefully
Click Run. The failing assertion shows expected vs actual enrollment lists. Note the difference — you’ll need it in step 3.
2. Strategy — which direction would you start?
Would you step forward from event 1, watching state change after each event? Or would you let the program finish, then work backward from the known wrong final state?
Which direction is faster here — and why?
Backward. Events 1–3 produce no observable symptom. Starting forward means inspecting correct state for several events before anything looks wrong. Anchoring on the known wrong final state and scrubbing backward walks directly to the first divergence — you stop the moment something changes from wrong to right.Click Debug without setting any breakpoints. Let the program run to completion. The debugger will be at the end of execution.
Now, in the Variables tab, expand state then 'CS201' then enrolled and waitlist. Observe their final (wrong) values.
3. Scrub backward through history
Drag the History scrubber backward, slowly, while watching the Variables tab. You’ll see enrolled and waitlist change as you rewind through events.
Scrub one event at a time. At each event, ask one question: “Did the front of the waitlist just get admitted?” Stop at the first event where the answer is no.
4. Now narrow to a line
Once you’ve identified that event, scrub forward to it. Set a breakpoint inside allocate_next — the function responsible for moving students from the waitlist into enrolled seats.
Click Continue (or restart with Debug if needed) until execution pauses there for the right event.
5. Compare prediction to observation
Before you step over the pop() line, add these to the Watch tab:
course.waitlist[0]— the student at the frontcourse.waitlist[-1]— the student at the back
Predict: given FIFO policy, which end should pop() remove from — front or back?
Now Step Over the pop() line. Add next_student to Watch (it now has a value). Compare: which end of the waitlist did pop() actually take from?
What the comparison reveals
`pop()` with no argument removes the *last* element (index `-1`). FIFO policy requires removing the *first* element. If your prediction was "front", your model was right — and the code was wrong. If you predicted "back", you may have assumed `pop()` defaults to front. That's the key gap: Python's list is a stack by default, not a queue.6. Hypothesis
Write your one-sentence hypothesis. Name the operation and the spec it violates.
Compare with a sample sentence
*"`list.pop()` removes the LAST element. The spec says FIFO — the FIRST element should be admitted first."* The hypothesis pins the bug to a *single library call's behavior* rather than to the surrounding orchestration. That precision is what makes the fix one character.7. Minimal fix — and a judgment call
Two correct fixes exist. Pick one and justify in one sentence (write your reasoning as a comment at the top of allocate_next):
course.waitlist.pop(0)— one-character change, list stays a list.- Convert
waitlisttocollections.dequeand usepopleft()— bigger diff, but the type says “queue”.
Criteria to weigh: communicates intent / asymptotic cost / disruption to surrounding code. There’s no single right answer; the justified choice is what matters.
🪞 Reflect — before you verify
Bug family: Symptom-far-from-cause bugs hide in caches that go stale events ago, message queues processed out of order, undo/redo stacks, optimistic UI updates. Name one place where the wrong final state would have been easier to find by stepping backward than forward.
Did you try stepping forward first? If so, at what point did you decide to switch direction? That decision point is worth naming — it’s the diagnostic cue that says “this is a temporal bug.”
8. Verify
Click Run. Both waitlist tests must pass.
"""Course waitlist simulator with a deliberately seeded ordering bug."""
from dataclasses import dataclass, field
@dataclass
class CourseState:
capacity: int
enrolled: list[str] = field(default_factory=list)
waitlist: list[str] = field(default_factory=list)
@property
def open_seats(self) -> int:
return self.capacity - len(self.enrolled)
@dataclass(frozen=True)
class Event:
step: int
kind: str
course: str
student: str | None = None
def initial_state() -> dict[str, CourseState]:
return {
"CS201": CourseState(capacity=2, enrolled=["Ava Chen", "Ben Ortiz"]),
"MATH220": CourseState(capacity=1, enrolled=["Iris Long"]),
}
def sample_events() -> list[Event]:
"""Reproducible event stream.
CS201 policy: students should be admitted from the waitlist in FIFO order.
"""
return [
Event(1, "join_waitlist", "CS201", "Mina Patel"),
Event(2, "join_waitlist", "CS201", "Theo Rios"),
Event(3, "join_waitlist", "CS201", "Jules Kim"),
Event(4, "drop", "CS201", "Ben Ortiz"),
Event(5, "join_waitlist", "MATH220", "Noor Ali"),
Event(6, "join_waitlist", "CS201", "Kai Morgan"),
Event(7, "drop", "MATH220", "Iris Long"),
Event(8, "drop", "CS201", "Ava Chen"),
Event(9, "join_waitlist", "CS201", "Sam Lee"),
]
def apply_event(state: dict[str, CourseState], event: Event) -> None:
course = state[event.course]
if event.kind == "join_waitlist":
_handle_join(course, event.student)
elif event.kind == "drop":
_handle_drop(event.course, course, event.student)
else:
raise ValueError(f"unknown event kind {event.kind!r}")
def _handle_join(course: CourseState, student: str | None) -> None:
if student in course.enrolled or student in course.waitlist:
raise ValueError(f"duplicate student in course state: {student}")
if course.open_seats > 0:
course.enrolled.append(student)
else:
course.waitlist.append(student)
def _handle_drop(course_name: str, course: CourseState, student: str | None) -> None:
if student in course.enrolled:
course.enrolled.remove(student)
allocate_next(course_name, course)
elif student in course.waitlist:
course.waitlist.remove(student)
def allocate_next(course_name: str, course: CourseState) -> None:
"""Fill open seats from the waitlist."""
while course.open_seats > 0 and course.waitlist:
next_student = course.waitlist.pop()
course.enrolled.append(next_student)
def run_events(
events: list[Event] | None = None,
state: dict[str, CourseState] | None = None,
) -> dict[str, CourseState]:
if state is None:
state = initial_state()
if events is None:
events = sample_events()
for event in events:
apply_event(state, event)
return state
from waitlist import run_events
def test_cs201_waitlist_is_fifo() -> None:
state = run_events()
cs201 = state["CS201"]
assert cs201.enrolled == ["Mina Patel", "Theo Rios"]
assert cs201.waitlist == ["Jules Kim", "Kai Morgan", "Sam Lee"]
def test_math220_single_waitlisted_student_gets_open_seat() -> None:
state = run_events()
math220 = state["MATH220"]
assert math220.enrolled == ["Noor Ali"]
assert math220.waitlist == []
# Debugging log — Case 3 (Course Waitlist)
Stages 1, 2, 6, 7 are labeled. Stages 3-5 are not — *name the stage yourself*, then fill in the content.
1. **Symptom** (one sentence — expected vs actual): _..._
2. **Predict** (which end of the waitlist should `pop()` remove from, given FIFO?): _..._
3. **[Stage name?]**: _..._
4. **[Stage name?]**: _..._
5. **[Stage name?]**: _..._
6. **Fix**: _..._
7. **Verify**: _..._
<details><summary>Field labels 3-5 (open only after you've named them yourself)</summary>
3. Evidence
4. Hypothesis
5. Localize
</details>
Step 6 — Knowledge Check
Min. score: 80%
1. For a Python list xs = ['a', 'b', 'c', 'd'], what does xs.pop() return, and what is xs afterward?
list.pop() with no argument removes and returns the last element. This is LIFO (stack) behavior. For FIFO (queue) behavior, use pop(0) (or collections.deque.popleft() for O(1) performance).
2. Which of these is the correct fix to enforce FIFO admission policy?
The bug is in how a student is removed from the waitlist, not in any of the data. pop() removes from the back; pop(0) removes from the front. FIFO requires removing from the front.
3. You discover the symptom (CS201 enrolls the wrong students) at the end of the program, but the cause is in event 4 (drop Ben Ortiz, which triggers allocate_next). Which technique most directly localizes the bug?
Back-in-time / history-scrubbing is built for exactly this bug shape. When the symptom appears later than the cause, scrubbing backward from the symptom — instead of stepping forward from the start — directly walks you to the divergence point. Forward stepping spends time on events that produced no observable change.
4. (Bonus — code communication.) Which choice best communicates that a list is being used as a FIFO queue?
collections.deque.popleft() is the idiomatic, readable choice. It tells the next reader: this is a FIFO queue. list.pop(0) works but doesn’t communicate intent (and is O(n) for large lists). For a debugging tutorial, the takeaway is broader: fixes that document intent are easier to get right and easier to maintain than fixes that merely produce the right output.
Triage Drill — Pick the Right Technique
🎯 Goal: Match each scenario to the right first move. The point isn’t speed; it’s discriminating between bug families.
Try the drill from memory. Pass threshold: 0.85. After the quiz, you’ll see a recap of the cue→technique mapping for spaced retrieval next time.
Why this matters & what you'll learn
Knowing six debugger moves doesn’t help if you reach for the wrong one first. Real bugs arrive without labels; the skill that separates a competent debugger from a thrashing one is reading the cue in a bug description and picking the right first move. This step interleaves the three bug families you’ve practiced so the discrimination is forced — and adds two ubiquitous moves the lecture covered (rubber duck, post-fix documentation) so they’re in the toolkit.
You will learn to:
- Analyze a bug description and discriminate which family (boundary, data, temporal) it belongs to.
- Evaluate which technique fits each cue — and articulate why neighboring techniques don’t.
- Apply rubber-duck debugging and post-fix documentation as standard moves in your workflow.
🦆 Two debugging moves the lecture covered that you haven’t drilled yet
Before the quiz, lock these in. They’re cheap, ubiquitous in real practice, and the triage drill will mention them.
🦆 Rubber Duck Debugging — your most valuable root-cause tool
The lecture called this the “most valuable root-cause analysis tool” — and the call-out wasn’t ironic.
The Curse of Knowledge. When you’ve held a mental model of your code in your head for the past hour, you read what you intended to write, not what you actually wrote. Your eyes skip the bug because your model says it’s not there. This is why staring at the same five lines for 20 minutes rarely uncovers anything new.
The technique.
- Place a rubber duck (or any silent object — a coffee mug, a textbook, a sympathetic stuffed animal) on your desk.
- Explain to the duck what your code is supposed to do, line by line. Out loud. Slowly.
- At some point — typically a third of the way through — you’ll tell the duck what your code should be doing next, and realize that’s not what it’s actually doing.
That’s the moment your mental model and the actual code diverge. The bug lives in that gap.
Why it works. Verbalization forces you to retrieve and articulate each intermediate step instead of skimming over it. The duck doesn’t help you; explaining helps you. The duck just keeps you from looking like you’re talking to yourself.
Practice tip: when you don’t have a duck, write the explanation as a comment in the code (you can delete it after). Same effect.
📝 After the fix — document and regression-test (don't skip this)
The lecture closed phase 4 (Implement & verify a fix) with three moves you should plan to do every time:
- Add nearby assertions. When you find a bug, related bugs are often hiding in the same neighborhood.
assert x is not None,assert len(items) > 0,assert response.status_code == 200— assertions catch errors before they become failures. - Document why the fix was necessary in a code comment, in the git commit message, and in the bug report. Future-you (and future-teammate) will need to understand why this line exists; “fix bug” is not enough.
- Keep the bug-reproduction test in the suite for regression testing. Re-running existing tests after later code changes is how you make sure today’s fix doesn’t get silently undone next month. Every bug fix should leave behind a test.
The triage quiz below assumes you’ll do all three after picking the right first move.
This step is a quiz only. No code to edit.
Take your time on each scenario — the goal is matching cues to
techniques, not memorizing pairs.
Step 8 — Knowledge Check
Min. score: 80%1. A function processes 50,000 log lines and produces a wrong total. You’ve confirmed the bug is consistent run-to-run. Which technique most efficiently localizes it?
Long streams want conditional breakpoints. The condition is whatever invariant you suspect is broken (running_total > 1e9, line.startswith('ERROR'), etc.). The debugger filters; you only see the iterations that matter.
2. A recursive function returns the wrong answer for one specific input. The function is small (12 lines) and you have a clear test case that reproduces it. Which technique fits best?
For small, well-localized buggy functions, ordinary breakpoint + step + watch + call stack is the simplest and fastest combination. Reach for fancier tools (conditional breakpoints, back-in-time) only when the simpler tool is genuinely insufficient.
3. Final cart total is wrong; a discount appears to have been applied to the wrong line item. The cart processed 8 events (add item, apply coupon, etc.) and the wrong-line discount happened somewhere in the middle. Which technique fits best?
Back-in-time / scrubbing is the right first move when symptom and cause are temporally distant within a single run. After scrubbing localizes the suspicious event, an ordinary breakpoint can give you line-level precision.
4. A function has two parameters that should be independent. After running, you find that modifying one of them mysteriously changes the other. Which technique fits best?
Mysterious co-mutation is the signature of aliasing. The most efficient first move is checking the Variables tab: if two names share an oid, they reference the same object, and modifying one will appear to “modify” the other. The classic Python instance is mutable default arguments — exactly what you saw in Step 2’s register_score.
5. You’ve spent 20 minutes setting and clearing breakpoints, making small edits, and rerunning tests. Nothing has worked, and you’re starting to feel frustrated. What’s the right next move?
When the cycle stalls, the move is to externalize. Write down the failure precisely, list hypotheses you’ve ruled out (and how), and re-pick a technique deliberately. This isn’t about willpower — it’s about getting the problem out of your head and onto a surface where you can reason about it. Research on debugging found that simply forcing this articulation helped students solve bugs they otherwise would have escalated.
6. A test passes locally on your laptop but fails on the autograder. You’ve reproduced the failure on the autograder twice. What’s the most useful first move?
Reproducibility is upstream of every debugging technique. A bug you can’t reproduce is a bug you can’t debug — none of breakpoints, scrubbing, or watches help if the failure isn’t in front of you. The first move is to find what differs between environments (Python version? OS? data? seed?) and either fix the discrepancy or simulate the autograder’s environment locally.
7. A test that previously passed now fails after a change you just made. The previous test still passes. What does this tell you?
A previously-passing test that newly fails after your change is a regression — your change broke a behavior that was correct. Revert and re-apply more carefully (smaller change, more thought). This is exactly why “verify means rerun the whole suite” — to catch regressions, not just confirm the one fix.
8. A payment processor handles 10,000 transactions. Two adjacent transactions produce totals that are slightly off — but only when a specific merchant ID appears. The failure is consistent run-to-run, and the wrong calculation fires exactly when the bad merchant ID is processed. Which technique fits best?
Conditional breakpoints vs. back-in-time scrubbing depend on temporal distance. Scrubbing earns its cost when symptom and cause are separated by time (many events happen between the bug and when you notice it). Here, the symptom co-occurs with the cause — the bad calculation fires exactly when the suspicious merchant ID is processed. A conditional breakpoint that pauses only on that ID is the direct move.
9. Which of these counts as evidence in the debugging cycle? (select all that apply)
Evidence is observable, specific, and reproducible. Variable values at specific lines, exact failure messages, and repr() outputs all qualify. Hunches are valuable as the starting point for hypothesis generation, but they don’t yet count as evidence — they need to be tested against observations before they earn that status. Distinguishing the two clearly is one of the highest-leverage moves an experienced debugger makes.
Transfer Challenge — You’re On Your Own
🎯 Goal: Find and fix a bug in unfamiliar code without step-by-step prompts. You pick the technique. You type the debugging log.
Compare to Cases 1–3: there, we numbered each stage of the cycle. Here, you do.
📂 What you have
A small program: tagger.py reads articles.txt (each line is "Title|tag") and returns the most common tag.
Two pytest tests in test_tagger.py:
test_python_is_most_common— fails (returns the wrong value).test_no_whitespace_in_result— fails (the result contains whitespace).
📋 Your debugging log
Open debugging_log.md and fill each field as you work.
🚨 Resist the obvious. You may recognize the bug family — but verify with the debugger before assuming. Pattern-matching without evidence is the trap of Step 7’s tinkering item.
Why this matters & what you'll learn
Knowing the cycle on scaffolded examples is one thing; running it without prompts on unfamiliar code is the actual job. Transfer is what tells you whether the cycle has become yours or whether it lived only in the labels we put around each stage. This step removes the per-stage scaffolds — you name the stages, pick the technique, and write the log — so you can see for yourself what you’ve internalized.
You will learn to:
- Apply the full cycle on unfamiliar code without step-by-step prompts.
- Evaluate which case from this tutorial the new bug most resembles structurally — and defend the match.
- Analyze your own default debugging mode (tinkering / print / hypothesis-driven) and name when to override it.
🔗 After fixing — before the quiz
The Transfer Challenge is intentionally in the same bug family as one of the three cases. Before reading the solution or the quiz:
- Which case is it most similar to structurally?
- Write one sentence: “Both bugs share ___ even though the surface is different because ___.”
- Write one sentence: “The surface difference is ___ — which is what makes this feel new.”
Commit to those sentences. Quiz Q1 asks you to defend the match.
🌐 Far-transfer probe — while you debug
Pick one codebase you’ve worked on recently. Where does external data enter (a file read, an API call, a form submission, a database query)? At that entry point: is normalization happening at the boundary, or are downstream consumers doing it — or not doing it at all? Spend 30 seconds answering for one entry point before you start the debugger.
Hint of last resort
If you haven’t found it yet after 10 minutes, the test output already tells you what repr(...) would tell you on a paused breakpoint. Re-read the failing assertion of test_no_whitespace_in_result.
🪞 Self-check — after you fix it
Before this tutorial, which mode would you have defaulted to on this bug?
- Tinkering — try
.strip(),.replace('\n', ''), and other edits until something worked. - Print-first — add
print(tag)everywhere. (The trailing\nprints as a literal newline, easy to miss;repr()makes it impossible to miss.) - Hypothesis-driven — breakpoint, inspect
repr(tag), name the cause, fix at the load boundary. - Honestly not sure — depends on the day and how stuck you felt.
Name which one. That’s the metacognitive skill: knowing your default mode is how you know when to override it.
"""Article tag analyzer.
Reads a file where each line is `"Title|tag"`, returns the most
common tag (uppercased) across all articles.
There is a bug. Both tests in test_tagger.py fail.
"""
from collections import Counter
def top_tag(articles_path: str) -> str:
counts: Counter[str] = Counter()
with open(articles_path) as f:
for line in f:
title, tag = line.split("|", 1)
counts[tag.upper()] += 1
return counts.most_common(1)[0][0]
Why Python rocks|python
JavaScript closures|javascript
Decorators in Python|python
Async Python explained|python
Rust intro|rust
from tagger import top_tag
def test_python_is_most_common() -> None:
# Three of five articles are tagged "python", so PYTHON should win.
assert top_tag('/tutorial/articles.txt') == "PYTHON"
def test_no_whitespace_in_result() -> None:
result = top_tag('/tutorial/articles.txt')
assert result == result.strip(), \
f"Result {result!r} contains whitespace — tags should be normalized at load time."
# Debugging log
Fill each field as you work. Fields 1, 2, 6, 7 are labeled for you.
Fields 3–5 are not — name the stage yourself, then fill in the content.
1. **Symptom** (one sentence — expected vs actual): _..._
2. **Predict** (what should the state be at the suspect line?): _..._
3. **[Stage name?]** (technique chosen and why — write: "I used [tool] because [cue]"): _..._
4. **[Stage name?]** (one sentence — *what* is wrong, *where* it lives): _..._
5. **[Stage name?]** (the line where intended and actual first diverge): _..._
6. **Fix** (file, line, minimal change): _..._
7. **Verify** (which tests pass now; any regressions?): _..._
<details><summary>Field labels 3–5 (open only after completing the log)</summary>
3. Evidence
4. Hypothesis
5. Localize
</details>
Step 9 — Knowledge Check
Min. score: 80%1. Which of the three earlier cases is this bug most structurally similar to?
This bug is the same family as Case 2 in different clothes. Both: external data (CSV row in Case 2, file line here) carries a stray whitespace character; the loading code doesn’t normalize it; the fix is to strip-and-validate at the data boundary. Recognizing isomorphism across surfaces is what transfer means in the research literature.
2. (Final retrieval — spaced from Step 1.) Place these debugging-cycle stages in order: A. Verify B. Symptom C. Hypothesis D. Fix E. Evidence F. Localize G. Predict
Symptom → Predict → Evidence → Hypothesis → Localize → Fix → Verify. The order matters: each stage produces what the next stage needs. Skipping or reordering creates known anti-patterns: tinkering (Fix-first), local verification (skipping Verify of the full suite), or pattern-matching wrong fixes (Localize without Hypothesis).
🪞 Final reflection (no graded answer): Which stage is hardest for you to slow down on? If your honest answer is “Fix” — i.e., you skip ahead to editing — you’re in good company. That’s the most common failure mode. The remedy is not willpower; it’s the explicit form of the cycle plus practice. You just did three rounds of practice.
3. (Spaced retrieval — Step 1’s “no edit until stage 6” rule.) You’re 30 seconds into investigating a bug. You think you see the problem. What does the discipline say to do right now?
“No edit until stage 6” is the central rule. Even a 5-second hypothesis (“I think it’s the off-by-one in the range call”) forces you to articulate what you believe before you commit to a fix. Without articulation, you fix-and-hope, which can take 10× longer than verbalize-then-fix.
4. (Transfer — apply the cycle to a new case.) A teammate reports: “My function expand_aliases is supposed to look up names in aliases.json, but every key returns None.” Which stage of the debugging cycle did your teammate just do, and what’s the next stage?
Symptom = the externally visible fault (“returns None”). The next stage is Predict — what should happen per the spec? Then Evidence — what is happening (use the debugger or print(repr(...))). Then Hypothesis. Skipping Predict is the most common shortcut and the most expensive one — without a written prediction, you can’t tell whether observation matches expectation.
5. (Spaced — Step 2’s aliasing badge.) Your code does:
def add_to(items: list[str] = []) -> list[str]:
items.append("x")
return items
print(add_to()) # ['x']
print(add_to()) # ['x', 'x'] ← surprise
Default argument values are evaluated once, at function-definition time. The items=[] creates one list, bound to the function as its default. Every call that uses the default reuses that same list. The fix is def add_to(items=None): items = items or [] (or if items is None: items = []). This is one of Python’s top-5 gotchas — the time-travel debugger’s aliasing badge (Step 2) lights up on this exact pattern.