TDD Tutorial — Print View

1

Cycle 1 — RED: Write the Failing Test

Prerequisite: Testing Foundations — pytest discovery, assert, partitions, behavior-not-implementation. If those feel new, do that one first.

You’ll grow a small dice-scoring engine over 15 steps using only Test-Driven Development.

Test-Driven Development in one minute

TDD is a design technique that uses tests as the medium of pressure. You write code in short cycles of three phases:

Phase	What you do	Why this phase exists
🔴 RED	Write one failing test that names a behavior you want	Forces the interface and expected behavior to be decided before any logic exists
🟢 GREEN	Write the smallest code that makes the test pass	Resists speculative design; only build what a test demands
🔵 REFACTOR	Improve the code while all tests stay green	The safety net lets you reshape structure without fear of regression

Each phase of cycle 1 is its own tutorial step so the rhythm becomes a felt sequence, not a slogan. From cycle 2 onward, each cycle is one step containing all three phases.

Why a failing test is the goal of RED

A failing test is not a bug — RED is the expected starting state of every cycle. But the failure has to come from the behavior under test, not from a typo:

Right reason — ImportError, AttributeError, a value-mismatch on the assertion you wrote. The test correctly says “this behavior does not exist yet.”
Wrong reason — SyntaxError, missing colon, misspelled test_ prefix. The test never ran. You’ve learned nothing about the unit, only about your typing.

Students commonly delete a failing test to make the bar green. We’re leaning into that discomfort instead. Learning to read the failure is what TDD trains.

The shape of every pytest test

Every pytest test you write has the same four-part structure:

Part	What it does
Import the unit under test	Tells Python what code you’ll call
Define a function whose name starts with `test_`	pytest only discovers functions matching this pattern
Arrange + act	Set up any input and call the unit
Assert an observable property of the result	Pin down one thing the spec promises

The pattern generalises; the specifics (what to import, call, and assert) come from the spec — and only from the spec.

The dragon dice rules (reference for all 12 cycles)

Roll	Event	Damage
Single 1	Dragon Flame	100
Single 5	Lightning Spark	50
Triple 1	Dragon Blast	1000
Triple 2	Goblin Swarm	200
Triple 3	Orc Charge	300
Triple 4	Troll Smash	400
Triple 5	Lightning Storm	500
Triple 6	Demon Strike	600

Triples consume three dice; leftover 1s and 5s still score as singles. Dice are integers 1–6. Today you implement only the empty-roll case. Eleven more cycles add the rest.

Cycle 1’s spec

An empty roll produces a battle report with zero damage and no events.

That sentence names everything you need: a function score, a return value with total_damage and events attributes. Translate it into a pytest test using the four-part shape.

Your task

In test_scorer.py (right pane), fill in the three sub-goal comments. Leave scorer.py empty — its code belongs to the GREEN step.
Predict the exact failure message you’ll see, then click Run.
Expected output: ImportError: cannot import name 'score' (or ModuleNotFoundError if scorer.py is empty). That IS the deliverable — RED for the right reason.

Detail: why a tuple, not a list

The empty events case is written as () (an empty tuple), not [] (an empty list). Tuples are immutable — they can’t be mutated by accident, and they’re safe dataclass defaults (cycle 2 uses that). Every test in this tutorial that pins down events uses a tuple.

References

pytest — Get started
Python — assert statement
If you get stuck on the test, click Show Solution below the editor for one possible form.

Self-test before clicking Next

Close the page in your head and answer:

What two pieces of information does pytest’s failure message give you?
Why is “RED for the right reason” worth distinguishing from “RED for any reason”?
If you had to write a test for a multiply(a, b) function from scratch, what four parts would it contain?

Starter files

scorer.py

# Cycle 1 RED phase — DO NOT WRITE PRODUCTION CODE HERE YET.
#
# The next tutorial step (Cycle 1 GREEN) is where the BattleReport
# class and the score function are introduced. Right now we are
# only writing the failing test on the right.

test_scorer.py

"""Cycle 1 RED — write the first failing test.

The sub-goals below describe the PURPOSE of each line you need to add,
not the syntax. Translate the spec ("an empty roll has no damage and no
events") into pytest assertions yourself. If you get stuck, consult the
rules table and the references in the instructions panel.
"""
from scorer import score


def test_empty_roll_has_zero_damage_and_no_events():
    # Sub-goal: call the unit under test on the simplest input the spec mentions
    #           — and capture the result so the next two lines can inspect it.

    # Sub-goal: pin down what the spec says about damage in this case.

    # Sub-goal: pin down what the spec says about events in this case.
    pass

2

Cycle 1 — GREEN: Make It Pass

The GREEN rule: write the smallest code that makes the failing test pass. Anything more is speculative design — code with no test demanding it.

The test is your contract

Every line of the test is an obligation your code must satisfy:

Line of the test	What your code must provide
`from scorer import score`	A `score` name in `scorer.py`
`report = score([])`	`score` returns something
`assert report.total_damage == 0`	That something exposes `total_damage`, equal to `0`
`assert report.events == ()`	…and `events`, equal to `()`

Four obligations. Anything you write that does not contribute to one of those four is speculative design — it has no test demanding it, so it has no test guarding it, so it has no business being there yet.

Three Python tools the scaffold uses

Three building blocks you’ll need. None are TDD-specific — they’re standard Python tools you’ll reach for many times.

@dataclass auto-writes __init__, __repr__, and __eq__ from your field declarations. You write the field names and types; Python writes the bookkeeping. The auto-generated __eq__ is what makes assert report.events == (...) work field-by-field. (docs)
frozen=True added to @dataclass(frozen=True) makes instances immutable — fields can’t be reassigned after construction. Frozen dataclasses are also hashable. We rely on this in cycle 2, where test assertions compare tuples of ScoringEvent instances. (docs)
@property turns a method into an attribute read. Calling sites use report.total_damage (no parens) but Python runs the function body each time. The test reads total_damage like a field; the @property decorator keeps that grammar consistent. (docs)

The Transformation Priority Premise — why “smallest” beats “best”

Robert Martin’s TPP lists code transformations from simplest to most complex: nothing → constant → variable → conditional → loop. The rule: always pick the simpler transformation that passes the current failing test, even when you “know” a more general one is coming.

For cycle 1, the test only mentions the empty case. You do not need a loop yet — the empty-tuple default already produces 0 damage. The loop arrives when a test (cycle 4) actually demands it.

Your task

In scorer.py (left pane), replace each sub-goal comment with the matching line. Re-read the test for the contract.
Predict what pytest will show. Run.
Resist any “improvement” beyond what the test demands — the next step is REFACTOR, and it only earns work that has somewhere to go.

Common mistakes to avoid

events: list = [] — Python rejects mutable defaults in dataclasses with ValueError. What immutable alternative matches the test’s events == () assertion?
Forgetting @property — without it, report.total_damage is a bound method object, not a number; the assertion fails in a weird way.
@dataclass without frozen=True — passes cycle 1, but cycle 2’s tuple comparisons of value objects need the structural __eq__ that frozen dataclasses provide.
if not dice: ... — speculative branching. The empty-tuple default already handles the empty case.

If you get genuinely stuck, click Show Solution below the editor for one possible form.

Self-test before clicking Next

What three things does @dataclass write for you that you’d otherwise type by hand?
Why is @property a better fit than a plain method for total_damage given how the test reads it?
What does frozen=True buy us, and which later cycle will first rely on it?
What four obligations does the cycle-1 test impose — derived from reading the test alone?

Starter files

scorer.py

"""Cycle 1 GREEN — smallest code that turns the failing test green.

The sub-goals below describe the PURPOSE of each line you need to add,
not the syntax. Re-read test_scorer.py to recover the contract: a name
to export, a return value, two attributes on the return value with
specific values. The Cart example in the instructions shows the toolkit
shape; you must translate it to the dragon-dice naming yourself.
"""
from dataclasses import dataclass


@dataclass(frozen=True)
class BattleReport:
    # Sub-goal: declare the storage that the test reads as `report.events`.
    # Hint: the test compares this to `()`, which already tells you the
    # type and the default value. (See the dataclasses docs link.)

    @property
    def total_damage(self):
        # Sub-goal: derive the total from whatever events the report holds.
        # Hint: with an empty-tuple default for events, an aggregate built-in
        # over an empty sequence already produces the value the test asserts.
        pass


def score(dice):
    # Sub-goal: hand back the kind of object the test reads attributes on.
    # Hint: ignore `dice` for now — no test makes a claim about non-empty
    # rolls yet, so any branching on it would be speculative design.
    pass

test_scorer.py

"""Cycle 1 — first failing test (carried over from the RED step)."""
from scorer import score


def test_empty_roll_has_zero_damage_and_no_events():
    report = score([])

    assert report.total_damage == 0
    assert report.events == ()

3

Cycle 1 — REFACTOR: The Pause That Counts

The discipline: REFACTOR is a phase you enter every cycle — even when the answer is “nothing to clean this time.” Entering and looking is the discipline. Skipping the look is the failure mode that quietly degrades TDD into test-after.

The REFACTOR checklist (you’ll re-use this every cycle)

Category	Question to ask	Cycle 1 answer
Duplication	Two pieces of code expressing the same idea?	No — only one piece of code
Names	Do names describe what they mean, not how they work?	`BattleReport`, `total_damage`, `events`, `score` — all domain words
Test names	Does each test name read as a behavior sentence?	`test_empty_roll_has_zero_damage_and_no_events` — long but unambiguous
Magic constants	Unexplained numbers or strings?	None yet
Imports	Conventional order, no dead imports?	Just `from dataclasses import dataclass` — clean

For cycle 1, every row is “fine.” That’s a real outcome of a REFACTOR phase — and recognising it without skipping the look is the win.

Your task

Re-read your code with the checklist. Spend 30 seconds — don’t rush.
Make any tiny improvement you spot (e.g., a module docstring); keep the bar green.
Take the first quiz below, focused on the rhythm you just lived through.

Why REFACTOR is the most-skipped phase

Martin Fowler calls skipping refactor “the most common way to screw up TDD.” Field studies of student and professional practice agree: developers treat the green bar as the finish line. Within a few cycles, duplication accumulates and the test suite ages — exactly because nobody paused at REFACTOR to look. By making “enter the phase even when there’s nothing to do” a habit now, you defend against that drift for the rest of the tutorial.

Starter files

scorer.py

from dataclasses import dataclass


@dataclass(frozen=True)
class BattleReport:
    events: tuple = ()

    @property
    def total_damage(self):
        return sum(event.damage for event in self.events)


def score(dice):
    return BattleReport()

test_scorer.py

"""Cycle 1 — first failing test, now green."""
from scorer import score


def test_empty_roll_has_zero_damage_and_no_events():
    report = score([])

    assert report.total_damage == 0
    assert report.events == ()

First Quiz — The Red-Green-Refactor Rhythm

Min. score: 80%

1. What is the correct order of phases in a single TDD cycle?

GREEN → RED → REFACTOR — write code, then prove it works with a test, then clean up
RED → GREEN → REFACTOR — write a failing test, write the smallest code to pass it, then improve the design with the safety net in place
RED → REFACTOR → GREEN — write a failing test, design the right structure first, then make the test pass
REFACTOR → RED → GREEN — clean up old code first, then write a test for the new behavior, then implement it

RED → GREEN → REFACTOR. The order is load-bearing. RED proves the test reveals missing behavior. GREEN proves the code now satisfies that behavior. REFACTOR improves the code while the safety net (the green test) protects you.

2. You write a new test and click Run. The bar is immediately green without you having to write any production code. What is the most defensible interpretation?

Celebrate — the existing code already implements this behavior, so there is nothing to do this cycle
Be suspicious — a test that passes without RED may be testing nothing meaningful, OR may genuinely be covered by an earlier refactor. Verify which by temporarily breaking the production code and confirming the test fails for the right reason
Delete the test — a test that passes immediately is redundant and only adds runtime cost
Rewrite the test until it fails, since the rule is RED → GREEN → REFACTOR and you cannot enter GREEN without first being RED

Both halves matter. Sometimes a test passes immediately because a prior refactor generalised behavior — that is a positive outcome, and the test still earns its place by documenting that the behavior is intended. Other times the test is vacuously true. The way to tell is to break the production code on purpose: a real test will catch the break.

3. In the GREEN phase, you have two implementations in mind: one is a hardcoded if dice == []: return BattleReport(), the other is a generic loop. The hardcoded version passes the current test. What does TDD discipline say to do?

Write the generic loop — it will be needed eventually, and writing it now saves work in the next cycle
Write the hardcoded version — TDD’s GREEN rule is the smallest code that passes the current failing test, even if you know more cycles are coming
Skip GREEN entirely and design the loop as part of REFACTOR — the cycle is flexible about which phase contains the new logic
It does not matter — both choices satisfy the test, and the difference is purely stylistic

This is the Transformation Priority Premise in action: prefer simpler transformations. The hardcoded version is fine as long as a future test will challenge it. That challenge — and the design pressure it creates — is exactly what cycle 4 of this tutorial will hand you.

4. Why is REFACTOR worth entering even when there is nothing to clean up?

Because pytest requires every cycle to have all three phases or it will skip the test in the next run
Because the discipline of pausing to look is the whole point — finding nothing actionable is fine, but skipping the look lets duplication and bad names accumulate silently across cycles
Because REFACTOR is the only phase where you can add new tests without breaking the cycle’s contract
Because the GREEN code is always wrong on the first attempt and must be revised before moving on

Field studies report that “skip refactor when it looks fine” is exactly how test-driven discipline degrades into test-after over a few weeks. Every cycle is a checkpoint where you ask “what do I see now?” Most cycles, the answer is mostly fine. But you only know that because you looked.

5. A teammate says: “TDD is just unit testing — write your tests first instead of last.” What is the most accurate correction?

They are right — TDD is a workflow ordering decision; the resulting tests and code are otherwise indistinguishable from test-after
TDD is a design technique that uses tests as the medium. Writing the test first forces interface and behavior decisions before implementation, and the REFACTOR step is where the design pressure pays off — neither of which is part of plain unit testing
TDD is primarily a refactoring technique, and the tests exist only as a regression safety net for the refactoring
TDD requires writing all tests for a feature first, then all the code at once — which is the opposite of unit testing’s incremental style

This is the threshold concept: TDD is design first, testing second. The test forces you to decide the public interface (function name, parameters, return shape) before any logic exists. The REFACTOR phase is where the design that emerged under pressure gets shaped intentionally. Calling it “unit testing in reverse order” misses both halves.

4

Cycle 2 — Single 1 → Dragon Flame

Spec: a single die showing 1 creates a Dragon Flame event worth 100 damage.

From now on, each cycle is one step with three tasks (RED → GREEN → REFACTOR). Same discipline as cycle 1 — tighter packaging.

Your task

🔴 RED — add test_single_one_creates_dragon_flame_event in test_scorer.py. The assertion should pin down total_damage == 100 and one ScoringEvent("Dragon Flame", (1,), 100) in events. Update the import to bring in ScoringEvent. Run — predict the failure, check.
🟢 GREEN — introduce a ScoringEvent dataclass and the smallest change to score. The minimum here is a hardcoded if dice == [1]: branch — that’s deliberate.
🔵 REFACTOR — walk the cycle-1 checklist. Resist generalising; cycle 4 will earn the loop.

Expected RED: ImportError: cannot import name 'ScoringEvent' — the test forces you to name the event class before writing it. That’s the design pressure of test-first thinking.

Why “allow the hard-code” is a TDD discipline

The instinct is to extract a rule, write a loop, build the abstraction now. TDD asks you to wait for the test that demands it. A speculative loop is a guess at the right shape; a loop refactor pulled by cycle 4’s test is a discovery. Refactor toward duplication, not before it.

If you’re genuinely stuck, click Show Solution below the editor.

Starter files

scorer.py

from dataclasses import dataclass


@dataclass(frozen=True)
class BattleReport:
    events: tuple = ()

    @property
    def total_damage(self):
        return sum(event.damage for event in self.events)


def score(dice):
    return BattleReport()

test_scorer.py

"""Cycles 1–2 — adding the Dragon Flame behavior."""
from scorer import score


def test_empty_roll_has_zero_damage_and_no_events():
    report = score([])

    assert report.total_damage == 0
    assert report.events == ()


# TODO (RED): import ScoringEvent from scorer
# TODO (RED): write test_single_one_creates_dragon_flame_event
#             score([1]) should return a report with total_damage == 100
#             and events == (ScoringEvent("Dragon Flame", (1,), 100),)

5

Cycle 3 — Single 5 → Lightning Spark

Spec: a single die showing 5 creates a Lightning Spark event worth 50 damage.

Same shape as cycle 2, different values. The duplication this creates is intentional — cycle 4’s test will earn the right to fix it.

Your task

🔴 RED — add test_single_five_creates_lightning_spark_event, structured exactly like cycle 2’s test but with the Lightning Spark values.
🟢 GREEN — add a second hardcoded if dice == [5]: branch. Resist the urge to write a loop or dict lookup.
🔵 REFACTOR — walk the checklist. The duplication is now visible; the right move is to note it and write nothing. No test demands the loop yet.

Why deliberately keeping ugly code is the disciplined move

You can clearly see duplication. Refactoring it now would be guessing at the right shape with one too few data points. Cycle 4’s test will provide the second data point — and the loop refactor it earns is a discovery, not a guess. Refactor toward duplication, not before it.

Click Show Solution below the editor if you need one form of the test + GREEN code.

Starter files

scorer.py

from dataclasses import dataclass


@dataclass(frozen=True)
class ScoringEvent:
    name: str
    dice_used: tuple
    damage: int


@dataclass(frozen=True)
class BattleReport:
    events: tuple = ()

    @property
    def total_damage(self):
        return sum(event.damage for event in self.events)


def score(dice):
    if dice == [1]:
        return BattleReport((
            ScoringEvent("Dragon Flame", (1,), 100),
        ))
    return BattleReport()

test_scorer.py

"""Cycles 1–3 — adding the Lightning Spark behavior."""
from scorer import score, ScoringEvent


def test_empty_roll_has_zero_damage_and_no_events():
    report = score([])

    assert report.total_damage == 0
    assert report.events == ()


def test_single_one_creates_dragon_flame_event():
    report = score([1])

    assert report.total_damage == 100
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
    )


# TODO (RED): write test_single_five_creates_lightning_spark_event
#             score([5]) should return total_damage == 50 with
#             events == (ScoringEvent("Lightning Spark", (5,), 50),)

6

Cycle 4 — Repeated Singles → First Real Refactor

Spec: score([1, 1]) returns total damage 200 with two Dragon Flame events.

The first design-breaking test. Neither dice == [1] nor dice == [5] matches [1, 1] — the duplication you noted in cycle 3 just demanded payment.

Your task

🔴 RED — add test_two_ones_create_two_dragon_flames asserting damage 200 and two Flame events. Run; predict the failure.
🟢 GREEN — you have two options. Pick the better one:
- Option A: a third hardcoded branch for dice == [1, 1]. Passes — and makes duplication worse, with a fourth branch coming next cycle.
- Option B: replace the hardcoded branches with a for loop over each die. Passes cycle 4 and retires the cycle-3 duplication in one move.
🔵 REFACTOR — re-run; the previous three tests are your safety net. Note what just happened: you rewrote the body of score and knew within a second that you didn’t break anything.

The threshold-concept moment: tests enable change

You just rewrote the body of score — two hardcoded branches → generic per-die loop — and confirmed in one second that nothing broke. Without the test suite you’d have had to reason from scratch: “Does this still handle empty? Single 1? Single 5?”

That’s what “tests enable change” means. Not “prevent change.” Not “slow change.” Enable — they replace fear-driven manual reasoning with a mechanical check. This is the cycle that earns the safety-net quiz below.

Click Show Solution if you need to see one form of option B.

Starter files

scorer.py

from dataclasses import dataclass


@dataclass(frozen=True)
class ScoringEvent:
    name: str
    dice_used: tuple
    damage: int


@dataclass(frozen=True)
class BattleReport:
    events: tuple = ()

    @property
    def total_damage(self):
        return sum(event.damage for event in self.events)


def score(dice):
    if dice == [1]:
        return BattleReport((
            ScoringEvent("Dragon Flame", (1,), 100),
        ))
    if dice == [5]:
        return BattleReport((
            ScoringEvent("Lightning Spark", (5,), 50),
        ))
    return BattleReport()

test_scorer.py

"""Cycles 1–4 — repeated singles force the first refactor."""
from scorer import score, ScoringEvent


def test_empty_roll_has_zero_damage_and_no_events():
    report = score([])

    assert report.total_damage == 0
    assert report.events == ()


def test_single_one_creates_dragon_flame_event():
    report = score([1])

    assert report.total_damage == 100
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
    )


def test_single_five_creates_lightning_spark_event():
    report = score([5])

    assert report.total_damage == 50
    assert report.events == (
        ScoringEvent("Lightning Spark", (5,), 50),
    )


# TODO (RED): write test_two_ones_create_two_dragon_flames
#             score([1, 1]) should return total_damage == 200 and two
#             Dragon Flame events in events

Cycle 4 Quiz — Tests as Design Pressure

Min. score: 80%

1. You noticed obvious duplication after cycle 3 but did not refactor it. Cycle 4’s test then forced the refactor. Why is this sequence (notice → wait → refactor when test demands) better than (notice → refactor immediately)?

It is not better — refactoring as soon as you see duplication is always the right move; this tutorial just artificially delayed it
Waiting until a test demands the change ensures the refactor is justified by behavior, not by aesthetic preference. It also gives you more data points to design the right shape, instead of guessing
Python performance is better when you delay extracting loops, because the interpreter optimises hardcoded branches differently from iterative ones
It is purely a style choice — the resulting code in either order is identical, so the order of operations does not matter

The test that demands the refactor is also the test that confirms the refactor preserved behavior. Refactoring on aesthetic instinct alone is gambling — you might pick a shape that the next test makes ugly anyway. The discipline is: let the tests speak first, then change the structure.

2. You replaced the hardcoded branches with a for loop and ran pytest. All four tests passed. What did the test suite just do for you that you would otherwise have had to do manually?

Nothing — passing tests do not provide any information beyond what manual inspection of the new code would have
It mechanically verified that the new structure preserves the behavior of the empty roll, single 1, single 5, and [1, 1] cases — replacing fear-driven manual reasoning with a one-second check
It rewrote your code to be more efficient — the suite optimises the implementation as a side effect of being run
It logged the change so a future code reviewer can audit it — the suite acts as a versioned history of refactor decisions

Tests enable change. The four green checkmarks are not just “tests passed” — they are “the empty case still works, single 1 still works, single 5 still works, and [1, 1] works for the first time.” Without the suite, you would have to convince yourself of each line by reading.

3. A teammate looks at your cycle-4 GREEN code and says: “Why didn’t you just add a third if dice == [1, 1]: branch? It would have passed the test.” What is the most accurate rebuttal?

A third branch would have passed cycle 4, but the duplication would have got worse, cycle 5 would have demanded a fourth branch, and the eventual loop refactor would still have to happen — only with more code to delete
A third branch is forbidden by pytest — only one if statement per function is allowed in TDD code
The teammate is right and you should rewrite — TDD always prefers the smallest change, and one new if is smaller than rewriting the function
It does not matter — both approaches produce the same byte code after Python compiles them

The TDD GREEN rule is “smallest code that passes the failing test” given the current direction of the design. A third hardcoded branch is locally minimal but globally wasteful — you’ll throw it away in cycle 5 or cycle 6 anyway. The loop is the smallest durable change.

4. (Spaced review — Cycle 1) Why is “RED for the right reason” still the right framing in cycle 4, even though we are now adding a test on top of an already-passing suite?

It is not — RED for the right reason only applies to cycle 1, where there is no implementation at all
Because the new test should fail with a meaningful mismatch (assertion failure on the expected events tuple) — not a typo or a missing import. That tells you the test is correctly pinning down a behavior that the current implementation cannot produce
Because pytest will only run new tests after old ones have failed at least once
Because every cycle must produce both a red bar and an import error or it will be rejected by the test runner

“RED for the right reason” applies to every cycle. In cycle 1 the right reason is usually ImportError. In later cycles it is typically an assertion failure showing the expected tuple of events versus what the current code actually produced. Either way, the failure has to come from the behavior under test, not from a typo.

7

Cycle 5 — Mixed Dice (No Production Change)

Spec: score([1, 5]) produces a mix of one Dragon Flame and one Lightning Spark.

Your task

🔴 RED — add test_one_and_five_create_two_different_events. Predict what pytest will show this time, then run.
🟢 GREEN — most likely outcome: all five tests pass without any production change. Cycle 4’s loop already handles mixed dice independently. The new test costs zero code — it just documents the intended behavior.
🔵 REFACTOR — verify the “free pass” with the ten-second mutation move (next collapsible).

The mutation move (verify the test isn’t vacuous — do this!)

A test that passes immediately can mean two things:

Why?	What it means
A previous refactor already generalised the behavior	✅ Positive — design is healthy
The assertion is vacuously true / missing / wrong oracle	❌ Defect — a Liar test

Tell them apart in 10 seconds: break the production code on purpose. Open scorer.py, temporarily change if die == 5: to if die == 999:. Run pytest. The new test should now fail — proving it actually checks the behavior. Restore the line.

This single move is the strongest defence against AI-generated Liar tests. Make it a habit; we’ll come back to it.

Starter files

scorer.py

from dataclasses import dataclass


@dataclass(frozen=True)
class ScoringEvent:
    name: str
    dice_used: tuple
    damage: int


@dataclass(frozen=True)
class BattleReport:
    events: tuple = ()

    @property
    def total_damage(self):
        return sum(event.damage for event in self.events)


def score(dice):
    events = []

    for die in dice:
        if die == 1:
            events.append(ScoringEvent("Dragon Flame", (1,), 100))
        if die == 5:
            events.append(ScoringEvent("Lightning Spark", (5,), 50))

    return BattleReport(tuple(events))

test_scorer.py

"""Cycles 1–5 — adding mixed dice."""
from scorer import score, ScoringEvent


def test_empty_roll_has_zero_damage_and_no_events():
    report = score([])

    assert report.total_damage == 0
    assert report.events == ()


def test_single_one_creates_dragon_flame_event():
    report = score([1])

    assert report.total_damage == 100
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
    )


def test_single_five_creates_lightning_spark_event():
    report = score([5])

    assert report.total_damage == 50
    assert report.events == (
        ScoringEvent("Lightning Spark", (5,), 50),
    )


def test_two_ones_create_two_dragon_flames():
    report = score([1, 1])

    assert report.total_damage == 200
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Dragon Flame", (1,), 100),
    )


# TODO (RED): write test_one_and_five_create_two_different_events
#             score([1, 5]) should return total_damage == 150 with
#             one Dragon Flame followed by one Lightning Spark

Solution

scorer.py

from dataclasses import dataclass


@dataclass(frozen=True)
class ScoringEvent:
    name: str
    dice_used: tuple
    damage: int


@dataclass(frozen=True)
class BattleReport:
    events: tuple = ()

    @property
    def total_damage(self):
        return sum(event.damage for event in self.events)


def score(dice):
    events = []

    for die in dice:
        if die == 1:
            events.append(ScoringEvent("Dragon Flame", (1,), 100))
        if die == 5:
            events.append(ScoringEvent("Lightning Spark", (5,), 50))

    return BattleReport(tuple(events))

test_scorer.py

"""Cycles 1–5 — five tests, all green."""
from scorer import score, ScoringEvent


def test_empty_roll_has_zero_damage_and_no_events():
    report = score([])

    assert report.total_damage == 0
    assert report.events == ()


def test_single_one_creates_dragon_flame_event():
    report = score([1])

    assert report.total_damage == 100
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
    )


def test_single_five_creates_lightning_spark_event():
    report = score([5])

    assert report.total_damage == 50
    assert report.events == (
        ScoringEvent("Lightning Spark", (5,), 50),
    )


def test_two_ones_create_two_dragon_flames():
    report = score([1, 1])

    assert report.total_damage == 200
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Dragon Flame", (1,), 100),
    )


def test_one_and_five_create_two_different_events():
    report = score([1, 5])

    assert report.total_damage == 150
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Lightning Spark", (5,), 50),
    )

The cycle-4 loop already handles mixed dice. Cycle 5’s test passes without any production change — and this is a positive outcome that documents the intended behavior. Verify with the ten-second mutation move described in the instructions.

8

Cycle 6 — Triple 1s → Dragon Blast (Design Moment)

Spec: three 1s in a roll combine into one Dragon Blast (1000 damage) instead of three Dragon Flames.

The pivot moment. Cycle 4’s per-die loop walks each die independently — it has no way to know that the other two 1s exist when it processes the first. This test cannot be satisfied by tweaking a branch; the structure has to change. A design-breaking test.

Your task

🔴 RED — add test_three_ones_create_dragon_blast_instead_of_three_flames. The current loop produces three Flames at 300 damage; the test demands one Blast at 1000. Run, see the value-mismatch.
🟢 GREEN — the structural shift: stop iterating dice in order, count how many of each face appeared (collections.Counter), then decide what to emit. Combos consume dice; leftovers still score as singles.
🔵 REFACTOR — re-run; the previous five tests survived a full body rewrite of score. That’s the safety net working.

Why this was a design-breaking test (and why the previous tests still pass)

A design-breaking test cannot be satisfied by adding a branch — only by reshaping structure. The cycle-4 per-die loop was fundamentally per-die; combos are fundamentally per-face-count. You couldn’t patch your way from one to the other.

Yet all five previous tests still pass after the rewrite. That’s because they assert on observable behavior (total_damage == 100, events == (event,)), not on internals (which loop, which variable name). Behavior tests survive structural rewrites; implementation-tests don’t. This is the Refactoring Litmus Test.

Starter files

scorer.py

from dataclasses import dataclass


@dataclass(frozen=True)
class ScoringEvent:
    name: str
    dice_used: tuple
    damage: int


@dataclass(frozen=True)
class BattleReport:
    events: tuple = ()

    @property
    def total_damage(self):
        return sum(event.damage for event in self.events)


def score(dice):
    events = []

    for die in dice:
        if die == 1:
            events.append(ScoringEvent("Dragon Flame", (1,), 100))
        if die == 5:
            events.append(ScoringEvent("Lightning Spark", (5,), 50))

    return BattleReport(tuple(events))

test_scorer.py

"""Cycles 1–6 — triple 1s break the per-die loop."""
from scorer import score, ScoringEvent


def test_empty_roll_has_zero_damage_and_no_events():
    report = score([])

    assert report.total_damage == 0
    assert report.events == ()


def test_single_one_creates_dragon_flame_event():
    report = score([1])

    assert report.total_damage == 100
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
    )


def test_single_five_creates_lightning_spark_event():
    report = score([5])

    assert report.total_damage == 50
    assert report.events == (
        ScoringEvent("Lightning Spark", (5,), 50),
    )


def test_two_ones_create_two_dragon_flames():
    report = score([1, 1])

    assert report.total_damage == 200
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Dragon Flame", (1,), 100),
    )


def test_one_and_five_create_two_different_events():
    report = score([1, 5])

    assert report.total_damage == 150
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Lightning Spark", (5,), 50),
    )


# TODO (RED): write test_three_ones_create_dragon_blast_instead_of_three_flames
#             score([1, 1, 1]) should return total_damage == 1000 with
#             a single Dragon Blast event whose dice_used is (1, 1, 1)

Solution

scorer.py

from collections import Counter
from dataclasses import dataclass


@dataclass(frozen=True)
class ScoringEvent:
    name: str
    dice_used: tuple
    damage: int


@dataclass(frozen=True)
class BattleReport:
    events: tuple = ()

    @property
    def total_damage(self):
        return sum(event.damage for event in self.events)


def score(dice):
    counts = Counter(dice)
    events = []

    if counts[1] >= 3:
        events.append(ScoringEvent("Dragon Blast", (1, 1, 1), 1000))
        counts[1] -= 3

    for _ in range(counts[1]):
        events.append(ScoringEvent("Dragon Flame", (1,), 100))

    for _ in range(counts[5]):
        events.append(ScoringEvent("Lightning Spark", (5,), 50))

    return BattleReport(tuple(events))

test_scorer.py

"""Cycles 1–6 — combos enter the design."""
from scorer import score, ScoringEvent


def test_empty_roll_has_zero_damage_and_no_events():
    report = score([])

    assert report.total_damage == 0
    assert report.events == ()


def test_single_one_creates_dragon_flame_event():
    report = score([1])

    assert report.total_damage == 100
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
    )


def test_single_five_creates_lightning_spark_event():
    report = score([5])

    assert report.total_damage == 50
    assert report.events == (
        ScoringEvent("Lightning Spark", (5,), 50),
    )


def test_two_ones_create_two_dragon_flames():
    report = score([1, 1])

    assert report.total_damage == 200
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Dragon Flame", (1,), 100),
    )


def test_one_and_five_create_two_different_events():
    report = score([1, 5])

    assert report.total_damage == 150
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Lightning Spark", (5,), 50),
    )


def test_three_ones_create_dragon_blast_instead_of_three_flames():
    report = score([1, 1, 1])

    assert report.total_damage == 1000
    assert report.events == (
        ScoringEvent("Dragon Blast", (1, 1, 1), 1000),
    )

The structural shift: count occurrences with Counter, run the combo check first, subtract the consumed dice, then emit singles for what’s left. The five previous tests act as the safety net for the rewrite — and all five still pass because counting-and-emitting is observationally equivalent to the per-die loop for non-combo cases.

Cycle 6 Quiz — Design-Breaking Tests

Min. score: 80%

1. What makes cycle 6’s test a design-breaking test, as opposed to a normal “add a branch” test?

Cycle 6’s assertion is longer than previous assertions, so the implementation has to grow proportionally to satisfy it
The per-die loop processes each die in isolation. There is no way for the loop to know, when it sees the first 1, that two more 1s exist later. Counting requires a structural shift — not a new branch
It uses pytest.raises, which forces the implementation to add exception handling
It tests three dice instead of one or two, and pytest internally requires a different runner for tests that exceed two arguments

A design-breaking test is one no local edit can satisfy. The cycle-4 loop is fundamentally per-die; combos are fundamentally per-face-count. You cannot patch your way from one to the other — you have to re-shape the algorithm. That re-shaping is what the test extracts as design pressure.

2. After the count-then-emit refactor, all five previous tests still passed. What does that tell you about the previous tests?

That they were unnecessary — passing on a totally different implementation means they were not pinning down anything specific
That they were testing observable behavior (return values), not implementation details (which loop was used). Behavior-level tests survive a structural rewrite; implementation-level tests would have broken
That pytest silently re-imports the previous implementation if the new one breaks them, so the green bar in this case is misleading
That the previous tests were duplicates of each other and could be deleted to simplify the suite

This is the Refactoring Litmus Test concept. Robust tests assert on what the unit does (report.total_damage == 100, report.events == (event,)), not on how it does it (assert "for die in dice" in inspect.getsource(score)). The first survives a rewrite; the second breaks at the slightest refactor.

3. Why does cycle 6’s GREEN code subtract from counts[1] after emitting the Dragon Blast event?

To avoid an off-by-one error in the next cycle’s parametrized tests
Because a combo consumes the dice it uses. If three 1s become a Dragon Blast, those three 1s are no longer available to score as Dragon Flames. The subtraction makes the bookkeeping explicit
Because Python’s Counter requires explicit decrement after every read; otherwise it caches stale values
It is a stylistic choice — counts[1] = 0 and del counts[1] would also work and produce the same behavior

“Combos consume dice, leftovers still score as singles” is one of the rules. The subtraction isn’t a Python quirk — it is the domain rule made explicit in the code. Cycle 7’s test will verify exactly this leftover behavior.

4. (Spaced review — Cycle 4) A teammate suggests “let’s just add if Counter(dice)[1] >= 3: ... at the top of the existing per-die loop.” Why is that not the same as the count-then-emit refactor we did?

It would still re-walk dice in two different shapes (the loop + the Counter pass), which is a different smell — the code now has two competing models of what the input is, and future combos would multiply the inconsistency
It would be slightly faster but would produce the same result, so functionally it is equivalent
It would not work in Python — Counter cannot be called inside a for loop body
Pytest would refuse to run it because the test runner forbids mixing iteration styles

The refactor is not “add Counter on top of the existing loop.” It is “replace the per-die view with the per-face-count view.” Mixing both is the worst of both worlds: now the function maintains two parallel models of the input, and every future change has to update both. The structural shift was the point.

9

Cycle 7 — Combo with Leftover Singles

Spec: score([1, 1, 1, 1, 5]) produces one Dragon Blast plus one leftover Flame and a Lightning Spark.

Cycle 6’s counts[1] -= 3 already handles this — but until a test asserts it, the leftover behavior is only implicitly correct. A future refactor could break it silently. Cycle 7 turns implicit correctness into an explicit guardrail.

Your task

🔴 RED — add test_dragon_blast_plus_leftover_flame_and_spark. Predict whether it passes immediately, then run.
🟢 GREEN — likely no production change. The cycle-6 design already produces this output.
🔵 REFACTOR + mutation check — apply the cycle-5 mutation move required this time: change counts[1] -= 3 to counts[1] -= 4, run, confirm cycle 7 fails, restore. That proves the test is a real guardrail.

Why guardrail tests matter

The most common way a TDD suite degrades over time is implicit correctness silently rotting into implicit incorrectness. A behavior the code does today, but no test asserts, can disappear in a refactor — and nobody notices until a customer reports the bug. The discipline: when you spot behavior that “just works,” ask if a future refactor could break it. If yes, write the test that protects it.

Starter files

scorer.py

from collections import Counter
from dataclasses import dataclass


@dataclass(frozen=True)
class ScoringEvent:
    name: str
    dice_used: tuple
    damage: int


@dataclass(frozen=True)
class BattleReport:
    events: tuple = ()

    @property
    def total_damage(self):
        return sum(event.damage for event in self.events)


def score(dice):
    counts = Counter(dice)
    events = []

    if counts[1] >= 3:
        events.append(ScoringEvent("Dragon Blast", (1, 1, 1), 1000))
        counts[1] -= 3

    for _ in range(counts[1]):
        events.append(ScoringEvent("Dragon Flame", (1,), 100))

    for _ in range(counts[5]):
        events.append(ScoringEvent("Lightning Spark", (5,), 50))

    return BattleReport(tuple(events))

test_scorer.py

"""Cycles 1–7 — adding the leftover-singles guardrail."""
from scorer import score, ScoringEvent


def test_empty_roll_has_zero_damage_and_no_events():
    report = score([])

    assert report.total_damage == 0
    assert report.events == ()


def test_single_one_creates_dragon_flame_event():
    report = score([1])

    assert report.total_damage == 100
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
    )


def test_single_five_creates_lightning_spark_event():
    report = score([5])

    assert report.total_damage == 50
    assert report.events == (
        ScoringEvent("Lightning Spark", (5,), 50),
    )


def test_two_ones_create_two_dragon_flames():
    report = score([1, 1])

    assert report.total_damage == 200
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Dragon Flame", (1,), 100),
    )


def test_one_and_five_create_two_different_events():
    report = score([1, 5])

    assert report.total_damage == 150
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Lightning Spark", (5,), 50),
    )


def test_three_ones_create_dragon_blast_instead_of_three_flames():
    report = score([1, 1, 1])

    assert report.total_damage == 1000
    assert report.events == (
        ScoringEvent("Dragon Blast", (1, 1, 1), 1000),
    )


# TODO (RED): write test_dragon_blast_plus_leftover_flame_and_spark
#             score([1, 1, 1, 1, 5]) should return total_damage == 1150
#             with a Dragon Blast, then a Dragon Flame, then a Lightning Spark

Solution

scorer.py

from collections import Counter
from dataclasses import dataclass


@dataclass(frozen=True)
class ScoringEvent:
    name: str
    dice_used: tuple
    damage: int


@dataclass(frozen=True)
class BattleReport:
    events: tuple = ()

    @property
    def total_damage(self):
        return sum(event.damage for event in self.events)


def score(dice):
    counts = Counter(dice)
    events = []

    if counts[1] >= 3:
        events.append(ScoringEvent("Dragon Blast", (1, 1, 1), 1000))
        counts[1] -= 3

    for _ in range(counts[1]):
        events.append(ScoringEvent("Dragon Flame", (1,), 100))

    for _ in range(counts[5]):
        events.append(ScoringEvent("Lightning Spark", (5,), 50))

    return BattleReport(tuple(events))

test_scorer.py

"""Cycles 1–7 — leftover bookkeeping protected by a guardrail test."""
from scorer import score, ScoringEvent


def test_empty_roll_has_zero_damage_and_no_events():
    report = score([])

    assert report.total_damage == 0
    assert report.events == ()


def test_single_one_creates_dragon_flame_event():
    report = score([1])

    assert report.total_damage == 100
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
    )


def test_single_five_creates_lightning_spark_event():
    report = score([5])

    assert report.total_damage == 50
    assert report.events == (
        ScoringEvent("Lightning Spark", (5,), 50),
    )


def test_two_ones_create_two_dragon_flames():
    report = score([1, 1])

    assert report.total_damage == 200
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Dragon Flame", (1,), 100),
    )


def test_one_and_five_create_two_different_events():
    report = score([1, 5])

    assert report.total_damage == 150
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Lightning Spark", (5,), 50),
    )


def test_three_ones_create_dragon_blast_instead_of_three_flames():
    report = score([1, 1, 1])

    assert report.total_damage == 1000
    assert report.events == (
        ScoringEvent("Dragon Blast", (1, 1, 1), 1000),
    )


def test_dragon_blast_plus_leftover_flame_and_spark():
    report = score([1, 1, 1, 1, 5])

    assert report.total_damage == 1150
    assert report.events == (
        ScoringEvent("Dragon Blast", (1, 1, 1), 1000),
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Lightning Spark", (5,), 50),
    )

Cycle 6’s counts[1] -= 3 already produces the right output. Cycle 7’s test documents that intent — converting an implicit correctness into an explicit guardrail. Mutate -= 3 to -= 4 on purpose to confirm the test catches it.

10

Cycle 8 — Goblin Swarm → Rule Objects (Big Refactor)

Spec: three 2s combine into a Goblin Swarm (200 damage).

Before writing anything, imagine where if counts[2] >= 3: would go. Now imagine six more triple combos (cycles 9–14) each adding their own if-and-subtract block. Painful? That pain is the design pressure. Cycle 8 is when the structure has to change.

Your task

🔴 RED — add test_three_twos_create_goblin_swarm, structured like cycle 6’s test. Run; the existing code knows nothing about 2s.
🟢 GREEN — you have two choices:
- Option A: another if counts[2] >= 3: block. Local minimum, multiplies the pain five-fold across cycles 9–14.
- Option B: extract the rule shape into data. Pull combo-rule logic into a ComboRule class with an apply(counts) method; same for SingleRule. Then drive score from two registries.
🔵 REFACTOR — option B is the refactor. Re-run; the seven prior tests are your safety net.

Listening to the test & the Open-Closed Principle

The discipline of listening to the test: the difficulty of imagining six more if-blocks was a signal — the structure no longer fit. The cure is structural extraction.

You just applied the Open-Closed Principle: score is now closed for modification (you won’t edit it again) but open for extension (cycle 9 will add four rules by appending data). Adding behavior = adding data, not editing code.

Refactor toward duplication, not before. Two data points (Dragon Blast + Goblin Swarm) are now enough to see the right shape; with one, you’d be guessing.

Starter files

scorer.py

from collections import Counter
from dataclasses import dataclass


@dataclass(frozen=True)
class ScoringEvent:
    name: str
    dice_used: tuple
    damage: int


@dataclass(frozen=True)
class BattleReport:
    events: tuple = ()

    @property
    def total_damage(self):
        return sum(event.damage for event in self.events)


def score(dice):
    counts = Counter(dice)
    events = []

    if counts[1] >= 3:
        events.append(ScoringEvent("Dragon Blast", (1, 1, 1), 1000))
        counts[1] -= 3

    for _ in range(counts[1]):
        events.append(ScoringEvent("Dragon Flame", (1,), 100))

    for _ in range(counts[5]):
        events.append(ScoringEvent("Lightning Spark", (5,), 50))

    return BattleReport(tuple(events))

test_scorer.py

"""Cycles 1–8 — Goblin Swarm forces the rule-object refactor."""
from scorer import score, ScoringEvent


def test_empty_roll_has_zero_damage_and_no_events():
    report = score([])

    assert report.total_damage == 0
    assert report.events == ()


def test_single_one_creates_dragon_flame_event():
    report = score([1])

    assert report.total_damage == 100
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
    )


def test_single_five_creates_lightning_spark_event():
    report = score([5])

    assert report.total_damage == 50
    assert report.events == (
        ScoringEvent("Lightning Spark", (5,), 50),
    )


def test_two_ones_create_two_dragon_flames():
    report = score([1, 1])

    assert report.total_damage == 200
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Dragon Flame", (1,), 100),
    )


def test_one_and_five_create_two_different_events():
    report = score([1, 5])

    assert report.total_damage == 150
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Lightning Spark", (5,), 50),
    )


def test_three_ones_create_dragon_blast_instead_of_three_flames():
    report = score([1, 1, 1])

    assert report.total_damage == 1000
    assert report.events == (
        ScoringEvent("Dragon Blast", (1, 1, 1), 1000),
    )


def test_dragon_blast_plus_leftover_flame_and_spark():
    report = score([1, 1, 1, 1, 5])

    assert report.total_damage == 1150
    assert report.events == (
        ScoringEvent("Dragon Blast", (1, 1, 1), 1000),
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Lightning Spark", (5,), 50),
    )


# TODO (RED): write test_three_twos_create_goblin_swarm
#             score([2, 2, 2]) should return total_damage == 200 with
#             a Goblin Swarm event whose dice_used is (2, 2, 2)

Solution

scorer.py

from collections import Counter
from dataclasses import dataclass


@dataclass(frozen=True)
class ScoringEvent:
    name: str
    dice_used: tuple
    damage: int


@dataclass(frozen=True)
class BattleReport:
    events: tuple = ()

    @property
    def total_damage(self):
        return sum(event.damage for event in self.events)


@dataclass(frozen=True)
class ComboRule:
    die: int
    count: int
    name: str
    damage: int

    def apply(self, counts):
        events = []

        if counts[self.die] >= self.count:
            dice_used = tuple([self.die] * self.count)
            events.append(ScoringEvent(self.name, dice_used, self.damage))
            counts[self.die] -= self.count

        return events


@dataclass(frozen=True)
class SingleRule:
    die: int
    name: str
    damage: int

    def apply(self, counts):
        events = []

        for _ in range(counts[self.die]):
            events.append(ScoringEvent(self.name, (self.die,), self.damage))

        return events


COMBO_RULES = (
    ComboRule(1, 3, "Dragon Blast", 1000),
    ComboRule(2, 3, "Goblin Swarm", 200),
)

SINGLE_RULES = (
    SingleRule(1, "Dragon Flame", 100),
    SingleRule(5, "Lightning Spark", 50),
)


def score(dice):
    counts = Counter(dice)
    events = []

    for rule in COMBO_RULES:
        events.extend(rule.apply(counts))

    for rule in SINGLE_RULES:
        events.extend(rule.apply(counts))

    return BattleReport(tuple(events))

test_scorer.py

"""Cycles 1–8 — rule objects power the design."""
from scorer import score, ScoringEvent


def test_empty_roll_has_zero_damage_and_no_events():
    report = score([])

    assert report.total_damage == 0
    assert report.events == ()


def test_single_one_creates_dragon_flame_event():
    report = score([1])

    assert report.total_damage == 100
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
    )


def test_single_five_creates_lightning_spark_event():
    report = score([5])

    assert report.total_damage == 50
    assert report.events == (
        ScoringEvent("Lightning Spark", (5,), 50),
    )


def test_two_ones_create_two_dragon_flames():
    report = score([1, 1])

    assert report.total_damage == 200
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Dragon Flame", (1,), 100),
    )


def test_one_and_five_create_two_different_events():
    report = score([1, 5])

    assert report.total_damage == 150
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Lightning Spark", (5,), 50),
    )


def test_three_ones_create_dragon_blast_instead_of_three_flames():
    report = score([1, 1, 1])

    assert report.total_damage == 1000
    assert report.events == (
        ScoringEvent("Dragon Blast", (1, 1, 1), 1000),
    )


def test_dragon_blast_plus_leftover_flame_and_spark():
    report = score([1, 1, 1, 1, 5])

    assert report.total_damage == 1150
    assert report.events == (
        ScoringEvent("Dragon Blast", (1, 1, 1), 1000),
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Lightning Spark", (5,), 50),
    )


def test_three_twos_create_goblin_swarm():
    report = score([2, 2, 2])

    assert report.total_damage == 200
    assert report.events == (
        ScoringEvent("Goblin Swarm", (2, 2, 2), 200),
    )

Extract ComboRule and SingleRule dataclasses with an apply method, plus two registry tuples (COMBO_RULES, SINGLE_RULES). The score function becomes two trivial loops. New rules are now data — cycle 9 will append four rows to COMBO_RULES without touching score at all.

Cycle 8 Quiz — Listening to the Test

Min. score: 80%

1. What does “listening to the test” mean as a refactoring heuristic?

Read every test out loud before running it, to catch typos in the test names that pytest cannot detect
Treat the difficulty of writing or understanding a test (or the pain of adding new branches that match its shape) as a diagnostic signal that the production code’s structure may be the real problem
Use only test names that the codebase’s audit log specifically asks for, ignoring the developer’s own intuition about what to call them
Run pytest with --log-cli-level=DEBUG and inspect the trace, since that output reveals the next refactor

“Listen to the test” is a foundational TDD heuristic: when adding a test or a branch hurts disproportionately, the structure is telling you something. In cycle 8, the pain of imagining six more if counts[X] >= 3: branches was the signal. The fix was a structural shift to data, not more code.

2. Why is putting rules in a COMBO_RULES tuple of ComboRule instances better than keeping them as if counts[X] >= 3: branches inside score()?

It is not better — it is just stylistically different. The byte code emitted is identical and the runtime cost is the same
Adding a new rule now means appending one row of data, with no edit to score(). That is the Open-Closed Principle: the function is closed for modification but open for extension via the registry
Tuples are faster than lists in Python, so the rule-object form is a performance optimisation
Python’s import system caches tuples but not function bodies, so subsequent test runs are quicker

The Open-Closed Principle (OCP) says modules should be open for extension and closed for modification. The rule-object refactor is the OCP in five lines: score() no longer changes when you add a new triple combo — only the data does. Cycle 9 will demonstrate the payoff by adding four combos at once.

3. A teammate says: “You should have done the rule-object refactor in cycle 6, when you first introduced combos. Why wait?” What is the most defensible answer?

Refactoring earlier would have been wrong — the rule-object shape only became visible after cycle 8’s test added a second combo, and refactoring with one example is guessing at the right shape
Cycle 6 was technically possible but pytest had not yet released support for dataclasses, so the refactor would have failed at import time
It is a stylistic preference — both orderings produce identical code in the end and one is not measurably better than the other
Earlier refactoring is always better; the tutorial’s order is artificial and you should always do the biggest refactor as soon as you can

Two data points are the minimum for seeing the right shape of an abstraction. With only Dragon Blast (cycle 6), you’d be guessing whether the variation is “the die that triggers” or “the count required” or “the name.” Cycle 8’s Goblin Swarm provides the second data point, and only then is ComboRule(die, count, name, damage) a discovery rather than a guess. Refactor toward duplication, not before it.

4. (Spaced review — Cycle 6) After the rule-object refactor, all seven previous tests still pass. Why is this expected, and what does it confirm about the previous tests?

It is unexpected — refactoring should always break something. If nothing broke, the previous tests must have been weak
It is expected because the previous tests assert on behavior (return values), not on internals (which class held the rule data). Behavior-level assertions survive structural rewrites — that is what makes them robust
It is expected because pytest skips tests whose names contain numbers — and several of the previous tests do
It is unexpected because Python’s import order changed, so most tests should have failed to import

The seven previous tests asserted on report.total_damage, report.events, and ScoringEvent instances. None of them asserted on internal structure. That is precisely why they survived a refactor that replaced the entire implementation strategy. The Refactoring Litmus Test: behavior tests survive structural change; implementation tests don’t.

11

Cycle 9 — Add the Rest of the Combo Rules

Spec: the four remaining triples — Triple 3 → Orc Charge (300), Triple 4 → Troll Smash (400), Triple 5 → Lightning Storm (500), Triple 6 → Demon Strike (600).

This cycle exists to prove cycle 8’s refactor paid off. Four new behaviors → no edits to score().

Your task

🔴 RED — write one parametrised test (@pytest.mark.parametrize) covering all four triples. Each row runs as a separate pytest result.
🟢 GREEN — append four rows to COMBO_RULES. Do not touch score(). That’s the proof.
🔵 REFACTOR — re-run; check duplication (none — rules are data) and that all 12 results pass.

Why parametrize beats a for-loop inside one test

@pytest.mark.parametrize runs the function once per row, reporting each as a separate test result. A for loop inside a single test stops at the first failure, hiding everything after it. The parametrise idiom is the right Python answer to “N tests of the same shape” — DRY tests that still report separate failures.

Starter files

scorer.py

from collections import Counter
from dataclasses import dataclass


@dataclass(frozen=True)
class ScoringEvent:
    name: str
    dice_used: tuple
    damage: int


@dataclass(frozen=True)
class BattleReport:
    events: tuple = ()

    @property
    def total_damage(self):
        return sum(event.damage for event in self.events)


@dataclass(frozen=True)
class ComboRule:
    die: int
    count: int
    name: str
    damage: int

    def apply(self, counts):
        events = []

        if counts[self.die] >= self.count:
            dice_used = tuple([self.die] * self.count)
            events.append(ScoringEvent(self.name, dice_used, self.damage))
            counts[self.die] -= self.count

        return events


@dataclass(frozen=True)
class SingleRule:
    die: int
    name: str
    damage: int

    def apply(self, counts):
        events = []

        for _ in range(counts[self.die]):
            events.append(ScoringEvent(self.name, (self.die,), self.damage))

        return events


COMBO_RULES = (
    ComboRule(1, 3, "Dragon Blast", 1000),
    ComboRule(2, 3, "Goblin Swarm", 200),
)

SINGLE_RULES = (
    SingleRule(1, "Dragon Flame", 100),
    SingleRule(5, "Lightning Spark", 50),
)


def score(dice):
    counts = Counter(dice)
    events = []

    for rule in COMBO_RULES:
        events.extend(rule.apply(counts))

    for rule in SINGLE_RULES:
        events.extend(rule.apply(counts))

    return BattleReport(tuple(events))

test_scorer.py

"""Cycles 1–9 — adding the four remaining triple combos."""
import pytest
from scorer import score, ScoringEvent


def test_empty_roll_has_zero_damage_and_no_events():
    report = score([])

    assert report.total_damage == 0
    assert report.events == ()


def test_single_one_creates_dragon_flame_event():
    report = score([1])

    assert report.total_damage == 100
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
    )


def test_single_five_creates_lightning_spark_event():
    report = score([5])

    assert report.total_damage == 50
    assert report.events == (
        ScoringEvent("Lightning Spark", (5,), 50),
    )


def test_two_ones_create_two_dragon_flames():
    report = score([1, 1])

    assert report.total_damage == 200
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Dragon Flame", (1,), 100),
    )


def test_one_and_five_create_two_different_events():
    report = score([1, 5])

    assert report.total_damage == 150
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Lightning Spark", (5,), 50),
    )


def test_three_ones_create_dragon_blast_instead_of_three_flames():
    report = score([1, 1, 1])

    assert report.total_damage == 1000
    assert report.events == (
        ScoringEvent("Dragon Blast", (1, 1, 1), 1000),
    )


def test_dragon_blast_plus_leftover_flame_and_spark():
    report = score([1, 1, 1, 1, 5])

    assert report.total_damage == 1150
    assert report.events == (
        ScoringEvent("Dragon Blast", (1, 1, 1), 1000),
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Lightning Spark", (5,), 50),
    )


def test_three_twos_create_goblin_swarm():
    report = score([2, 2, 2])

    assert report.total_damage == 200
    assert report.events == (
        ScoringEvent("Goblin Swarm", (2, 2, 2), 200),
    )


# TODO (RED): write a parametrized test_other_triples_create_combo_events
#             that covers triples of 3, 4, 5, 6 with their respective events

Solution

scorer.py

from collections import Counter
from dataclasses import dataclass


@dataclass(frozen=True)
class ScoringEvent:
    name: str
    dice_used: tuple
    damage: int


@dataclass(frozen=True)
class BattleReport:
    events: tuple = ()

    @property
    def total_damage(self):
        return sum(event.damage for event in self.events)


@dataclass(frozen=True)
class ComboRule:
    die: int
    count: int
    name: str
    damage: int

    def apply(self, counts):
        events = []

        if counts[self.die] >= self.count:
            dice_used = tuple([self.die] * self.count)
            events.append(ScoringEvent(self.name, dice_used, self.damage))
            counts[self.die] -= self.count

        return events


@dataclass(frozen=True)
class SingleRule:
    die: int
    name: str
    damage: int

    def apply(self, counts):
        events = []

        for _ in range(counts[self.die]):
            events.append(ScoringEvent(self.name, (self.die,), self.damage))

        return events


COMBO_RULES = (
    ComboRule(1, 3, "Dragon Blast", 1000),
    ComboRule(2, 3, "Goblin Swarm", 200),
    ComboRule(3, 3, "Orc Charge", 300),
    ComboRule(4, 3, "Troll Smash", 400),
    ComboRule(5, 3, "Lightning Storm", 500),
    ComboRule(6, 3, "Demon Strike", 600),
)

SINGLE_RULES = (
    SingleRule(1, "Dragon Flame", 100),
    SingleRule(5, "Lightning Spark", 50),
)


def score(dice):
    counts = Counter(dice)
    events = []

    for rule in COMBO_RULES:
        events.extend(rule.apply(counts))

    for rule in SINGLE_RULES:
        events.extend(rule.apply(counts))

    return BattleReport(tuple(events))

test_scorer.py

"""Cycles 1–9 — all six triples expressible as data."""
import pytest
from scorer import score, ScoringEvent


def test_empty_roll_has_zero_damage_and_no_events():
    report = score([])

    assert report.total_damage == 0
    assert report.events == ()


def test_single_one_creates_dragon_flame_event():
    report = score([1])

    assert report.total_damage == 100
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
    )


def test_single_five_creates_lightning_spark_event():
    report = score([5])

    assert report.total_damage == 50
    assert report.events == (
        ScoringEvent("Lightning Spark", (5,), 50),
    )


def test_two_ones_create_two_dragon_flames():
    report = score([1, 1])

    assert report.total_damage == 200
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Dragon Flame", (1,), 100),
    )


def test_one_and_five_create_two_different_events():
    report = score([1, 5])

    assert report.total_damage == 150
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Lightning Spark", (5,), 50),
    )


def test_three_ones_create_dragon_blast_instead_of_three_flames():
    report = score([1, 1, 1])

    assert report.total_damage == 1000
    assert report.events == (
        ScoringEvent("Dragon Blast", (1, 1, 1), 1000),
    )


def test_dragon_blast_plus_leftover_flame_and_spark():
    report = score([1, 1, 1, 1, 5])

    assert report.total_damage == 1150
    assert report.events == (
        ScoringEvent("Dragon Blast", (1, 1, 1), 1000),
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Lightning Spark", (5,), 50),
    )


def test_three_twos_create_goblin_swarm():
    report = score([2, 2, 2])

    assert report.total_damage == 200
    assert report.events == (
        ScoringEvent("Goblin Swarm", (2, 2, 2), 200),
    )


@pytest.mark.parametrize(
    "roll, expected_event",
    [
        ([3, 3, 3], ScoringEvent("Orc Charge", (3, 3, 3), 300)),
        ([4, 4, 4], ScoringEvent("Troll Smash", (4, 4, 4), 400)),
        ([5, 5, 5], ScoringEvent("Lightning Storm", (5, 5, 5), 500)),
        ([6, 6, 6], ScoringEvent("Demon Strike", (6, 6, 6), 600)),
    ],
)
def test_other_triples_create_combo_events(roll, expected_event):
    report = score(roll)

    assert report.total_damage == expected_event.damage
    assert report.events == (expected_event,)

Append four rows to COMBO_RULES — one per remaining triple. The score() function does not change. The parametrize decorator turns one test function into four separately-reported test results, each pinning down one combo.

12

Cycle 10 — Six 1s → Two Dragon Blasts (Hidden Bug)

Spec: score([1, 1, 1, 1, 1, 1]) produces two Dragon Blasts (2000 damage).

Predict first. Before writing a test, look at your ComboRule.apply and trace through six 1s by hand. What does the current code actually produce?

Reveal the hidden bug (open after you’ve predicted)

The if counts[1] >= 3: runs once. It emits one Blast and counts[1] -= 3 leaves counts[1] == 3. Those three 1s fall through to SingleRule, emitting three Flames. Total: 1000 + 300 = 1300 damage.

But six 1s should be two Blasts → 2000 damage. The code is wrong — and no previous test caught it, because every prior combo test used exactly three of a face.

This is the kind of bug TDD literature reports: a defect the developer doesn’t know exists in code they wrote themselves. The test surfaces it.

Your task

🔴 RED — add test_six_ones_create_two_dragon_blasts asserting 2000 damage and two Blast events. Run; see the wrong events tuple.
🟢 GREEN — fix ComboRule.apply so it can emit zero or more combos per call, with the correct leftover bookkeeping. Hint: floor-division (//) and modulo (%) are the right operators.
🔵 REFACTOR — re-run. Especially gratifying: cycle 7’s leftover guardrail still passes, because 4 % 3 == 1 matches 4 - 3 == 1 for that input. The fix only changed behavior on cases no prior test pinned down.

What this teaches about coverage vs. boundary thinking

Every previous combo test used exactly count dice (three 1s, three 2s, etc.). The bug only manifests at 2 × count and beyond. Line coverage told you the if ran. It didn’t tell you the line was right for all relevant inputs.

That’s the gap between coverage and boundary-value analysis: every behavior has boundaries (0, exactly N, 2N, between N and 2N) and a healthy suite probes each. Coverage is a locator of under-tested code; it isn’t a measure of correctness.

Starter files

scorer.py

from collections import Counter
from dataclasses import dataclass


@dataclass(frozen=True)
class ScoringEvent:
    name: str
    dice_used: tuple
    damage: int


@dataclass(frozen=True)
class BattleReport:
    events: tuple = ()

    @property
    def total_damage(self):
        return sum(event.damage for event in self.events)


@dataclass(frozen=True)
class ComboRule:
    die: int
    count: int
    name: str
    damage: int

    def apply(self, counts):
        events = []

        if counts[self.die] >= self.count:
            dice_used = tuple([self.die] * self.count)
            events.append(ScoringEvent(self.name, dice_used, self.damage))
            counts[self.die] -= self.count

        return events


@dataclass(frozen=True)
class SingleRule:
    die: int
    name: str
    damage: int

    def apply(self, counts):
        events = []

        for _ in range(counts[self.die]):
            events.append(ScoringEvent(self.name, (self.die,), self.damage))

        return events


COMBO_RULES = (
    ComboRule(1, 3, "Dragon Blast", 1000),
    ComboRule(2, 3, "Goblin Swarm", 200),
    ComboRule(3, 3, "Orc Charge", 300),
    ComboRule(4, 3, "Troll Smash", 400),
    ComboRule(5, 3, "Lightning Storm", 500),
    ComboRule(6, 3, "Demon Strike", 600),
)

SINGLE_RULES = (
    SingleRule(1, "Dragon Flame", 100),
    SingleRule(5, "Lightning Spark", 50),
)


def score(dice):
    counts = Counter(dice)
    events = []

    for rule in COMBO_RULES:
        events.extend(rule.apply(counts))

    for rule in SINGLE_RULES:
        events.extend(rule.apply(counts))

    return BattleReport(tuple(events))

test_scorer.py

"""Cycles 1–10 — surfacing the multi-combo edge case."""
import pytest
from scorer import score, ScoringEvent


def test_empty_roll_has_zero_damage_and_no_events():
    report = score([])

    assert report.total_damage == 0
    assert report.events == ()


def test_single_one_creates_dragon_flame_event():
    report = score([1])

    assert report.total_damage == 100
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
    )


def test_single_five_creates_lightning_spark_event():
    report = score([5])

    assert report.total_damage == 50
    assert report.events == (
        ScoringEvent("Lightning Spark", (5,), 50),
    )


def test_two_ones_create_two_dragon_flames():
    report = score([1, 1])

    assert report.total_damage == 200
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Dragon Flame", (1,), 100),
    )


def test_one_and_five_create_two_different_events():
    report = score([1, 5])

    assert report.total_damage == 150
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Lightning Spark", (5,), 50),
    )


def test_three_ones_create_dragon_blast_instead_of_three_flames():
    report = score([1, 1, 1])

    assert report.total_damage == 1000
    assert report.events == (
        ScoringEvent("Dragon Blast", (1, 1, 1), 1000),
    )


def test_dragon_blast_plus_leftover_flame_and_spark():
    report = score([1, 1, 1, 1, 5])

    assert report.total_damage == 1150
    assert report.events == (
        ScoringEvent("Dragon Blast", (1, 1, 1), 1000),
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Lightning Spark", (5,), 50),
    )


def test_three_twos_create_goblin_swarm():
    report = score([2, 2, 2])

    assert report.total_damage == 200
    assert report.events == (
        ScoringEvent("Goblin Swarm", (2, 2, 2), 200),
    )


@pytest.mark.parametrize(
    "roll, expected_event",
    [
        ([3, 3, 3], ScoringEvent("Orc Charge", (3, 3, 3), 300)),
        ([4, 4, 4], ScoringEvent("Troll Smash", (4, 4, 4), 400)),
        ([5, 5, 5], ScoringEvent("Lightning Storm", (5, 5, 5), 500)),
        ([6, 6, 6], ScoringEvent("Demon Strike", (6, 6, 6), 600)),
    ],
)
def test_other_triples_create_combo_events(roll, expected_event):
    report = score(roll)

    assert report.total_damage == expected_event.damage
    assert report.events == (expected_event,)


# TODO (RED): write test_six_ones_create_two_dragon_blasts
#             score([1, 1, 1, 1, 1, 1]) should return total_damage == 2000
#             and events containing TWO Dragon Blast ScoringEvents

Solution

scorer.py

from collections import Counter
from dataclasses import dataclass


@dataclass(frozen=True)
class ScoringEvent:
    name: str
    dice_used: tuple
    damage: int


@dataclass(frozen=True)
class BattleReport:
    events: tuple = ()

    @property
    def total_damage(self):
        return sum(event.damage for event in self.events)


@dataclass(frozen=True)
class ComboRule:
    die: int
    count: int
    name: str
    damage: int

    def apply(self, counts):
        events = []

        number_of_combos = counts[self.die] // self.count

        for _ in range(number_of_combos):
            dice_used = tuple([self.die] * self.count)
            events.append(ScoringEvent(self.name, dice_used, self.damage))

        counts[self.die] %= self.count

        return events


@dataclass(frozen=True)
class SingleRule:
    die: int
    name: str
    damage: int

    def apply(self, counts):
        events = []

        for _ in range(counts[self.die]):
            events.append(ScoringEvent(self.name, (self.die,), self.damage))

        return events


COMBO_RULES = (
    ComboRule(1, 3, "Dragon Blast", 1000),
    ComboRule(2, 3, "Goblin Swarm", 200),
    ComboRule(3, 3, "Orc Charge", 300),
    ComboRule(4, 3, "Troll Smash", 400),
    ComboRule(5, 3, "Lightning Storm", 500),
    ComboRule(6, 3, "Demon Strike", 600),
)

SINGLE_RULES = (
    SingleRule(1, "Dragon Flame", 100),
    SingleRule(5, "Lightning Spark", 50),
)


def score(dice):
    counts = Counter(dice)
    events = []

    for rule in COMBO_RULES:
        events.extend(rule.apply(counts))

    for rule in SINGLE_RULES:
        events.extend(rule.apply(counts))

    return BattleReport(tuple(events))

test_scorer.py

"""Cycles 1–10 — multi-combo edge case fixed."""
import pytest
from scorer import score, ScoringEvent


def test_empty_roll_has_zero_damage_and_no_events():
    report = score([])

    assert report.total_damage == 0
    assert report.events == ()


def test_single_one_creates_dragon_flame_event():
    report = score([1])

    assert report.total_damage == 100
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
    )


def test_single_five_creates_lightning_spark_event():
    report = score([5])

    assert report.total_damage == 50
    assert report.events == (
        ScoringEvent("Lightning Spark", (5,), 50),
    )


def test_two_ones_create_two_dragon_flames():
    report = score([1, 1])

    assert report.total_damage == 200
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Dragon Flame", (1,), 100),
    )


def test_one_and_five_create_two_different_events():
    report = score([1, 5])

    assert report.total_damage == 150
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Lightning Spark", (5,), 50),
    )


def test_three_ones_create_dragon_blast_instead_of_three_flames():
    report = score([1, 1, 1])

    assert report.total_damage == 1000
    assert report.events == (
        ScoringEvent("Dragon Blast", (1, 1, 1), 1000),
    )


def test_dragon_blast_plus_leftover_flame_and_spark():
    report = score([1, 1, 1, 1, 5])

    assert report.total_damage == 1150
    assert report.events == (
        ScoringEvent("Dragon Blast", (1, 1, 1), 1000),
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Lightning Spark", (5,), 50),
    )


def test_three_twos_create_goblin_swarm():
    report = score([2, 2, 2])

    assert report.total_damage == 200
    assert report.events == (
        ScoringEvent("Goblin Swarm", (2, 2, 2), 200),
    )


@pytest.mark.parametrize(
    "roll, expected_event",
    [
        ([3, 3, 3], ScoringEvent("Orc Charge", (3, 3, 3), 300)),
        ([4, 4, 4], ScoringEvent("Troll Smash", (4, 4, 4), 400)),
        ([5, 5, 5], ScoringEvent("Lightning Storm", (5, 5, 5), 500)),
        ([6, 6, 6], ScoringEvent("Demon Strike", (6, 6, 6), 600)),
    ],
)
def test_other_triples_create_combo_events(roll, expected_event):
    report = score(roll)

    assert report.total_damage == expected_event.damage
    assert report.events == (expected_event,)


def test_six_ones_create_two_dragon_blasts():
    report = score([1, 1, 1, 1, 1, 1])

    assert report.total_damage == 2000
    assert report.events == (
        ScoringEvent("Dragon Blast", (1, 1, 1), 1000),
        ScoringEvent("Dragon Blast", (1, 1, 1), 1000),
    )

The fix in ComboRule.apply: replace the one-shot if with a per-combo loop driven by floor division (counts[self.die] // self.count), and replace the subtraction with modulo (counts[self.die] %= self.count). Cycle 7’s guardrail still passes because 4 % 3 == 1 matches the previous 4 - 3 == 1 for that specific input — the disagreement is only on multi-combo cases.

Cycle 10 Quiz — TDD Finds Bugs You Didn't Know You Had

Min. score: 80%

1. The “one Dragon Blast for any number of 1s ≥ 3” bug was sitting in your code from cycle 6 onward — four cycles. Why did no previous test catch it?

pytest’s collection algorithm has a known issue with bugs that span multiple cycles, so it deferred reporting
Every previous test used at most three of a face (three 1s, three 2s, etc.). The bug only manifests when the count of a face is at least six. The suite had no such input, so the defect lived undisturbed
The bug only manifests on Tuesdays, and the previous test runs happened on other days
Cycle 6’s if counts[1] >= 3: is correct — there was no bug; cycle 10 changed the requirement

Boundary thinking. Every previous combo test gave the rule exactly count dice. The bug was on the boundary where counts[X] >= 2 * count. Without a test on that boundary, the code would have shipped wrong — and the developer would have been completely confident in it. This is exactly the kind of failure mode empirical TDD studies report.

2. What is the most defensible general lesson from cycle 10 about test-suite design?

Always test every input combinatorially — only complete coverage of the input space catches all bugs
Coverage by line is what matters; cycle 10’s bug was in code already covered, so coverage didn’t help. The lesson is to also think about boundary cases (exactly N, 2N, between N and 2N, 0) per behavior
Never use floor-division and modulo in production code — they are too easy to get wrong
TDD is unreliable; this bug proves that even careful test-first development can miss things, so we should add manual QA after every cycle

Coverage told us we executed the line. It did not tell us whether the line was right for all relevant inputs. Boundary-value analysis (the partition tradition you encountered in Foundations) complements coverage: every behavior has a “just-at-the-boundary,” “just-past-the-boundary,” and “well-past-the- boundary” input — and a healthy suite probes each.

3. Cycle 7’s guardrail ([1, 1, 1, 1, 5]) still passes after cycle 10’s refactor. Why? Be precise.

Because pytest skips guardrail tests after they pass once — they are stored as cached green results
Because for counts[1] = 4, the old counts[1] -= 3 and the new counts[1] %= 3 both leave 1 in the counter — the two implementations agree on this specific input even though they disagree on counts[1] >= 6
Because cycle 7’s test is parameterized, and pytest reruns parameterized tests with looser tolerances
Because cycle 7’s test has been deleted by the cycle 10 refactor — the green count went down, not up

This is a subtle but important point: a refactor does not have to change every output. The two implementations of apply produce the same answer for any input where the old answer was correct — and only the new answer is correct where the old was wrong. That is what makes the cycle-10 fix a generalisation rather than a replacement.

4. (Spaced review — Cycle 8) A teammate looks at cycle 10’s fix and says: “This bug only existed because we extracted ComboRule. If we’d kept the inline if counts[1] >= 3: from cycle 6, we wouldn’t have had this issue.” Are they right?

Yes — the rule-object refactor introduced complexity that produced the bug. Inline code would have been simpler and safer
No — the exact same bug was present in cycle 6’s inline code (if counts[1] >= 3: ... counts[1] -= 3 would also fail for six 1s). The rule-object refactor moved the bug from one place to another but did not introduce it. The defect was a missing test, not a missing pattern
It depends on Python version — Python 3.10+ silently rewrites if to a multi-iteration form for combos, which is why old Python had the bug
Yes — Python’s Counter class re-orders subtraction operations under the hood when used with dataclasses

The bug is the algorithm’s bug, not the abstraction’s. Whether the if/-= sits in score() or in ComboRule.apply makes no difference to its behavior. The lesson is honest: TDD does not prevent every bug; it surfaces them when tests probe the right inputs. The rule-object refactor was about future- proofing the design, not correctness of cycle 6’s logic.

13

Cycle 11 — Invalid Dice Raise an Error

Spec: score([1, 2, 9]) raises ValueError("Die values must be between 1 and 6"). Dice are integers 1–6.

So far every test was a happy-path test. Happy-path-only suites are the single most common smell in student-written test code (per testing-education research). TDD applies to error behavior just as much as to success — pytest.raises is the idiom.

Your task

🔴 RED — add test_invalid_die_value_raises_error using pytest.raises(ValueError, match=...). The current score silently ignores the 9, so the test fails because no exception is raised.
🟢 GREEN — add a validate_dice(dice) helper that raises ValueError for any die outside 1–6; call it at the top of score.
🔵 REFACTOR — checklist; constants like VALID_DICE keep the rule explicit.

Why `match=` matters & happy-path-only is dangerous

Without match=, the test would accept any ValueError — even one with no message, or the wrong message. With it, the test pins down the contract: the function raises and explains why. That’s an oracle for the error case, not just for success.

The deeper lesson: line coverage doesn’t catch missing error handling. A function that silently processes invalid input passes every happy-path assertion. Only an invalid-input test asks “does the boundary actually error?”

Starter files

scorer.py

from collections import Counter
from dataclasses import dataclass


@dataclass(frozen=True)
class ScoringEvent:
    name: str
    dice_used: tuple
    damage: int


@dataclass(frozen=True)
class BattleReport:
    events: tuple = ()

    @property
    def total_damage(self):
        return sum(event.damage for event in self.events)


@dataclass(frozen=True)
class ComboRule:
    die: int
    count: int
    name: str
    damage: int

    def apply(self, counts):
        events = []

        number_of_combos = counts[self.die] // self.count

        for _ in range(number_of_combos):
            dice_used = tuple([self.die] * self.count)
            events.append(ScoringEvent(self.name, dice_used, self.damage))

        counts[self.die] %= self.count

        return events


@dataclass(frozen=True)
class SingleRule:
    die: int
    name: str
    damage: int

    def apply(self, counts):
        events = []

        for _ in range(counts[self.die]):
            events.append(ScoringEvent(self.name, (self.die,), self.damage))

        return events


COMBO_RULES = (
    ComboRule(1, 3, "Dragon Blast", 1000),
    ComboRule(2, 3, "Goblin Swarm", 200),
    ComboRule(3, 3, "Orc Charge", 300),
    ComboRule(4, 3, "Troll Smash", 400),
    ComboRule(5, 3, "Lightning Storm", 500),
    ComboRule(6, 3, "Demon Strike", 600),
)

SINGLE_RULES = (
    SingleRule(1, "Dragon Flame", 100),
    SingleRule(5, "Lightning Spark", 50),
)


def score(dice):
    counts = Counter(dice)
    events = []

    for rule in COMBO_RULES:
        events.extend(rule.apply(counts))

    for rule in SINGLE_RULES:
        events.extend(rule.apply(counts))

    return BattleReport(tuple(events))

test_scorer.py

"""Cycles 1–11 — adding input validation."""
import pytest
from scorer import score, ScoringEvent


def test_empty_roll_has_zero_damage_and_no_events():
    report = score([])

    assert report.total_damage == 0
    assert report.events == ()


def test_single_one_creates_dragon_flame_event():
    report = score([1])

    assert report.total_damage == 100
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
    )


def test_single_five_creates_lightning_spark_event():
    report = score([5])

    assert report.total_damage == 50
    assert report.events == (
        ScoringEvent("Lightning Spark", (5,), 50),
    )


def test_two_ones_create_two_dragon_flames():
    report = score([1, 1])

    assert report.total_damage == 200
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Dragon Flame", (1,), 100),
    )


def test_one_and_five_create_two_different_events():
    report = score([1, 5])

    assert report.total_damage == 150
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Lightning Spark", (5,), 50),
    )


def test_three_ones_create_dragon_blast_instead_of_three_flames():
    report = score([1, 1, 1])

    assert report.total_damage == 1000
    assert report.events == (
        ScoringEvent("Dragon Blast", (1, 1, 1), 1000),
    )


def test_dragon_blast_plus_leftover_flame_and_spark():
    report = score([1, 1, 1, 1, 5])

    assert report.total_damage == 1150
    assert report.events == (
        ScoringEvent("Dragon Blast", (1, 1, 1), 1000),
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Lightning Spark", (5,), 50),
    )


def test_three_twos_create_goblin_swarm():
    report = score([2, 2, 2])

    assert report.total_damage == 200
    assert report.events == (
        ScoringEvent("Goblin Swarm", (2, 2, 2), 200),
    )


@pytest.mark.parametrize(
    "roll, expected_event",
    [
        ([3, 3, 3], ScoringEvent("Orc Charge", (3, 3, 3), 300)),
        ([4, 4, 4], ScoringEvent("Troll Smash", (4, 4, 4), 400)),
        ([5, 5, 5], ScoringEvent("Lightning Storm", (5, 5, 5), 500)),
        ([6, 6, 6], ScoringEvent("Demon Strike", (6, 6, 6), 600)),
    ],
)
def test_other_triples_create_combo_events(roll, expected_event):
    report = score(roll)

    assert report.total_damage == expected_event.damage
    assert report.events == (expected_event,)


def test_six_ones_create_two_dragon_blasts():
    report = score([1, 1, 1, 1, 1, 1])

    assert report.total_damage == 2000
    assert report.events == (
        ScoringEvent("Dragon Blast", (1, 1, 1), 1000),
        ScoringEvent("Dragon Blast", (1, 1, 1), 1000),
    )


# TODO (RED): write test_invalid_die_value_raises_error
#             score([1, 2, 9]) should raise ValueError("Die values must be between 1 and 6")
#             Use `with pytest.raises(ValueError, match="..."): score([1, 2, 9])`

Solution

scorer.py

from collections import Counter
from dataclasses import dataclass


VALID_DICE = {1, 2, 3, 4, 5, 6}


@dataclass(frozen=True)
class ScoringEvent:
    name: str
    dice_used: tuple
    damage: int


@dataclass(frozen=True)
class BattleReport:
    events: tuple = ()

    @property
    def total_damage(self):
        return sum(event.damage for event in self.events)


@dataclass(frozen=True)
class ComboRule:
    die: int
    count: int
    name: str
    damage: int

    def apply(self, counts):
        events = []

        number_of_combos = counts[self.die] // self.count

        for _ in range(number_of_combos):
            dice_used = tuple([self.die] * self.count)
            events.append(ScoringEvent(self.name, dice_used, self.damage))

        counts[self.die] %= self.count

        return events


@dataclass(frozen=True)
class SingleRule:
    die: int
    name: str
    damage: int

    def apply(self, counts):
        events = []

        for _ in range(counts[self.die]):
            events.append(ScoringEvent(self.name, (self.die,), self.damage))

        return events


COMBO_RULES = (
    ComboRule(1, 3, "Dragon Blast", 1000),
    ComboRule(2, 3, "Goblin Swarm", 200),
    ComboRule(3, 3, "Orc Charge", 300),
    ComboRule(4, 3, "Troll Smash", 400),
    ComboRule(5, 3, "Lightning Storm", 500),
    ComboRule(6, 3, "Demon Strike", 600),
)

SINGLE_RULES = (
    SingleRule(1, "Dragon Flame", 100),
    SingleRule(5, "Lightning Spark", 50),
)


def validate_dice(dice):
    for die in dice:
        if die not in VALID_DICE:
            raise ValueError("Die values must be between 1 and 6")


def score(dice):
    validate_dice(dice)

    counts = Counter(dice)
    events = []

    for rule in COMBO_RULES:
        events.extend(rule.apply(counts))

    for rule in SINGLE_RULES:
        events.extend(rule.apply(counts))

    return BattleReport(tuple(events))

test_scorer.py

"""Cycles 1–11 — invalid dice now raise."""
import pytest
from scorer import score, ScoringEvent


def test_empty_roll_has_zero_damage_and_no_events():
    report = score([])

    assert report.total_damage == 0
    assert report.events == ()


def test_single_one_creates_dragon_flame_event():
    report = score([1])

    assert report.total_damage == 100
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
    )


def test_single_five_creates_lightning_spark_event():
    report = score([5])

    assert report.total_damage == 50
    assert report.events == (
        ScoringEvent("Lightning Spark", (5,), 50),
    )


def test_two_ones_create_two_dragon_flames():
    report = score([1, 1])

    assert report.total_damage == 200
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Dragon Flame", (1,), 100),
    )


def test_one_and_five_create_two_different_events():
    report = score([1, 5])

    assert report.total_damage == 150
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Lightning Spark", (5,), 50),
    )


def test_three_ones_create_dragon_blast_instead_of_three_flames():
    report = score([1, 1, 1])

    assert report.total_damage == 1000
    assert report.events == (
        ScoringEvent("Dragon Blast", (1, 1, 1), 1000),
    )


def test_dragon_blast_plus_leftover_flame_and_spark():
    report = score([1, 1, 1, 1, 5])

    assert report.total_damage == 1150
    assert report.events == (
        ScoringEvent("Dragon Blast", (1, 1, 1), 1000),
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Lightning Spark", (5,), 50),
    )


def test_three_twos_create_goblin_swarm():
    report = score([2, 2, 2])

    assert report.total_damage == 200
    assert report.events == (
        ScoringEvent("Goblin Swarm", (2, 2, 2), 200),
    )


@pytest.mark.parametrize(
    "roll, expected_event",
    [
        ([3, 3, 3], ScoringEvent("Orc Charge", (3, 3, 3), 300)),
        ([4, 4, 4], ScoringEvent("Troll Smash", (4, 4, 4), 400)),
        ([5, 5, 5], ScoringEvent("Lightning Storm", (5, 5, 5), 500)),
        ([6, 6, 6], ScoringEvent("Demon Strike", (6, 6, 6), 600)),
    ],
)
def test_other_triples_create_combo_events(roll, expected_event):
    report = score(roll)

    assert report.total_damage == expected_event.damage
    assert report.events == (expected_event,)


def test_six_ones_create_two_dragon_blasts():
    report = score([1, 1, 1, 1, 1, 1])

    assert report.total_damage == 2000
    assert report.events == (
        ScoringEvent("Dragon Blast", (1, 1, 1), 1000),
        ScoringEvent("Dragon Blast", (1, 1, 1), 1000),
    )


def test_invalid_die_value_raises_error():
    with pytest.raises(ValueError, match="Die values must be between 1 and 6"):
        score([1, 2, 9])

Add a validate_dice helper called at the top of score. The helper raises ValueError with a message that names the boundary, so the test can pin the message down with match=.

14

Cycle 12 — Human-Readable Battle Summary

Spec: report.summary() returns a one-line string like "Dragon Blast (1000) + Lightning Spark (50) = 1050". An empty report returns "No damage.".

Twelve cycles in. Cycle 12 is the only one that adds behavior to BattleReport itself rather than to score. Where a behavior lives is a design decision — putting summary on the report keeps formatting close to the data.

Your task

🔴 RED — add test_battle_report_has_human_readable_summary. Run; expect AttributeError.
🟢 GREEN — add a summary method to BattleReport that handles empty events as a special case and joins name (damage) segments with " + " for non-empty.
🔵 REFACTOR — final checklist. Re-run; all 16+ tests should be green. You just completed twelve TDD cycles.

A small design choice: method vs. property?

Properties signal cheap, side-effect-free derived values. summary() is cheap, but the empty-case branch and the f-string assembly do nontrivial work. Convention: methods for derived strings; properties for derived numbers and booleans. We chose a method. (Either would work — but be consistent within a project.)

Starter files

scorer.py

from collections import Counter
from dataclasses import dataclass


VALID_DICE = {1, 2, 3, 4, 5, 6}


@dataclass(frozen=True)
class ScoringEvent:
    name: str
    dice_used: tuple
    damage: int


@dataclass(frozen=True)
class BattleReport:
    events: tuple = ()

    @property
    def total_damage(self):
        return sum(event.damage for event in self.events)


@dataclass(frozen=True)
class ComboRule:
    die: int
    count: int
    name: str
    damage: int

    def apply(self, counts):
        events = []

        number_of_combos = counts[self.die] // self.count

        for _ in range(number_of_combos):
            dice_used = tuple([self.die] * self.count)
            events.append(ScoringEvent(self.name, dice_used, self.damage))

        counts[self.die] %= self.count

        return events


@dataclass(frozen=True)
class SingleRule:
    die: int
    name: str
    damage: int

    def apply(self, counts):
        events = []

        for _ in range(counts[self.die]):
            events.append(ScoringEvent(self.name, (self.die,), self.damage))

        return events


COMBO_RULES = (
    ComboRule(1, 3, "Dragon Blast", 1000),
    ComboRule(2, 3, "Goblin Swarm", 200),
    ComboRule(3, 3, "Orc Charge", 300),
    ComboRule(4, 3, "Troll Smash", 400),
    ComboRule(5, 3, "Lightning Storm", 500),
    ComboRule(6, 3, "Demon Strike", 600),
)

SINGLE_RULES = (
    SingleRule(1, "Dragon Flame", 100),
    SingleRule(5, "Lightning Spark", 50),
)


def validate_dice(dice):
    for die in dice:
        if die not in VALID_DICE:
            raise ValueError("Die values must be between 1 and 6")


def score(dice):
    validate_dice(dice)

    counts = Counter(dice)
    events = []

    for rule in COMBO_RULES:
        events.extend(rule.apply(counts))

    for rule in SINGLE_RULES:
        events.extend(rule.apply(counts))

    return BattleReport(tuple(events))

test_scorer.py

"""Cycles 1–12 — adding the human-readable summary."""
import pytest
from scorer import score, ScoringEvent


def test_empty_roll_has_zero_damage_and_no_events():
    report = score([])

    assert report.total_damage == 0
    assert report.events == ()


def test_single_one_creates_dragon_flame_event():
    report = score([1])

    assert report.total_damage == 100
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
    )


def test_single_five_creates_lightning_spark_event():
    report = score([5])

    assert report.total_damage == 50
    assert report.events == (
        ScoringEvent("Lightning Spark", (5,), 50),
    )


def test_two_ones_create_two_dragon_flames():
    report = score([1, 1])

    assert report.total_damage == 200
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Dragon Flame", (1,), 100),
    )


def test_one_and_five_create_two_different_events():
    report = score([1, 5])

    assert report.total_damage == 150
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Lightning Spark", (5,), 50),
    )


def test_three_ones_create_dragon_blast_instead_of_three_flames():
    report = score([1, 1, 1])

    assert report.total_damage == 1000
    assert report.events == (
        ScoringEvent("Dragon Blast", (1, 1, 1), 1000),
    )


def test_dragon_blast_plus_leftover_flame_and_spark():
    report = score([1, 1, 1, 1, 5])

    assert report.total_damage == 1150
    assert report.events == (
        ScoringEvent("Dragon Blast", (1, 1, 1), 1000),
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Lightning Spark", (5,), 50),
    )


def test_three_twos_create_goblin_swarm():
    report = score([2, 2, 2])

    assert report.total_damage == 200
    assert report.events == (
        ScoringEvent("Goblin Swarm", (2, 2, 2), 200),
    )


@pytest.mark.parametrize(
    "roll, expected_event",
    [
        ([3, 3, 3], ScoringEvent("Orc Charge", (3, 3, 3), 300)),
        ([4, 4, 4], ScoringEvent("Troll Smash", (4, 4, 4), 400)),
        ([5, 5, 5], ScoringEvent("Lightning Storm", (5, 5, 5), 500)),
        ([6, 6, 6], ScoringEvent("Demon Strike", (6, 6, 6), 600)),
    ],
)
def test_other_triples_create_combo_events(roll, expected_event):
    report = score(roll)

    assert report.total_damage == expected_event.damage
    assert report.events == (expected_event,)


def test_six_ones_create_two_dragon_blasts():
    report = score([1, 1, 1, 1, 1, 1])

    assert report.total_damage == 2000
    assert report.events == (
        ScoringEvent("Dragon Blast", (1, 1, 1), 1000),
        ScoringEvent("Dragon Blast", (1, 1, 1), 1000),
    )


def test_invalid_die_value_raises_error():
    with pytest.raises(ValueError, match="Die values must be between 1 and 6"):
        score([1, 2, 9])


# TODO (RED): write test_battle_report_has_human_readable_summary
#             score([1, 1, 1, 5]).summary() should equal
#             "Dragon Blast (1000) + Lightning Spark (50) = 1050"

Solution

scorer.py

from collections import Counter
from dataclasses import dataclass


VALID_DICE = {1, 2, 3, 4, 5, 6}


@dataclass(frozen=True)
class ScoringEvent:
    name: str
    dice_used: tuple
    damage: int


@dataclass(frozen=True)
class BattleReport:
    events: tuple = ()

    @property
    def total_damage(self):
        return sum(event.damage for event in self.events)

    def summary(self):
        if not self.events:
            return "No damage."

        event_text = " + ".join(
            f"{event.name} ({event.damage})" for event in self.events
        )

        return f"{event_text} = {self.total_damage}"


@dataclass(frozen=True)
class ComboRule:
    die: int
    count: int
    name: str
    damage: int

    def apply(self, counts):
        events = []

        number_of_combos = counts[self.die] // self.count

        for _ in range(number_of_combos):
            dice_used = tuple([self.die] * self.count)
            events.append(ScoringEvent(self.name, dice_used, self.damage))

        counts[self.die] %= self.count

        return events


@dataclass(frozen=True)
class SingleRule:
    die: int
    name: str
    damage: int

    def apply(self, counts):
        events = []

        for _ in range(counts[self.die]):
            events.append(ScoringEvent(self.name, (self.die,), self.damage))

        return events


COMBO_RULES = (
    ComboRule(1, 3, "Dragon Blast", 1000),
    ComboRule(2, 3, "Goblin Swarm", 200),
    ComboRule(3, 3, "Orc Charge", 300),
    ComboRule(4, 3, "Troll Smash", 400),
    ComboRule(5, 3, "Lightning Storm", 500),
    ComboRule(6, 3, "Demon Strike", 600),
)

SINGLE_RULES = (
    SingleRule(1, "Dragon Flame", 100),
    SingleRule(5, "Lightning Spark", 50),
)


def validate_dice(dice):
    for die in dice:
        if die not in VALID_DICE:
            raise ValueError("Die values must be between 1 and 6")


def score(dice):
    validate_dice(dice)

    counts = Counter(dice)
    events = []

    for rule in COMBO_RULES:
        events.extend(rule.apply(counts))

    for rule in SINGLE_RULES:
        events.extend(rule.apply(counts))

    return BattleReport(tuple(events))

test_scorer.py

"""Cycles 1–12 — twelve cycles, all green."""
import pytest
from scorer import score, ScoringEvent


def test_empty_roll_has_zero_damage_and_no_events():
    report = score([])

    assert report.total_damage == 0
    assert report.events == ()


def test_single_one_creates_dragon_flame_event():
    report = score([1])

    assert report.total_damage == 100
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
    )


def test_single_five_creates_lightning_spark_event():
    report = score([5])

    assert report.total_damage == 50
    assert report.events == (
        ScoringEvent("Lightning Spark", (5,), 50),
    )


def test_two_ones_create_two_dragon_flames():
    report = score([1, 1])

    assert report.total_damage == 200
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Dragon Flame", (1,), 100),
    )


def test_one_and_five_create_two_different_events():
    report = score([1, 5])

    assert report.total_damage == 150
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Lightning Spark", (5,), 50),
    )


def test_three_ones_create_dragon_blast_instead_of_three_flames():
    report = score([1, 1, 1])

    assert report.total_damage == 1000
    assert report.events == (
        ScoringEvent("Dragon Blast", (1, 1, 1), 1000),
    )


def test_dragon_blast_plus_leftover_flame_and_spark():
    report = score([1, 1, 1, 1, 5])

    assert report.total_damage == 1150
    assert report.events == (
        ScoringEvent("Dragon Blast", (1, 1, 1), 1000),
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Lightning Spark", (5,), 50),
    )


def test_three_twos_create_goblin_swarm():
    report = score([2, 2, 2])

    assert report.total_damage == 200
    assert report.events == (
        ScoringEvent("Goblin Swarm", (2, 2, 2), 200),
    )


@pytest.mark.parametrize(
    "roll, expected_event",
    [
        ([3, 3, 3], ScoringEvent("Orc Charge", (3, 3, 3), 300)),
        ([4, 4, 4], ScoringEvent("Troll Smash", (4, 4, 4), 400)),
        ([5, 5, 5], ScoringEvent("Lightning Storm", (5, 5, 5), 500)),
        ([6, 6, 6], ScoringEvent("Demon Strike", (6, 6, 6), 600)),
    ],
)
def test_other_triples_create_combo_events(roll, expected_event):
    report = score(roll)

    assert report.total_damage == expected_event.damage
    assert report.events == (expected_event,)


def test_six_ones_create_two_dragon_blasts():
    report = score([1, 1, 1, 1, 1, 1])

    assert report.total_damage == 2000
    assert report.events == (
        ScoringEvent("Dragon Blast", (1, 1, 1), 1000),
        ScoringEvent("Dragon Blast", (1, 1, 1), 1000),
    )


def test_invalid_die_value_raises_error():
    with pytest.raises(ValueError, match="Die values must be between 1 and 6"):
        score([1, 2, 9])


def test_battle_report_has_human_readable_summary():
    report = score([1, 1, 1, 5])

    assert report.summary() == (
        "Dragon Blast (1000) + Lightning Spark (50) = 1050"
    )

Add a summary method to BattleReport. Empty events return “No damage.”; otherwise join name (damage) segments with “ + “ and append = total.

15

The Big Picture — Lessons from Twelve Cycles

Twelve cycles. Twelve tests. One small domain model with combos, leftovers, validation, and a summary method. Scroll through your final scorer.py — every line is justified by a test. No speculative code, no dead branch. The tests are the byproduct; the design is the primary product.

The takeaways that travel

TDD is design, not testing. The test is the contract; the implementation emerges under its pressure.
Refactor toward duplication, not before it. Two examples reveal the right shape; one is a guess.
Tests enable change. Behavior-level assertions survive structural rewrites; implementation-coupled tests don’t.
Coverage ≠ correctness. Boundary thinking (cycle 10) catches what coverage misses.
Listen to the test. Pain in writing a test usually points at the production code.

Take the final knowledge check at the end of this step. The sections below review the whole journey, the anti-pattern taxonomy, the empirical case for TDD, and what to learn next.

The journey in one table

Cycle	Behavior	Design move	Lesson
1	Empty roll	First class + function	RED for the right reason
2	Single 1	`ScoringEvent` introduced	Allow the ugly first GREEN
3	Single 5	Second hardcoded branch	Refactor toward duplication
4	Repeated singles	Per-die loop	First real refactor; tests enable change
5	Mixed dice	(no production change)	Free pass — verify with the mutation move
6	Triple 1s	`Counter`, count-then-emit	Design-breaking test; structural shift
7	Combo + leftovers	(no production change)	Guardrail tests for implicit correctness
8	Triple 2s	Rule objects	Listening to the test; Open-Closed
9	Other triples	Append data	Refactor pays off; `parametrize`
10	Six 1s	`//` and `%=`	Hidden edge case; boundary > coverage
11	Invalid dice	`pytest.raises`	Robustness is first-class
12	Summary	Method on `BattleReport`	Behavior on the existing object

When TDD shines, when it’s overkill

TDD shines on: new features with clear behavioral requirements; complex logic with branching cases; long-lived code modified by multiple people; API design (the test forces a caller’s perspective); domains where regressions hurt (payments, scoring, calculations).

TDD is overkill on: one-off throwaway scripts; exploratory prototyping where the problem isn’t yet defined; UI layout (visual correctness); non-binary outcomes (ML accuracy, image recognition); Jupyter research explorations.

Even in the second list, some tests pay off — the question is whether to write them first. Kent Beck: “the discipline of working strictly test-first is valuable but not necessarily something you want to do all the time.”

TDD anti-pattern taxonomy

Level	Anti-pattern	What it looks like	Antidote
I	The Liar	Test passes but asserts vacuously (`isinstance(x, int)` only)	Cycle 5’s mutation move
I	The Nitpicker	Asserts on private attributes / implementation details	Assert on observable behavior
II	Success Against All Odds	New test passes immediately, with no investigation	Verify with mutation
II	Skip-the-Refactor	Stop at green; never enter REFACTOR	Make the look mandatory
III	The Giant	One test asserts dozens of behaviors (mirrors a God Object)	One behavior per test
III	Excessive Setup	30+ lines of fixture before one assertion	Decouple production code
IV	The Mockery	More mock setup than test logic	Listen — the design is wrong
IV	Modify-the-Test	AI rewrites the test to match buggy code	Own the spec yourself

The pattern: higher the level, more it’s an architectural smell. Listen to the test.

The empirical case for TDD

Study	Finding
Microsoft & IBM (Williams et al., 2008)	39–91% decrease in pre-release defect density in TDD teams
Same studies	15–35% longer initial development; offset by reduced debugging
Erdogmus et al. (2005)	Test-first students wrote more tests AND were more productive per test
Janzen & Saiedian (ICSE 2007)	Mature programmers exposed to TDD significantly more likely to prefer TDD later — the Residual Effect
Fucci et al. (2017)	TDD’s benefit comes from granularity + uniformity, not strict test-first ordering — your twelve tiny cycles embody both

Caveat: mixed for solo programmers on short tasks. Strongest in team settings, with CI, on long-lived systems.

What to learn next (the same rhythm, scaled up)

Fixtures (@pytest.fixture) for reusable setup of objects, DBs, mock APIs
Mocks, fakes, stubs — with a strong default toward fakes over mocks
Property-based testing with Hypothesis — score(any list of 1–6) should always satisfy invariants
Mutation testing with mutmut or cosmic-ray — automate the cycle-5 mutation move across the whole suite
The Outside-In / Double-Loop pattern (Percival, Obey the Testing Goat) — high-level acceptance tests drive unit tests

Each lives inside the same Red-Green-Refactor rhythm you just internalised. They scale the discipline; they don’t replace it.

Starter files

scorer.py

# The full, twelve-cycle implementation lives here. Use this step's
# editor to scroll through what you built — every line is justified by
# a test in test_scorer.py. There is no speculative code.
from collections import Counter
from dataclasses import dataclass


VALID_DICE = {1, 2, 3, 4, 5, 6}


@dataclass(frozen=True)
class ScoringEvent:
    name: str
    dice_used: tuple
    damage: int


@dataclass(frozen=True)
class BattleReport:
    events: tuple = ()

    @property
    def total_damage(self):
        return sum(event.damage for event in self.events)

    def summary(self):
        if not self.events:
            return "No damage."

        event_text = " + ".join(
            f"{event.name} ({event.damage})" for event in self.events
        )

        return f"{event_text} = {self.total_damage}"


@dataclass(frozen=True)
class ComboRule:
    die: int
    count: int
    name: str
    damage: int

    def apply(self, counts):
        events = []

        number_of_combos = counts[self.die] // self.count

        for _ in range(number_of_combos):
            dice_used = tuple([self.die] * self.count)
            events.append(ScoringEvent(self.name, dice_used, self.damage))

        counts[self.die] %= self.count

        return events


@dataclass(frozen=True)
class SingleRule:
    die: int
    name: str
    damage: int

    def apply(self, counts):
        events = []

        for _ in range(counts[self.die]):
            events.append(ScoringEvent(self.name, (self.die,), self.damage))

        return events


COMBO_RULES = (
    ComboRule(1, 3, "Dragon Blast", 1000),
    ComboRule(2, 3, "Goblin Swarm", 200),
    ComboRule(3, 3, "Orc Charge", 300),
    ComboRule(4, 3, "Troll Smash", 400),
    ComboRule(5, 3, "Lightning Storm", 500),
    ComboRule(6, 3, "Demon Strike", 600),
)

SINGLE_RULES = (
    SingleRule(1, "Dragon Flame", 100),
    SingleRule(5, "Lightning Spark", 50),
)


def validate_dice(dice):
    for die in dice:
        if die not in VALID_DICE:
            raise ValueError("Die values must be between 1 and 6")


def score(dice):
    validate_dice(dice)

    counts = Counter(dice)
    events = []

    for rule in COMBO_RULES:
        events.extend(rule.apply(counts))

    for rule in SINGLE_RULES:
        events.extend(rule.apply(counts))

    return BattleReport(tuple(events))

test_scorer.py

"""All twelve cycles, all green. Read it as a contract."""
import pytest
from scorer import score, ScoringEvent


def test_empty_roll_has_zero_damage_and_no_events():
    report = score([])

    assert report.total_damage == 0
    assert report.events == ()


def test_single_one_creates_dragon_flame_event():
    report = score([1])

    assert report.total_damage == 100
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
    )


def test_single_five_creates_lightning_spark_event():
    report = score([5])

    assert report.total_damage == 50
    assert report.events == (
        ScoringEvent("Lightning Spark", (5,), 50),
    )


def test_two_ones_create_two_dragon_flames():
    report = score([1, 1])

    assert report.total_damage == 200
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Dragon Flame", (1,), 100),
    )


def test_one_and_five_create_two_different_events():
    report = score([1, 5])

    assert report.total_damage == 150
    assert report.events == (
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Lightning Spark", (5,), 50),
    )


def test_three_ones_create_dragon_blast_instead_of_three_flames():
    report = score([1, 1, 1])

    assert report.total_damage == 1000
    assert report.events == (
        ScoringEvent("Dragon Blast", (1, 1, 1), 1000),
    )


def test_dragon_blast_plus_leftover_flame_and_spark():
    report = score([1, 1, 1, 1, 5])

    assert report.total_damage == 1150
    assert report.events == (
        ScoringEvent("Dragon Blast", (1, 1, 1), 1000),
        ScoringEvent("Dragon Flame", (1,), 100),
        ScoringEvent("Lightning Spark", (5,), 50),
    )


def test_three_twos_create_goblin_swarm():
    report = score([2, 2, 2])

    assert report.total_damage == 200
    assert report.events == (
        ScoringEvent("Goblin Swarm", (2, 2, 2), 200),
    )


@pytest.mark.parametrize(
    "roll, expected_event",
    [
        ([3, 3, 3], ScoringEvent("Orc Charge", (3, 3, 3), 300)),
        ([4, 4, 4], ScoringEvent("Troll Smash", (4, 4, 4), 400)),
        ([5, 5, 5], ScoringEvent("Lightning Storm", (5, 5, 5), 500)),
        ([6, 6, 6], ScoringEvent("Demon Strike", (6, 6, 6), 600)),
    ],
)
def test_other_triples_create_combo_events(roll, expected_event):
    report = score(roll)

    assert report.total_damage == expected_event.damage
    assert report.events == (expected_event,)


def test_six_ones_create_two_dragon_blasts():
    report = score([1, 1, 1, 1, 1, 1])

    assert report.total_damage == 2000
    assert report.events == (
        ScoringEvent("Dragon Blast", (1, 1, 1), 1000),
        ScoringEvent("Dragon Blast", (1, 1, 1), 1000),
    )


def test_invalid_die_value_raises_error():
    with pytest.raises(ValueError, match="Die values must be between 1 and 6"):
        score([1, 2, 9])


def test_battle_report_has_human_readable_summary():
    report = score([1, 1, 1, 5])

    assert report.summary() == (
        "Dragon Blast (1000) + Lightning Spark (50) = 1050"
    )

Solution

scorer.py

from collections import Counter
from dataclasses import dataclass


VALID_DICE = {1, 2, 3, 4, 5, 6}


@dataclass(frozen=True)
class ScoringEvent:
    name: str
    dice_used: tuple
    damage: int


@dataclass(frozen=True)
class BattleReport:
    events: tuple = ()

    @property
    def total_damage(self):
        return sum(event.damage for event in self.events)

    def summary(self):
        if not self.events:
            return "No damage."

        event_text = " + ".join(
            f"{event.name} ({event.damage})" for event in self.events
        )

        return f"{event_text} = {self.total_damage}"


@dataclass(frozen=True)
class ComboRule:
    die: int
    count: int
    name: str
    damage: int

    def apply(self, counts):
        events = []

        number_of_combos = counts[self.die] // self.count

        for _ in range(number_of_combos):
            dice_used = tuple([self.die] * self.count)
            events.append(ScoringEvent(self.name, dice_used, self.damage))

        counts[self.die] %= self.count

        return events


@dataclass(frozen=True)
class SingleRule:
    die: int
    name: str
    damage: int

    def apply(self, counts):
        events = []

        for _ in range(counts[self.die]):
            events.append(ScoringEvent(self.name, (self.die,), self.damage))

        return events


COMBO_RULES = (
    ComboRule(1, 3, "Dragon Blast", 1000),
    ComboRule(2, 3, "Goblin Swarm", 200),
    ComboRule(3, 3, "Orc Charge", 300),
    ComboRule(4, 3, "Troll Smash", 400),
    ComboRule(5, 3, "Lightning Storm", 500),
    ComboRule(6, 3, "Demon Strike", 600),
)

SINGLE_RULES = (
    SingleRule(1, "Dragon Flame", 100),
    SingleRule(5, "Lightning Spark", 50),
)


def validate_dice(dice):
    for die in dice:
        if die not in VALID_DICE:
            raise ValueError("Die values must be between 1 and 6")


def score(dice):
    validate_dice(dice)

    counts = Counter(dice)
    events = []

    for rule in COMBO_RULES:
        events.extend(rule.apply(counts))

    for rule in SINGLE_RULES:
        events.extend(rule.apply(counts))

    return BattleReport(tuple(events))

The final implementation as it stood at the end of cycle 12. Use this step for reading and reflection — there are no new tasks, just the comprehensive knowledge check that follows.

Final Quiz — The Big Picture

Min. score: 80%

1. Which single statement most accurately captures TDD?

A way to achieve 100% line coverage without writing extra tests later
A design discipline where tests drive the structure of production code; the test suite is a valuable byproduct
A workflow optimisation that ships features faster than test-after development
A coverage technique ensuring every line is exercised before release

The threshold concept of the tutorial: TDD is design, not testing. Cycle 8’s rule-object refactor emerged under cycle-8’s test pressure, not from a whiteboard. The test suite is a (valuable) byproduct.

2. For which scenario is TDD most beneficial?

A 25-line one-off file-renaming script run once and discarded
A new payment-processing module: complex validation, multi-developer code, high cost of regression
A Jupyter notebook exploring a new dataset
Hand-written CSS for a landing page judged visually

Payment processing has every property TDD rewards: complex logic, strict correctness, multiple developers, long life. Microsoft/IBM measured 39–91% defect-density reductions exactly here.

3. A test file contains:

def test_user_creation():
    user = User("Alice", "alice@example.com")
    assert user._name == "Alice"
    assert user._email == "alice@example.com"

Which anti-pattern is present?

Excessive Setup — too many asserts
The Nitpicker — asserts on private attributes (_name, _email), coupling the test to implementation details
The Mockery — the test verifies the User constructor
Skip-the-Refactor — should be split into smaller tests

The Nitpicker. Leading underscores signal “private — implementation detail.” Tests that touch private state break on every refactor. Assert on observable behavior via the public API.

4. The “one Dragon Blast for any count ≥ 3” bug lived in cycles 6–10. Why didn’t line coverage catch it?

Coverage caught it; pytest ignored the warning
Coverage measures which lines ran, not whether the line is correct for all relevant inputs. The concept that catches it is boundary thinking — probing inputs at exactly N, 2N, between, etc.
Coverage requires the --strict flag we didn’t enable
Coverage can’t measure exception-raising code

Coverage tells you the line ran, not that it was right. Every prior combo test used exactly count dice; nothing probed 2 × count. Boundary thinking is the discipline that catches it. Coverage is a locator of under-tested code, not a measure of correctness.

TDD with pytest — Dragon Dice Battle

Cycle 1 — RED: Write the Failing Test

Test-Driven Development in one minute

Why a failing test is the goal of RED

The shape of every pytest test

The dragon dice rules (reference for all 12 cycles)

Cycle 1’s spec

Your task

Detail: why a tuple, not a list

References

Self-test before clicking Next

Solution

Cycle 1 — GREEN: Make It Pass

The test is your contract

Three Python tools the scaffold uses

The Transformation Priority Premise — why “smallest” beats “best”

Your task

Common mistakes to avoid

Self-test before clicking Next

Solution

Cycle 1 — REFACTOR: The Pause That Counts

The REFACTOR checklist (you’ll re-use this every cycle)

Your task

Why REFACTOR is the most-skipped phase

Solution

First Quiz — The Red-Green-Refactor Rhythm

Cycle 2 — Single 1 → Dragon Flame

Your task

Why “allow the hard-code” is a TDD discipline

Solution

Cycle 3 — Single 5 → Lightning Spark

Your task

Why deliberately keeping ugly code is the disciplined move

Solution

Cycle 4 — Repeated Singles → First Real Refactor

Your task

The threshold-concept moment: tests enable change

Solution

Cycle 4 Quiz — Tests as Design Pressure

Cycle 5 — Mixed Dice (No Production Change)

Your task

The mutation move (verify the test isn’t vacuous — do this!)

Solution

Cycle 6 — Triple 1s → Dragon Blast (Design Moment)

Your task

Why this was a design-breaking test (and why the previous tests still pass)

Solution

Cycle 6 Quiz — Design-Breaking Tests

Cycle 7 — Combo with Leftover Singles

Your task

Why guardrail tests matter

Solution

Cycle 8 — Goblin Swarm → Rule Objects (Big Refactor)

Your task

Listening to the test & the Open-Closed Principle

Solution

Cycle 8 Quiz — Listening to the Test

Cycle 9 — Add the Rest of the Combo Rules

Your task

Why parametrize beats a for-loop inside one test

Solution

Cycle 10 — Six 1s → Two Dragon Blasts (Hidden Bug)

Reveal the hidden bug (open after you’ve predicted)

Your task

What this teaches about coverage vs. boundary thinking

Solution

Cycle 10 Quiz — TDD Finds Bugs You Didn't Know You Had

Cycle 11 — Invalid Dice Raise an Error

Your task

Why match= matters & happy-path-only is dangerous

Solution

Cycle 12 — Human-Readable Battle Summary

Your task

A small design choice: method vs. property?

Solution

The Big Picture — Lessons from Twelve Cycles

The takeaways that travel

The journey in one table

When TDD shines, when it’s overkill

TDD anti-pattern taxonomy

Why `match=` matters & happy-path-only is dangerous