1

Why Test? The Bug That Got Away

The Lay of the Land

Learning objective: After this step you will be able to explain why automated tests are valuable and use assert to verify function behavior.

You write Python code that works… until someone changes it and breaks something you did not expect. Testing is the safety net that catches those breaks before they reach users.

A Common Misconception

Many students think testing is about finding bugs after you write code. That is only half the story. Tests also:

Think of tests as a ratchet: once a test passes, your progress is locked in. You can never accidentally slip backward.

A Note Before We Begin

TDD asks you to write a test before the code exists. This feels backward at first — and that is completely normal. Every developer who learns TDD goes through an uncomfortable adjustment period. If TDD feels strange or slow, you are doing it right. Stick with the process; it gets natural with practice.

Predict Before You Run

The function below is supposed to convert a numeric score (0–100) to a letter grade. Before running it, read the code carefully and predict: what will calculate_grade(85) return? What about calculate_grade(90)?

Write your predictions down, then click Run to see if you were right.

Task

  1. Run grades.py and observe the output. Was your prediction correct?
  2. Find and fix the bug in calculate_grade().
  3. Add three assert statements at the bottom of the file to verify:
    • calculate_grade(95) returns "A"
    • calculate_grade(85) returns "B"
    • calculate_grade(72) returns "C"

An assert works like this:

assert calculate_grade(95) == "A", "95 should be an A"

If the condition is False, Python raises an AssertionError with the message.

Starter files
grades.py
def calculate_grade(score):
    """Convert a numeric score (0-100) to a letter grade."""
    if score > 90:
        return "A"
    if score >= 80:
        return "B"
    if score >= 70:
        return "C"
    if score >= 60:
        return "D"
    return "F"


# Try it out — predict the output before running!
print("grade(95) =", calculate_grade(95))
print("grade(85) =", calculate_grade(85))
print("grade(90) =", calculate_grade(90))
print("grade(45) =", calculate_grade(45))

# TODO: Add three assert statements below to verify:
#   calculate_grade(95) == "A"
#   calculate_grade(85) == "B"
#   calculate_grade(72) == "C"
2

Your First pytest Test

Learning objective: After this step you will be able to write pytest test functions and interpret pytest output.

From assert to pytest

Raw assert statements work, but they give minimal feedback when things fail. pytest is Python’s most popular testing framework. It discovers and runs your tests automatically, gives clear failure output, and scales from tiny scripts to enormous projects.

pytest Conventions

pytest uses simple naming conventions — no classes, no boilerplate:

Convention Rule Example
File name Must start with test_ test_temperature.py
Function name Must start with test_ def test_freezing_point():
Assertions Plain assert — same as before assert celsius_to_fahrenheit(0) == 32

That is it. No import unittest, no class TestSomething(unittest.TestCase), no self.assertEqual(). Just functions and assert.

Running Tests

In this tutorial, clicking Run on a test file will execute pytest automatically. You will see output like:

test_temperature.py::test_freezing_point PASSED
test_temperature.py::test_boiling_point PASSED

A PASSED test means the assertion held. A FAILED test shows you exactly what went wrong.

Predict Before You Run

Before writing any code, predict: what will pytest print if a test function only has pass and no assert? Will it PASS or FAIL? Click Run on test_temperature.py now — before changing anything — and see if your prediction is correct.

Task

The temperature.py module is provided and working. Your job is to complete test_temperature.py:

  1. Fill in test_freezing_point — assert that celsius_to_fahrenheit(0) returns 32.0
  2. Fill in test_boiling_point — assert that celsius_to_fahrenheit(100) returns 212.0
  3. Write test_negative_forty from scratch — assert that celsius_to_fahrenheit(-40) returns -40.0 (the unique crossover point where Celsius equals Fahrenheit!)

Click Run on test_temperature.py after each change to see your tests execute.

Starter files
temperature.py
"""Temperature conversion utilities."""


def celsius_to_fahrenheit(celsius):
    """Convert Celsius to Fahrenheit."""
    return celsius * 9 / 5 + 32


def fahrenheit_to_celsius(fahrenheit):
    """Convert Fahrenheit to Celsius."""
    return (fahrenheit - 32) * 5 / 9
test_temperature.py
"""Tests for the temperature module."""
import pytest
from temperature import celsius_to_fahrenheit, fahrenheit_to_celsius


def test_freezing_point():
    # TODO: assert that celsius_to_fahrenheit(0) returns 32.0
    pass


def test_boiling_point():
    # TODO: assert that celsius_to_fahrenheit(100) returns 212.0
    pass


# TODO: Write a test_negative_forty function from scratch.
# Assert that celsius_to_fahrenheit(-40) returns -40.0


if __name__ == "__main__":
    pytest.main(["-v"])
3

The Red-Green-Refactor Rhythm

Learning objective: After this step you will be able to execute a complete Red-Green-Refactor TDD cycle and explain why TDD is a design technique, not just testing.

The Core Insight

Here is the single most important idea in this tutorial:

TDD is a design technique, not a testing technique.

When you write the test first, you are forced to decide:

You answer all these design questions before writing a single line of implementation. The tests are a byproduct — the real value is the design thinking.

The Three Phases

TDD follows a strict three-phase cycle, repeated in short iterations:

Phase Color What You Do Key Rule
1. RED :red_circle: Write ONE failing test The test describes desired behavior that does not exist yet
2. GREEN :green_circle: Write the MINIMUM code to pass Do not optimize. Do not add features. Just make the test pass
3. REFACTOR :large_blue_circle: Clean up the code Remove duplication, improve names — but don’t change behavior. All tests must still pass

Critical misconception: You do NOT write all your tests first. You write one test, make it pass, refactor, then write the next test. One cycle at a time.

Worked Example: WordCounter

Let’s build a WordCounter class using TDD. We will walk through one full cycle together, then you will continue.

Cycle 1: Count words in a simple sentence (WORKED EXAMPLE)

RED — Write a test that describes the behavior we want:

# Sub-goal: Define WHAT we want (the interface + expected behavior)
def test_count_words_simple():
    counter = WordCounter()
    result = counter.count("hello world")
    assert result == {"hello": 1, "world": 1}

Run this test — it fails because WordCounter does not exist yet. This is the RED phase. The failure is expected and correct.

GREEN — Write the minimum code to make it pass:

# Sub-goal: Write just enough code to pass the test
class WordCounter:
    def count(self, text):
        words = text.split()
        result = {}
        for word in words:
            result[word] = result.get(word, 0) + 1
        return result

Run the test again — it passes. This is the GREEN phase.

REFACTOR — The code is clean enough for now, so no refactoring needed. All tests still pass.

Task

Open test_word_counter.py. Cycle 1’s test is already provided.

  1. SEE RED FIRST: Click Run on test_word_counter.py before writing any code. You should see the test FAIL with an ImportError — this IS the RED phase. The test describes behavior that does not exist yet. This failure is expected and correct.
  2. Go GREEN: Now write the WordCounter class in word_counter.py to make the test pass. Run again — you should see GREEN.
  3. Cycle 2: Complete test_count_repeated_words (partially scaffolded). Run the tests — the new test should FAIL (RED). Then update your implementation if needed to pass it (GREEN).
  4. REFACTOR: After Cycle 2 passes, look at your count() method. If you used a manual dictionary loop, try refactoring to use Python’s collections.Counter — which does the same thing in one line. Run the tests after refactoring to confirm they still pass. This is the REFACTOR phase: improve the code without changing behavior.
  5. Cycle 3: Write test_count_empty_string entirely on your own. What should count("") return? Decide, write the test, see it fail, then make it pass.

Important: Always run the tests and see RED before writing implementation. If you skip this step, you cannot be sure your test is actually testing something.

Starter files
word_counter.py
# TODO: Implement the WordCounter class here.
# It should have a count(text) method that returns
# a dictionary of {word: count} for each word in the text.
test_word_counter.py
"""TDD for WordCounter — practice the Red-Green-Refactor rhythm."""
import pytest
from word_counter import WordCounter


# --- Cycle 1 (test provided — see RED first, then write implementation to go GREEN) ---
def test_count_words_simple():
    """A simple sentence with no repeated words."""
    counter = WordCounter()
    result = counter.count("hello world")
    assert result == {"hello": 1, "world": 1}


# --- Cycle 2 (partially scaffolded — complete the assertion) ---
def test_count_repeated_words():
    """A sentence with repeated words should count each occurrence."""
    counter = WordCounter()
    result = counter.count("the cat sat on the mat")
    # TODO: assert that 'the' appears 2 times and 'cat' appears 1 time
    # Hint: assert result["the"] == ??? and result["cat"] == ???
    pass


# --- Cycle 3 (write this test entirely on your own) ---
# TODO: Write test_count_empty_string
# What should counter.count("") return? Decide, then test it.


if __name__ == "__main__":
    pytest.main(["-v"])
4

TDD Kata: FizzBuzz

Learning objective: After this step you will be able to apply the Red-Green-Refactor cycle independently across multiple iterations with fading scaffolding.

The FizzBuzz Kata

A “kata” is a small, focused exercise designed to practice the TDD rhythm. We will build the classic FizzBuzz function using four TDD cycles, with the scaffolding fading at each step:

FizzBuzz rules:

Cycle 1: Regular numbers (FULL WORKED EXAMPLE)

The first test and implementation are provided for you. Study how the cycle works:

RED: test_returns_number_as_string checks that fizzbuzz(1) returns "1" and fizzbuzz(2) returns "2".

GREEN: The simplest implementation is return str(number).

REFACTOR: Nothing to refactor yet.

Run the tests to confirm Cycle 1 passes (GREEN).

Self-check: Why does the Cycle 1 implementation use return str(number) instead of a more complete solution? What TDD principle does this follow? (Answer: the GREEN phase requires writing the minimum code to pass the current test. We do not add logic for requirements that have no test yet.)

Cycle 2: Fizz (PARTIAL SCAFFOLD)

The test test_fizz_for_multiples_of_3 is written for you. Your job:

  1. Run the tests — you should see this new test FAIL (RED)
  2. Modify fizzbuzz() in fizzbuzz.py to make it pass (GREEN)
  3. Refactor if needed

Cycle 3: Buzz (HINT ONLY)

Write test_buzz_for_multiples_of_5 yourself. The function signature is provided — fill in the assertions. Run the tests and see RED first, then write the implementation to go GREEN.

Cycle 4: FizzBuzz (INDEPENDENT)

Write test_fizzbuzz_for_multiples_of_15 and the corresponding implementation entirely on your own. Full TDD cycle, no scaffolding. Start with the test. See RED. Then go GREEN.

Remember the rhythm: RED (see it fail) → GREEN (make it pass) → REFACTOR (clean up).

Starter files
fizzbuzz.py
def fizzbuzz(number):
    """Return FizzBuzz result for the given number.

    Rules:
    - Divisible by 3 -> "Fizz"
    - Divisible by 5 -> "Buzz"
    - Divisible by both 3 and 5 -> "FizzBuzz"
    - Otherwise -> the number as a string
    """
    # TODO: Add Fizz/Buzz/FizzBuzz logic ABOVE the return below.
    # Each cycle should add a new condition.
    return str(number)
test_fizzbuzz.py
"""TDD Kata: FizzBuzz — practice the rhythm with fading scaffolding."""
import pytest
from fizzbuzz import fizzbuzz


# --- Cycle 1 (FULL WORKED EXAMPLE — provided) ---
def test_returns_number_as_string():
    assert fizzbuzz(1) == "1"
    assert fizzbuzz(2) == "2"
    assert fizzbuzz(4) == "4"


# --- Cycle 2 (PARTIAL SCAFFOLD — test provided, write the implementation) ---
def test_fizz_for_multiples_of_3():
    assert fizzbuzz(3) == "Fizz"
    assert fizzbuzz(6) == "Fizz"
    assert fizzbuzz(9) == "Fizz"


# --- Cycle 3 (HINT ONLY — write the assertions and implementation) ---
def test_buzz_for_multiples_of_5():
    # TODO: assert that fizzbuzz(5), fizzbuzz(10), fizzbuzz(20)
    # all return "Buzz"
    pass


# --- Cycle 4 (INDEPENDENT — write this test from scratch) ---
# TODO: Write test_fizzbuzz_for_multiples_of_15
# Test that fizzbuzz(15), fizzbuzz(30), fizzbuzz(45) return "FizzBuzz"
# Then update fizzbuzz.py to make it pass.
# Hint: If using if/elif, check divisibility by 15 BEFORE 3 or 5. Why?


if __name__ == "__main__":
    pytest.main(["-v"])
5

Test Behavior, Not Implementation

Learning objective: After this step you will be able to distinguish between brittle (implementation-coupled) and robust (behavior-focused) tests, and explain why this matters for refactoring.

The Key Principle

Test what a function DOES, not how it does it.

A brittle test breaks when you change the internal implementation, even if the behavior is unchanged. A robust test only breaks when the actual behavior changes.

Example: Two Ways to Test the Same Function

Consider testing a function that sorts a list of students by GPA:

# BRITTLE — tests implementation details
def test_sort_brittle():
    sorter = StudentSorter()
    sorter.sort(students)
    # Checks internal state — breaks if we rename the field
    assert sorter._sorted_list[0].name == "Alice"
    assert sorter._comparison_count == 3  # Why do we care?

# ROBUST — tests behavior
def test_sort_robust():
    sorter = StudentSorter()
    result = sorter.sort(students)
    # Checks what the function RETURNS — the observable behavior
    assert result[0].gpa >= result[1].gpa  # Sorted descending
    assert len(result) == len(students)     # No items lost

The brittle test will break if you rename _sorted_list or change the sort algorithm (even to a faster one). The robust test only cares about the result.

The Refactoring Litmus Test

Here is a simple rule: if you refactor the internals of a function and all tests still pass, your tests are robust. If tests break after a pure refactoring (no behavior change), they are testing implementation.

Task 1: Spot the Brittle Tests

Open test_brittle_audit.py. It contains four tests for a simple ShoppingCart class. Two tests are brittle (they test implementation details) and two are robust (they test behavior). Your job:

  1. Read each test carefully
  2. Identify which two are brittle and which two are robust
  3. Fix the two brittle tests by rewriting them to test behavior instead of implementation
  4. Run the tests to confirm they all pass

Task 2: The Refactoring Litmus Test

Now refactor your FizzBuzz from Step 4 using a different approach — for example, string concatenation instead of if/elif:

def fizzbuzz(number):
    result = ""
    if number % 3 == 0:
        result += "Fizz"
    if number % 5 == 0:
        result += "Buzz"
    return result or str(number)

Run test_fizzbuzz.py — all tests should still pass, proving they test behavior, not implementation.

Starter files
shopping_cart.py
class ShoppingCart:
    def __init__(self):
        self._items = []

    def add(self, name, price):
        self._items.append({"name": name, "price": price})

    def total(self):
        return sum(item["price"] for item in self._items)

    def item_count(self):
        return len(self._items)

    def items(self):
        return [item["name"] for item in self._items]
test_brittle_audit.py
"""AUDIT: Two of these tests are brittle. Find and fix them."""
import pytest
from shopping_cart import ShoppingCart


# Test 1
def test_add_item_updates_count():
    cart = ShoppingCart()
    cart.add("Apple", 1.50)
    assert cart.item_count() == 1


# Test 2 — BRITTLE? Check if this tests behavior or implementation.
def test_add_item_internal_list():
    cart = ShoppingCart()
    cart.add("Apple", 1.50)
    # Accesses private attribute directly
    assert cart._items[0]["name"] == "Apple"
    assert cart._items[0]["price"] == 1.50


# Test 3
def test_total_sums_prices():
    cart = ShoppingCart()
    cart.add("Apple", 1.50)
    cart.add("Bread", 2.00)
    assert cart.total() == 3.50


# Test 4 — BRITTLE? Check if this tests behavior or implementation.
def test_internal_list_length():
    cart = ShoppingCart()
    cart.add("Apple", 1.50)
    cart.add("Bread", 2.00)
    # Accesses private attribute directly
    assert len(cart._items) == 2


if __name__ == "__main__":
    pytest.main(["-v"])
6

TDD Kata: Password Validator

Learning objective: After this step you will be able to independently apply the full TDD cycle — writing tests first, then implementation — for a new problem.

The Production Phase

This is the real test of your TDD skills. You will receive requirements and a minimal scaffold — a stub function and one example test. The rest is up to you.

Requirements: validate_password(password)

Build a function that validates passwords. It should return a tuple: (is_valid, message).

Rule Valid Example Invalid Example
At least 8 characters "SecureP1" "Short1"
Contains at least one uppercase letter "Password1" "password1"
Contains at least one digit "Password1" "Password"

Return values:

Your TDD Workflow

The first test (test_valid_password) is provided as scaffolding. Your job:

  1. Write at least 3 more test functions in test_password.py — one for each rule (too short, missing uppercase, missing digit). The function signatures are suggested in the comments.
  2. Run tests — they should all FAIL (RED) since password.py only has a stub
  3. Implement validate_password() in password.py to make tests pass (GREEN)
  4. Refactor if needed

If You Feel Stuck

That is completely normal. Writing tests for something that does not exist yet feels backward at first. Every TDD practitioner felt this way initially. Focus on what the function should return for each input — that is all a test needs to express.

Remember: Write the tests before the implementation. This forces you to design the interface (function name, parameters, return type) before writing any logic.

Starter files
password.py
def validate_password(password):
    """Validate a password against security rules.

    Returns:
        (True, "Valid") if all rules pass.
        (False, "reason") if a rule is violated.
    """
    # TODO: Replace this stub with real validation logic.
    # The stub returns a failing tuple so tests show clean RED.
    return (False, "Not implemented")
test_password.py
"""TDD Kata: Password Validator — write your tests FIRST."""
import pytest
from password import validate_password


# Here is the first test as an example (the "happy path"):
def test_valid_password():
    is_valid, message = validate_password("SecureP1")
    assert is_valid is True
    assert message == "Valid"


# TODO: Write at least 3 more test functions below.
# Each test should check ONE validation rule:
#
# def test_rejects_short_password():
#     ...
#
# def test_rejects_missing_uppercase():
#     ...
#
# def test_rejects_missing_digit():
#     ...


if __name__ == "__main__":
    pytest.main(["-v"])
7

The Big Picture

Learning objective: After this step you will be able to evaluate when TDD is appropriate, identify common TDD anti-patterns, and articulate the key principles you learned.

When TDD Shines

TDD is most valuable when:

When TDD is Overkill

TDD is less useful for:

Even in these cases, some tests are almost always valuable. The question is whether to write them first.

The Residual Effect

Research shows that students who learn TDD properly continue to write significantly more tests in future projects, even when not explicitly asked to. TDD doesn’t just teach testing — it permanently shifts how you think about code quality.

Key Takeaways

  1. TDD is a design technique, not a testing technique — tests are a valuable byproduct
  2. Red-Green-Refactor — one test at a time, in short iterations
  3. Test behavior, not implementation — if refactoring breaks tests, the tests are brittle
  4. Listen to the test — difficulty writing a test is diagnostic feedback about your design
  5. The ratchet — each passing test locks in progress permanently

What’s Next

This tutorial covered the fundamentals. In more advanced TDD work, you will encounter:

Final Exercise: Diagnose This Test Suite

Open test_diagnose.py. It contains a test suite written by a junior developer. Read each test and identify which TDD principle it violates (if any). Add a # VERDICT: comment to each test function explaining your analysis. This is not graded — it is a reflection exercise.

Starter files
test_diagnose.py
"""Diagnose: Which principles does each test violate (if any)?"""
import pytest


# --- Test 1 ---
def test_login_system():
    """Tests login, logout, session, and password reset all in one."""
    user = {"username": "alice", "password": "Secret1"}
    assert authenticate(user["username"], user["password"]) == True
    assert create_session(user["username"]) is not None
    assert logout(user["username"]) == True
    assert reset_password(user["username"], "NewSecret1") == True
    # VERDICT: (your analysis here)


# --- Test 2 ---
def test_add_returns_sum():
    assert add(2, 3) == 5
    # VERDICT: (your analysis here)


# --- Test 3 ---
def test_user_internals():
    user = User("Bob")
    assert user._internal_id is not None
    assert user._created_at is not None
    # VERDICT: (your analysis here)


# --- Test 4 ---
def test_divide_by_zero():
    result = safe_divide(10, 0)
    assert result is None
    # VERDICT: (your analysis here)