Playwright Tutorial — Print View

1

Anatomy of a Playwright Test: Navigate, Interact, Assert

Why this matters

Every Playwright test you ever write — at work, on capstones, debugging at 11pm — is a variation on three lines: navigate to the page, interact with the UI, assert what the user sees. Lock that rhythm in now and the rest of the tutorial becomes pattern-matching against it. Skip it, and every later step feels like memorization.

🎯 You will learn to

Analyze a basic Playwright test and identify how each line maps onto the Arrange / Act / Assert pattern from Testing Foundations
Apply the navigate-interact-assert rhythm to read unfamiliar Playwright tests at a glance

In Testing Foundations you wrote tests like this:

def test_valid_name_accepted():
    assert squad_name_valid("epic") is True

That test verifies one function in isolation. A Playwright test verifies a whole React app through a real browser, the way a user experiences it. Same AAA bones, different organism.

🔄 Concept bridge

Testing Foundations (pytest)	Playwright (e2e)
Arrange / Act / Assert	Navigate / Interact / Assert
Function inputs	User actions through the UI
Direct return value	Observable outcome on the page
Synchronous	Async (`await` everywhere)
Strong oracle = `==` exact match	Strong oracle = `toHaveText`, `toHaveCount`, …

The discipline is the same. The mechanics differ.

🌳 Primer: what `getByRole` actually queries

Before you read the test, lock in this concept — every locator in the test below depends on it.

Every HTML element has an implicit role that the browser exposes to assistive technology (screen readers, voice control, etc.). The browser maintains a parallel tree — the accessibility tree — that mirrors the DOM but only contains semantically meaningful elements with their roles, names, and states.

HTML	Implicit role	Accessible name source
`<button>Save</button>`	`button`	the visible text “Save”
`<input type="text">`	`textbox`	a `<label for=...>` or `aria-label`
`<a href="...">Home</a>`	`link`	the visible link text
`<ul><li>X</li></ul>`	`list` containing `listitem`	(none — structural)
`<h2>Settings</h2>`	`heading`	the visible heading text
`<div onclick=...>Click me</div>`	(no role)	(no name) — invisible to screen readers

page.getByRole('button', { name: /add todo/i }) queries this tree, not the DOM. It says: “find the element with accessible role button whose accessible name matches the regex /add todo/i.” The query doesn’t care whether the button is <button class="primary">, <button data-print-id="add">, or wrapped in five <div>s — only the role and name.

Why this matters:

Locators stay stable across CSS refactors — change the class, change the layout, the locator still works.
Locators break when accessibility breaks — if a teammate replaces <button> with <div onclick="...">, the locator stops finding it. That’s a feature, not a bug: the change made the page worse for screen-reader users, and the test failure surfaces that regression.
You’re testing the same thing the user (and their assistive tech) sees — not the same thing the React renderer happens to emit on a given day.

With that primer in mind, every getByRole(...) call below is a query against the accessibility tree.

Read this test (don’t run yet)

import { test, expect } from '@playwright/test';

test('user can add a todo', async ({ page }) => {
  await page.goto('/');                                                  // Navigate
  await page.getByRole('textbox', { name: /todo item/i }).fill('Milk');  // Interact
  await page.getByRole('button', { name: /add todo/i }).click();         // Interact
  await expect(page.getByRole('listitem')).toHaveText('Milk');           // Assert
});

Annotations that matter:

async ({ page }) => { … } — every Playwright test is async. page is your handle to the browser tab.
await on every line — the browser is asynchronous. Without await, JavaScript races past the click before React’s state has updated.
getByRole('button', { name: /add todo/i }) — queries the accessibility tree (per the primer above) for a button with the accessible name “Add todo”.
await expect(...).toBeVisible() — Playwright’s web-first assertions auto-wait and retry until the condition holds (or the timeout expires). They’re the right tool for asynchronous UI.

⚠️ Negative-transfer trap: this is *not* React Testing Library or Jest

If you’ve used React Testing Library (RTL) with Jest, the API looks deceptively similar — getByRole, getByText, expect(...).toBeVisible(). The methods have the same names but different machinery underneath:

Comparison point	React Testing Library + Jest	Playwright
What runs the test	jsdom (a fake DOM in Node)	a real Chromium browser
Render	React’s renderer alone	the full app + bundler + browser
`getByRole(...)`	synchronous, returns immediately	returns a locator — async, retries
`expect(x).toBeVisible()`	synchronous Jest matcher	`await expect(locator).toBeVisible()` — async, auto-retries
A failing assertion	shows the rendered DOM	shows the failing accessibility tree + screenshot
Snapshot tests	common (`toMatchSnapshot`)	strongly discouraged for e2e — they brittle on every render
Deep render assertions	“the component received prop X”	not even possible — Playwright sees only what the user sees

Three habits to retire before continuing:

Never write expect(await locator.isVisible()).toBe(true). That looks like Jest, but it runs once and races. Always await expect(locator).toBeVisible() — Playwright’s web-first form retries.
Don’t reach for snapshot matchers. toMatchSnapshot works in Playwright but is the wrong tool for e2e — every refactor breaks the snapshot, even when the user-visible behavior is unchanged. Use toHaveText, toHaveCount, toHaveURL — assertions that mirror what the user would notice.
Don’t probe component internals. “Was prop X passed?” “Is useState set to Y?” — those are unit-test concerns. Playwright sees what the browser renders. If a behavior isn’t observable through the UI, it’s not Playwright’s job to verify.

🎬 Predict — commit to a letter, then click reveal

Read the test above and pick one answer for each question. Commit (out loud, on paper, or in your head) before opening the reveal — predicting something is what primes the encoding; skim-and-reveal is no learning.

Q1. If we changed name: /add todo/i to name: /save/i, what happens?

(a) The test still passes — getByRole matches buttons by role, not name.
(b) The test fails fast — Playwright throws “no such button” on the next line.
(c) The test fails on a 30-second timeout — the locator silently retries waiting for a “Save” button that never appears.
(d) Compile error — name: requires a string literal, not a regex.

Reveal — pick first, then click

(c). The role+name query is async and retrying (that’s the whole point of web-first locators). With no matching button, Playwright keeps retrying until the action timeout — which surfaces as a slow-failing test, not a fast crash. (a) is the wrong direction — name is the required filter, not a hint. (b) is the React Testing Library mental model leaking in: RTL’s getByRole throws synchronously; Playwright’s doesn’t. (d) is wrong because regex is allowed (and idiomatic).

Q2. Which line is the Assert step?

(a) await page.goto('/')
(b) await page.getByRole('textbox', ...).fill('Milk')
(c) await page.getByRole('button', ...).click()
(d) await expect(page.getByRole('listitem')).toHaveText('Milk')

Reveal

(d). Only expect(...) calls are assertions — they check an outcome. goto, fill, click are commands that do things to the page. If you can’t point to which line is the assertion, the test isn’t proving what you think.

▶ Run

Click Test in the Live Preview toolbar. The test passes against the demo Todo app.

🔍 Investigate

Why is await on every line? The browser is asynchronous: clicking a button doesn’t instantly produce the result. await says “wait for this to finish before moving on.” Without await, the assertion would race past the click before React re-rendered, and the test would either fail or — worse — pass for the wrong reason.

✏️ Modify — predict the failure shape, then run

Change the assertion to look for 'Bread' instead of 'Milk'. Before you click Test, commit to one of these:

(a) Locator-not-found timeout (no element matched).
(b) Text mismatch — the failure message names both the expected (Bread) and actual (Milk) text.
(c) Both — Playwright reports two failures.
(d) The test passes — toHaveText does a substring match.

Run, then check your prediction.

Reveal

(b). The locator finds the listitem (it exists); the assertion fails on the text comparison and the failure message includes both expected and actual. Building the habit of predicting the failure message shape is the difference between debugging by reading and debugging by guessing.

📝 House rule (carry it forward)

A Playwright test reads navigate → interact → assert. The test title is the spec — what user-visible promise we’re proving — not a description of clicks.

Starter files

src/App.jsx

function App() {
  const [items, setItems] = React.useState([]);
  const [text, setText] = React.useState('');

  function addTodo() {
    const trimmed = text.trim();
    if (!trimmed) return;

    setItems([...items, trimmed]);
    setText('');
  }

  return (
    <main className="todo-shell">
      <section className="todo-panel">
        <p className="eyebrow">Playwright tutorial</p>
        <h1>Todo Lab</h1>

        <div className="todo-form">
          <label htmlFor="todo-input">Todo item</label>
          <div className="todo-row">
            <input
              id="todo-input"
              value={text}
              onChange={(event) => setText(event.target.value)}
              placeholder="Buy milk"
            />
            <button onClick={addTodo}>Add todo</button>
          </div>
        </div>

        <ul aria-label="Todo list" className="todo-list">
          {items.map((item, index) => (
            <li key={index}>{item}</li>
          ))}
        </ul>
      </section>
    </main>
  );
}

src/main.jsx

const root = ReactDOM.createRoot(document.getElementById('root'));
root.render(<App />);

src/styles.css

body {
  margin: 0;
  font-family: system-ui, -apple-system, BlinkMacSystemFont, "Segoe UI", sans-serif;
  background: #f6f7fb;
  color: #1f2937;
}

.todo-shell {
  min-height: 100vh;
  display: grid;
  place-items: center;
  padding: 32px;
}

.todo-panel {
  width: min(100%, 560px);
  background: white;
  border: 1px solid #d9dee8;
  border-radius: 8px;
  padding: 28px;
  box-shadow: 0 18px 40px rgba(31, 41, 55, 0.08);
}

.eyebrow {
  margin: 0 0 8px;
  color: #4b5563;
  font-size: 0.85rem;
  font-weight: 700;
  text-transform: uppercase;
  letter-spacing: 0.04em;
}

h1 { margin: 0 0 24px; font-size: 2rem; }
label { display: block; margin-bottom: 8px; font-weight: 700; }
.todo-row { display: flex; gap: 10px; }

input {
  flex: 1;
  min-width: 0;
  background: white;
  color: #1f2937;
  border: 1px solid #b8c0cc;
  border-radius: 6px;
  padding: 10px 12px;
  font: inherit;
}

button {
  border: 0;
  border-radius: 6px;
  padding: 10px 14px;
  background: #2563eb;
  color: white;
  font: inherit;
  font-weight: 700;
  cursor: pointer;
}

.todo-list { margin: 24px 0 0; padding-left: 24px; }
.todo-list:empty { display: none; }
.todo-list li { margin: 8px 0; }

/* Dark mode — the iframe inherits the host page's theme via
   [data-bs-theme="dark"] on <html>. Mirror the site's dark palette
   so the Todo app preview stays legible when students switch themes. */
[data-bs-theme="dark"] body { background: #1c2533; color: #e6edf3; }
[data-bs-theme="dark"] .todo-panel {
  background: #232a36;
  border-color: #2a323e;
  box-shadow: 0 18px 40px rgba(0, 0, 0, 0.4);
}
[data-bs-theme="dark"] .eyebrow { color: #9ca3af; }
[data-bs-theme="dark"] input {
  background: #2a323e;
  color: #e6edf3;
  border-color: #3a4351;
}
[data-bs-theme="dark"] input::placeholder { color: #6b7280; }
[data-bs-theme="dark"] button { background: #2563eb; }

tests/todo.spec.js

import { test, expect } from '@playwright/test';

test('user can add a todo', async ({ page }) => {
  await page.goto('/');
  await page.getByRole('textbox', { name: /todo item/i }).fill('Milk');
  await page.getByRole('button', { name: /add todo/i }).click();
  await expect(page.getByRole('listitem')).toHaveText('Milk');
});

Step 1 — Knowledge Check

Min. score: 80%

1. Which of these test titles best describes a behavioral spec (rather than a click-script)?

clicks add button and waits for list
This describes the clicks the test performs, not the behavior the user can do. A future developer reading a CI failure on this title can’t tell what user-facing promise broke.
user can add a todo and see it in the list
Right. “user can add a todo and see it in the list” reads like a product promise. A failure on this test immediately tells the reader what regressed: the user can no longer add a todo.
test_add_button_click
The test_ prefix is fine, but the rest is tied to UI mechanics (a button click) rather than user behavior. If the button becomes a link tomorrow, this title looks wrong even though the spec is unchanged.
test 1: form submission flow
Numbering tells the reader nothing. Imagine 30 of these — debugging a CI failure means opening each test to figure out what it does.

Test names should read like product promises, not click sequences. A good rule of thumb: if a future developer sees the test fail in CI, can they tell from the name alone what user-facing thing broke? If yes, the name is doing its job.

2. Why does this Playwright assertion need await?

await expect(page.getByText('Milk')).toBeVisible();

JavaScript requires await on every line in async functions
await is required for Promises, not for every line. The reason this line needs it is more specific: web-first assertions like toBeVisible() actively wait and retry until the condition is met.
Browser interactions are asynchronous; await expect(...) auto-waits and retries until the condition holds
Right. Playwright’s web-first assertions auto-wait and retry up to a timeout. Without await, you’d skip past before React’s state settles — a classic flaky-test recipe. The Playwright docs explicitly call out expect(await locator.isVisible()).toBe(true) as an anti-pattern: it doesn’t wait.
await makes the test go faster
await doesn’t speed anything up — if anything, it pauses execution. Its job is correctness under async, not performance.
Without await, the test won’t compile
A missing await here compiles fine — the matcher returns a Promise that’s silently ignored. The test would just behave incorrectly: silent flakiness rather than a build error.

await expect(locator).matcher() is the canonical Playwright shape. The matcher retries until it succeeds or hits the timeout. Without await, JavaScript fires the matcher and immediately moves on, ignoring whether it ever held.

3. In the test below, which line is the Assert step?

test('user can add a todo', async ({ page }) => {
  await page.goto('/');                                                  // Line 1
  await page.getByRole('textbox', { name: /todo item/i }).fill('Milk');  // Line 2
  await page.getByRole('button', { name: /add todo/i }).click();         // Line 3
  await expect(page.getByText('Milk')).toBeVisible();                    // Line 4
});

Line 1 — page.goto('/') confirms we landed on the right page
Line 1 is Navigate (the e2e equivalent of Arrange). It puts the page in the starting state but doesn’t verify anything.
Line 4 — await expect(...).toBeVisible() checks the user-visible outcome
Right. Line 4 is the only line whose job is to check an outcome. The others set up state (goto) or perform actions (fill, click). The assertion is what confirms the user’s promise was met.
Lines 2 and 3 together — they perform the action under test
Lines 2 and 3 are Interact (Act): the user types into the input and clicks the button. They produce the new state but don’t verify it.
All four lines are assertions in async code
Only expect(...) calls are assertions. goto, fill, and click are commands that act on the page — they never check whether their outcome matches a spec.

Playwright’s navigate / interact / assert is the same shape as foundations’ Arrange / Act / Assert. Each test should have one assertion phase that verifies the user-visible promise. If you can’t point to which line is the assertion, the test probably isn’t proving what you think.

2

The Spec Card: Choosing What User Paths Deserve a Test

Why this matters

The hardest part of e2e testing isn’t writing the test — it’s deciding which tests to write. Without a deliberate selection method, you end up testing whatever came to mind first, missing the partitions that actually catch bugs. The Spec Card is the artifact that forces the question what about this feature is the stable contract? before you commit code that pins the wrong thing.

🎯 You will learn to

Apply input-space partitioning from Testing Foundations to user-path partitioning in e2e
Create a Spec Card that names a feature’s stable contract before writing the test
Evaluate which user paths deserve an e2e test versus a lower test layer

🧠 Quick recall — commit before reading on

Q. Why does Playwright need await in front of expect(locator).toBeVisible()?

(a) JavaScript requires await on every line in async functions.
(b) Web-first assertions auto-wait and retry; without await, the assertion fires once and races past React’s render.
(c) await makes the test go faster.
(d) Without await, the test won’t compile.

Reveal

(b). The matcher returns a Promise that retries until the condition holds or the timeout expires. Drop the await and it fires once, then JavaScript moves on — silent flakiness, the worst kind of failure.

From foundations partitions to user-path partitions

In Testing Foundations, you partitioned the input space of a function and picked one representative input per partition. In e2e, you partition the user-path space — the different user behaviors a feature has to support — and pick one representative test per partition.

Same discipline. Different domain.

📋 Introducing the Spec Card

Before you write an e2e test, write down the spec it’s verifying. Five fields, fits on screen:

Spec Card: User can add a todo

✓ Behavior:        User types a name, clicks Add, sees it in the list.
✓ Should pass when: CSS classes change. The Add button is restyled.
                    The input becomes a `<textarea>`. The list becomes
                    a table.
✗ Should fail when: Adding silently drops items. Empty inputs are
                    accepted. The input doesn't clear after add.
🎯 Locator contract: A textbox labeled "Todo item"; a button named
                    "Add todo"; a list of items.
✅ Oracle:          The new item is visible in the list.

The Spec Card is the artifact you carry through the rest of the tutorial. It forces the question what about this UI is the stable contract? before you write code that can pin the wrong thing.

Notice the “Should pass when” line: it lists implementation changes that should not break the test. That’s your defense against brittleness later.

✏️ Fill in your own Spec Card — pick one of two ways

Two equally good options. Pick whichever fits how you think:

In-editor template — Open notes/spec-card.md in the file tree on the left. It’s a fillable Markdown template (auto-saved alongside your code). Fill it in for the whitespace-only input test you’re about to write below.
Standalone tool — Open the Spec Card tool in a new tab. Same five fields, but as a structured form with auto-save, Export-as-Markdown, and Copy-to-clipboard. The tool persists across tutorials so you can build a portfolio of Spec Cards as you write tests at school and at work.

Either way, fill the card in before you touch the test code below. The whole point of the Spec Card is that the decisions get made upstream of typing.

🎬 Predict — which user-path partitions are missing?

Three tests are pre-written in tests/add-todo.spec.js. They cover:

Happy path — "Milk" is accepted.
Empty input — "" is rejected.
Very long input — a 200-character string is accepted.

Read the spec under App.jsx: the app trims input before deciding. Which partition is missing from the tests?

(In your head, before reading on…)

Reveal

The missing partition is **whitespace-only input** (`" "`). After trimming, it equals `""`, so the spec says it should be rejected — exactly like the empty-string case from the partition perspective, but with a different surface input.

▶ Run

Click Test. Three tests pass; the fourth is a // TODO you’ll fill in next.

✏️ Modify — write the missing partition test

In tests/add-todo.spec.js, find the whitespace-only input is rejected test. The Arrange / Act / Assert comments are placeholders — fill them in, following the pattern of the three tests above.

Hints will appear on test failure — work through them in layers if you get stuck.

🔍 Investigate

You now have four tests for one feature, each covering a different partition. Why not write a test for every possible input?

The foundations answer applies: representative coverage with low cost. We don’t need a separate test for " ", " ", " ", " ", … — they’re all in the same partition (whitespace-only) and the trimming logic processes them identically. One representative test per partition is enough.

📝 House rules added

Use partitions to choose user paths. You don’t need a test for every string. You need one test per behaviorally-distinct partition.
Not every test belongs in e2e. Many edge cases live more cheaply in unit tests. Reserve e2e tests for behaviors that need full-stack browser confidence.

Starter files

src/App.jsx

function App() {
  const [items, setItems] = React.useState([]);
  const [text, setText] = React.useState('');

  function addTodo() {
    const trimmed = text.trim();
    if (!trimmed) return;

    setItems([...items, trimmed]);
    setText('');
  }

  return (
    <main className="todo-shell">
      <section className="todo-panel">
        <p className="eyebrow">Playwright tutorial</p>
        <h1>Todo Lab</h1>

        <div className="todo-form">
          <label htmlFor="todo-input">Todo item</label>
          <div className="todo-row">
            <input
              id="todo-input"
              value={text}
              onChange={(event) => setText(event.target.value)}
              placeholder="Buy milk"
            />
            <button onClick={addTodo}>Add todo</button>
          </div>
        </div>

        <ul aria-label="Todo list" className="todo-list">
          {items.map((item, index) => (
            <li key={index}>{item}</li>
          ))}
        </ul>
      </section>
    </main>
  );
}

src/main.jsx

const root = ReactDOM.createRoot(document.getElementById('root'));
root.render(<App />);

src/styles.css

body { margin: 0; font-family: system-ui, -apple-system, sans-serif; background: #f6f7fb; color: #1f2937; }
.todo-shell { min-height: 100vh; display: grid; place-items: center; padding: 32px; }
.todo-panel { width: min(100%, 560px); background: white; border: 1px solid #d9dee8; border-radius: 8px; padding: 28px; box-shadow: 0 18px 40px rgba(31, 41, 55, 0.08); }
.eyebrow { margin: 0 0 8px; color: #4b5563; font-size: 0.85rem; font-weight: 700; text-transform: uppercase; letter-spacing: 0.04em; }
h1 { margin: 0 0 24px; font-size: 2rem; }
label { display: block; margin-bottom: 8px; font-weight: 700; }
.todo-row { display: flex; gap: 10px; }
input { flex: 1; min-width: 0; background: white; color: #1f2937; border: 1px solid #b8c0cc; border-radius: 6px; padding: 10px 12px; font: inherit; }
button { border: 0; border-radius: 6px; padding: 10px 14px; background: #2563eb; color: white; font: inherit; font-weight: 700; cursor: pointer; }
.todo-list { margin: 24px 0 0; padding-left: 24px; }
.todo-list:empty { display: none; }
.todo-list li { margin: 8px 0; }
/* Dark mode (iframe sets [data-bs-theme="dark"] on <html>) */
[data-bs-theme="dark"] body { background: #1c2533; color: #e6edf3; }
[data-bs-theme="dark"] .todo-panel { background: #232a36; border-color: #2a323e; box-shadow: 0 18px 40px rgba(0, 0, 0, 0.4); }
[data-bs-theme="dark"] .eyebrow { color: #9ca3af; }
[data-bs-theme="dark"] input { background: #2a323e; color: #e6edf3; border-color: #3a4351; }
[data-bs-theme="dark"] input::placeholder { color: #6b7280; }
[data-bs-theme="dark"] button { background: #2563eb; }

tests/add-todo.spec.js

import { test, expect } from '@playwright/test';

test('user can add a todo (happy path)', async ({ page }) => {
  await page.goto('/');
  await page.getByRole('textbox', { name: /todo item/i }).fill('Milk');
  await page.getByRole('button', { name: /add todo/i }).click();
  await expect(page.getByRole('listitem')).toHaveText('Milk');
});

test('empty input is rejected', async ({ page }) => {
  await page.goto('/');
  await page.getByRole('button', { name: /add todo/i }).click();
  await expect(page.getByRole('listitem')).toHaveCount(0);
});

test('very long todo is accepted', async ({ page }) => {
  await page.goto('/');
  const long = 'x'.repeat(200);
  await page.getByRole('textbox', { name: /todo item/i }).fill(long);
  await page.getByRole('button', { name: /add todo/i }).click();
  await expect(page.getByRole('listitem')).toHaveText(long);
});

// TODO: write the missing partition test here.
// The spec trims input before deciding whether to accept it,
// so whitespace-only input is in the same partition as empty input.
test('whitespace-only input is rejected', async ({ page }) => {
  // Arrange: navigate to the page.
  // Act: fill the input with whitespace, click Add todo.
  // Assert: no list item was added.
});

notes/spec-card.md

# Spec Card: User can add a todo (whitespace-only rejected)

Fill this in BEFORE writing the test. The decisions made here
determine which assertions and locators you'll commit to below.

## ✓ Behavior
<!-- One sentence: what user-visible behavior are you proving? -->


## ✓ Should pass when
<!-- Implementation changes the test must SURVIVE.
     Examples: CSS class renames, button restyles, layout shifts. -->


## ✗ Should fail when
<!-- Regressions the test must CATCH.
     Examples: whitespace input is accepted, the input doesn't
     clear after submit, the list silently drops items. -->


## 🎯 Locator contract
<!-- Which semantic queries identify each element?
     Prefer role + accessible name, label, or semantic test ID.
     Avoid CSS classes and DOM positions. -->


## ✅ Oracle
<!-- Observable outcome that confirms success.
     What would the user see? -->


---
Prefer a structured form? Open the standalone Spec Card tool at
/SEBook/tools/spec-card (auto-saves, exports as Markdown).

Step 2 — Knowledge Check

Min. score: 80%

1. Which of these scenarios is the BEST candidate for an end-to-end test (rather than a unit or integration test)?

Validating that 47 different email-format edge cases all produce the right error message
47 email validation cases is exactly what unit tests are for. Each is cheap and isolated. Running 47 full-browser e2e tests would be slow, flaky, and overkill — a single e2e test (“invalid email shows an error”) proves the wiring works; the 47 edge cases belong in unit tests.
Verifying a guest who tries to checkout is prompted to sign in, and their cart is preserved
Right. This needs the full stack — UI, routing, session, cart persistence, sign-in flow. No lower test layer covers all of those at once. This is exactly what e2e tests are best at.
Checking that the cart-total formatter rounds half-up correctly for 30 currency formats
30 formatter cases are a unit-test job. They’re deterministic and fast in isolation. E2E them and you’d burn minutes per CI run for coverage that pytest gets in milliseconds.
Confirming the API endpoint returns the right HTTP status for 12 different input shapes
API contract tests are an integration-layer concern, not e2e. They don’t need a browser — they need a request library and the API. Doing this through e2e adds cost without adding confidence.

E2E tests are expensive confidence. Spend that budget on flows where the full integration matters: auth, routing, state-across-pages, cross-service behaviors. Push validation rules, formatters, and API contracts to lower test layers where they’re cheaper and clearer.

2. What is the purpose of the “Should pass when” field on a Spec Card?

It lists the test cases the test should cover
Test cases (partitions) belong in the test code itself, not on the Spec Card. The Spec Card is meta — it describes what the test is trying to prove and what should/shouldn’t break it.
It documents UI/code changes that the test should survive — your defense against brittle tests
Right. “Should pass when” is the list of harmless implementation changes — CSS class renames, layout shifts, button restyles. If your test breaks under any of those, it’s coupled to implementation rather than behavior. Writing this list before the test is your best defense against brittleness.
It records the date the test was written
The Spec Card is about specification, not metadata. Dates and authorship belong in version control.
It tracks who the assigned reviewer is
Reviewer assignments aren’t part of the Spec Card. The card is about what the test verifies, not who reviewed it.

The Spec Card’s “Should pass when” line forces you to think about the test’s durability before you write it. If you can predict that a CSS class rename should be harmless but you choose a CSS-class locator anyway, you’ve already lost.

3. (Spaced review — Step 1) A Playwright test contains the line:

expect(await page.getByText('Saved').isVisible()).toBe(true);

Which is the most accurate critique?

This is the canonical Playwright pattern — isVisible() returns a Promise that we resolve with await
The Playwright docs explicitly call this an anti-pattern. isVisible() is a one-shot check — it returns immediately, with no retry. The web-first form await expect(locator).toBeVisible() retries until the timeout.
This is an anti-pattern — isVisible() doesn’t auto-wait. Use await expect(page.getByText('Saved')).toBeVisible() instead
Right. isVisible() is non-retrying — if the element isn’t there right now, the assertion fails. The web-first form await expect(...).toBeVisible() retries until the condition holds or the timeout expires. The Playwright official best practices specifically call out this exact line as a thing to avoid.
The test is fine as long as the page loads quickly enough
“Loads quickly enough” is the recipe for flaky CI: it works locally, fails on a slow build agent, and nobody can reproduce it. Use await expect(...) and let Playwright handle the timing.
The expect should be wrapped in await for compilation reasons
The compilation works either way. The issue isn’t compilation — it’s correctness under async. The non-retrying form silently produces flaky tests, which is the worst kind of failure.

expect(await locator.isVisible()).toBe(true) is the canonical Playwright anti-pattern. Always use await expect(locator).toBeVisible() — the web-first form auto-waits and retries.

3

The Locator Ladder: Stable Contracts vs Incidental UI

Why this matters

The locator you choose is the contract between your test and the UI — it decides which UI changes will (correctly) break the test and which will (incorrectly) break it. Pick the wrong rung of the ladder and your test either fails on every CSS rename (false alarms that erode trust) or stays green when accessibility regresses (silent failures). The locator ladder is how you make that choice deliberately, not by accident.

🎯 You will learn to

Analyze five locator strategies and identify what each one depends on (semantics vs implementation)
Apply the locator ladder to choose the highest rung the UI actually supports
Evaluate locator durability against three classes of refactor (CSS rename, text change, DOM restructure)

🧠 Quick recall — commit before reading on

Q. From your Spec Card in Step 2, what does the “Locator contract” field name?

(a) The exact CSS selectors the test should use.
(b) The semantic queries (role + accessible name, label, test ID) that identify each element the test interacts with — the stable part of the UI surface.
(c) The list of test cases the test should cover.
(d) The CI pipeline that runs the test.

Reveal

(b). “Locator contract” names what about each element is stable — the role and accessible name, the label association, or the semantic test ID. CSS selectors (a) are the brittle rung. Test cases (c) belong in the test code, not the Spec Card.

🎯 The locator ladder

There are five common ways to find the same UI element in Playwright. Each rung depends on something different about the UI.

// Five ways to find the same "Add todo" button:

// Rung 1 — Role + accessible name. Mirrors how assistive tech finds it.
page.getByRole('button', { name: /add todo/i });

// Rung 2 — Label association (best for form controls).
page.getByLabel(/todo item/i);   // (this would find the input, not the button)

// Rung 3 — Visible text content.
page.getByText('Add todo');

// Rung 4 — Author-supplied stable test ID.
page.getByTestId('add-todo');

// Rung 5 — Raw CSS/DOM selector (last resort).
page.locator('.add-todo-btn');

What each rung depends on:

Rung	Locator	Depends on
1	`getByRole` + `name:`	The button has an accessible name (HTML semantics)
2	`getByLabel`	A `<label for="…">` connection (forms)
3	`getByText`	Exact visible text
4	`getByTestId`	An author-added `data-testid` attribute
5	`.locator('.css-class')`	The DOM/CSS structure (implementation detail)

Higher rungs depend on accessible / user-visible facts. Lower rungs depend on implementation decisions (CSS classes, DOM positions). The official Playwright docs put it bluntly: “Your DOM can easily change … Prefer user-facing attributes to XPath or CSS selectors.”

🎬 Predict — commit to a letter, then click reveal

The team is about to ship three independent changes to the Add button: a CSS-class rename (.add-todo-btn → .primary-btn), a button-text change ("Add todo" → "Add"), and a DOM restructure (the button moves into a different parent element). The user-visible behavior — clicking it adds a todo — doesn’t change.

Q. Of the five locators above, which two would survive all three changes without a single edit?

(a) Rungs 1 and 4 — getByRole('button', { name: /add/i }) and getByTestId('add-todo').
(b) Rungs 1 and 3 — both query user-visible text in some form.
(c) Rungs 2 and 5 — both target form-control specifics.
(d) None — every locator breaks on at least one change.

Reveal — pick first, then click

(a). getByRole('button', { name: /add/i }) survives all three: regex tolerance covers the text change (“Add” still matches /add/i); the role-based query is independent of CSS classes and DOM ancestry. getByTestId('add-todo') survives because the data-testid is author-controlled and travels with the element wherever it moves. The other rungs each break on one of the three. The investigate-table below shows the per-cell answer if you want the full breakdown — but the lesson lands in those two rows.

▶ Run

Click Test. All five locators currently work against the Todo app — the file tests/locator-ladder.spec.js has one test per rung, all passing.

🔍 Investigate — reveal the answer table

                            CSS rename    Text change    DOM restructure
----------------------------------------------------------------------
getByRole({name:/add/i})    ✓              ✗ (a)         ✓
getByLabel                  ✓              ✓ (b)         ✓
getByText('Add todo')       ✓              ✗              ✓
getByTestId('add-todo')     ✓              ✓              ✓
.locator('.add-todo-btn')   ✗              ✓              ✗ (c)

Notes:

(a) With a regex /add/i, the role locator survives “Add todo” → “Add” (regex still matches). With an exact name: 'Add todo' it would break. Regex tolerance is a deliberate design choice.
(b) getByLabel finds inputs via their <label> — button labels don’t apply, so this rung doesn’t really apply to buttons. Listed for completeness.
(c) A DOM restructure (changing the button’s surrounding markup) often changes CSS-selector ancestry. Brittle.

The pattern: getByTestId is the only rung that survives a button-text change without exact matching. But getByTestId requires the author to have added the test ID — a code-level decision. And test IDs done badly (<button data-testid="blue-btn-right-col">) are just CSS coupling under another name.

✏️ Modify

Open tests/locator-ladder.spec.js. The fifth test uses the brittle .locator('.add-todo-btn') form. Rewrite it as a role-based locator (Rung 1). Run again — your refactored test should still pass.

📝 House rule

Pick the locator that matches the stable contract of this UI element. If the button label is part of the user-visible promise, use getByRole with a sensible regex. If the wording will change but the action is permanent, use getByTestId with a semantically named test ID. Use raw CSS only when nothing else will do — and write a comment explaining why.

Starter files

src/App.jsx

function App() {
  const [items, setItems] = React.useState([]);
  const [text, setText] = React.useState('');

  function addTodo() {
    const trimmed = text.trim();
    if (!trimmed) return;

    setItems([...items, trimmed]);
    setText('');
  }

  return (
    <main className="todo-shell">
      <section className="todo-panel">
        <p className="eyebrow">Playwright tutorial</p>
        <h1>Todo Lab</h1>

        <div className="todo-form">
          <label htmlFor="todo-input">Todo item</label>
          <div className="todo-row">
            <input
              id="todo-input"
              value={text}
              onChange={(event) => setText(event.target.value)}
              placeholder="Buy milk"
            />
            <button
              className="add-todo-btn"
              data-testid="add-todo"
              onClick={addTodo}
            >
              Add todo
            </button>
          </div>
        </div>

        <ul aria-label="Todo list" className="todo-list">
          {items.map((item, index) => (
            <li key={index}>{item}</li>
          ))}
        </ul>
      </section>
    </main>
  );
}

src/main.jsx

const root = ReactDOM.createRoot(document.getElementById('root'));
root.render(<App />);

src/styles.css

body { margin: 0; font-family: system-ui, -apple-system, sans-serif; background: #f6f7fb; color: #1f2937; }
.todo-shell { min-height: 100vh; display: grid; place-items: center; padding: 32px; }
.todo-panel { width: min(100%, 560px); background: white; border: 1px solid #d9dee8; border-radius: 8px; padding: 28px; box-shadow: 0 18px 40px rgba(31, 41, 55, 0.08); }
.eyebrow { margin: 0 0 8px; color: #4b5563; font-size: 0.85rem; font-weight: 700; text-transform: uppercase; letter-spacing: 0.04em; }
h1 { margin: 0 0 24px; font-size: 2rem; }
label { display: block; margin-bottom: 8px; font-weight: 700; }
.todo-row { display: flex; gap: 10px; }
input { flex: 1; min-width: 0; background: white; color: #1f2937; border: 1px solid #b8c0cc; border-radius: 6px; padding: 10px 12px; font: inherit; }
.add-todo-btn,
button { border: 0; border-radius: 6px; padding: 10px 14px; background: #2563eb; color: white; font: inherit; font-weight: 700; cursor: pointer; }
.todo-list { margin: 24px 0 0; padding-left: 24px; }
.todo-list:empty { display: none; }
.todo-list li { margin: 8px 0; }
/* Dark mode */
[data-bs-theme="dark"] body { background: #1c2533; color: #e6edf3; }
[data-bs-theme="dark"] .todo-panel { background: #232a36; border-color: #2a323e; box-shadow: 0 18px 40px rgba(0, 0, 0, 0.4); }
[data-bs-theme="dark"] .eyebrow { color: #9ca3af; }
[data-bs-theme="dark"] input { background: #2a323e; color: #e6edf3; border-color: #3a4351; }
[data-bs-theme="dark"] input::placeholder { color: #6b7280; }
[data-bs-theme="dark"] .add-todo-btn,
[data-bs-theme="dark"] button { background: #2563eb; }

tests/locator-ladder.spec.js

import { test, expect } from '@playwright/test';

// Rung 1 — Role + accessible name (regex-tolerant).
test('rung 1: getByRole finds the Add todo button', async ({ page }) => {
  await page.goto('/');
  await page.getByRole('textbox', { name: /todo item/i }).fill('Milk');
  await page.getByRole('button', { name: /add todo/i }).click();
  await expect(page.getByRole('listitem')).toHaveText('Milk');
});

// Rung 2 — getByLabel (best for inputs, but works through the form).
test('rung 2: getByLabel finds the input via its label', async ({ page }) => {
  await page.goto('/');
  await page.getByLabel(/todo item/i).fill('Bread');
  await page.getByRole('button', { name: /add todo/i }).click();
  await expect(page.getByRole('listitem')).toHaveText('Bread');
});

// Rung 3 — getByText (couples to exact wording).
test('rung 3: getByText finds the button by visible text', async ({ page }) => {
  await page.goto('/');
  await page.getByRole('textbox', { name: /todo item/i }).fill('Eggs');
  await page.getByText('Add todo').click();
  await expect(page.getByRole('listitem')).toHaveText('Eggs');
});

// Rung 4 — getByTestId (semantic test ID).
test('rung 4: getByTestId finds the button via data-testid', async ({ page }) => {
  await page.goto('/');
  await page.getByRole('textbox', { name: /todo item/i }).fill('Cheese');
  await page.getByTestId('add-todo').click();
  await expect(page.getByRole('listitem')).toHaveText('Cheese');
});

// Rung 5 — Raw CSS class (the brittle rung — REWRITE this one!).
// TODO: rewrite this test to use page.getByRole instead of CSS.
test('rung 5: brittle CSS locator (rewrite me)', async ({ page }) => {
  await page.goto('/');
  await page.getByRole('textbox', { name: /todo item/i }).fill('Butter');
  await page.locator('.add-todo-btn').click();
  await expect(page.getByRole('listitem')).toHaveText('Butter');
});

Solution

src/App.jsx

function App() {
  const [items, setItems] = React.useState([]);
  const [text, setText] = React.useState('');

  function addTodo() {
    const trimmed = text.trim();
    if (!trimmed) return;

    setItems([...items, trimmed]);
    setText('');
  }

  return (
    <main className="todo-shell">
      <section className="todo-panel">
        <p className="eyebrow">Playwright tutorial</p>
        <h1>Todo Lab</h1>

        <div className="todo-form">
          <label htmlFor="todo-input">Todo item</label>
          <div className="todo-row">
            <input
              id="todo-input"
              value={text}
              onChange={(event) => setText(event.target.value)}
              placeholder="Buy milk"
            />
            <button
              className="add-todo-btn"
              data-testid="add-todo"
              onClick={addTodo}
            >
              Add todo
            </button>
          </div>
        </div>

        <ul aria-label="Todo list" className="todo-list">
          {items.map((item, index) => (
            <li key={index}>{item}</li>
          ))}
        </ul>
      </section>
    </main>
  );
}

tests/locator-ladder.spec.js

import { test, expect } from '@playwright/test';

test('rung 1: getByRole finds the Add todo button', async ({ page }) => {
  await page.goto('/');
  await page.getByRole('textbox', { name: /todo item/i }).fill('Milk');
  await page.getByRole('button', { name: /add todo/i }).click();
  await expect(page.getByRole('listitem')).toHaveText('Milk');
});

test('rung 2: getByLabel finds the input via its label', async ({ page }) => {
  await page.goto('/');
  await page.getByLabel(/todo item/i).fill('Bread');
  await page.getByRole('button', { name: /add todo/i }).click();
  await expect(page.getByRole('listitem')).toHaveText('Bread');
});

test('rung 3: getByText finds the button by visible text', async ({ page }) => {
  await page.goto('/');
  await page.getByRole('textbox', { name: /todo item/i }).fill('Eggs');
  await page.getByText('Add todo').click();
  await expect(page.getByRole('listitem')).toHaveText('Eggs');
});

test('rung 4: getByTestId finds the button via data-testid', async ({ page }) => {
  await page.goto('/');
  await page.getByRole('textbox', { name: /todo item/i }).fill('Cheese');
  await page.getByTestId('add-todo').click();
  await expect(page.getByRole('listitem')).toHaveText('Cheese');
});

test('rung 5: brittle CSS locator (rewrite me)', async ({ page }) => {
  await page.goto('/');
  await page.getByRole('textbox', { name: /todo item/i }).fill('Butter');
  await page.getByRole('button', { name: /add todo/i }).click();
  await expect(page.getByRole('listitem')).toHaveText('Butter');
});

Rung 5 was rewritten to use the role + accessible-name locator (Rung 1). Same behavior verified, but the test no longer depends on the CSS class .add-todo-btn. Step 5 will demonstrate why this matters when the team renames CSS classes.

Step 3 — Knowledge Check

Min. score: 80%

1. Which of these is the BEST locator for “the user’s primary save button” — assuming the button has the visible text “Save” today, but the team has announced it will be renamed to “Submit” next quarter?

page.getByRole('button', { name: /save/i })
getByRole with name: /save/i is great today, but next quarter when the button becomes “Submit”, every test using this locator breaks for a wording change — that’s a false alarm, not a regression. (You could use name: /save|submit/i to bridge, but that’s a maintenance smell — the locator should reflect what’s stable.)
page.getByText('Save')
getByText('Save') ties the test to the exact visible text. The planned rename to “Submit” will break every test that uses it. The test would correctly fail if save broke — but also fail for a harmless rewording.
page.locator('.btn-primary')
CSS class locators are the most brittle option on the ladder. They depend on styling decisions, not user-visible facts. A designer changing .btn-primary to .btn-action breaks the test for no good reason.
page.getByTestId('save-action')
Right. When the team has announced that wording will change but the action is stable, data-testid is the right tool: the contract becomes “this is the save action” rather than “the button labeled Save.” But the test ID has to be semantically named — data-testid="save-action", not data-testid="blue-btn".

The locator ladder isn’t “always pick option 1.” It’s “pick the rung that matches the stable contract for this UI element.” When wording is stable, getByRole is best. When wording will change but the action is permanent, getByTestId is right. The choice depends on what about this UI is the promise.

2. Two versions of data-testid for the same Add Todo button — which is BETTER, and why? Version A: <button data-testid="primary-blue-btn-right-column"> Version B: <button data-testid="add-todo-action">

Version A — it’s more descriptive
Descriptive about what? Version A describes color (blue), styling (primary), and layout position (right-column). When the designer changes the color or moves the button, the test ID is wrong even though the behavior is unchanged.
Version B — it names the action, not the styling/layout. A is just CSS coupling under another name.
Right. The data-testid is supposed to be a stable contract. Naming it after styling (primary-blue-btn) or layout (right-column) means the contract drifts every time the design changes. Naming it after the action (add-todo-action) keeps the contract semantic — the test ID changes only when the behavior changes, which is exactly what tests should track.
They’re equivalent — both are test IDs
Both are syntactically test IDs, but they’re not behaviorally equivalent. The whole point of data-testid is to be a stable contract; A pegs the contract to styling, B pegs it to behavior. Different contracts = different durability.
Version A — Playwright recommends descriptive IDs
Playwright’s docs recommend test IDs that survive design changes. “Descriptive” without the right anchor (action vs styling) is worse than no test ID at all — it gives a false sense of stability.

Test IDs are only as durable as their naming. A test ID named after styling or layout is functionally equivalent to a CSS-class locator — it pins implementation. A test ID named after the action or the semantic role (save-action, cart-checkout-button) is what the docs intend: a stable contract that the test can rely on indefinitely.

3. (Spaced review — Step 2) Your team is debating: should “rejecting whitespace-only input” have its own e2e test, or can it be tested in the same test as “rejecting empty input”?

They should always be in separate tests for clarity
Separate tests aren’t always needed. If two scenarios are in the same behavioral partition (i.e., the code processes them identically), one test covers both. Adding a redundant test costs maintenance time without adding confidence.
It depends on how addTodo validates — if both go through the same code path (trim then check empty), they’re in the same partition and one representative test is enough
Right. The Spec Card and partition discipline tell us: if addTodo calls .trim() before checking emptiness, then "" and " " end up in the same partition — both produce "" after trimming. One representative test per partition is the rule from foundations.
Whitespace cases are too edge-case for e2e and should be skipped
Skipping the case isn’t the answer. Whitespace input is a real partition (real users hit it), and a test should cover it. The question is where — same test, same file, or its own — and the partition rule says: same partition, one test.
Always merge edge cases into the happy-path test to save time
Cramming multiple partitions into one test makes failures harder to diagnose (which scenario caused the failure?) and tends to mask issues. One test per behavioral partition keeps failures targeted.

Partitions are the unit of test design, not individual inputs. Two inputs are in the same partition if the system processes them the same way. One representative per partition is sufficient — adding more is wasted effort, removing one is missed coverage.

4

Strong Assertions: The Liar Test in the Browser

Why this matters

A green test you can’t trust is worse than no test at all — it gives false confidence while the bug ships. Liar tests are the most dangerous failure mode in an e2e suite because the test visibly clicks buttons, which makes it feel like real verification. This step makes that lie tactile: you’ll watch a buggy app pass a weak assertion, then strengthen it until it tells the truth.

🎯 You will learn to

Analyze a passing Playwright test and recognize when its oracle is too weak to catch the spec violation
Apply web-first assertions (await expect(...)) instead of the synchronous expect(await locator.isVisible()).toBe(true) antipattern
Evaluate three weak assertion patterns and rewrite them to verify the user-visible promise

🧠 Quick recall — commit before reading on

Q. From Testing Foundations: a liar test has a PASS result that doesn’t prove the spec. What’s the defining feature?

(a) The test runs slowly and times out before completing.
(b) The test’s oracle is too weak — the assertion is true for both a correct implementation and a buggy one.
(c) The test only runs on some platforms.
(d) The test asserts on the wrong element entirely.

Reveal

(b). A liar test passes against a correct implementation and against a broken one — the assertion can’t distinguish them. The same pattern exists in e2e, and it’s sneakier here because the test visibly clicks buttons, which makes it feel “more real” than it is.

🎬 Predict — commit to a letter, then click reveal

Read this test. The Todo app you’ll run it against has a bug somewhere in addTodo — predict-and-investigate, don’t peek at the source first.

test('adding a todo shows it in the list', async ({ page }) => {
  await page.goto('/');
  await page.getByRole('textbox', { name: /todo item/i }).fill('Milk');
  await page.getByRole('button', { name: /add todo/i }).click();
  await expect(page.getByRole('listitem')).toHaveCount(1);
});

Q. Against a buggy app where addTodo somehow drops the user’s text, what does this test do?

(a) Fail — Playwright detects the empty list item and raises.
(b) Pass — toHaveCount(1) only counts list items; it never reads their text.
(c) Error — toHaveCount requires non-empty content.
(d) Flaky — sometimes passes, sometimes fails depending on render order.

Reveal — pick first, then click

(b). The assertion only counts. It says nothing about what’s inside the items. The test will be a liar: green check, broken feature.

▶ Run

Click Test.

The test passes. Surprise.

🔍 Investigate — open `src/App.jsx` and find the bug

Now (and only now) open src/App.jsx. The bug: addTodo stores '' instead of trimmed — the user’s text is dropped between state-update and render, so every <li> renders empty.

What did toHaveCount(1) actually verify? Just that one list item exists. It said nothing about what’s inside the item. The bug — empty text — is invisible to this assertion.

The assertion is a liar: PASS result, broken feature.

Three weak assertion patterns to recognize

Weak assertion	Why it lies
`await expect(page.getByRole('list')).toBeVisible()`	An empty `<ul>` is still “visible”
`await expect(page.getByText('')).toBeVisible()`	Always true
`await expect(page.getByRole('listitem')).toHaveCount(1)`	Doesn’t verify item content

And one Playwright-specific anti-pattern from the official docs:

// ❌ Anti-pattern — non-retrying, no auto-wait:
expect(await page.getByText('Milk').isVisible()).toBe(true);

// ✓ Web-first form — auto-waits and retries:
await expect(page.getByText('Milk')).toBeVisible();

✏️ Modify

In tests/todo.spec.js, strengthen the assertion to verify the item’s text, not just the count. Predict the new failure message before re-running.

Hints will appear on test failure — work through them in layers if you get stuck.

📝 House rule

Assert the promise, not the plumbing.

The promise is what the spec said the user would see. The plumbing is which DOM nodes exist, what CSS class they have, what their internal state is. A strong assertion verifies the promise; a weak assertion verifies the plumbing without verifying what the user actually gets.

Starter files

src/App.jsx

// 🐛 BUGGY APP — there's a bug somewhere in addTodo that makes the
// weak assertion lie. Predict + run the test BEFORE you hunt for it
// in the source. The Investigate phase reveals where the bug lives
// (and why the count assertion missed it).
function App() {
  const [items, setItems] = React.useState([]);
  const [text, setText] = React.useState('');

  function addTodo() {
    const trimmed = text.trim();
    if (!trimmed) return;
    setItems([...items, '']);
    setText('');
  }

  return (
    <main className="todo-shell">
      <section className="todo-panel">
        <p className="eyebrow">Buggy Todo Lab</p>
        <h1>Todo Lab</h1>

        <div className="todo-form">
          <label htmlFor="todo-input">Todo item</label>
          <div className="todo-row">
            <input
              id="todo-input"
              value={text}
              onChange={(event) => setText(event.target.value)}
              placeholder="Buy milk"
            />
            <button onClick={addTodo}>Add todo</button>
          </div>
        </div>

        <ul aria-label="Todo list" className="todo-list">
          {items.map((item, index) => (
            <li key={index}>{item}</li>
          ))}
        </ul>
      </section>
    </main>
  );
}

src/main.jsx

const root = ReactDOM.createRoot(document.getElementById('root'));
root.render(<App />);

src/styles.css

body { margin: 0; font-family: system-ui, -apple-system, sans-serif; background: #f6f7fb; color: #1f2937; }
.todo-shell { min-height: 100vh; display: grid; place-items: center; padding: 32px; }
.todo-panel { width: min(100%, 560px); background: white; border: 1px solid #d9dee8; border-radius: 8px; padding: 28px; box-shadow: 0 18px 40px rgba(31, 41, 55, 0.08); }
.eyebrow { margin: 0 0 8px; color: #4b5563; font-size: 0.85rem; font-weight: 700; text-transform: uppercase; letter-spacing: 0.04em; }
h1 { margin: 0 0 24px; font-size: 2rem; }
label { display: block; margin-bottom: 8px; font-weight: 700; }
.todo-row { display: flex; gap: 10px; }
input { flex: 1; min-width: 0; background: white; color: #1f2937; border: 1px solid #b8c0cc; border-radius: 6px; padding: 10px 12px; font: inherit; }
button { border: 0; border-radius: 6px; padding: 10px 14px; background: #2563eb; color: white; font: inherit; font-weight: 700; cursor: pointer; }
.todo-list { margin: 24px 0 0; padding-left: 24px; min-height: 24px; }
.todo-list li { margin: 8px 0; min-height: 1.2em; }
/* Dark mode */
[data-bs-theme="dark"] body { background: #1c2533; color: #e6edf3; }
[data-bs-theme="dark"] .todo-panel { background: #232a36; border-color: #2a323e; box-shadow: 0 18px 40px rgba(0, 0, 0, 0.4); }
[data-bs-theme="dark"] .eyebrow { color: #9ca3af; }
[data-bs-theme="dark"] input { background: #2a323e; color: #e6edf3; border-color: #3a4351; }
[data-bs-theme="dark"] input::placeholder { color: #6b7280; }
[data-bs-theme="dark"] button { background: #2563eb; }

tests/todo.spec.js

import { test, expect } from '@playwright/test';

// The weak assertion below passes against the buggy app.
// Strengthen it so the test fails — that's the bug-catching version.
test('adding a todo shows it in the list', async ({ page }) => {
  await page.goto('/');
  await page.getByRole('textbox', { name: /todo item/i }).fill('Milk');
  await page.getByRole('button', { name: /add todo/i }).click();

  // ❌ Weak assertion: only checks the count.
  await expect(page.getByRole('listitem')).toHaveCount(1);

  // TODO: replace or extend the assertion above so the test
  // catches the empty-text bug. Hint: assert the item's text.
});

Solution

src/App.jsx

// 🐛 BUGGY APP — bug: addTodo stores '' instead of `trimmed`, so the
// <li> renders empty. The strengthened test now catches this; the
// weak count-only assertion did not. (Bug intentional — the lesson
// is the test, not the fix.)
function App() {
  const [items, setItems] = React.useState([]);
  const [text, setText] = React.useState('');

  function addTodo() {
    const trimmed = text.trim();
    if (!trimmed) return;
    setItems([...items, '']);
    setText('');
  }

  return (
    <main className="todo-shell">
      <section className="todo-panel">
        <p className="eyebrow">Buggy Todo Lab</p>
        <h1>Todo Lab</h1>

        <div className="todo-form">
          <label htmlFor="todo-input">Todo item</label>
          <div className="todo-row">
            <input
              id="todo-input"
              value={text}
              onChange={(event) => setText(event.target.value)}
              placeholder="Buy milk"
            />
            <button onClick={addTodo}>Add todo</button>
          </div>
        </div>

        <ul aria-label="Todo list" className="todo-list">
          {items.map((item, index) => (
            <li key={index}>{item}</li>
          ))}
        </ul>
      </section>
    </main>
  );
}

tests/todo.spec.js

import { test, expect } from '@playwright/test';

test('adding a todo shows it in the list', async ({ page }) => {
  await page.goto('/');
  await page.getByRole('textbox', { name: /todo item/i }).fill('Milk');
  await page.getByRole('button', { name: /add todo/i }).click();

  // Strengthened assertion: verifies the item's text, not just the count.
  await expect(page.getByRole('listitem')).toHaveText('Milk');
});

The strengthened assertion uses toHaveText('Milk') — it now pins the content of the list item, not just its existence. Against the buggy app (which renders an empty <li>), this assertion fails as it should: the user’s promise (“the item shows up in the list”) was broken, and the test now reflects that.

Step 4 — Knowledge Check

Min. score: 80%

1. Which assertion would catch a bug where the “Mark complete” toggle visually updates (the item gets a strikethrough) but the underlying “remaining” counter does not decrement?

await expect(page.getByRole('listitem').first()).toHaveCSS('text-decoration', /line-through/)
This catches the visual effect (strikethrough) — exactly the surface that does update in the buggy scenario. It would pass while the counter stays wrong. A green test here is a liar test.
await expect(page.getByRole('status')).toContainText('2 items remaining')
Right. The counter is the promise — the user contract is “remaining decrements when you mark something done.” Asserting on <p role="status"> content directly catches a counter bug whether or not the visual style changed.
await expect(page.locator('.completed')).toBeVisible()
.completed is a CSS class — that’s plumbing, not promise. Even if it asserts visibility, it doesn’t verify the counter (which is the regression we’d miss).
await expect(page.getByRole('listitem')).toHaveCount(3)
The total count of list items doesn’t change when you mark one done (it changes if you delete). This assertion is testing a different behavior entirely.

“Assert the promise, not the plumbing.” The promise here is that the counter reflects remaining items. If your assertion only checks visual side-effects (strikethrough, CSS classes), you’ve written a liar test: it passes for a render that’s correct in appearance but wrong in meaning.

2. Which of these is a Playwright anti-pattern that the official best-practices docs explicitly call out?

await expect(page.getByText('Saved')).toBeVisible()
This is the correct form — web-first assertion that auto-waits and retries until the condition holds or the timeout expires. The Playwright docs recommend this everywhere.
expect(await page.getByText('Saved').isVisible()).toBe(true)
Right. isVisible() returns immediately — no auto-wait, no retry. If the element renders 200ms later, this fails. The Playwright docs explicitly call this out as an anti-pattern. Use await expect(locator).toBeVisible() instead.
await page.getByRole('button', { name: 'Save' }).click()
click() on a Playwright locator auto-waits for the element to be actionable (visible, stable, enabled). This is the recommended way to click a button.
await page.goto('/dashboard')
page.goto('/path') is the standard way to navigate. Nothing wrong here.

The Playwright best-practices guide is direct: “Don’t use manual assertions that are not awaiting the expect.” Always use await expect(locator).matcher() so your test gets auto-waiting and retrying — the whole point of Playwright’s web-first assertions.

3. (Spaced review — Step 3) A test uses page.locator('.add-todo-btn') to find the Add button. The team renames the CSS class to .primary-btn. The behavior is unchanged. The test fails. What’s the most accurate label for this failure?

A real regression — the team broke the test by renaming
A regression is when the behavior breaks. The behavior here is unchanged — the user can still click Add. The test broke because it pinned a styling decision (CSS class), not the behavior. That’s a brittle test, not a regression catch.
A false alarm — the test was coupled to implementation, not behavior
Right. The test failed for a refactor that didn’t change user-visible behavior. That’s the textbook false alarm — wasted CI time and eroded trust in the suite. A role-based locator (getByRole('button', { name: /add/i })) wouldn’t have broken.
Operator error — someone forgot to update the CSS class name
It’s not operator error — the test should have been written so a CSS rename couldn’t break it. The fix is the locator strategy, not constantly renaming the test.
Flaky test — re-running it will probably pass
Flakiness is intermittent failure. This is a deterministic failure caused by a deterministic implementation change. Re-running won’t help; the locator needs to change.

From Step 5 onward (next!), we’ll see this pattern in action — running tests against deliberate refactors and identifying which failures are real regressions vs false alarms. The preview: a test that breaks under a behavior-preserving refactor is brittle, not catching a bug.

5

Behavior, Not Implementation: The Brittleness Gauntlet

Why this matters

Every brittle test on a real codebase trains the team to ignore the suite — and once trust is gone, the suite’s value collapses. The fix is not to write more tests; it’s to make sure each test breaks for the right reason. This step makes that distinction tactile by having you edit the app yourself and watch one locator survive a refactor while another shatters.

🎯 You will learn to

Analyze a failing test and classify the break as a real regression or a false alarm
Apply the locator ladder under pressure: predict which tests survive each refactor before running them
Evaluate a brittle locator and rewrite it into one coupled to behavior, not styling

🧠 Quick recall — commit before reading on

Q. From Step 3 — which two locator strategies survive a CSS class rename without modification?

(a) getByText and getByLabel
(b) getByRole and getByTestId
(c) getByPlaceholder and .locator('.css-class')
(d) Only getByRole survives — every other rung breaks.

Reveal

(b). Both getByRole and getByTestId query non-CSS properties — the accessibility tree and an author-supplied data attribute, respectively. They survive any change to className. CSS-class locators (.locator('.css-class')) explicitly couple to the class.

Now we’re going to make the brittleness tactile. You’ll edit the app yourself and watch tests break.

Two tests, same behavior, two locator strategies

You have two test files in tests/:

tests/css-locator.spec.js — uses page.locator('.add-todo-btn') (Rung 5)
tests/role-locator.spec.js — uses page.getByRole('button', { name: /add/i }) (Rung 1)

Both verify the same behavior: clicking Add adds a todo. Both pass against the current App.jsx.

🎬 Predict — Round 1: CSS class rename. Commit to a letter, then click reveal.

Imagine the design team does a styling pass and renames the button’s CSS class:

- <button className="add-todo-btn" onClick={addTodo}>Add todo</button>
+ <button className="primary-btn"  onClick={addTodo}>Add todo</button>

The user-visible behavior is identical — the button still says “Add todo” and still adds a todo.

Q. After the rename, what happens when you re-run both test files?

(a) Both pass — the behavior didn’t change, so neither test should break.
(b) Both fail — Playwright reloads the file and gets confused by the rename.
(c) css-locator fails (false alarm — broke for a styling change), role-locator passes (correctly indifferent to CSS).
(d) role-locator fails (real regression — the role changed), css-locator passes.

Reveal — pick first, then make the edit yourself

(c). This is the entire lesson of the gauntlet. The role-based locator queries the accessibility tree (role + accessible name “Add todo”) — both unchanged. The CSS locator queries the class — which IS what changed. The behavior is identical, so the role test correctly stays green; the CSS test fails for a false alarm. You’re about to watch this happen in real time.

✏️ Edit App.jsx (one line)

Open src/App.jsx. Find the line:

<button className="add-todo-btn" onClick={addTodo}>Add todo</button>

Change add-todo-btn to primary-btn. Just that one identifier. Save the file.

▶ Run

Click Test. You will see one ❌ red and one ✓ green — that’s the design of this step. Do not “fix” the red one by reverting the rename; the red is the lesson. If you see two greens, the rename didn’t take effect (recheck App.jsx); if you see two reds, you broke something else (revert other changes and try again).

The gate below specifically asserts that tests/css-locator.spec.js is failing — passing the gate requires the css-locator test to be in its broken state.

🔍 Investigate

Test	Result	What it tells us
`tests/css-locator.spec.js`	❌ Fails	The test was coupled to a styling decision. The user-facing behavior didn’t change, but the test broke. This is a false alarm — wasted CI time and eroded trust in the suite.
`tests/role-locator.spec.js`	✓ Passes	The test was coupled to the user-visible role + name. Styling changed; behavior didn’t; the test correctly didn’t notice.

The role-based test honors what’s stable about the UI: the button has an accessible name “Add todo.” Styling is incidental. The CSS-based test pinned the incidental thing.

🔄 Mini-gauntlet, Round 2 (preview)

What if Marketing renames "Add todo" → "Add"? The role-locator’s regex /add/i matches both, so it survives. A name: 'Add todo' (exact) wouldn’t have. Whether that survival is right depends on whether the exact wording is part of the spec — and that ambiguity is exactly the trade-off Step 6 makes explicit.

📝 House rule

A test that breaks under a refactor it shouldn’t have broken under is brittle. Brittleness is the cost of coupling tests to implementation details. The Spec Card’s “Should pass when” field is your defense — write down the changes the test should survive before you write the test, then make sure your locators honor it.

Starter files

src/App.jsx

// 🛠 Edit this file as instructed: rename the CSS class
// on the Add todo button from "add-todo-btn" to "primary-btn".
function App() {
  const [items, setItems] = React.useState([]);
  const [text, setText] = React.useState('');

  function addTodo() {
    const trimmed = text.trim();
    if (!trimmed) return;

    setItems([...items, trimmed]);
    setText('');
  }

  return (
    <main className="todo-shell">
      <section className="todo-panel">
        <p className="eyebrow">Brittleness gauntlet</p>
        <h1>Todo Lab</h1>

        <div className="todo-form">
          <label htmlFor="todo-input">Todo item</label>
          <div className="todo-row">
            <input
              id="todo-input"
              value={text}
              onChange={(event) => setText(event.target.value)}
              placeholder="Buy milk"
            />
            <button className="add-todo-btn" onClick={addTodo}>
              Add todo
            </button>
          </div>
        </div>

        <ul aria-label="Todo list" className="todo-list">
          {items.map((item, index) => (
            <li key={index}>{item}</li>
          ))}
        </ul>
      </section>
    </main>
  );
}

src/main.jsx

const root = ReactDOM.createRoot(document.getElementById('root'));
root.render(<App />);

src/styles.css

body { margin: 0; font-family: system-ui, -apple-system, sans-serif; background: #f6f7fb; color: #1f2937; }
.todo-shell { min-height: 100vh; display: grid; place-items: center; padding: 32px; }
.todo-panel { width: min(100%, 560px); background: white; border: 1px solid #d9dee8; border-radius: 8px; padding: 28px; box-shadow: 0 18px 40px rgba(31, 41, 55, 0.08); }
.eyebrow { margin: 0 0 8px; color: #4b5563; font-size: 0.85rem; font-weight: 700; text-transform: uppercase; letter-spacing: 0.04em; }
h1 { margin: 0 0 24px; font-size: 2rem; }
label { display: block; margin-bottom: 8px; font-weight: 700; }
.todo-row { display: flex; gap: 10px; }
input { flex: 1; min-width: 0; background: white; color: #1f2937; border: 1px solid #b8c0cc; border-radius: 6px; padding: 10px 12px; font: inherit; }
.add-todo-btn,
.primary-btn,
button { border: 0; border-radius: 6px; padding: 10px 14px; background: #2563eb; color: white; font: inherit; font-weight: 700; cursor: pointer; }
.todo-list { margin: 24px 0 0; padding-left: 24px; }
.todo-list:empty { display: none; }
.todo-list li { margin: 8px 0; }
/* Dark mode */
[data-bs-theme="dark"] body { background: #1c2533; color: #e6edf3; }
[data-bs-theme="dark"] .todo-panel { background: #232a36; border-color: #2a323e; box-shadow: 0 18px 40px rgba(0, 0, 0, 0.4); }
[data-bs-theme="dark"] .eyebrow { color: #9ca3af; }
[data-bs-theme="dark"] input { background: #2a323e; color: #e6edf3; border-color: #3a4351; }
[data-bs-theme="dark"] input::placeholder { color: #6b7280; }
[data-bs-theme="dark"] .add-todo-btn,
[data-bs-theme="dark"] .primary-btn,
[data-bs-theme="dark"] button { background: #2563eb; }

tests/css-locator.spec.js

import { test, expect } from '@playwright/test';

// CSS-class locator — pins .add-todo-btn (an implementation detail).
test('css-locator: user can add a todo', async ({ page }) => {
  await page.goto('/');
  await page.getByRole('textbox', { name: /todo item/i }).fill('Milk');
  await page.locator('.add-todo-btn').click();
  await expect(page.getByRole('listitem')).toHaveText('Milk');
});

tests/role-locator.spec.js

import { test, expect } from '@playwright/test';

// Role-based locator — pins the button's accessible name.
test('role-locator: user can add a todo', async ({ page }) => {
  await page.goto('/');
  await page.getByRole('textbox', { name: /todo item/i }).fill('Milk');
  await page.getByRole('button', { name: /add/i }).click();
  await expect(page.getByRole('listitem')).toHaveText('Milk');
});

Step 5 — Knowledge Check

Min. score: 80%

1. A team’s CI pipeline reports that test admin can deactivate a user failed last night. Investigation shows: a developer changed a CSS class from .user-row-actions to .row-controls. The deactivate behavior itself works perfectly. The test used page.locator('.user-row-actions button.deactivate'). What’s the most accurate diagnosis?

The test correctly caught a regression — the CSS class was part of the public API
CSS classes are styling concerns, not contracts. A CSS rename almost never breaks user-visible behavior. Treating the class as a “public API” is the brittle assumption — it makes the test fail for reasons unrelated to the spec.
The test is brittle — it’s coupled to a styling decision, not user-visible behavior
Right. The behavior under test is “admin can deactivate a user.” The test broke for a styling rename, not a behavior change. That’s the textbook definition of brittleness — coupling to implementation details rather than the spec.
The developer should have kept the old CSS class name to maintain test compatibility
Tests should adapt to the codebase, not the other way around. Freezing internal naming so tests don’t break is a maintenance anti-pattern — it accumulates technical debt purely to serve test coupling.
The test failure is fine because all CSS changes are risky
The CSS change here had no functional effect. Treating every CSS change as risky leads to enormous maintenance burden and noisy CI — the team will start ignoring these failures, masking real regressions.

A test failure is only useful if it points to a behavior break. A test that fails for a styling rename, a class rename, or a DOM restructure is a false alarm — it costs the team time and erodes trust in the suite. Use role-based or test-ID-based locators to keep the contract stable while implementation evolves.

2. You write a new e2e test using getByRole('button', { name: 'Sign in' }). A week later, the marketing team renames the button from “Sign in” to “Log in”. Your test breaks. Which is the most accurate take?

False alarm. Use a regex like name: /sign in|log in/i so future renames don’t break the test.
A patchwork regex like /sign in|log in/i is a maintenance smell — every wording change adds another OR clause until the regex is unreadable. Use it as a bridge during a rollout, but the long-term answer depends on whether the wording is contractual.
False alarm. The button text wasn’t part of the spec. Switch to getByTestId('signin-action').
This is the right answer for one case — when wording is incidental and likely to change. But it’s not the right answer for every case. If the brand requires “Sign in” specifically (legal, accessibility consistency, marketing contract), the test should fail when wording drifts. The decision depends on the spec.
Real regression. The user can no longer sign in.
The user can almost certainly still sign in — the button now says “Log in” but does the same thing. The test broke for wording, not behavior. So this isn’t a regression in the user-flow sense.
It depends. If the spec promises specific button copy (e.g. for branding/legal/UX consistency), the test should fail. If the copy is incidental, switch to getByTestId so the next rename doesn’t break the test.
Right. It depends on what the spec promises. This is the trade-off Step 6 tackles head-on. If the wording is part of the contract, fail loudly when it changes. If it’s incidental, use getByTestId('signin-action') so the locator survives renames. Don’t reflexively pick one — read the spec.

The locator ladder isn’t "always pick option 1." The right rung depends on what’s promised by the spec. Step 6 makes this trade-off explicit by introducing the match assertion specificity to spec specificity principle.

3. (Spaced review — Step 4) A weak assertion await expect(page.getByRole('listitem')).toHaveCount(1) passed against an app that renders an empty <li> (the user’s text was dropped). Why did it pass?

Because Playwright’s auto-wait masked the bug
Auto-wait makes assertions retry until they hold; it doesn’t change what they check. A count assertion verifies count, regardless of whether the count is reached immediately or after waiting.
Because toHaveCount only verifies the count of matching elements, not their content. The empty <li> counts as one matching element.
Right. toHaveCount(1) asserts “exactly one matching listitem exists” — and the buggy app did render one listitem. The fact that it was empty is exactly the gap the weak assertion missed. To catch the bug, pin the content with toHaveText('Milk').
Because the app didn’t actually have a bug
The app had a real bug — it stored an empty string instead of the user’s text. The weak assertion failed to detect it. That’s the liar-test pattern.
Because the assertion needed await
The assertion already had await. The issue isn’t the await form — it’s that toHaveCount is checking the wrong thing for this spec.

Strong assertions pin what the spec promises. The spec promised "the user’s text appears in the list," so the assertion needs to verify text content — not just that something exists. This is the same liar-test family from Testing Foundations Step 3.

6

The Maintenance Trade-off: Pin the Spec, No More, No Less

Why this matters

Step 4 said stronger assertions catch more bugs. Step 5 said brittle locators waste team time. Both are true — and they pull in opposite directions. The skill that separates a maintainable suite from a brittle one is knowing how to reconcile them: pin exactly what the spec promises, no more, no less. Get this calibration wrong and you either over-specify (false alarms on every refactor) or under-specify (the count is broken and the test is green).

🎯 You will learn to

Apply the principle match assertion specificity to spec specificity to a single-promise feature
Analyze a 3 × 2 grid of assertion strength × scenario and predict which results are correct vs misleading
Evaluate a goldilocks assertion against brittle and loose alternatives

🧠 Quick recall — commit before reading on

Q. A test fails. Which of these is the false alarm?

(a) The behavior under test changed — the user can no longer place an order.
(b) The test asserts on a CSS class that the design team renamed; the user-visible behavior is unchanged.
(c) The test discovered a regression in the checkout flow.
(d) The test caught an off-by-one in the cart count.

Reveal

(b). A false alarm is a test failure that doesn’t correspond to a behavior change — the test was coupled to implementation (CSS class) instead of to the user-visible promise. (a), (c), and (d) are real regressions worth catching. Both Step 4 (liar tests = false passes) and Step 5 (brittle tests = false fails) point at the same underlying issue: a test’s value depends on what it actually verifies. Step 6 puts the principle into one sentence.

🎯 The principle

Match assertion specificity to spec specificity. Pin exactly what the spec promises — no more, no less.

A stronger assertion is not always a better assertion. We’ll see this on a deliberately simple feature first. (Step 7 generalizes it to features with multiple promises.)

The feature

The Todo app has a new remaining-count display: a <p role="status"> showing “3 items remaining”. The spec is one sentence:

“Show the user how many items are still pending.”

That’s it. One promise: surface the count. Notice what’s not in the spec:

the exact wording (“items remaining” vs “todos pending”)
plurality grammar (“1 item” vs “1 items”)
the surrounding sentence (“You have 3…” vs just “3…”)
color, position, animation

Three candidate assertions

// Brittle (over-specified): pins exact wording, plurality, surrounding copy.
await expect(page.getByRole('status'))
  .toHaveText('You have 3 items remaining across all todos');

// Goldilocks (spec-aligned): pins exactly what the spec promises.
await expect(page.getByRole('status')).toContainText('3');
await expect(page.getByRole('status')).toContainText(/item/i);

// Loose (under-specified): the status region exists; nothing more.
await expect(page.getByRole('status')).toBeVisible();

🎬 Predict — Scenario A: marketing changes wording. Commit, then click reveal.

Imagine the team rewrites the status text from "3 items remaining" to "3 todos pending". The spec is still satisfied — the count is still shown.

Q. Which assertion correctly survives the wording change (i.e., passes — and the pass is the right answer)?

(a) Brittle only — exact text is the contract.
(b) Goldilocks only — pins the count and the noun, both still present.
(c) Loose only — toBeVisible() doesn’t care about content.
(d) Goldilocks and Loose — both still pass; only Goldilocks’s pass is informative.

Reveal

(d). Brittle fails (false alarm — wording changed, spec didn’t). Goldilocks and Loose both pass — but Goldilocks’s pass is meaningful (it verified the count and the noun) while Loose’s pass is trivially true (it never checked the count anyway). A “passing” test that proves nothing isn’t doing its job.

🎬 Predict — Scenario B: an off-by-one regression. Commit, then click reveal.

Now imagine a different change: the count logic has a bug. Where the page should say “3 items remaining,” it says “4 items remaining” instead.

Q. Which assertion catches this regression (i.e., fails — and the fail is the right answer)?

(a) Brittle and Goldilocks both fail; Loose passes (misses the bug).
(b) Only Brittle fails; Goldilocks misses it because it doesn’t pin the exact number.
(c) Only Loose fails — it’s the only one that runs against the count region.
(d) All three pass — toContainText and toHaveText both ignore numeric content.

Reveal

(a). Brittle fails because '3 items remaining' ≠ '4 items remaining'. Goldilocks fails because toContainText('3') doesn’t match '4 items remaining' (no '3' in that string). Loose passes because the status region is still visible — it never checked the count, so it can’t catch a count regression. That last “pass” is the under-specification trap.

▶ Run

Click Test. All three tests pass against the base app. (The base app shows "3 items remaining" correctly.)

✏️ Edit App.jsx — introduce the off-by-one bug

In src/App.jsx, find the line:

const remainingCount = items.length;

Change it to:

const remainingCount = items.length + 1;

That’s the bug — the count is now wrong by one. Predict which tests catch it before re-running.

▶ Run again

🔍 Investigate — Scenario B results

Assertion	Result	Was the result useful?
Brittle	❌ Fails	✓ Yes — it caught the regression
Goldilocks	❌ Fails	✓ Yes — it caught the regression
Loose	✓ Passes	✗ No — it missed the bug entirely

Now think back to Scenario A (the wording change). Reset the bug — change items.length + 1 back to items.length. Then imagine the wording change happening:

Assertion	Result under wording change	Was the result useful?
Brittle	❌ Fails	✗ No — false alarm; spec still satisfied
Goldilocks	✓ Passes	✓ Yes — wording isn’t part of the spec
Loose	✓ Passes	(Trivially — but it never checked the count anyway)

The 2×2 grid that crystallizes the lesson

Assertion ↓ / Spec →	Spec is loose (“show the count”)	Spec is tight (“show ‘3 items remaining’”)
Loose assertion	✓ aligned	✗ misses regressions
Tight assertion	✗ false alarms	✓ aligned

Strength (LO3) and spec-fidelity (LO4) are different axes. The best assertion lives on the diagonal — its specificity matches the spec’s specificity.

Loose spec + loose assertion = good. (You’re pinning what’s promised.)
Loose spec + tight assertion = false alarms. (You’re pinning more than promised.)
Tight spec + loose assertion = misses regressions. (You’re pinning less than promised.)
Tight spec + tight assertion = good. (You’re pinning the exact contract.)

The Goldilocks assertion above is on the diagonal: a loose spec, met with a loose-but-targeted assertion that still verifies the count. Brittle is off the diagonal in one direction; loose is off in the other.

📝 House rule

Pin exactly what the spec promises. No more, no less.

Don’t default to maximum strictness “just in case.” Strictness is not free — every pin is a future false alarm waiting to happen. Don’t default to minimum strictness either — every un-pinned promise is a regression waiting to slip through.

Read the spec. Decide what’s promised. Pin that.

Starter files

src/App.jsx

// 🛠 You'll edit one line in this file to introduce the off-by-one bug.
function App() {
  const [items, setItems] = React.useState([]);
  const [text, setText] = React.useState('');

  function addTodo() {
    const trimmed = text.trim();
    if (!trimmed) return;
    setItems([...items, trimmed]);
    setText('');
  }

  const remainingCount = items.length;

  return (
    <main className="todo-shell">
      <section className="todo-panel">
        <p className="eyebrow">Todo Lab</p>
        <h1>Todo Lab</h1>

        <div className="todo-form">
          <label htmlFor="todo-input">Todo item</label>
          <div className="todo-row">
            <input
              id="todo-input"
              value={text}
              onChange={(event) => setText(event.target.value)}
              placeholder="Buy milk"
            />
            <button onClick={addTodo}>Add todo</button>
          </div>
        </div>

        <p role="status" className="status-line">
          {remainingCount} items remaining
        </p>

        <ul aria-label="Todo list" className="todo-list">
          {items.map((item, index) => (
            <li key={index}>{item}</li>
          ))}
        </ul>
      </section>
    </main>
  );
}

src/main.jsx

const root = ReactDOM.createRoot(document.getElementById('root'));
root.render(<App />);

src/styles.css

body { margin: 0; font-family: system-ui, -apple-system, sans-serif; background: #f6f7fb; color: #1f2937; }
.todo-shell { min-height: 100vh; display: grid; place-items: center; padding: 32px; }
.todo-panel { width: min(100%, 560px); background: white; border: 1px solid #d9dee8; border-radius: 8px; padding: 28px; box-shadow: 0 18px 40px rgba(31, 41, 55, 0.08); }
.eyebrow { margin: 0 0 8px; color: #4b5563; font-size: 0.85rem; font-weight: 700; text-transform: uppercase; letter-spacing: 0.04em; }
h1 { margin: 0 0 24px; font-size: 2rem; }
label { display: block; margin-bottom: 8px; font-weight: 700; }
.todo-row { display: flex; gap: 10px; }
input { flex: 1; min-width: 0; background: white; color: #1f2937; border: 1px solid #b8c0cc; border-radius: 6px; padding: 10px 12px; font: inherit; }
button { border: 0; border-radius: 6px; padding: 10px 14px; background: #2563eb; color: white; font: inherit; font-weight: 700; cursor: pointer; }
.status-line { margin: 18px 0 0; color: #4b5563; font-weight: 600; }
.todo-list { margin: 12px 0 0; padding-left: 24px; }
.todo-list li { margin: 8px 0; }
/* Dark mode */
[data-bs-theme="dark"] body { background: #1c2533; color: #e6edf3; }
[data-bs-theme="dark"] .todo-panel { background: #232a36; border-color: #2a323e; box-shadow: 0 18px 40px rgba(0, 0, 0, 0.4); }
[data-bs-theme="dark"] .eyebrow { color: #9ca3af; }
[data-bs-theme="dark"] input { background: #2a323e; color: #e6edf3; border-color: #3a4351; }
[data-bs-theme="dark"] input::placeholder { color: #6b7280; }
[data-bs-theme="dark"] button { background: #2563eb; }
[data-bs-theme="dark"] .status-line { color: #9ca3af; }

tests/brittle.spec.js

import { test, expect } from '@playwright/test';

// BRITTLE: pins exact wording, plurality, surrounding copy.
test('brittle: counter shows pinned exact text', async ({ page }) => {
  await page.goto('/');
  await page.getByRole('textbox', { name: /todo item/i }).fill('A');
  await page.getByRole('button', { name: /add/i }).click();
  await page.getByRole('textbox', { name: /todo item/i }).fill('B');
  await page.getByRole('button', { name: /add/i }).click();
  await page.getByRole('textbox', { name: /todo item/i }).fill('C');
  await page.getByRole('button', { name: /add/i }).click();
  await expect(page.getByRole('status')).toHaveText('3 items remaining');
});

tests/goldilocks.spec.js

import { test, expect } from '@playwright/test';

// GOLDILOCKS: pins exactly what the spec promises (the count + the noun).
test('goldilocks: counter shows the right count of items', async ({ page }) => {
  await page.goto('/');
  await page.getByRole('textbox', { name: /todo item/i }).fill('A');
  await page.getByRole('button', { name: /add/i }).click();
  await page.getByRole('textbox', { name: /todo item/i }).fill('B');
  await page.getByRole('button', { name: /add/i }).click();
  await page.getByRole('textbox', { name: /todo item/i }).fill('C');
  await page.getByRole('button', { name: /add/i }).click();
  await expect(page.getByRole('status')).toContainText('3');
  await expect(page.getByRole('status')).toContainText(/item/i);
});

tests/loose.spec.js

import { test, expect } from '@playwright/test';

// LOOSE: the status region exists; nothing more.
// This misses the actual count!
test('loose: status region is visible', async ({ page }) => {
  await page.goto('/');
  await page.getByRole('textbox', { name: /todo item/i }).fill('A');
  await page.getByRole('button', { name: /add/i }).click();
  await expect(page.getByRole('status')).toBeVisible();
});

Step 6 — Knowledge Check

Min. score: 80%

1. A test asserts:

await expect(page.getByRole('status')).toHaveText(
  'Welcome back, Ada! You have 5 unread messages waiting.'
);

The product spec says: “After login, show the user a welcome message and their unread message count.” What’s the most accurate critique?

It’s correctly strict — it pins everything the spec promises
Strictness isn’t free. The spec promises two things (welcome message, unread count) but this assertion pins about seven (exact wording, name interpolation, plurality grammar, sentence structure). When wording changes, the test breaks for reasons the spec doesn’t care about — the over-specification trap.
It’s over-specified — it pins wording, the user’s name interpolation, plurality, and surrounding copy that the spec doesn’t promise. Marketing can rephrase it and break the test for nothing.
Right. The spec is loose (“show a welcome message and unread count”); the assertion is tight (exact full-sentence match). When marketing changes “Welcome back” to “Hi” or “5 unread messages” to “5 messages waiting,” the test breaks even though the spec is still satisfied. False alarm waiting to happen.
It’s under-specified — it should also pin the URL and page title
The spec doesn’t promise anything about URL or page title. Adding assertions for those pins MORE implementation, not less — making the test more brittle, not less.
It’s wrong because it uses toHaveText instead of toBeVisible
toHaveText is the right tool for asserting on specific text content. The problem isn’t the matcher — it’s what is being matched (over-specified text). A better fix is toContainText with a regex covering the bits the spec actually cares about.

The principle: pin exactly what the spec promises — no more, no less. Stronger assertions aren’t always better; they can over-specify and create false alarms. The best assertion matches the spec’s specificity.

2. Which strategy BEST avoids both false alarms AND missed regressions for the spec “the page shows the user’s order ID”?

await expect(page.getByText('Order ID: 12345 — placed at 3:42 PM')).toBeVisible()
Pinning the timestamp and the surrounding sentence is over-specification — those aren’t in the spec. A wording or layout change breaks the test for reasons the spec doesn’t care about.
await expect(page.getByRole('region', { name: /order id/i })).toContainText(orderId)
Right. The spec promises the order ID (the actual value), in a region the user can identify. Asserting that the order-ID region contains the actual order ID pins exactly that — no more, no less. The wording (“Order ID: …” vs “Order #…”) is incidental and the test will survive it.
await expect(page.getByRole('region', { name: /order id/i })).toBeVisible()
Asserting only that the region is visible doesn’t verify what’s inside it. The spec promises the order ID specifically; a region with the wrong ID (or no ID) would pass this assertion. Under-specified.
await expect(page.getByText('order')).toBeVisible()
getByText('order') is too loose (matches any element with the word “order”) and toBeVisible() doesn’t verify content. Two ways under-specified at once.

The diagonal of the 2×2 grid: tight spec (the actual ID matters) → tight assertion (verify the ID). The framing region uses a role locator with a regex name so the wording around the ID can change without breaking the test. The ID itself is pinned because the spec says so.

3. (Spaced review — Step 5) A test fails after a CSS class rename. The behavior is unchanged. The team then changes the class back to silence the test. What’s the underlying problem?

The team’s solution is correct — keeping CSS class names stable is essential for tests to work
This is the brittle-test lock-in trap. If you keep CSS class names stable just for tests, you accumulate technical debt — class names that no longer reflect the design, retained only because tests grip them. The cause isn’t the rename; it’s the test.
The team patched the symptom (test failure) instead of the cause (test was coupled to implementation, not behavior)
Right. The test was a CSS-locator test (Step 5 brittleness). Patching the symptom (revert rename) keeps the brittle test passing today but ensures the same trap fires again the next time someone refactors. The fix is to rewrite the locator using a stable contract (getByRole or a semantic getByTestId).
The test is correct; the team should add the old CSS class as an alias
Aliasing is even worse — now you’re maintaining two class names, one of which is dead-weight. The spec didn’t change; the test should have been written against a stable locator.
Reverting the CSS rename was the right call — never let a refactor break tests
Tests should adapt to the codebase, not freeze it. Refactors are how codebases stay healthy. A test that breaks under a refactor with no behavior change is brittle — fix the test, don’t ban the refactor.

From Step 5: brittle tests fail under refactors that don’t break behavior. The fix is to rewrite the test against a stable contract, not to revert the refactor or freeze internal naming.

7

Multi-Promise Features and the Capstone

Why this matters

Real features rarely have a single promise. The “Mark as done” toggle has three: state changes, count decrements, item stays visible. Each promise has its own specificity sweet spot — and treating them as one big assertion either over-pins (brittle on harmless changes) or under-pins (misses bugs in two-thirds of the contract). This step is the real-world skill: per-promise specificity decisions, made independently.

🎯 You will learn to

Apply the specificity-matching principle to features with multiple independent promises
Analyze each promise separately and choose its locator + assertion shape
Create a complete multi-promise Playwright test from a Spec Card and a partial test stub

🧠 Quick recall — commit before reading on

Q. From Step 6: a stronger assertion is sometimes worse. When?

(a) When the SUT is slow — strong assertions time out before the page renders.
(b) When the spec is loose — pinning more than the spec promises creates false alarms on every harmless wording / styling change.
(c) Never — stricter is always safer.
(d) When the test runs on Firefox — strong assertions don’t work cross-browser.

Reveal

(b). This is Step 6’s principle: the best assertion lives on the diagonal of the (spec specificity × assertion specificity) grid. If the spec is loose (“show the count”) but the assertion is tight (toHaveText('3 items remaining')), every wording change becomes a false alarm — a test failure that doesn’t correspond to a behavior break.

Step 6 had a single promise (the count). Real features usually have multiple promises — and you have to make a separate specificity decision for each one. That’s the skill that distinguishes a maintainable test suite from a brittle one.

🎯 The feature: “Mark as done” toggle

The Todo app now supports marking items as done. Click on a todo’s button to toggle its done state. Done items show a checkmark; the remaining-count display only counts items that are not done.

The spec is three promises:

Toggle state. Clicking a todo toggles its done state.
Count decrements. The remaining-count display reflects only un-done items.
Item stays visible. Marked-done items remain in the list (not deleted).

For each promise, we make a specificity decision independently. Read this table — you’ll fill in a similar one for the capstone:

Promise                       Brittle option              Goldilocks option              Loose option
──────────────────────────    ──────────────────────────  ──────────────────────────     ─────────────────────────
1. Toggle state               toHaveClass(/todo-done/)    toHaveAttribute('aria-         (skip — but then how
                              (pins CSS class —           pressed', 'true') (pins        do you know the toggle
                              implementation detail)      semantic ARIA contract)        worked?)
2. Count decrements           toHaveText('2 items         getByRole('status')            toBeVisible() on the
                              remaining') (over-pins      .toContainText('2')            status (misses the
                              wording)                    (pins the number itself)       count regression)
3. Item stays visible         (Goldilocks IS the          getByRole('listitem')          (you can't loose-spec
                              target — count + visible)   .filter({hasText:'Milk'})      a deletion check —
                                                          .toBeVisible()                  this promise is binary)

Notice the asymmetry.

Promise 2 is the same shape as Step 6: pin the count, not the wording.
Promise 1 introduces a new dimension: there’s a right tool (aria-pressed, the semantic contract) and a wrong tool (.todo-done CSS class). Using the wrong tool isn’t more strict — it’s coupled to implementation in a different way.
Promise 3 is binary — the item either stays visible or it doesn’t. Loose-spec doesn’t apply when the contract is yes/no.

Worked example: one fully written test

Read this carefully — it applies the table above:

test('marking a todo as done decrements the count and keeps it visible', async ({ page }) => {
  // Arrange: three todos.
  await page.goto('/');
  for (const t of ['Milk', 'Bread', 'Eggs']) {
    await page.getByRole('textbox', { name: /todo item/i }).fill(t);
    await page.getByRole('button', { name: /add todo/i }).click();
  }

  // Act: mark "Milk" as done.
  const milkToggle = page.getByRole('button', { name: 'Milk' });
  await milkToggle.click();

  // Assert all three promises:
  // Promise 1 — toggle state is "done" (semantic ARIA contract).
  await expect(milkToggle).toHaveAttribute('aria-pressed', 'true');

  // Promise 2 — count decrements (pin the number, not wording).
  await expect(page.getByRole('status')).toContainText('2');

  // Promise 3 — Milk is still in the list (not deleted).
  await expect(
    page.getByRole('listitem').filter({ hasText: 'Milk' })
  ).toBeVisible();
});

Each assertion is on the diagonal of its own 2×2 grid. Promise 1 uses the semantic ARIA attribute (not the CSS class). Promise 2 pins the count number (not the wording). Promise 3 verifies presence (the binary contract).

🎓 Capstone — write the next two tests

You’re given a complete Spec Card and two test stubs. Your job: fill in Act + Assert.

Spec Card: Mark a todo as done

✓ Behavior:        Clicking a todo toggles its "done" state. Done todos
                    are visually distinct. The remaining count decrements.
                    Marked-done todos remain in the list.
✓ Should pass when: Visual styling of done items changes (color, icon,
                    font-weight). The toggle becomes a checkbox instead
                    of a button. The confirmation animation changes.
✗ Should fail when: Marking doesn't persist between renders. Count doesn't
                    decrement. Done items disappear from the list.
🎯 Locator contract: Each todo is a listitem. The toggle button has the
                    item's text as its accessible name. The status region
                    exposes a count.
✅ Oracle:          The status count reflects the number of un-done items.

Your two tests:

test('marking and unmarking a todo restores the count', async ({ page }) => {
  // Arrange: one todo "Milk".
  // Act: mark it done, then unmark it.
  // Assert: aria-pressed is back to false; count is back to 1.
});

test('marking one of two todos shows count of 1', async ({ page }) => {
  // Arrange: two todos "Milk" and "Bread".
  // Act: mark "Milk" as done.
  // Assert: count shows "1"; "Bread" is still un-done; "Milk" is done.
});

Use the worked example as your template. Apply per-promise specificity decisions (semantic locators, pin the count, verify the toggle state).

🤔 Metacognitive close

Before you submit:

Rate your confidence on each LO from Step 1 to now. Anything still fuzzy?
For your two capstone tests, ask: what’s the smallest change to App.jsx that should make my test fail? What’s the smallest change that should NOT make my test fail?

That second question is the real test of whether you’ve internalized the principle. If your test would fail for anything you can think of, it’s brittle. If it would not fail for a real regression you can think of, it’s loose. Aim for the diagonal.

📝 Final house rule

A durable e2e test isn’t a script of clicks. It’s an executable behavioral spec with a thin adapter that maps user intent onto the current UI.

Next steps beyond this tutorial

The in-browser sandbox here doesn’t host every Playwright feature. In a real Playwright project you’d also use:

Network mocking (page.route) — mock API responses for deterministic tests.
Storage state auth — sign in once, reuse the session across tests.
Fixtures — share setup logic without hiding business intent.
Trace viewer — inspect failed CI runs frame-by-frame.

The official Playwright docs are the next learning artifact. Everything you’ve built here transfers — only the plumbing differs.

Starter files

src/App.jsx

function App() {
  const [items, setItems] = React.useState([]);
  const [text, setText] = React.useState('');

  function addTodo() {
    const trimmed = text.trim();
    if (!trimmed) return;
    setItems([...items, { text: trimmed, done: false }]);
    setText('');
  }

  function toggleDone(idx) {
    setItems(items.map((item, i) =>
      i === idx ? { ...item, done: !item.done } : item
    ));
  }

  const remainingCount = items.filter((item) => !item.done).length;

  return (
    <main className="todo-shell">
      <section className="todo-panel">
        <p className="eyebrow">Todo Lab — Capstone</p>
        <h1>Todo Lab</h1>

        <div className="todo-form">
          <label htmlFor="todo-input">Todo item</label>
          <div className="todo-row">
            <input
              id="todo-input"
              value={text}
              onChange={(event) => setText(event.target.value)}
              placeholder="Buy milk"
            />
            <button onClick={addTodo}>Add todo</button>
          </div>
        </div>

        <p role="status" className="status-line">
          {remainingCount} items remaining
        </p>

        <ul aria-label="Todo list" className="todo-list">
          {items.map((item, idx) => (
            <li key={idx} className={item.done ? 'todo-done' : ''}>
              <button
                className="todo-toggle"
                onClick={() => toggleDone(idx)}
                aria-pressed={item.done}
              >
                {item.text}
              </button>
            </li>
          ))}
        </ul>
      </section>
    </main>
  );
}

src/main.jsx

const root = ReactDOM.createRoot(document.getElementById('root'));
root.render(<App />);

src/styles.css

body { margin: 0; font-family: system-ui, -apple-system, sans-serif; background: #f6f7fb; color: #1f2937; }
.todo-shell { min-height: 100vh; display: grid; place-items: center; padding: 32px; }
.todo-panel { width: min(100%, 560px); background: white; border: 1px solid #d9dee8; border-radius: 8px; padding: 28px; box-shadow: 0 18px 40px rgba(31, 41, 55, 0.08); }
.eyebrow { margin: 0 0 8px; color: #4b5563; font-size: 0.85rem; font-weight: 700; text-transform: uppercase; letter-spacing: 0.04em; }
h1 { margin: 0 0 24px; font-size: 2rem; }
label { display: block; margin-bottom: 8px; font-weight: 700; }
.todo-row { display: flex; gap: 10px; }
input { flex: 1; min-width: 0; background: white; color: #1f2937; border: 1px solid #b8c0cc; border-radius: 6px; padding: 10px 12px; font: inherit; }
.todo-row > button { border: 0; border-radius: 6px; padding: 10px 14px; background: #2563eb; color: white; font: inherit; font-weight: 700; cursor: pointer; }
.status-line { margin: 18px 0 0; color: #4b5563; font-weight: 600; }
.todo-list { margin: 12px 0 0; padding-left: 0; list-style: none; }
.todo-list li { margin: 8px 0; }
.todo-toggle { display: block; width: 100%; text-align: left; color: #1f2937; border: 1px solid #d9dee8; border-radius: 6px; padding: 10px 12px; background: white; font: inherit; cursor: pointer; }
.todo-done .todo-toggle { color: #9ca3af; text-decoration: line-through; }
/* Dark mode */
[data-bs-theme="dark"] body { background: #1c2533; color: #e6edf3; }
[data-bs-theme="dark"] .todo-panel { background: #232a36; border-color: #2a323e; box-shadow: 0 18px 40px rgba(0, 0, 0, 0.4); }
[data-bs-theme="dark"] .eyebrow { color: #9ca3af; }
[data-bs-theme="dark"] input { background: #2a323e; color: #e6edf3; border-color: #3a4351; }
[data-bs-theme="dark"] input::placeholder { color: #6b7280; }
[data-bs-theme="dark"] .todo-row > button { background: #2563eb; }
[data-bs-theme="dark"] .status-line { color: #9ca3af; }
[data-bs-theme="dark"] .todo-toggle { background: #2a323e; color: #e6edf3; border-color: #3a4351; }
[data-bs-theme="dark"] .todo-done .todo-toggle { color: #6b7280; }

tests/mark-done.spec.js

import { test, expect } from '@playwright/test';

// Worked example — read this carefully before writing the next two.
test('marking a todo as done decrements the count and keeps it visible', async ({ page }) => {
  await page.goto('/');
  for (const t of ['Milk', 'Bread', 'Eggs']) {
    await page.getByRole('textbox', { name: /todo item/i }).fill(t);
    await page.getByRole('button', { name: /add todo/i }).click();
  }

  const milkToggle = page.getByRole('button', { name: 'Milk' });
  await milkToggle.click();

  // Promise 1 — toggle state (semantic ARIA contract).
  await expect(milkToggle).toHaveAttribute('aria-pressed', 'true');
  // Promise 2 — count decrements (pin the number).
  await expect(page.getByRole('status')).toContainText('2');
  // Promise 3 — item stays visible (binary contract).
  await expect(
    page.getByRole('listitem').filter({ hasText: 'Milk' })
  ).toBeVisible();
});

// Your turn: fill in Act + Assert.
test('marking and unmarking a todo restores the count', async ({ page }) => {
  // Arrange: navigate and add one todo "Milk".
  await page.goto('/');
  await page.getByRole('textbox', { name: /todo item/i }).fill('Milk');
  await page.getByRole('button', { name: /add todo/i }).click();

  // TODO: Act — mark Milk as done, then unmark it.
  // TODO: Assert — Milk's aria-pressed is "false"; the status shows "1".
});

test('marking one of two todos shows count of 1', async ({ page }) => {
  // Arrange: navigate and add two todos "Milk" and "Bread".
  await page.goto('/');
  for (const t of ['Milk', 'Bread']) {
    await page.getByRole('textbox', { name: /todo item/i }).fill(t);
    await page.getByRole('button', { name: /add todo/i }).click();
  }

  // TODO: Act — mark "Milk" as done.
  // TODO: Assert — status shows "1"; "Milk" is done; "Bread" is not done.
});

Solution

src/App.jsx

function App() {
  const [items, setItems] = React.useState([]);
  const [text, setText] = React.useState('');

  function addTodo() {
    const trimmed = text.trim();
    if (!trimmed) return;
    setItems([...items, { text: trimmed, done: false }]);
    setText('');
  }

  function toggleDone(idx) {
    setItems(items.map((item, i) =>
      i === idx ? { ...item, done: !item.done } : item
    ));
  }

  const remainingCount = items.filter((item) => !item.done).length;

  return (
    <main className="todo-shell">
      <section className="todo-panel">
        <p className="eyebrow">Todo Lab — Capstone</p>
        <h1>Todo Lab</h1>

        <div className="todo-form">
          <label htmlFor="todo-input">Todo item</label>
          <div className="todo-row">
            <input
              id="todo-input"
              value={text}
              onChange={(event) => setText(event.target.value)}
              placeholder="Buy milk"
            />
            <button onClick={addTodo}>Add todo</button>
          </div>
        </div>

        <p role="status" className="status-line">
          {remainingCount} items remaining
        </p>

        <ul aria-label="Todo list" className="todo-list">
          {items.map((item, idx) => (
            <li key={idx} className={item.done ? 'todo-done' : ''}>
              <button
                className="todo-toggle"
                onClick={() => toggleDone(idx)}
                aria-pressed={item.done}
              >
                {item.text}
              </button>
            </li>
          ))}
        </ul>
      </section>
    </main>
  );
}

tests/mark-done.spec.js

import { test, expect } from '@playwright/test';

test('marking a todo as done decrements the count and keeps it visible', async ({ page }) => {
  await page.goto('/');
  for (const t of ['Milk', 'Bread', 'Eggs']) {
    await page.getByRole('textbox', { name: /todo item/i }).fill(t);
    await page.getByRole('button', { name: /add todo/i }).click();
  }

  const milkToggle = page.getByRole('button', { name: 'Milk' });
  await milkToggle.click();

  await expect(milkToggle).toHaveAttribute('aria-pressed', 'true');
  await expect(page.getByRole('status')).toContainText('2');
  await expect(
    page.getByRole('listitem').filter({ hasText: 'Milk' })
  ).toBeVisible();
});

test('marking and unmarking a todo restores the count', async ({ page }) => {
  await page.goto('/');
  await page.getByRole('textbox', { name: /todo item/i }).fill('Milk');
  await page.getByRole('button', { name: /add todo/i }).click();

  const milkToggle = page.getByRole('button', { name: 'Milk' });

  // Mark, then unmark.
  await milkToggle.click();
  await milkToggle.click();

  await expect(milkToggle).toHaveAttribute('aria-pressed', 'false');
  await expect(page.getByRole('status')).toContainText('1');
});

test('marking one of two todos shows count of 1', async ({ page }) => {
  await page.goto('/');
  for (const t of ['Milk', 'Bread']) {
    await page.getByRole('textbox', { name: /todo item/i }).fill(t);
    await page.getByRole('button', { name: /add todo/i }).click();
  }

  const milkToggle = page.getByRole('button', { name: 'Milk' });
  await milkToggle.click();

  await expect(page.getByRole('status')).toContainText('1');
  await expect(milkToggle).toHaveAttribute('aria-pressed', 'true');
  await expect(
    page.getByRole('button', { name: 'Bread' })
  ).toHaveAttribute('aria-pressed', 'false');
});

Each test on the diagonal: semantic locators (getByRole with the item’s text as the accessible name), per-promise specificity (toggle state via aria-pressed, count via toContainText of the number, item visibility via getByRole('listitem').filter()). None of the tests would break if the strikethrough color changes, the toggle becomes a checkbox icon, or the wording around the count changes. All three would fail if marking didn’t persist or the count didn’t decrement.

Step 7 — Knowledge Check

Min. score: 80%

1. A “checkout” feature has three spec’d promises:

After paying, the user sees an order confirmation.
The order ID is shown so the user can reference it later.
A confirmation email is sent (verifiable via a test mailbox).

Which set of specificity choices BEST matches the spec?

All three pinned with toHaveText (exact match) for maximum strictness
Pinning Promise 1 with exact toHaveText means any wording change to the confirmation message (“Order confirmed”, “Thank you for your order”, “Order placed”) breaks the test for no behavior reason. That’s the over-specification trap.
Promise 1: Goldilocks (toContainText(/order|confirm/i)). Promise 2: Tight (toHaveText matching the actual order ID). Promise 3: Tight (assert the exact email arrived in the mailbox).
Right. Each promise gets the specificity its spec demands. Promise 1 (“user sees confirmation”) is loose-specced — wording isn’t promised, so Goldilocks. Promise 2 (“order ID shown”) IS the contract — the specific ID matters. Promise 3 (“email is sent”) is binary reality — the email either arrived or didn’t, so a tight assertion against the test mailbox is appropriate.
All three with toBeVisible() to keep the test minimal
toBeVisible() for Promise 2 (“order ID shown”) doesn’t verify the ID is correct — only that something renders. A bug that shows a hardcoded “ORDER-XXX” instead of the real ID would pass this assertion. Under-specified.
Skip Promise 3 because emails are too hard to test
Email IS the contract for Promise 3 — skipping it means the test can’t catch the most expensive failure mode (the user didn’t get their receipt). Use a test mailbox or queue inspection. “Hard to test” is a maintenance argument, not a spec argument.

Multi-promise features need per-promise specificity decisions. Each promise has its own answer to “what exactly is this asserting, and what’s allowed to change?” Pinning everything strictly creates a brittle suite; pinning everything loosely creates a leaky one. The skill is judgment: read each promise, decide its specificity independently.

2. Your team built a notifications panel with these spec’d behaviors:

Unread notifications show a red badge with the count.
Clicking the bell icon opens the panel.
Notifications are listed in reverse chronological order.

A designer changes the badge color from red to orange (no spec change). The team’s e2e test fails because it asserts await expect(badge).toHaveCSS('background-color', 'rgb(239, 68, 68)'). What’s the right diagnosis?

Real regression — the badge color is part of the spec
The spec listed says “red badge” — but the test failure is for a color change, not a missing-badge change. Was “red” specifically promised, or was the spec loose about the color? If the spec says “badge with the count,” the test should assert that — not the exact RGB value.
False alarm — the test pinned implementation (specific RGB color) instead of the user-visible promise (the badge with a count)
Right. The test pinned the RGB color value (rgb(239, 68, 68)) — implementation. The spec’s promise is “badge with the count” — the count is the contract, not the specific color shade. Asserting getByRole('status').toContainText(unreadCount) would survive any color change while still verifying the user-facing behavior.
The team should add red and orange as accepted values to the assertion
Adding multiple accepted colors is a maintenance smell — every redesign expands the OR-list. The deeper fix is to stop testing the color at all if the color isn’t in the spec.
Designers shouldn’t change colors without updating tests
Tests should adapt to design changes, not vice versa. If the test breaks for a design refresh that didn’t change the spec, the test is brittle — that’s exactly Step 5’s lesson applied to assertions instead of locators.

The principle works on both sides — locators (Step 5) and assertions (Step 6). When an assertion pins something the spec doesn’t promise (specific color, exact wording, internal classnames), it generates false alarms. The fix is to find the user-facing promise and pin only that.

3. (Spaced review — Steps 1–6, the integration question) Imagine you’re writing an e2e test for a new feature, before any code exists. Which is the most useful first step?

Open the Playwright codegen tool and click through a planned flow
Codegen records clicks — it doesn’t know your spec. The result is a click-script test, exactly the anti-pattern Step 1 introduced. Codegen is useful as a starting point for mechanics, but not for design decisions. The Spec Card comes first.
Read the spec, then write a Spec Card before any test code: behavior, allowed implementation changes, required failures, locator contract, oracle
Right. The Spec Card forces you to answer the load-bearing questions before you write code: what does this prove? What can change without breaking it? What changes must break it? Once you’ve answered those, the test almost writes itself — and it’s robust by construction.
Look at existing tests for similar features and copy the locator and assertion patterns
Patterns from existing tests are useful style references, but copying without thinking about this feature’s specific spec leads to the wrong specificity for this test. The Spec Card forces you to think feature-specifically.
Write the assertion first, then work backward to the actions
Working backward from the assertion is good practice for AAA structure, but only after you know what to assert. The Spec Card answers that — its Oracle field is what you’ll assert.

The Spec Card is the central artifact this tutorial built up to. Every test should start with one — even a small one written in 30 seconds. The cost of writing it is small; the cost of not writing it is the brittle/loose tests you’ve been learning to avoid.

8

From-Scratch Capstone: Write a Test From a Spec Card Alone

Why this matters

Filling in a TODO inside a tutorial scaffold is not the skill you’ll need at work. At work you get a behavior, an empty file, and a deadline. The gap between “I can finish the test someone started” and “I can write the test from a blank buffer” is enormous — and most Playwright tutorials never close it. This step does. It’s the moment the training wheels come off.

🎯 You will learn to

Create a complete Playwright test — from import to closing }); — given only a behavior spec
Apply every prior step’s discipline (Spec Card, locator ladder, web-first assertions, per-promise specificity) without a stub to lean on
Evaluate your own test against the gates: does it survive harmless refactors and catch real regressions?

🪜 The training wheels come off

Every previous step gave you something to start with: a stub, a TODO, a worked example sitting just above the box where you typed. This step gives you nothing. An empty file. A spec. Your judgment.

That’s how it works at work — and that’s the gap most Playwright tutorials never close. We’re closing it here.

📋 The spec — read carefully, don’t skim

The Todo app from Step 7 supports marking items as done. The team has just added a small new spec promise:

Promise. When every todo in the list is marked done, the remaining-count display reads "0 items remaining", and all the original todos remain visible (done items are not deleted from the list).

Two specific user paths the team wants covered:

Mark-all-then-check. Add three todos. Mark all three as done. The count should read 0; all three items should still be in the list.
Toggle-back-restores. Add two todos. Mark both done. Then unmark one. The count should be 1; both items still in the list.

🃏 Your Spec Card (write this BEFORE you write code — on paper or as a comment)

Fill in the five fields:

Field	Example shape
Behavior	One sentence: what user-visible behavior are you proving?
Should pass when	List the implementation changes the test must survive (CSS class renames, button text tweaks, etc.)
Required failures	List the regressions the test must catch (count not decrementing, items deleted on done, etc.)
Locator contract	Which semantic queries (`getByRole`, `getByLabel`, etc.) — and why each one
Oracle	Per-promise: what assertion shape pins each promise at the right specificity?

Once your Spec Card has all five fields, then open tests/all-done.spec.js and start typing. You will see only the import line; everything else is yours.

✏️ Write the test

Open tests/all-done.spec.js (currently has only the import line). Write two tests covering the two user paths above. Both must:

Use getByRole / getByLabel for every locator (no CSS classes, no XPath).
Use await expect(...) for every assertion (no synchronous expect(await locator.isVisible()).toBe(true)).
Match assertion specificity to spec specificity: the count number IS the contract, but the wording around it (“0 items remaining” vs “Nothing left to do”) is not.

📋 What the gates check

The gates below verify you wrote the test from scratch — the file will have:

An import line for test, expect.
Two test('...', async ({ page }) => { … }); blocks.
At least one await page.goto(...) per test.
At least one await expect(...) per test.
At least one getByRole(...) locator (proving you used the accessibility tree).
And of course: both tests must actually pass against the running app.

Don’t peek at Step 7’s solution mid-task. The point of this step is not the answer; it’s the typing-from-blank habit.

Starter files

src/App.jsx

function App() {
  const [items, setItems] = React.useState([]);
  const [text, setText] = React.useState('');

  function addTodo() {
    const trimmed = text.trim();
    if (!trimmed) return;
    setItems([...items, { text: trimmed, done: false }]);
    setText('');
  }

  function toggleDone(idx) {
    setItems(items.map((item, i) =>
      i === idx ? { ...item, done: !item.done } : item
    ));
  }

  const remainingCount = items.filter((item) => !item.done).length;

  return (
    <main className="todo-shell">
      <section className="todo-panel">
        <p className="eyebrow">Todo Lab — From-Scratch Capstone</p>
        <h1>Todo Lab</h1>

        <div className="todo-form">
          <label htmlFor="todo-input">Todo item</label>
          <div className="todo-row">
            <input
              id="todo-input"
              value={text}
              onChange={(event) => setText(event.target.value)}
              placeholder="Buy milk"
            />
            <button onClick={addTodo}>Add todo</button>
          </div>
        </div>

        <p role="status" className="status-line">
          {remainingCount} items remaining
        </p>

        <ul aria-label="Todo list" className="todo-list">
          {items.map((item, idx) => (
            <li key={idx} className={item.done ? 'todo-done' : ''}>
              <button
                className="todo-toggle"
                onClick={() => toggleDone(idx)}
                aria-pressed={item.done}
              >
                {item.text}
              </button>
            </li>
          ))}
        </ul>
      </section>
    </main>
  );
}

src/main.jsx

const root = ReactDOM.createRoot(document.getElementById('root'));
root.render(<App />);

src/styles.css

body { margin: 0; font-family: system-ui, -apple-system, sans-serif; background: #f6f7fb; color: #1f2937; }
.todo-shell { min-height: 100vh; display: grid; place-items: center; padding: 32px; }
.todo-panel { width: min(100%, 560px); background: white; border: 1px solid #d9dee8; border-radius: 8px; padding: 28px; box-shadow: 0 18px 40px rgba(31, 41, 55, 0.08); }
.eyebrow { margin: 0 0 8px; color: #4b5563; font-size: 0.85rem; font-weight: 700; text-transform: uppercase; letter-spacing: 0.04em; }
h1 { margin: 0 0 24px; font-size: 2rem; }
label { display: block; margin-bottom: 8px; font-weight: 700; }
.todo-row { display: flex; gap: 10px; }
input { flex: 1; min-width: 0; background: white; color: #1f2937; border: 1px solid #b8c0cc; border-radius: 6px; padding: 10px 12px; font: inherit; }
.todo-row > button { border: 0; border-radius: 6px; padding: 10px 14px; background: #2563eb; color: white; font: inherit; font-weight: 700; cursor: pointer; }
.status-line { margin: 18px 0 0; color: #4b5563; font-weight: 600; }
.todo-list { margin: 12px 0 0; padding-left: 0; list-style: none; }
.todo-list li { margin: 8px 0; }
.todo-toggle { width: 100%; text-align: left; background: transparent; border: 1px solid #d9dee8; border-radius: 6px; padding: 10px 12px; font: inherit; cursor: pointer; }
.todo-toggle[aria-pressed="true"] { background: #ecfdf5; border-color: #10b981; }
.todo-done .todo-toggle { text-decoration: line-through; color: #6b7280; }
[data-bs-theme="dark"] body { background: #1c2533; color: #e6edf3; }
[data-bs-theme="dark"] .todo-panel { background: #232a36; border-color: #2a323e; box-shadow: 0 18px 40px rgba(0, 0, 0, 0.4); }
[data-bs-theme="dark"] .eyebrow { color: #9ca3af; }
[data-bs-theme="dark"] input { background: #2a323e; color: #e6edf3; border-color: #3a4351; }
[data-bs-theme="dark"] input::placeholder { color: #6b7280; }
[data-bs-theme="dark"] .todo-row > button { background: #2563eb; }
[data-bs-theme="dark"] .todo-toggle { background: transparent; color: #e6edf3; border-color: #3a4351; }
[data-bs-theme="dark"] .todo-toggle[aria-pressed="true"] { background: #064e3b; border-color: #10b981; }

tests/all-done.spec.js

import { test, expect } from '@playwright/test';

// ─────────────────────────────────────────────────────────────
// From-scratch capstone. Two tests, both written by you, both
// following the spec at the top of the step. No TODOs, no stubs.
//
// Spec recap (write this as a comment block before each test):
//   Promise: marking all todos done makes the count read 0,
//            and all items remain visible.
//   Path 1:  add 3 todos, mark all 3 done, expect count = 0
//            and 3 listitems still visible.
//   Path 2:  add 2 todos, mark both done, unmark one,
//            expect count = 1, both listitems visible.
// ─────────────────────────────────────────────────────────────

Solution

tests/all-done.spec.js

import { test, expect } from '@playwright/test';

test('marking every todo done shows count 0 and keeps all items visible', async ({ page }) => {
  await page.goto('/');

  for (const t of ['Milk', 'Bread', 'Eggs']) {
    await page.getByRole('textbox', { name: /todo item/i }).fill(t);
    await page.getByRole('button', { name: /add todo/i }).click();
  }

  for (const t of ['Milk', 'Bread', 'Eggs']) {
    await page.getByRole('button', { name: t }).click();
  }

  await expect(page.getByRole('status')).toContainText('0');
  await expect(page.getByRole('listitem')).toHaveCount(3);
});

test('unmarking one todo restores the count to 1, both items still visible', async ({ page }) => {
  await page.goto('/');

  for (const t of ['Milk', 'Bread']) {
    await page.getByRole('textbox', { name: /todo item/i }).fill(t);
    await page.getByRole('button', { name: /add todo/i }).click();
  }

  const milkToggle = page.getByRole('button', { name: 'Milk' });
  const breadToggle = page.getByRole('button', { name: 'Bread' });

  await milkToggle.click();
  await breadToggle.click();
  await milkToggle.click();   // un-mark Milk

  await expect(page.getByRole('status')).toContainText('1');
  await expect(page.getByRole('listitem')).toHaveCount(2);
  await expect(milkToggle).toHaveAttribute('aria-pressed', 'false');
  await expect(breadToggle).toHaveAttribute('aria-pressed', 'true');
});

Two tests, two promises, no scaffolding. Notice every choice the Spec Card forced you to commit to: semantic locators (getByRole everywhere), per-promise specificity (toContainText('0') for the count — the number is the contract, the wording around it isn’t; toHaveCount for “items still in the list” — exact count IS the contract), and the use of aria-pressed to verify the toggle state semantically rather than via .todo-done class.

If you wrote a test that pins the count to the literal string "0 items remaining", your test passes today but breaks when product changes the wording to “Nothing left to do” — over-specified. If you wrote toBeVisible() on the listitems instead of toHaveCount(3), your test passes when 3 items become 1 — under-specified. The Spec Card was the tool that made each of those choices visible before you typed.

Step 8 — Knowledge Check

Min. score: 80%

1. (Cumulative — Steps 3 + 6.) You’re testing a button that the team has announced will be renamed from “Submit” to “Place order” next quarter. The action it performs (submitting the order) won’t change. Which locator + assertion shape best matches the spec?

page.getByRole('button', { name: /submit/i }).click() + await expect(page.getByText('Order placed')).toBeVisible() — survives the wording until next quarter.
It works until next quarter, then breaks the day the rename ships — a known wording change should push you off the role+name locator. Use getByTestId when the action is stable but the wording isn’t.
page.getByTestId('submit-order-action').click() + await expect(page.getByRole('status')).toContainText(/order placed|placed your order/i) — pins the action via test ID, pins the outcome via the regex.
Right. The action (‘submit the order’) is the stable contract — getByTestId('submit-order-action') honors that. The outcome region is named by role; the regex tolerates wording variants. Both choices are on the diagonal: pin the action, tolerate the wording.
page.locator('.btn-primary.submit').click() + await expect(page.locator('.confirmation')).toBeVisible() — most specific to today’s UI.
CSS-class locators are the brittle rung (Step 5). They depend on styling, not behavior. The wording-change-resistance is also accidental, not deliberate.
page.getByText('Submit').click() + await expect(page.getByText('Order placed')).toBeVisible() — text on both sides.
getByText('Submit') will break next quarter when ‘Submit’ becomes ‘Place order’. Same fate as the role+name approach — the spec said the wording would change, so don’t pin it.

When the spec tells you wording is going to change but the action is permanent, that’s the canonical case for getByTestId with a semantic test ID. Pair it with a Goldilocks assertion on the outcome region (role + regex) and you’ve matched specificity to spec on both sides.

2. (Cumulative — Step 5.) A test using getByRole('button', { name: 'Add todo' }) (exact name, not regex) fails after marketing renamed the button to “Add”. The behavior is unchanged. What’s the most accurate diagnosis?

Real regression — the button no longer adds todos.
The behavior didn’t change — the user can still add todos. The test broke for a wording change, not a behavior change. That’s the textbook false alarm from Step 5.
False alarm — the test pinned the exact wording, but the wording wasn’t promised by the spec. Switch to name: /add/i (regex) or getByTestId.
Right. Exact name: 'Add todo' pins the wording. A rewording with no spec change is exactly the false alarm Step 5 made tactile. The fix depends on the spec — if ‘Add todo’ specifically wasn’t promised, regex (/add/i) or getByTestId is the right rung.
Flaky — re-running it will probably pass.
Flakiness is intermittent. This is a deterministic failure caused by a deterministic UI change — re-running won’t help.
Operator error — the developer should have updated the test along with the button.
‘Update the test along with the button’ is the brittle-test trap: every wording change forces a test edit. The fix is to write a locator that doesn’t pin the wording in the first place — that’s the entire lesson of Step 5.

False alarms erode trust in the test suite faster than anything else. The fix isn’t to reactively patch the test on every UI change — it’s to choose locators whose contract matches what the spec actually promises.

3. (Cumulative — Steps 4 + 7.) A “Mark complete” feature has two spec’d promises: (1) the item shows visually that it’s complete, (2) the remaining-count decrements. Which assertion set best catches both regressions while surviving harmless styling changes?

await expect(item).toHaveCSS('text-decoration', /line-through/) + await expect(counter).toContainText(String(expectedRemaining))
Promise 1 pinned the visual effect (strikethrough). The visual is incidental — if the design changes to a checkmark icon or color change instead, the test breaks for no spec reason. Use the semantic contract (aria-pressed).
await expect(itemToggle).toHaveAttribute('aria-pressed', 'true') + await expect(page.getByRole('status')).toContainText(String(expectedRemaining))
Right. Per-promise specificity (Step 7): semantic ARIA contract for the toggle state, count-as-number for the counter. Both are on the diagonal — they survive design changes (Step 6) and catch real regressions (Step 4).
await expect(item).toBeVisible() + await expect(counter).toBeVisible()
Both toBeVisible() calls are liar-test territory (Step 4): an empty item is still visible, and a counter showing the wrong number is still visible. Neither pins the actual promise.
await expect(page.locator('.completed')).toBeVisible() + await expect(page.locator('.count').first()).toHaveText('2 items remaining')
.completed is a CSS class (brittle, Step 5) and '2 items remaining' over-pins the wording (Step 6). Both choices are off-diagonal in the wrong direction.

Multi-promise features (Step 7) require per-promise specificity decisions. Each promise gets its own assertion shape — semantic for the toggle state, count-as-number for the counter — and each independently honors the principle: pin what the spec promises, no more, no less.

4. What’s the single most useful artifact you produced in this step?

The two passing tests
The tests are the output. The Spec Card is the method that produced them. The output is one feature; the method scales to every feature you’ll ever test.
The Spec Card you filled in before writing the tests — that’s the artifact you’ll reuse on every test you write at work
The locator queries
Locator queries are tactics, not strategy. Without the Spec Card to drive which locator to use, you’re guessing each time.
The assertion patterns
Assertion patterns are tactics. The Spec Card decides which assertion pattern fits which promise — that’s the higher-order skill.

Tests are downstream of decisions. The Spec Card is the upstream artifact that made every decision visible before you typed. Carry the habit. On your first job’s first PR, the difference between writing a brittle test and a robust one is whether you wrote the Spec Card before opening the test file.

Playwright Tutorial: End-to-End Testing for React Apps

Anatomy of a Playwright Test: Navigate, Interact, Assert

Why this matters

🎯 You will learn to

🔄 Concept bridge

🌳 Primer: what getByRole actually queries

Read this test (don’t run yet)

🎬 Predict — commit to a letter, then click reveal

▶ Run

🔍 Investigate

✏️ Modify — predict the failure shape, then run

📝 House rule (carry it forward)

Solution

Step 1 — Knowledge Check

The Spec Card: Choosing What User Paths Deserve a Test

Why this matters

🎯 You will learn to

🧠 Quick recall — commit before reading on

From foundations partitions to user-path partitions

📋 Introducing the Spec Card

✏️ Fill in your own Spec Card — pick one of two ways

🎬 Predict — which user-path partitions are missing?

▶ Run

✏️ Modify — write the missing partition test

🔍 Investigate

📝 House rules added

Solution

Step 2 — Knowledge Check

The Locator Ladder: Stable Contracts vs Incidental UI

Why this matters

🎯 You will learn to

🧠 Quick recall — commit before reading on

🎯 The locator ladder

🎬 Predict — commit to a letter, then click reveal

▶ Run

🔍 Investigate — reveal the answer table

✏️ Modify

📝 House rule

Solution

Step 3 — Knowledge Check

Strong Assertions: The Liar Test in the Browser

Why this matters

🎯 You will learn to

🧠 Quick recall — commit before reading on

🎬 Predict — commit to a letter, then click reveal

▶ Run

🔍 Investigate — open src/App.jsx and find the bug

Three weak assertion patterns to recognize

✏️ Modify

📝 House rule

Solution

Step 4 — Knowledge Check

Behavior, Not Implementation: The Brittleness Gauntlet

Why this matters

🎯 You will learn to

🧠 Quick recall — commit before reading on

Two tests, same behavior, two locator strategies

🎬 Predict — Round 1: CSS class rename. Commit to a letter, then click reveal.

✏️ Edit App.jsx (one line)

▶ Run

🔍 Investigate

🔄 Mini-gauntlet, Round 2 (preview)

📝 House rule

Solution

Step 5 — Knowledge Check

The Maintenance Trade-off: Pin the Spec, No More, No Less

Why this matters

🎯 You will learn to

🧠 Quick recall — commit before reading on

🎯 The principle

The feature

Three candidate assertions

🎬 Predict — Scenario A: marketing changes wording. Commit, then click reveal.

🎬 Predict — Scenario B: an off-by-one regression. Commit, then click reveal.

▶ Run

✏️ Edit App.jsx — introduce the off-by-one bug

▶ Run again

🔍 Investigate — Scenario B results

The 2×2 grid that crystallizes the lesson

📝 House rule

🌳 Primer: what `getByRole` actually queries

🔍 Investigate — open `src/App.jsx` and find the bug