Playwright Tutorial: End-to-End Testing for React Apps
Translate the testing concepts from Testing Foundations into the browser. Write end-to-end tests in Playwright that test behavior, not implementation — tests that survive harmless refactors and fail for real bugs.
Anatomy of a Playwright Test: Navigate, Interact, Assert
Learning objective. After this step you can read a basic Playwright test for a React app and identify how each line maps onto the Arrange / Act / Assert pattern from Testing Foundations.
In Testing Foundations you wrote tests like this:
def test_valid_name_accepted():
assert squad_name_valid("epic") is True
That test verifies one function in isolation. A Playwright test verifies a whole React app through a real browser, the way a user experiences it. Same AAA bones, different organism.
🔄 Concept bridge
| Testing Foundations (pytest) | Playwright (e2e) |
|---|---|
| Arrange / Act / Assert | Navigate / Interact / Assert |
| Function inputs | User actions through the UI |
| Direct return value | Observable outcome on the page |
| Synchronous | Async (await everywhere) |
Strong oracle = == exact match |
Strong oracle = toHaveText, toHaveCount, … |
The discipline is the same. The mechanics differ.
Read this test (don’t run yet)
import { test, expect } from '@playwright/test';
test('user can add a todo', async ({ page }) => {
await page.goto('/'); // Navigate
await page.getByRole('textbox', { name: /todo item/i }).fill('Milk'); // Interact
await page.getByRole('button', { name: /add todo/i }).click(); // Interact
await expect(page.getByRole('listitem')).toHaveText('Milk'); // Assert
});
Annotations that matter:
async ({ page }) => { … }— every Playwright test is async.pageis your handle to the browser tab.awaiton every line — the browser is asynchronous. Withoutawait, JavaScript races past the click before React’s state has updated.getByRole('button', { name: /add todo/i })— finds the button the way assistive tech finds it: by its accessible name, not by its CSS class or DOM position.await expect(...).toBeVisible()— Playwright’s web-first assertions auto-wait and retry until the condition holds (or the timeout expires). They’re the right tool for asynchronous UI.
🎬 Predict (in your head, before running)
- Which lines are Navigate? Interact? Assert?
- If we changed
name: /add todo/itoname: /save/i, what would happen?
▶ Run
Click Test in the Live Preview toolbar. The test passes against the demo Todo app.
🔍 Investigate
Why is await on every line? The browser is asynchronous: clicking a button doesn’t instantly produce the result. await says “wait for this to finish before moving on.” Without await, the assertion would race past the click before React re-rendered, and the test would either fail or — worse — pass for the wrong reason.
✏️ Modify
Change the assertion to look for 'Bread' instead of 'Milk'. Predict the failure message before running. Then run.
Did your prediction match the actual failure? If yes, you’re building the mental model that lets you debug Playwright tests by reading them, not by guessing.
📝 House rule (carry it forward)
A Playwright test reads navigate → interact → assert. The test title is the spec — what user-visible promise we’re proving — not a description of clicks.
function App() {
const [items, setItems] = React.useState([]);
const [text, setText] = React.useState('');
function addTodo() {
const trimmed = text.trim();
if (!trimmed) return;
setItems([...items, trimmed]);
setText('');
}
return (
<main className="todo-shell">
<section className="todo-panel">
<p className="eyebrow">Playwright tutorial</p>
<h1>Todo Lab</h1>
<div className="todo-form">
<label htmlFor="todo-input">Todo item</label>
<div className="todo-row">
<input
id="todo-input"
value={text}
onChange={(event) => setText(event.target.value)}
placeholder="Buy milk"
/>
<button onClick={addTodo}>Add todo</button>
</div>
</div>
<ul aria-label="Todo list" className="todo-list">
{items.map((item, index) => (
<li key={index}>{item}</li>
))}
</ul>
</section>
</main>
);
}
const root = ReactDOM.createRoot(document.getElementById('root'));
root.render(<App />);
body {
margin: 0;
font-family: system-ui, -apple-system, BlinkMacSystemFont, "Segoe UI", sans-serif;
background: #f6f7fb;
color: #1f2937;
}
.todo-shell {
min-height: 100vh;
display: grid;
place-items: center;
padding: 32px;
}
.todo-panel {
width: min(100%, 560px);
background: white;
border: 1px solid #d9dee8;
border-radius: 8px;
padding: 28px;
box-shadow: 0 18px 40px rgba(31, 41, 55, 0.08);
}
.eyebrow {
margin: 0 0 8px;
color: #4b5563;
font-size: 0.85rem;
font-weight: 700;
text-transform: uppercase;
letter-spacing: 0.04em;
}
h1 { margin: 0 0 24px; font-size: 2rem; }
label { display: block; margin-bottom: 8px; font-weight: 700; }
.todo-row { display: flex; gap: 10px; }
input {
flex: 1;
min-width: 0;
background: white;
color: #1f2937;
border: 1px solid #b8c0cc;
border-radius: 6px;
padding: 10px 12px;
font: inherit;
}
button {
border: 0;
border-radius: 6px;
padding: 10px 14px;
background: #2563eb;
color: white;
font: inherit;
font-weight: 700;
cursor: pointer;
}
.todo-list { margin: 24px 0 0; padding-left: 24px; }
.todo-list:empty { display: none; }
.todo-list li { margin: 8px 0; }
/* Dark mode — the iframe inherits the host page's theme via
[data-bs-theme="dark"] on <html>. Mirror the site's dark palette
so the Todo app preview stays legible when students switch themes. */
[data-bs-theme="dark"] body { background: #1c2533; color: #e6edf3; }
[data-bs-theme="dark"] .todo-panel {
background: #232a36;
border-color: #2a323e;
box-shadow: 0 18px 40px rgba(0, 0, 0, 0.4);
}
[data-bs-theme="dark"] .eyebrow { color: #9ca3af; }
[data-bs-theme="dark"] input {
background: #2a323e;
color: #e6edf3;
border-color: #3a4351;
}
[data-bs-theme="dark"] input::placeholder { color: #6b7280; }
[data-bs-theme="dark"] button { background: #3b82f6; }
import { test, expect } from '@playwright/test';
test('user can add a todo', async ({ page }) => {
await page.goto('/');
await page.getByRole('textbox', { name: /todo item/i }).fill('Milk');
await page.getByRole('button', { name: /add todo/i }).click();
await expect(page.getByRole('listitem')).toHaveText('Milk');
});
Anatomy of a Playwright Test — Knowledge Check
Min. score: 80%1. Which of these test titles best describes a behavioral spec (rather than a click-script)?
Test names should read like product promises, not click sequences. A good rule of thumb: if a future developer sees the test fail in CI, can they tell from the name alone what user-facing thing broke? If yes, the name is doing its job.
2. Why does this Playwright assertion need await?
await expect(page.getByText('Milk')).toBeVisible();
await expect(locator).matcher() is the canonical Playwright shape. The matcher retries until it succeeds or hits the timeout. Without await, JavaScript fires the matcher and immediately moves on, ignoring whether it ever held.
3. In the test below, which line is the Assert step?
test('user can add a todo', async ({ page }) => {
await page.goto('/'); // Line 1
await page.getByRole('textbox', { name: /todo item/i }).fill('Milk'); // Line 2
await page.getByRole('button', { name: /add todo/i }).click(); // Line 3
await expect(page.getByText('Milk')).toBeVisible(); // Line 4
});
Playwright’s navigate / interact / assert is the same shape as foundations’ Arrange / Act / Assert. Each test should have one assertion phase that verifies the user-visible promise. If you can’t point to which line is the assertion, the test probably isn’t proving what you think.
The Spec Card: Choosing What User Paths Deserve a Test
Learning objective. After this step you can write a Spec Card for a feature and use it to choose which user-path partitions deserve an end-to-end test (and which belong in lower test layers).
🧠 Quick recall (don’t scroll back)
- What’s the navigate-interact-assert rhythm?
- What does
awaitbuy us in front ofexpect(...)?
From foundations partitions to user-path partitions
In Testing Foundations, you partitioned the input space of a function and picked one representative input per partition. In e2e, you partition the user-path space — the different user behaviors a feature has to support — and pick one representative test per partition.
Same discipline. Different domain.
📋 Introducing the Spec Card
Before you write an e2e test, write down the spec it’s verifying. Five fields, fits on screen:
Spec Card: User can add a todo
✓ Behavior: User types a name, clicks Add, sees it in the list.
✓ Should pass when: CSS classes change. The Add button is restyled.
The input becomes a `<textarea>`. The list becomes
a table.
✗ Should fail when: Adding silently drops items. Empty inputs are
accepted. The input doesn't clear after add.
🎯 Locator contract: A textbox labeled "Todo item"; a button named
"Add todo"; a list of items.
✅ Oracle: The new item is visible in the list.
The Spec Card is the artifact you carry through the rest of the tutorial. It forces the question what about this UI is the stable contract? before you write code that can pin the wrong thing.
Notice the “Should pass when” line: it lists implementation changes that should not break the test. That’s your defense against brittleness later.
🎬 Predict — which user-path partitions are missing?
Three tests are pre-written in tests/add-todo.spec.js. They cover:
- Happy path —
"Milk"is accepted. - Empty input —
""is rejected. - Very long input — a 200-character string is accepted.
Read the spec under App.jsx: the app trims input before deciding. Which partition is missing from the tests?
(In your head, before reading on…)
Reveal
The missing partition is **whitespace-only input** (`" "`). After trimming, it equals `""`, so the spec says it should be rejected — exactly like the empty-string case from the partition perspective, but with a different surface input.▶ Run
Click Test. Three tests pass; the fourth is a // TODO you’ll fill in next.
✏️ Modify — write the missing partition test
In tests/add-todo.spec.js, find the whitespace-only input is rejected test. The Arrange / Act / Assert comments are placeholders — fill them in, following the pattern of the three tests above.
Hints (use them in order if you get stuck):
- Look at the
empty input is rejectedtest. The whitespace test is the same shape — only the input value changes. - The assertion should prove that no list item was added.
- Use
toHaveCount(0)on the listitems.
🔍 Investigate
You now have four tests for one feature, each covering a different partition. Why not write a test for every possible input?
The foundations answer applies: representative coverage with low cost. We don’t need a separate test for " ", " ", " ", " ", … — they’re all in the same partition (whitespace-only) and the trimming logic processes them identically. One representative test per partition is enough.
📝 House rules added
- Use partitions to choose user paths. You don’t need a test for every string. You need one test per behaviorally-distinct partition.
- Not every test belongs in e2e. Many edge cases live more cheaply in unit tests. Reserve e2e tests for behaviors that need full-stack browser confidence.
function App() {
const [items, setItems] = React.useState([]);
const [text, setText] = React.useState('');
function addTodo() {
const trimmed = text.trim();
if (!trimmed) return;
setItems([...items, trimmed]);
setText('');
}
return (
<main className="todo-shell">
<section className="todo-panel">
<p className="eyebrow">Playwright tutorial</p>
<h1>Todo Lab</h1>
<div className="todo-form">
<label htmlFor="todo-input">Todo item</label>
<div className="todo-row">
<input
id="todo-input"
value={text}
onChange={(event) => setText(event.target.value)}
placeholder="Buy milk"
/>
<button onClick={addTodo}>Add todo</button>
</div>
</div>
<ul aria-label="Todo list" className="todo-list">
{items.map((item, index) => (
<li key={index}>{item}</li>
))}
</ul>
</section>
</main>
);
}
const root = ReactDOM.createRoot(document.getElementById('root'));
root.render(<App />);
body { margin: 0; font-family: system-ui, -apple-system, sans-serif; background: #f6f7fb; color: #1f2937; }
.todo-shell { min-height: 100vh; display: grid; place-items: center; padding: 32px; }
.todo-panel { width: min(100%, 560px); background: white; border: 1px solid #d9dee8; border-radius: 8px; padding: 28px; box-shadow: 0 18px 40px rgba(31, 41, 55, 0.08); }
.eyebrow { margin: 0 0 8px; color: #4b5563; font-size: 0.85rem; font-weight: 700; text-transform: uppercase; letter-spacing: 0.04em; }
h1 { margin: 0 0 24px; font-size: 2rem; }
label { display: block; margin-bottom: 8px; font-weight: 700; }
.todo-row { display: flex; gap: 10px; }
input { flex: 1; min-width: 0; background: white; color: #1f2937; border: 1px solid #b8c0cc; border-radius: 6px; padding: 10px 12px; font: inherit; }
button { border: 0; border-radius: 6px; padding: 10px 14px; background: #2563eb; color: white; font: inherit; font-weight: 700; cursor: pointer; }
.todo-list { margin: 24px 0 0; padding-left: 24px; }
.todo-list:empty { display: none; }
.todo-list li { margin: 8px 0; }
/* Dark mode (iframe sets [data-bs-theme="dark"] on <html>) */
[data-bs-theme="dark"] body { background: #1c2533; color: #e6edf3; }
[data-bs-theme="dark"] .todo-panel { background: #232a36; border-color: #2a323e; box-shadow: 0 18px 40px rgba(0, 0, 0, 0.4); }
[data-bs-theme="dark"] .eyebrow { color: #9ca3af; }
[data-bs-theme="dark"] input { background: #2a323e; color: #e6edf3; border-color: #3a4351; }
[data-bs-theme="dark"] input::placeholder { color: #6b7280; }
[data-bs-theme="dark"] button { background: #3b82f6; }
import { test, expect } from '@playwright/test';
test('user can add a todo (happy path)', async ({ page }) => {
await page.goto('/');
await page.getByRole('textbox', { name: /todo item/i }).fill('Milk');
await page.getByRole('button', { name: /add todo/i }).click();
await expect(page.getByRole('listitem')).toHaveText('Milk');
});
test('empty input is rejected', async ({ page }) => {
await page.goto('/');
await page.getByRole('button', { name: /add todo/i }).click();
await expect(page.getByRole('listitem')).toHaveCount(0);
});
test('very long todo is accepted', async ({ page }) => {
await page.goto('/');
const long = 'x'.repeat(200);
await page.getByRole('textbox', { name: /todo item/i }).fill(long);
await page.getByRole('button', { name: /add todo/i }).click();
await expect(page.getByRole('listitem')).toHaveText(long);
});
// TODO: write the missing partition test here.
// The spec trims input before deciding whether to accept it,
// so whitespace-only input is in the same partition as empty input.
test('whitespace-only input is rejected', async ({ page }) => {
// Arrange: navigate to the page.
// Act: fill the input with whitespace, click Add todo.
// Assert: no list item was added.
});
Spec Card & Partitions — Knowledge Check
Min. score: 80%1. Which of these scenarios is the BEST candidate for an end-to-end test (rather than a unit or integration test)?
E2E tests are expensive confidence. Spend that budget on flows where the full integration matters: auth, routing, state-across-pages, cross-service behaviors. Push validation rules, formatters, and API contracts to lower test layers where they’re cheaper and clearer.
2. What is the purpose of the “Should pass when” field on a Spec Card?
The Spec Card’s “Should pass when” line forces you to think about the test’s durability before you write it. If you can predict that a CSS class rename should be harmless but you choose a CSS-class locator anyway, you’ve already lost.
3. (Spaced review — Step 1) A Playwright test contains the line:
expect(await page.getByText('Saved').isVisible()).toBe(true);
expect(await locator.isVisible()).toBe(true) is the canonical Playwright anti-pattern. Always use await expect(locator).toBeVisible() — the web-first form auto-waits and retries.
The Locator Ladder: Stable Contracts vs Incidental UI
Learning objective. After this step you can choose a locator that matches what’s stable about a UI element — and explain when each rung of the locator ladder is the right choice.
🧠 Quick recall
From your Spec Card in Step 2: what does the “Locator contract” field mean? Try to answer for the Todo app’s Add button before reading on.
🎯 The locator ladder
There are five common ways to find the same UI element in Playwright. Each rung depends on something different about the UI.
// Five ways to find the same "Add todo" button:
// Rung 1 — Role + accessible name. Mirrors how assistive tech finds it.
page.getByRole('button', { name: /add todo/i });
// Rung 2 — Label association (best for form controls).
page.getByLabel(/todo item/i); // (this would find the input, not the button)
// Rung 3 — Visible text content.
page.getByText('Add todo');
// Rung 4 — Author-supplied stable test ID.
page.getByTestId('add-todo');
// Rung 5 — Raw CSS/DOM selector (last resort).
page.locator('.add-todo-btn');
What each rung depends on:
| Rung | Locator | Depends on |
|---|---|---|
| 1 | getByRole + name: |
The button has an accessible name (HTML semantics) |
| 2 | getByLabel |
A <label for="…"> connection (forms) |
| 3 | getByText |
Exact visible text |
| 4 | getByTestId |
An author-added data-testid attribute |
| 5 | .locator('.css-class') |
The DOM/CSS structure (implementation detail) |
Higher rungs depend on accessible / user-visible facts. Lower rungs depend on implementation decisions (CSS classes, DOM positions). The official Playwright docs put it bluntly: “Your DOM can easily change … Prefer user-facing attributes to XPath or CSS selectors.”
🎬 Predict — fill in this table (don’t peek)
For each locator, will it survive the change in each column? Mark ✓ (still works) or ✗ (breaks).
CSS rename Text change DOM restructure
(.add-btn -> ("Add todo" (button moved
.primary) -> "Add") to footer)
----------------------------------------------------------------------
1. getByRole({name:/add/i}) ? ? ?
2. getByLabel ? ? ?
3. getByText('Add todo') ? ? ?
4. getByTestId('add-todo') ? ? ?
5. .locator('.add-todo-btn') ? ? ?
▶ Run
Click Test. All five locators currently work against the Todo app — the file tests/locator-ladder.spec.js has one test per rung, all passing.
🔍 Investigate — reveal the answer table
CSS rename Text change DOM restructure
----------------------------------------------------------------------
1. getByRole({name:/add/i}) ✓ ✗ (a) ✓
2. getByLabel ✓ ✓ (b) ✓
3. getByText('Add todo') ✓ ✗ ✓
4. getByTestId('add-todo') ✓ ✓ ✓
5. .locator('.add-todo-btn') ✗ ✓ ✗ (c)
Notes:
- (a) With a regex
/add/i, the role locator survives “Add todo” → “Add” (regex still matches). With an exactname: 'Add todo'it would break. Regex tolerance is a deliberate design choice. - (b)
getByLabelfinds inputs via their<label>— button labels don’t apply, so this rung doesn’t really apply to buttons. Listed for completeness. - (c) A DOM restructure (changing the button’s surrounding markup) often changes CSS-selector ancestry. Brittle.
The pattern: getByTestId is the only rung that survives a button-text change without exact matching. But getByTestId requires the author to have added the test ID — a code-level decision. And test IDs done badly (<button data-testid="blue-btn-right-col">) are just CSS coupling under another name.
✏️ Modify
Open tests/locator-ladder.spec.js. The fifth test uses the brittle .locator('.add-todo-btn') form. Rewrite it as a role-based locator (Rung 1). Run again — your refactored test should still pass.
📝 House rule
Pick the locator that matches the stable contract of this UI element. If the button label is part of the user-visible promise, use getByRole with a sensible regex. If the wording will change but the action is permanent, use getByTestId with a semantically named test ID. Use raw CSS only when nothing else will do — and write a comment explaining why.
function App() {
const [items, setItems] = React.useState([]);
const [text, setText] = React.useState('');
function addTodo() {
const trimmed = text.trim();
if (!trimmed) return;
setItems([...items, trimmed]);
setText('');
}
return (
<main className="todo-shell">
<section className="todo-panel">
<p className="eyebrow">Playwright tutorial</p>
<h1>Todo Lab</h1>
<div className="todo-form">
<label htmlFor="todo-input">Todo item</label>
<div className="todo-row">
<input
id="todo-input"
value={text}
onChange={(event) => setText(event.target.value)}
placeholder="Buy milk"
/>
<button
className="add-todo-btn"
data-testid="add-todo"
onClick={addTodo}
>
Add todo
</button>
</div>
</div>
<ul aria-label="Todo list" className="todo-list">
{items.map((item, index) => (
<li key={index}>{item}</li>
))}
</ul>
</section>
</main>
);
}
const root = ReactDOM.createRoot(document.getElementById('root'));
root.render(<App />);
body { margin: 0; font-family: system-ui, -apple-system, sans-serif; background: #f6f7fb; color: #1f2937; }
.todo-shell { min-height: 100vh; display: grid; place-items: center; padding: 32px; }
.todo-panel { width: min(100%, 560px); background: white; border: 1px solid #d9dee8; border-radius: 8px; padding: 28px; box-shadow: 0 18px 40px rgba(31, 41, 55, 0.08); }
.eyebrow { margin: 0 0 8px; color: #4b5563; font-size: 0.85rem; font-weight: 700; text-transform: uppercase; letter-spacing: 0.04em; }
h1 { margin: 0 0 24px; font-size: 2rem; }
label { display: block; margin-bottom: 8px; font-weight: 700; }
.todo-row { display: flex; gap: 10px; }
input { flex: 1; min-width: 0; background: white; color: #1f2937; border: 1px solid #b8c0cc; border-radius: 6px; padding: 10px 12px; font: inherit; }
.add-todo-btn,
button { border: 0; border-radius: 6px; padding: 10px 14px; background: #2563eb; color: white; font: inherit; font-weight: 700; cursor: pointer; }
.todo-list { margin: 24px 0 0; padding-left: 24px; }
.todo-list:empty { display: none; }
.todo-list li { margin: 8px 0; }
/* Dark mode */
[data-bs-theme="dark"] body { background: #1c2533; color: #e6edf3; }
[data-bs-theme="dark"] .todo-panel { background: #232a36; border-color: #2a323e; box-shadow: 0 18px 40px rgba(0, 0, 0, 0.4); }
[data-bs-theme="dark"] .eyebrow { color: #9ca3af; }
[data-bs-theme="dark"] input { background: #2a323e; color: #e6edf3; border-color: #3a4351; }
[data-bs-theme="dark"] input::placeholder { color: #6b7280; }
[data-bs-theme="dark"] .add-todo-btn,
[data-bs-theme="dark"] button { background: #3b82f6; }
import { test, expect } from '@playwright/test';
// Rung 1 — Role + accessible name (regex-tolerant).
test('rung 1: getByRole finds the Add todo button', async ({ page }) => {
await page.goto('/');
await page.getByRole('textbox', { name: /todo item/i }).fill('Milk');
await page.getByRole('button', { name: /add todo/i }).click();
await expect(page.getByRole('listitem')).toHaveText('Milk');
});
// Rung 2 — getByLabel (best for inputs, but works through the form).
test('rung 2: getByLabel finds the input via its label', async ({ page }) => {
await page.goto('/');
await page.getByLabel(/todo item/i).fill('Bread');
await page.getByRole('button', { name: /add todo/i }).click();
await expect(page.getByRole('listitem')).toHaveText('Bread');
});
// Rung 3 — getByText (couples to exact wording).
test('rung 3: getByText finds the button by visible text', async ({ page }) => {
await page.goto('/');
await page.getByRole('textbox', { name: /todo item/i }).fill('Eggs');
await page.getByText('Add todo').click();
await expect(page.getByRole('listitem')).toHaveText('Eggs');
});
// Rung 4 — getByTestId (semantic test ID).
test('rung 4: getByTestId finds the button via data-testid', async ({ page }) => {
await page.goto('/');
await page.getByRole('textbox', { name: /todo item/i }).fill('Cheese');
await page.getByTestId('add-todo').click();
await expect(page.getByRole('listitem')).toHaveText('Cheese');
});
// Rung 5 — Raw CSS class (the brittle rung — REWRITE this one!).
// TODO: rewrite this test to use page.getByRole instead of CSS.
test('rung 5: brittle CSS locator (rewrite me)', async ({ page }) => {
await page.goto('/');
await page.getByRole('textbox', { name: /todo item/i }).fill('Butter');
await page.locator('.add-todo-btn').click();
await expect(page.getByRole('listitem')).toHaveText('Butter');
});
The Locator Ladder — Knowledge Check
Min. score: 80%1. Which of these is the BEST locator for “the user’s primary save button” — assuming the button has the visible text “Save” today, but the team has announced it will be renamed to “Submit” next quarter?
The locator ladder isn’t “always pick option 1.” It’s “pick the rung that matches the stable contract for this UI element.” When wording is stable, getByRole is best. When wording will change but the action is permanent, getByTestId is right. The choice depends on what about this UI is the promise.
2. Two versions of data-testid for the same Add Todo button — which is BETTER, and why?
Version A: <button data-testid="primary-blue-btn-right-column">
Version B: <button data-testid="add-todo-action">
Test IDs are only as durable as their naming. A test ID named after styling or layout is functionally equivalent to a CSS-class locator — it pins implementation. A test ID named after the action or the semantic role (save-action, cart-checkout-button) is what the docs intend: a stable contract that the test can rely on indefinitely.
3. (Spaced review — Step 2) Your team is debating: should “rejecting whitespace-only input” have its own e2e test, or can it be tested in the same test as “rejecting empty input”?
Partitions are the unit of test design, not individual inputs. Two inputs are in the same partition if the system processes them the same way. One representative per partition is sufficient — adding more is wasted effort, removing one is missed coverage.
Strong Assertions: The Liar Test in the Browser
Learning objective. After this step you can recognize weak assertions in Playwright tests, predict when they’ll lie about a buggy app, and strengthen them to catch real regressions.
🧠 Quick recall
From Testing Foundations: a liar test had a passing green checkmark but a weak oracle that didn’t actually verify the spec. What made it lie?
(In your own words, before reading on …)
Same pattern exists in e2e — and it’s sneakier here because the test visibly clicks buttons, which makes it feel “more real.”
🎬 Predict
Read this test. The Todo app you’ll run it against is silently buggy: it adds a list item, but the rendered text is always empty (the user’s input is dropped between state-update and render).
test('adding a todo shows it in the list', async ({ page }) => {
await page.goto('/');
await page.getByRole('textbox', { name: /todo item/i }).fill('Milk');
await page.getByRole('button', { name: /add todo/i }).click();
await expect(page.getByRole('listitem')).toHaveCount(1);
});
Will this test catch the bug? Will it pass or fail?
(Predict before running …)
▶ Run
Click Test.
The test passes. Surprise.
🔍 Investigate
What did toHaveCount(1) actually verify? Just that one list item exists. It said nothing about what’s inside the item. The bug — empty text — is invisible to this assertion.
The assertion is a liar: green checkmark, broken feature.
Three weak assertion patterns to recognize
| Weak assertion | Why it lies |
|---|---|
await expect(page.getByRole('list')).toBeVisible() |
An empty <ul> is still “visible” |
await expect(page.getByText('')).toBeVisible() |
Always true |
await expect(page.getByRole('listitem')).toHaveCount(1) |
Doesn’t verify item content |
And one Playwright-specific anti-pattern from the official docs:
// ❌ Anti-pattern — non-retrying, no auto-wait:
expect(await page.getByText('Milk').isVisible()).toBe(true);
// ✓ Web-first form — auto-waits and retries:
await expect(page.getByText('Milk')).toBeVisible();
✏️ Modify
In tests/todo.spec.js, strengthen the assertion to verify the item’s text, not just the count. Predict the new failure message before re-running.
Hints:
- The locator
page.getByRole('listitem')finds the list item; chain a content-checking matcher. toHaveText('Milk')pins exact text;toContainText('Milk')pins substring.- The spec promises that the user’s input appears in the list, so exact match is fine here.
📝 House rule
Assert the promise, not the plumbing.
The promise is what the spec said the user would see. The plumbing is which DOM nodes exist, what CSS class they have, what their internal state is. A strong assertion verifies the promise; a weak assertion verifies the plumbing without verifying what the user actually gets.
// 🐛 BUGGY APP: addTodo adds a list item, but the user's text is dropped.
// The list will always render empty <li> elements.
function App() {
const [items, setItems] = React.useState([]);
const [text, setText] = React.useState('');
function addTodo() {
const trimmed = text.trim();
if (!trimmed) return;
// Bug: should be `setItems([...items, trimmed])`.
// Instead we store an empty string, so the rendered <li> is empty.
setItems([...items, '']);
setText('');
}
return (
<main className="todo-shell">
<section className="todo-panel">
<p className="eyebrow">Buggy Todo Lab</p>
<h1>Todo Lab</h1>
<div className="todo-form">
<label htmlFor="todo-input">Todo item</label>
<div className="todo-row">
<input
id="todo-input"
value={text}
onChange={(event) => setText(event.target.value)}
placeholder="Buy milk"
/>
<button onClick={addTodo}>Add todo</button>
</div>
</div>
<ul aria-label="Todo list" className="todo-list">
{items.map((item, index) => (
<li key={index}>{item}</li>
))}
</ul>
</section>
</main>
);
}
const root = ReactDOM.createRoot(document.getElementById('root'));
root.render(<App />);
body { margin: 0; font-family: system-ui, -apple-system, sans-serif; background: #f6f7fb; color: #1f2937; }
.todo-shell { min-height: 100vh; display: grid; place-items: center; padding: 32px; }
.todo-panel { width: min(100%, 560px); background: white; border: 1px solid #d9dee8; border-radius: 8px; padding: 28px; box-shadow: 0 18px 40px rgba(31, 41, 55, 0.08); }
.eyebrow { margin: 0 0 8px; color: #4b5563; font-size: 0.85rem; font-weight: 700; text-transform: uppercase; letter-spacing: 0.04em; }
h1 { margin: 0 0 24px; font-size: 2rem; }
label { display: block; margin-bottom: 8px; font-weight: 700; }
.todo-row { display: flex; gap: 10px; }
input { flex: 1; min-width: 0; background: white; color: #1f2937; border: 1px solid #b8c0cc; border-radius: 6px; padding: 10px 12px; font: inherit; }
button { border: 0; border-radius: 6px; padding: 10px 14px; background: #2563eb; color: white; font: inherit; font-weight: 700; cursor: pointer; }
.todo-list { margin: 24px 0 0; padding-left: 24px; min-height: 24px; }
.todo-list li { margin: 8px 0; min-height: 1.2em; }
/* Dark mode */
[data-bs-theme="dark"] body { background: #1c2533; color: #e6edf3; }
[data-bs-theme="dark"] .todo-panel { background: #232a36; border-color: #2a323e; box-shadow: 0 18px 40px rgba(0, 0, 0, 0.4); }
[data-bs-theme="dark"] .eyebrow { color: #9ca3af; }
[data-bs-theme="dark"] input { background: #2a323e; color: #e6edf3; border-color: #3a4351; }
[data-bs-theme="dark"] input::placeholder { color: #6b7280; }
[data-bs-theme="dark"] button { background: #3b82f6; }
import { test, expect } from '@playwright/test';
// The weak assertion below passes against the buggy app.
// Strengthen it so the test fails — that's the bug-catching version.
test('adding a todo shows it in the list', async ({ page }) => {
await page.goto('/');
await page.getByRole('textbox', { name: /todo item/i }).fill('Milk');
await page.getByRole('button', { name: /add todo/i }).click();
// ❌ Weak assertion: only checks the count.
await expect(page.getByRole('listitem')).toHaveCount(1);
// TODO: replace or extend the assertion above so the test
// catches the empty-text bug. Hint: assert the item's text.
});
Strong Assertions — Knowledge Check
Min. score: 80%1. Which assertion would catch a bug where the “Mark complete” toggle visually updates (the item gets a strikethrough) but the underlying “remaining” counter does not decrement?
“Assert the promise, not the plumbing.” The promise here is that the counter reflects remaining items. If your assertion only checks visual side-effects (strikethrough, CSS classes), you’ve written a liar test: it passes for a render that’s correct in appearance but wrong in meaning.
2. Which of these is a Playwright anti-pattern that the official best-practices docs explicitly call out?
The Playwright best-practices guide is direct: “Don’t use manual assertions that are not awaiting the expect.” Always use await expect(locator).matcher() so your test gets auto-waiting and retrying — the whole point of Playwright’s web-first assertions.
3. (Spaced review — Step 3) A test uses page.locator('.add-todo-btn') to find the Add button. The team renames the CSS class to .primary-btn. The behavior is unchanged. The test fails. What’s the most accurate label for this failure?
From Step 5 onward (next!), we’ll see this pattern in action — running tests against deliberate refactors and identifying which failures are real regressions vs false alarms. The preview: a test that breaks under a behavior-preserving refactor is brittle, not catching a bug.
Behavior, Not Implementation: The Brittleness Gauntlet
Learning objective. After this step you can predict which tests will survive a UI refactor and which will break, classify a break as a real regression vs a false alarm, and rewrite brittle locators into durable ones.
🧠 Quick recall
From Step 3 — which two locator strategies survive a CSS class rename?
(In your head: it’s getByRole and getByTestId. CSS-class locators don’t.)
Now we’re going to make the brittleness tactile. You’ll edit the app yourself and watch tests break.
Two tests, same behavior, two locator strategies
You have two test files in tests/:
tests/css-locator.spec.js— usespage.locator('.add-todo-btn')(Rung 5)tests/role-locator.spec.js— usespage.getByRole('button', { name: /add/i })(Rung 1)
Both verify the same behavior: clicking Add adds a todo. Both pass against the current App.jsx.
🎬 Predict — Round 1: CSS class rename
Imagine the design team does a styling pass and renames the button’s CSS class:
- <button className="add-todo-btn" onClick={addTodo}>Add todo</button>
+ <button className="primary-btn" onClick={addTodo}>Add todo</button>
The user-visible behavior is identical — the button still says “Add todo” and still adds a todo.
Predict (in your head, before editing):
- Will
tests/css-locator.spec.jsstill pass? - Will
tests/role-locator.spec.jsstill pass? - If either breaks, is the break a real regression or a false alarm?
✏️ Edit App.jsx (one line)
Open src/App.jsx. Find the line:
<button className="add-todo-btn" onClick={addTodo}>Add todo</button>
Change add-todo-btn to primary-btn. Just that one identifier. Save the file.
▶ Run
Click Test.
🔍 Investigate
| Test | Result | What it tells us |
|---|---|---|
tests/css-locator.spec.js |
❌ Fails | The test was coupled to a styling decision. The user-facing behavior didn’t change, but the test broke. This is a false alarm — wasted CI time and eroded trust in the suite. |
tests/role-locator.spec.js |
✓ Passes | The test was coupled to the user-visible role + name. Styling changed; behavior didn’t; the test correctly didn’t notice. |
The role-based test honors what’s stable about the UI: the button has an accessible name “Add todo.” Styling is incidental. The CSS-based test pinned the incidental thing.
🔄 Mini-gauntlet, Round 2: button text change
Now imagine Marketing changes the button text:
- <button ...>Add todo</button>
+ <button ...>Add</button>
Predict: will tests/role-locator.spec.js (using name: /add/i) still pass?
(Answer: yes — the regex /add/i matches both “Add todo” and “Add”. A name: 'Add todo' (exact) would have failed.)
Discussion: was that a regression or a rewording? Depends on whether “Add todo” specifically is part of the spec. If the spec is “user can add a todo,” the rewording is harmless. If the spec is “the button says exactly ‘Add todo’ for branding consistency,” the rewording is a regression.
→ That ambiguity is the trade-off Step 6 will tackle.
📝 House rule
A test that breaks under a refactor it shouldn’t have broken under is brittle. Brittleness is the cost of coupling tests to implementation details. The Spec Card’s “Should pass when” field is your defense — write down the changes the test should survive before you write the test, then make sure your locators honor it.
// 🛠 Edit this file as instructed: rename the CSS class
// on the Add todo button from "add-todo-btn" to "primary-btn".
function App() {
const [items, setItems] = React.useState([]);
const [text, setText] = React.useState('');
function addTodo() {
const trimmed = text.trim();
if (!trimmed) return;
setItems([...items, trimmed]);
setText('');
}
return (
<main className="todo-shell">
<section className="todo-panel">
<p className="eyebrow">Brittleness gauntlet</p>
<h1>Todo Lab</h1>
<div className="todo-form">
<label htmlFor="todo-input">Todo item</label>
<div className="todo-row">
<input
id="todo-input"
value={text}
onChange={(event) => setText(event.target.value)}
placeholder="Buy milk"
/>
<button className="add-todo-btn" onClick={addTodo}>
Add todo
</button>
</div>
</div>
<ul aria-label="Todo list" className="todo-list">
{items.map((item, index) => (
<li key={index}>{item}</li>
))}
</ul>
</section>
</main>
);
}
const root = ReactDOM.createRoot(document.getElementById('root'));
root.render(<App />);
body { margin: 0; font-family: system-ui, -apple-system, sans-serif; background: #f6f7fb; color: #1f2937; }
.todo-shell { min-height: 100vh; display: grid; place-items: center; padding: 32px; }
.todo-panel { width: min(100%, 560px); background: white; border: 1px solid #d9dee8; border-radius: 8px; padding: 28px; box-shadow: 0 18px 40px rgba(31, 41, 55, 0.08); }
.eyebrow { margin: 0 0 8px; color: #4b5563; font-size: 0.85rem; font-weight: 700; text-transform: uppercase; letter-spacing: 0.04em; }
h1 { margin: 0 0 24px; font-size: 2rem; }
label { display: block; margin-bottom: 8px; font-weight: 700; }
.todo-row { display: flex; gap: 10px; }
input { flex: 1; min-width: 0; background: white; color: #1f2937; border: 1px solid #b8c0cc; border-radius: 6px; padding: 10px 12px; font: inherit; }
.add-todo-btn,
.primary-btn,
button { border: 0; border-radius: 6px; padding: 10px 14px; background: #2563eb; color: white; font: inherit; font-weight: 700; cursor: pointer; }
.todo-list { margin: 24px 0 0; padding-left: 24px; }
.todo-list:empty { display: none; }
.todo-list li { margin: 8px 0; }
/* Dark mode */
[data-bs-theme="dark"] body { background: #1c2533; color: #e6edf3; }
[data-bs-theme="dark"] .todo-panel { background: #232a36; border-color: #2a323e; box-shadow: 0 18px 40px rgba(0, 0, 0, 0.4); }
[data-bs-theme="dark"] .eyebrow { color: #9ca3af; }
[data-bs-theme="dark"] input { background: #2a323e; color: #e6edf3; border-color: #3a4351; }
[data-bs-theme="dark"] input::placeholder { color: #6b7280; }
[data-bs-theme="dark"] .add-todo-btn,
[data-bs-theme="dark"] .primary-btn,
[data-bs-theme="dark"] button { background: #3b82f6; }
import { test, expect } from '@playwright/test';
// CSS-class locator — pins .add-todo-btn (an implementation detail).
test('css-locator: user can add a todo', async ({ page }) => {
await page.goto('/');
await page.getByRole('textbox', { name: /todo item/i }).fill('Milk');
await page.locator('.add-todo-btn').click();
await expect(page.getByRole('listitem')).toHaveText('Milk');
});
import { test, expect } from '@playwright/test';
// Role-based locator — pins the button's accessible name.
test('role-locator: user can add a todo', async ({ page }) => {
await page.goto('/');
await page.getByRole('textbox', { name: /todo item/i }).fill('Milk');
await page.getByRole('button', { name: /add/i }).click();
await expect(page.getByRole('listitem')).toHaveText('Milk');
});
The Brittleness Gauntlet — Knowledge Check
Min. score: 80%
1. A team’s CI pipeline reports that test admin can deactivate a user failed last night. Investigation shows: a developer changed a CSS class from .user-row-actions to .row-controls. The deactivate behavior itself works perfectly. The test used page.locator('.user-row-actions button.deactivate').
What’s the most accurate diagnosis?
A test failure is only useful if it points to a behavior break. A test that fails for a styling rename, a class rename, or a DOM restructure is a false alarm — it costs the team time and erodes trust in the suite. Use role-based or test-ID-based locators to keep the contract stable while implementation evolves.
2. You write a new e2e test using getByRole('button', { name: 'Sign in' }). A week later, the marketing team renames the button from “Sign in” to “Log in”. Your test breaks.
Which is the most accurate take?
The locator ladder isn’t "always pick option 1." The right rung depends on what’s promised by the spec. Step 6 makes this trade-off explicit by introducing the match assertion specificity to spec specificity principle.
3. (Spaced review — Step 4) A weak assertion await expect(page.getByRole('listitem')).toHaveCount(1) passed against an app that renders an empty <li> (the user’s text was dropped). Why did it pass?
Strong assertions pin what the spec promises. The spec promised "the user’s text appears in the list," so the assertion needs to verify text content — not just that something exists. This is the same liar-test family from Testing Foundations Step 3.
The Maintenance Trade-off: Pin the Spec, No More, No Less
Learning objective. After this step you can match an assertion’s strictness to what the spec actually promises — neither over-specifying (which causes false alarms) nor under-specifying (which misses real bugs).
🧠 Quick recall
- From Step 4 — what’s a liar test?
- From Step 5 — when is a test break a false alarm?
Both questions point at the same underlying issue: a test’s value depends on what it actually verifies. Step 6 puts the principle into one sentence.
🎯 The principle
Match assertion specificity to spec specificity. Pin exactly what the spec promises — no more, no less.
A stronger assertion is not always a better assertion. We’ll see this on a deliberately simple feature first. (Step 7 generalizes it to features with multiple promises.)
The feature
The Todo app has a new remaining-count display: a <p role="status"> showing “3 items remaining”. The spec is one sentence:
“Show the user how many items are still pending.”
That’s it. One promise: surface the count. Notice what’s not in the spec:
- the exact wording (“items remaining” vs “todos pending”)
- plurality grammar (“1 item” vs “1 items”)
- the surrounding sentence (“You have 3…” vs just “3…”)
- color, position, animation
Three candidate assertions
// Brittle (over-specified): pins exact wording, plurality, surrounding copy.
await expect(page.getByRole('status'))
.toHaveText('You have 3 items remaining across all todos');
// Goldilocks (spec-aligned): pins exactly what the spec promises.
await expect(page.getByRole('status')).toContainText('3');
await expect(page.getByRole('status')).toContainText(/item/i);
// Loose (under-specified): the status region exists; nothing more.
await expect(page.getByRole('status')).toBeVisible();
🎬 Predict — Scenario A: marketing changes wording
Imagine the team rewrites the status text from "3 items remaining" to "3 todos pending". The spec is still satisfied — the count is still shown.
Predict (in your head, before running):
| Assertion | Will it pass? | Is the result correct? |
|---|---|---|
| Brittle | ? | ? |
| Goldilocks | ? | ? |
| Loose | ? | ? |
🎬 Predict — Scenario B: an off-by-one regression
Now imagine a different change: the count logic has a bug. Where the page should say “3 items remaining,” it says “4 items remaining” instead.
Predict the same table.
▶ Run
Click Test. All three tests pass against the base app. (The base app shows "3 items remaining" correctly.)
✏️ Edit App.jsx — introduce the off-by-one bug
In src/App.jsx, find the line:
const remainingCount = items.length;
Change it to:
const remainingCount = items.length + 1;
That’s the bug — the count is now wrong by one. Predict which tests catch it before re-running.
▶ Run again
🔍 Investigate — Scenario B results
| Assertion | Result | Was the result useful? |
|---|---|---|
| Brittle | ❌ Fails | ✓ Yes — it caught the regression |
| Goldilocks | ❌ Fails | ✓ Yes — it caught the regression |
| Loose | ✓ Passes | ✗ No — it missed the bug entirely |
Now think back to Scenario A (the wording change). Reset the bug — change items.length + 1 back to items.length. Then imagine the wording change happening:
| Assertion | Result under wording change | Was the result useful? |
|---|---|---|
| Brittle | ❌ Fails | ✗ No — false alarm; spec still satisfied |
| Goldilocks | ✓ Passes | ✓ Yes — wording isn’t part of the spec |
| Loose | ✓ Passes | (Trivially — but it never checked the count anyway) |
The 2×2 grid that crystallizes the lesson
Spec is loose Spec is tight
("show the count") ("show '3 items remaining'")
┌───────────────────────┬───────────────────────────┐
Loose assertion │ ✓ aligned │ ✗ misses regressions │
├───────────────────────┼───────────────────────────┤
Tight assertion │ ✗ false alarms │ ✓ aligned │
└───────────────────────┴───────────────────────────┘
Strength (LO3) and spec-fidelity (LO4) are different axes. The best assertion lives on the diagonal — its specificity matches the spec’s specificity.
- Loose spec + loose assertion = good. (You’re pinning what’s promised.)
- Loose spec + tight assertion = false alarms. (You’re pinning more than promised.)
- Tight spec + loose assertion = misses regressions. (You’re pinning less than promised.)
- Tight spec + tight assertion = good. (You’re pinning the exact contract.)
The Goldilocks assertion above is on the diagonal: a loose spec, met with a loose-but-targeted assertion that still verifies the count. Brittle is off the diagonal in one direction; loose is off in the other.
📝 House rule
Pin exactly what the spec promises. No more, no less.
Don’t default to maximum strictness “just in case.” Strictness is not free — every pin is a future false alarm waiting to happen. Don’t default to minimum strictness either — every un-pinned promise is a regression waiting to slip through.
Read the spec. Decide what’s promised. Pin that.
// 🛠 You'll edit one line in this file to introduce the off-by-one bug.
function App() {
const [items, setItems] = React.useState([]);
const [text, setText] = React.useState('');
function addTodo() {
const trimmed = text.trim();
if (!trimmed) return;
setItems([...items, trimmed]);
setText('');
}
const remainingCount = items.length;
return (
<main className="todo-shell">
<section className="todo-panel">
<p className="eyebrow">Todo Lab</p>
<h1>Todo Lab</h1>
<div className="todo-form">
<label htmlFor="todo-input">Todo item</label>
<div className="todo-row">
<input
id="todo-input"
value={text}
onChange={(event) => setText(event.target.value)}
placeholder="Buy milk"
/>
<button onClick={addTodo}>Add todo</button>
</div>
</div>
<p role="status" className="status-line">
{remainingCount} items remaining
</p>
<ul aria-label="Todo list" className="todo-list">
{items.map((item, index) => (
<li key={index}>{item}</li>
))}
</ul>
</section>
</main>
);
}
const root = ReactDOM.createRoot(document.getElementById('root'));
root.render(<App />);
body { margin: 0; font-family: system-ui, -apple-system, sans-serif; background: #f6f7fb; color: #1f2937; }
.todo-shell { min-height: 100vh; display: grid; place-items: center; padding: 32px; }
.todo-panel { width: min(100%, 560px); background: white; border: 1px solid #d9dee8; border-radius: 8px; padding: 28px; box-shadow: 0 18px 40px rgba(31, 41, 55, 0.08); }
.eyebrow { margin: 0 0 8px; color: #4b5563; font-size: 0.85rem; font-weight: 700; text-transform: uppercase; letter-spacing: 0.04em; }
h1 { margin: 0 0 24px; font-size: 2rem; }
label { display: block; margin-bottom: 8px; font-weight: 700; }
.todo-row { display: flex; gap: 10px; }
input { flex: 1; min-width: 0; background: white; color: #1f2937; border: 1px solid #b8c0cc; border-radius: 6px; padding: 10px 12px; font: inherit; }
button { border: 0; border-radius: 6px; padding: 10px 14px; background: #2563eb; color: white; font: inherit; font-weight: 700; cursor: pointer; }
.status-line { margin: 18px 0 0; color: #4b5563; font-weight: 600; }
.todo-list { margin: 12px 0 0; padding-left: 24px; }
.todo-list li { margin: 8px 0; }
/* Dark mode */
[data-bs-theme="dark"] body { background: #1c2533; color: #e6edf3; }
[data-bs-theme="dark"] .todo-panel { background: #232a36; border-color: #2a323e; box-shadow: 0 18px 40px rgba(0, 0, 0, 0.4); }
[data-bs-theme="dark"] .eyebrow { color: #9ca3af; }
[data-bs-theme="dark"] input { background: #2a323e; color: #e6edf3; border-color: #3a4351; }
[data-bs-theme="dark"] input::placeholder { color: #6b7280; }
[data-bs-theme="dark"] button { background: #3b82f6; }
[data-bs-theme="dark"] .status-line { color: #9ca3af; }
import { test, expect } from '@playwright/test';
// BRITTLE: pins exact wording, plurality, surrounding copy.
test('brittle: counter shows pinned exact text', async ({ page }) => {
await page.goto('/');
await page.getByRole('textbox', { name: /todo item/i }).fill('A');
await page.getByRole('button', { name: /add/i }).click();
await page.getByRole('textbox', { name: /todo item/i }).fill('B');
await page.getByRole('button', { name: /add/i }).click();
await page.getByRole('textbox', { name: /todo item/i }).fill('C');
await page.getByRole('button', { name: /add/i }).click();
await expect(page.getByRole('status')).toHaveText('3 items remaining');
});
import { test, expect } from '@playwright/test';
// GOLDILOCKS: pins exactly what the spec promises (the count + the noun).
test('goldilocks: counter shows the right count of items', async ({ page }) => {
await page.goto('/');
await page.getByRole('textbox', { name: /todo item/i }).fill('A');
await page.getByRole('button', { name: /add/i }).click();
await page.getByRole('textbox', { name: /todo item/i }).fill('B');
await page.getByRole('button', { name: /add/i }).click();
await page.getByRole('textbox', { name: /todo item/i }).fill('C');
await page.getByRole('button', { name: /add/i }).click();
await expect(page.getByRole('status')).toContainText('3');
await expect(page.getByRole('status')).toContainText(/item/i);
});
import { test, expect } from '@playwright/test';
// LOOSE: the status region exists; nothing more.
// This misses the actual count!
test('loose: status region is visible', async ({ page }) => {
await page.goto('/');
await page.getByRole('textbox', { name: /todo item/i }).fill('A');
await page.getByRole('button', { name: /add/i }).click();
await expect(page.getByRole('status')).toBeVisible();
});
The Maintenance Trade-off — Knowledge Check
Min. score: 80%1. A test asserts:
await expect(page.getByRole('status')).toHaveText(
'Welcome back, Ada! You have 5 unread messages waiting.'
);
The principle: pin exactly what the spec promises — no more, no less. Stronger assertions aren’t always better; they can over-specify and create false alarms. The best assertion matches the spec’s specificity.
2. Which strategy BEST avoids both false alarms AND missed regressions for the spec “the page shows the user’s order ID”?
The diagonal of the 2×2 grid: tight spec (the actual ID matters) → tight assertion (verify the ID). The framing region uses a role locator with a regex name so the wording around the ID can change without breaking the test. The ID itself is pinned because the spec says so.
3. (Spaced review — Step 5) A test fails after a CSS class rename. The behavior is unchanged. The team then changes the class back to silence the test. What’s the underlying problem?
From Step 5: brittle tests fail under refactors that don’t break behavior. The fix is to rewrite the test against a stable contract, not to revert the refactor or freeze internal naming.
Multi-Promise Features and the Capstone
Learning objective. After this step you can apply the match assertion specificity to spec specificity principle to a feature with multiple promises, choosing per-promise specificity independently. This is the real-world skill — most features have more than one thing to verify.
🧠 Quick recall
From Step 6 — what does it mean to match assertion specificity to spec specificity? Why is a stronger assertion sometimes worse?
Step 6 had a single promise (the count). Real features usually have multiple promises — and you have to make a separate specificity decision for each one. That’s the skill that distinguishes a maintainable test suite from a brittle one.
🎯 The feature: “Mark as done” toggle
The Todo app now supports marking items as done. Click on a todo’s button to toggle its done state. Done items show a checkmark; the remaining-count display only counts items that are not done.
The spec is three promises:
- Toggle state. Clicking a todo toggles its done state.
- Count decrements. The remaining-count display reflects only un-done items.
- Item stays visible. Marked-done items remain in the list (not deleted).
For each promise, we make a specificity decision independently. Read this table — you’ll fill in a similar one for the capstone:
Promise Brittle option Goldilocks option Loose option
────────────────────────── ────────────────────────── ────────────────────────── ─────────────────────────
1. Toggle state toHaveClass(/todo-done/) toHaveAttribute('aria- (skip — but then how
(pins CSS class — pressed', 'true') (pins do you know the toggle
implementation detail) semantic ARIA contract) worked?)
2. Count decrements toHaveText('2 items getByRole('status') toBeVisible() on the
remaining') (over-pins .toContainText('2') status (misses the
wording) (pins the number itself) count regression)
3. Item stays visible (Goldilocks IS the getByRole('listitem') (you can't loose-spec
target — count + visible) .filter({hasText:'Milk'}) a deletion check —
.toBeVisible() this promise is binary)
Notice the asymmetry.
- Promise 2 is the same shape as Step 6: pin the count, not the wording.
- Promise 1 introduces a new dimension: there’s a right tool (
aria-pressed, the semantic contract) and a wrong tool (.todo-doneCSS class). Using the wrong tool isn’t more strict — it’s coupled to implementation in a different way. - Promise 3 is binary — the item either stays visible or it doesn’t. Loose-spec doesn’t apply when the contract is yes/no.
Worked example: one fully written test
Read this carefully — it applies the table above:
test('marking a todo as done decrements the count and keeps it visible', async ({ page }) => {
// Arrange: three todos.
await page.goto('/');
for (const t of ['Milk', 'Bread', 'Eggs']) {
await page.getByRole('textbox', { name: /todo item/i }).fill(t);
await page.getByRole('button', { name: /add todo/i }).click();
}
// Act: mark "Milk" as done.
const milkToggle = page.getByRole('button', { name: 'Milk' });
await milkToggle.click();
// Assert all three promises:
// Promise 1 — toggle state is "done" (semantic ARIA contract).
await expect(milkToggle).toHaveAttribute('aria-pressed', 'true');
// Promise 2 — count decrements (pin the number, not wording).
await expect(page.getByRole('status')).toContainText('2');
// Promise 3 — Milk is still in the list (not deleted).
await expect(
page.getByRole('listitem').filter({ hasText: 'Milk' })
).toBeVisible();
});
Each assertion is on the diagonal of its own 2×2 grid. Promise 1 uses the semantic ARIA attribute (not the CSS class). Promise 2 pins the count number (not the wording). Promise 3 verifies presence (the binary contract).
🎓 Capstone — write the next two tests
You’re given a complete Spec Card and two test stubs. Your job: fill in Act + Assert.
Spec Card: Mark a todo as done
✓ Behavior: Clicking a todo toggles its "done" state. Done todos
are visually distinct. The remaining count decrements.
Marked-done todos remain in the list.
✓ Should pass when: Visual styling of done items changes (color, icon,
font-weight). The toggle becomes a checkbox instead
of a button. The confirmation animation changes.
✗ Should fail when: Marking doesn't persist between renders. Count doesn't
decrement. Done items disappear from the list.
🎯 Locator contract: Each todo is a listitem. The toggle button has the
item's text as its accessible name. The status region
exposes a count.
✅ Oracle: The status count reflects the number of un-done items.
Your two tests:
test('marking and unmarking a todo restores the count', async ({ page }) => {
// Arrange: one todo "Milk".
// Act: mark it done, then unmark it.
// Assert: aria-pressed is back to false; count is back to 1.
});
test('marking one of two todos shows count of 1', async ({ page }) => {
// Arrange: two todos "Milk" and "Bread".
// Act: mark "Milk" as done.
// Assert: count shows "1"; "Bread" is still un-done; "Milk" is done.
});
Use the worked example as your template. Apply per-promise specificity decisions (semantic locators, pin the count, verify the toggle state).
🤔 Metacognitive close
Before you submit:
- Rate your confidence on each LO from Step 1 to now. Anything still fuzzy?
- For your two capstone tests, ask: what’s the smallest change to App.jsx that should make my test fail? What’s the smallest change that should NOT make my test fail?
That second question is the real test of whether you’ve internalized the principle. If your test would fail for anything you can think of, it’s brittle. If it would not fail for a real regression you can think of, it’s loose. Aim for the diagonal.
📝 Final house rule
A durable e2e test isn’t a script of clicks. It’s an executable behavioral spec with a thin adapter that maps user intent onto the current UI.
Next steps beyond this tutorial
The in-browser sandbox here doesn’t host every Playwright feature. In a real Playwright project you’d also use:
- Network mocking (
page.route) — mock API responses for deterministic tests. - Storage state auth — sign in once, reuse the session across tests.
- Fixtures — share setup logic without hiding business intent.
- Trace viewer — inspect failed CI runs frame-by-frame.
The official Playwright docs are the next learning artifact. Everything you’ve built here transfers — only the plumbing differs.
function App() {
const [items, setItems] = React.useState([]);
const [text, setText] = React.useState('');
function addTodo() {
const trimmed = text.trim();
if (!trimmed) return;
setItems([...items, { text: trimmed, done: false }]);
setText('');
}
function toggleDone(idx) {
setItems(items.map((item, i) =>
i === idx ? { ...item, done: !item.done } : item
));
}
const remainingCount = items.filter((item) => !item.done).length;
return (
<main className="todo-shell">
<section className="todo-panel">
<p className="eyebrow">Todo Lab — Capstone</p>
<h1>Todo Lab</h1>
<div className="todo-form">
<label htmlFor="todo-input">Todo item</label>
<div className="todo-row">
<input
id="todo-input"
value={text}
onChange={(event) => setText(event.target.value)}
placeholder="Buy milk"
/>
<button onClick={addTodo}>Add todo</button>
</div>
</div>
<p role="status" className="status-line">
{remainingCount} items remaining
</p>
<ul aria-label="Todo list" className="todo-list">
{items.map((item, idx) => (
<li key={idx} className={item.done ? 'todo-done' : ''}>
<button
className="todo-toggle"
onClick={() => toggleDone(idx)}
aria-pressed={item.done}
>
{item.text}
</button>
</li>
))}
</ul>
</section>
</main>
);
}
const root = ReactDOM.createRoot(document.getElementById('root'));
root.render(<App />);
body { margin: 0; font-family: system-ui, -apple-system, sans-serif; background: #f6f7fb; color: #1f2937; }
.todo-shell { min-height: 100vh; display: grid; place-items: center; padding: 32px; }
.todo-panel { width: min(100%, 560px); background: white; border: 1px solid #d9dee8; border-radius: 8px; padding: 28px; box-shadow: 0 18px 40px rgba(31, 41, 55, 0.08); }
.eyebrow { margin: 0 0 8px; color: #4b5563; font-size: 0.85rem; font-weight: 700; text-transform: uppercase; letter-spacing: 0.04em; }
h1 { margin: 0 0 24px; font-size: 2rem; }
label { display: block; margin-bottom: 8px; font-weight: 700; }
.todo-row { display: flex; gap: 10px; }
input { flex: 1; min-width: 0; background: white; color: #1f2937; border: 1px solid #b8c0cc; border-radius: 6px; padding: 10px 12px; font: inherit; }
.todo-row > button { border: 0; border-radius: 6px; padding: 10px 14px; background: #2563eb; color: white; font: inherit; font-weight: 700; cursor: pointer; }
.status-line { margin: 18px 0 0; color: #4b5563; font-weight: 600; }
.todo-list { margin: 12px 0 0; padding-left: 0; list-style: none; }
.todo-list li { margin: 8px 0; }
.todo-toggle { display: block; width: 100%; text-align: left; color: #1f2937; border: 1px solid #d9dee8; border-radius: 6px; padding: 10px 12px; background: white; font: inherit; cursor: pointer; }
.todo-done .todo-toggle { color: #9ca3af; text-decoration: line-through; }
/* Dark mode */
[data-bs-theme="dark"] body { background: #1c2533; color: #e6edf3; }
[data-bs-theme="dark"] .todo-panel { background: #232a36; border-color: #2a323e; box-shadow: 0 18px 40px rgba(0, 0, 0, 0.4); }
[data-bs-theme="dark"] .eyebrow { color: #9ca3af; }
[data-bs-theme="dark"] input { background: #2a323e; color: #e6edf3; border-color: #3a4351; }
[data-bs-theme="dark"] input::placeholder { color: #6b7280; }
[data-bs-theme="dark"] .todo-row > button { background: #3b82f6; }
[data-bs-theme="dark"] .status-line { color: #9ca3af; }
[data-bs-theme="dark"] .todo-toggle { background: #2a323e; color: #e6edf3; border-color: #3a4351; }
[data-bs-theme="dark"] .todo-done .todo-toggle { color: #6b7280; }
import { test, expect } from '@playwright/test';
// Worked example — read this carefully before writing the next two.
test('marking a todo as done decrements the count and keeps it visible', async ({ page }) => {
await page.goto('/');
for (const t of ['Milk', 'Bread', 'Eggs']) {
await page.getByRole('textbox', { name: /todo item/i }).fill(t);
await page.getByRole('button', { name: /add todo/i }).click();
}
const milkToggle = page.getByRole('button', { name: 'Milk' });
await milkToggle.click();
// Promise 1 — toggle state (semantic ARIA contract).
await expect(milkToggle).toHaveAttribute('aria-pressed', 'true');
// Promise 2 — count decrements (pin the number).
await expect(page.getByRole('status')).toContainText('2');
// Promise 3 — item stays visible (binary contract).
await expect(
page.getByRole('listitem').filter({ hasText: 'Milk' })
).toBeVisible();
});
// Your turn: fill in Act + Assert.
test('marking and unmarking a todo restores the count', async ({ page }) => {
// Arrange: navigate and add one todo "Milk".
await page.goto('/');
await page.getByRole('textbox', { name: /todo item/i }).fill('Milk');
await page.getByRole('button', { name: /add todo/i }).click();
// TODO: Act — mark Milk as done, then unmark it.
// TODO: Assert — Milk's aria-pressed is "false"; the status shows "1".
});
test('marking one of two todos shows count of 1', async ({ page }) => {
// Arrange: navigate and add two todos "Milk" and "Bread".
await page.goto('/');
for (const t of ['Milk', 'Bread']) {
await page.getByRole('textbox', { name: /todo item/i }).fill(t);
await page.getByRole('button', { name: /add todo/i }).click();
}
// TODO: Act — mark "Milk" as done.
// TODO: Assert — status shows "1"; "Milk" is done; "Bread" is not done.
});
Multi-Promise Features — Capstone Knowledge Check
Min. score: 80%1. A “checkout” feature has three spec’d promises:
- After paying, the user sees an order confirmation.
- The order ID is shown so the user can reference it later.
- A confirmation email is sent (verifiable via a test mailbox).
Multi-promise features need per-promise specificity decisions. Each promise has its own answer to “what exactly is this asserting, and what’s allowed to change?” Pinning everything strictly creates a brittle suite; pinning everything loosely creates a leaky one. The skill is judgment: read each promise, decide its specificity independently.
2. Your team built a notifications panel with these spec’d behaviors:
- Unread notifications show a red badge with the count.
- Clicking the bell icon opens the panel.
- Notifications are listed in reverse chronological order.
await expect(badge).toHaveCSS('background-color', 'rgb(239, 68, 68)').
What’s the right diagnosis?
The principle works on both sides — locators (Step 5) and assertions (Step 6). When an assertion pins something the spec doesn’t promise (specific color, exact wording, internal classnames), it generates false alarms. The fix is to find the user-facing promise and pin only that.
3. (Spaced review — Steps 1–6, the integration question) Imagine you’re writing an e2e test for a new feature, before any code exists. Which is the most useful first step?
The Spec Card is the central artifact this tutorial built up to. Every test should start with one — even a small one written in 30 seconds. The cost of writing it is small; the cost of not writing it is the brittle/loose tests you’ve been learning to avoid.