Playwright Tutorial: End-to-End Testing for React Apps
Translate the testing concepts from Testing Foundations into the browser. Write end-to-end tests in Playwright that test behavior, not implementation — tests that survive harmless refactors and fail for real bugs.
Anatomy of a Playwright Test: Navigate, Interact, Assert
Why this matters
Every Playwright test you ever write — at work, on capstones, debugging at 11pm — is a variation on three lines: navigate to the page, interact with the UI, assert what the user sees. Lock that rhythm in now and the rest of the tutorial becomes pattern-matching against it. Skip it, and every later step feels like memorization.
🎯 You will learn to
- Analyze a basic Playwright test and identify how each line maps onto the Arrange / Act / Assert pattern from Testing Foundations
- Apply the navigate-interact-assert rhythm to read unfamiliar Playwright tests at a glance
In Testing Foundations you wrote tests like this:
def test_valid_name_accepted():
assert squad_name_valid("epic") is True
That test verifies one function in isolation. A Playwright test verifies a whole React app through a real browser, the way a user experiences it. Same AAA bones, different organism.
🔄 Concept bridge
| Testing Foundations (pytest) | Playwright (e2e) |
|---|---|
| Arrange / Act / Assert | Navigate / Interact / Assert |
| Function inputs | User actions through the UI |
| Direct return value | Observable outcome on the page |
| Synchronous | Async (await everywhere) |
Strong oracle = == exact match |
Strong oracle = toHaveText, toHaveCount, … |
The discipline is the same. The mechanics differ.
🌳 Primer: what getByRole actually queries
Before you read the test, lock in this concept — every locator in the test below depends on it.
Every HTML element has an implicit role that the browser exposes to assistive technology (screen readers, voice control, etc.). The browser maintains a parallel tree — the accessibility tree — that mirrors the DOM but only contains semantically meaningful elements with their roles, names, and states.
| HTML | Implicit role | Accessible name source |
|---|---|---|
<button>Save</button> |
button |
the visible text “Save” |
<input type="text"> |
textbox |
a <label for=...> or aria-label |
<a href="...">Home</a> |
link |
the visible link text |
<ul><li>X</li></ul> |
list containing listitem |
(none — structural) |
<h2>Settings</h2> |
heading |
the visible heading text |
<div onclick=...>Click me</div> |
(no role) | (no name) — invisible to screen readers |
page.getByRole('button', { name: /add todo/i }) queries this tree, not the DOM. It says: “find the element with accessible role button whose accessible name matches the regex /add todo/i.” The query doesn’t care whether the button is <button class="primary">, <button data-print-id="add">, or wrapped in five <div>s — only the role and name.
Why this matters:
- Locators stay stable across CSS refactors — change the class, change the layout, the locator still works.
- Locators break when accessibility breaks — if a teammate replaces
<button>with<div onclick="...">, the locator stops finding it. That’s a feature, not a bug: the change made the page worse for screen-reader users, and the test failure surfaces that regression. - You’re testing the same thing the user (and their assistive tech) sees — not the same thing the React renderer happens to emit on a given day.
With that primer in mind, every getByRole(...) call below is a query against the accessibility tree.
Read this test (don’t run yet)
import { test, expect } from '@playwright/test';
test('user can add a todo', async ({ page }) => {
await page.goto('/'); // Navigate
await page.getByRole('textbox', { name: /todo item/i }).fill('Milk'); // Interact
await page.getByRole('button', { name: /add todo/i }).click(); // Interact
await expect(page.getByRole('listitem')).toHaveText('Milk'); // Assert
});
Annotations that matter:
async ({ page }) => { … }— every Playwright test is async.pageis your handle to the browser tab.awaiton every line — the browser is asynchronous. Withoutawait, JavaScript races past the click before React’s state has updated.getByRole('button', { name: /add todo/i })— queries the accessibility tree (per the primer above) for a button with the accessible name “Add todo”.await expect(...).toBeVisible()— Playwright’s web-first assertions auto-wait and retry until the condition holds (or the timeout expires). They’re the right tool for asynchronous UI.
⚠️ Negative-transfer trap: this is *not* React Testing Library or Jest
If you’ve used React Testing Library (RTL) with Jest, the API looks deceptively similar — getByRole, getByText, expect(...).toBeVisible(). The methods have the same names but different machinery underneath:
| Comparison point | React Testing Library + Jest | Playwright |
|---|---|---|
| What runs the test | jsdom (a fake DOM in Node) | a real Chromium browser |
| Render | React’s renderer alone | the full app + bundler + browser |
getByRole(...) |
synchronous, returns immediately | returns a locator — async, retries |
expect(x).toBeVisible() |
synchronous Jest matcher | await expect(locator).toBeVisible() — async, auto-retries |
| A failing assertion | shows the rendered DOM | shows the failing accessibility tree + screenshot |
| Snapshot tests | common (toMatchSnapshot) |
strongly discouraged for e2e — they brittle on every render |
| Deep render assertions | “the component received prop X” | not even possible — Playwright sees only what the user sees |
Three habits to retire before continuing:
- Never write
expect(await locator.isVisible()).toBe(true). That looks like Jest, but it runs once and races. Alwaysawait expect(locator).toBeVisible()— Playwright’s web-first form retries. - Don’t reach for snapshot matchers.
toMatchSnapshotworks in Playwright but is the wrong tool for e2e — every refactor breaks the snapshot, even when the user-visible behavior is unchanged. UsetoHaveText,toHaveCount,toHaveURL— assertions that mirror what the user would notice. - Don’t probe component internals. “Was prop X passed?” “Is
useStateset to Y?” — those are unit-test concerns. Playwright sees what the browser renders. If a behavior isn’t observable through the UI, it’s not Playwright’s job to verify.
🎬 Predict — commit to a letter, then click reveal
Read the test above and pick one answer for each question. Commit (out loud, on paper, or in your head) before opening the reveal — predicting something is what primes the encoding; skim-and-reveal is no learning.
Q1. If we changed name: /add todo/i to name: /save/i, what happens?
- (a) The test still passes —
getByRolematches buttons by role, not name. - (b) The test fails fast — Playwright throws “no such button” on the next line.
- (c) The test fails on a 30-second timeout — the locator silently retries waiting for a “Save” button that never appears.
- (d) Compile error —
name:requires a string literal, not a regex.
Reveal — pick first, then click
(c). The role+name query is async and retrying (that’s the whole point of web-first locators). With no matching button, Playwright keeps retrying until the action timeout — which surfaces as a slow-failing test, not a fast crash. (a) is the wrong direction — name is the required filter, not a hint. (b) is the React Testing Library mental model leaking in: RTL’s getByRole throws synchronously; Playwright’s doesn’t. (d) is wrong because regex is allowed (and idiomatic).
Q2. Which line is the Assert step?
- (a)
await page.goto('/') - (b)
await page.getByRole('textbox', ...).fill('Milk') - (c)
await page.getByRole('button', ...).click() - (d)
await expect(page.getByRole('listitem')).toHaveText('Milk')
Reveal
(d). Only expect(...) calls are assertions — they check an outcome. goto, fill, click are commands that do things to the page. If you can’t point to which line is the assertion, the test isn’t proving what you think.
▶ Run
Click Test in the Live Preview toolbar. The test passes against the demo Todo app.
🔍 Investigate
Why is await on every line? The browser is asynchronous: clicking a button doesn’t instantly produce the result. await says “wait for this to finish before moving on.” Without await, the assertion would race past the click before React re-rendered, and the test would either fail or — worse — pass for the wrong reason.
✏️ Modify — predict the failure shape, then run
Change the assertion to look for 'Bread' instead of 'Milk'. Before you click Test, commit to one of these:
- (a) Locator-not-found timeout (no element matched).
- (b) Text mismatch — the failure message names both the expected (
Bread) and actual (Milk) text. - (c) Both — Playwright reports two failures.
- (d) The test passes —
toHaveTextdoes a substring match.
Run, then check your prediction.
Reveal
(b). The locator finds the listitem (it exists); the assertion fails on the text comparison and the failure message includes both expected and actual. Building the habit of predicting the failure message shape is the difference between debugging by reading and debugging by guessing.
📝 House rule (carry it forward)
A Playwright test reads navigate → interact → assert. The test title is the spec — what user-visible promise we’re proving — not a description of clicks.
function App() {
const [items, setItems] = React.useState([]);
const [text, setText] = React.useState('');
function addTodo() {
const trimmed = text.trim();
if (!trimmed) return;
setItems([...items, trimmed]);
setText('');
}
return (
<main className="todo-shell">
<section className="todo-panel">
<p className="eyebrow">Playwright tutorial</p>
<h1>Todo Lab</h1>
<div className="todo-form">
<label htmlFor="todo-input">Todo item</label>
<div className="todo-row">
<input
id="todo-input"
value={text}
onChange={(event) => setText(event.target.value)}
placeholder="Buy milk"
/>
<button onClick={addTodo}>Add todo</button>
</div>
</div>
<ul aria-label="Todo list" className="todo-list">
{items.map((item, index) => (
<li key={index}>{item}</li>
))}
</ul>
</section>
</main>
);
}
const root = ReactDOM.createRoot(document.getElementById('root'));
root.render(<App />);
body {
margin: 0;
font-family: system-ui, -apple-system, BlinkMacSystemFont, "Segoe UI", sans-serif;
background: #f6f7fb;
color: #1f2937;
}
.todo-shell {
min-height: 100vh;
display: grid;
place-items: center;
padding: 32px;
}
.todo-panel {
width: min(100%, 560px);
background: white;
border: 1px solid #d9dee8;
border-radius: 8px;
padding: 28px;
box-shadow: 0 18px 40px rgba(31, 41, 55, 0.08);
}
.eyebrow {
margin: 0 0 8px;
color: #4b5563;
font-size: 0.85rem;
font-weight: 700;
text-transform: uppercase;
letter-spacing: 0.04em;
}
h1 { margin: 0 0 24px; font-size: 2rem; }
label { display: block; margin-bottom: 8px; font-weight: 700; }
.todo-row { display: flex; gap: 10px; }
input {
flex: 1;
min-width: 0;
background: white;
color: #1f2937;
border: 1px solid #b8c0cc;
border-radius: 6px;
padding: 10px 12px;
font: inherit;
}
button {
border: 0;
border-radius: 6px;
padding: 10px 14px;
background: #2563eb;
color: white;
font: inherit;
font-weight: 700;
cursor: pointer;
}
.todo-list { margin: 24px 0 0; padding-left: 24px; }
.todo-list:empty { display: none; }
.todo-list li { margin: 8px 0; }
/* Dark mode — the iframe inherits the host page's theme via
[data-bs-theme="dark"] on <html>. Mirror the site's dark palette
so the Todo app preview stays legible when students switch themes. */
[data-bs-theme="dark"] body { background: #1c2533; color: #e6edf3; }
[data-bs-theme="dark"] .todo-panel {
background: #232a36;
border-color: #2a323e;
box-shadow: 0 18px 40px rgba(0, 0, 0, 0.4);
}
[data-bs-theme="dark"] .eyebrow { color: #9ca3af; }
[data-bs-theme="dark"] input {
background: #2a323e;
color: #e6edf3;
border-color: #3a4351;
}
[data-bs-theme="dark"] input::placeholder { color: #6b7280; }
[data-bs-theme="dark"] button { background: #2563eb; }
import { test, expect } from '@playwright/test';
test('user can add a todo', async ({ page }) => {
await page.goto('/');
await page.getByRole('textbox', { name: /todo item/i }).fill('Milk');
await page.getByRole('button', { name: /add todo/i }).click();
await expect(page.getByRole('listitem')).toHaveText('Milk');
});
Step 1 — Knowledge Check
Min. score: 80%1. Which of these test titles best describes a behavioral spec (rather than a click-script)?
Test names should read like product promises, not click sequences. A good rule of thumb: if a future developer sees the test fail in CI, can they tell from the name alone what user-facing thing broke? If yes, the name is doing its job.
2. Why does this Playwright assertion need await?
await expect(page.getByText('Milk')).toBeVisible();
await expect(locator).matcher() is the canonical Playwright shape. The matcher retries until it succeeds or hits the timeout. Without await, JavaScript fires the matcher and immediately moves on, ignoring whether it ever held.
3. In the test below, which line is the Assert step?
test('user can add a todo', async ({ page }) => {
await page.goto('/'); // Line 1
await page.getByRole('textbox', { name: /todo item/i }).fill('Milk'); // Line 2
await page.getByRole('button', { name: /add todo/i }).click(); // Line 3
await expect(page.getByText('Milk')).toBeVisible(); // Line 4
});
Playwright’s navigate / interact / assert is the same shape as foundations’ Arrange / Act / Assert. Each test should have one assertion phase that verifies the user-visible promise. If you can’t point to which line is the assertion, the test probably isn’t proving what you think.
The Spec Card: Choosing What User Paths Deserve a Test
Why this matters
The hardest part of e2e testing isn’t writing the test — it’s deciding which tests to write. Without a deliberate selection method, you end up testing whatever came to mind first, missing the partitions that actually catch bugs. The Spec Card is the artifact that forces the question what about this feature is the stable contract? before you commit code that pins the wrong thing.
🎯 You will learn to
- Apply input-space partitioning from Testing Foundations to user-path partitioning in e2e
- Create a Spec Card that names a feature’s stable contract before writing the test
- Evaluate which user paths deserve an e2e test versus a lower test layer
🧠 Quick recall — commit before reading on
Q. Why does Playwright need await in front of expect(locator).toBeVisible()?
- (a) JavaScript requires
awaiton every line in async functions. - (b) Web-first assertions auto-wait and retry; without
await, the assertion fires once and races past React’s render. - (c)
awaitmakes the test go faster. - (d) Without
await, the test won’t compile.
Reveal
(b). The matcher returns a Promise that retries until the condition holds or the timeout expires. Drop the await and it fires once, then JavaScript moves on — silent flakiness, the worst kind of failure.
From foundations partitions to user-path partitions
In Testing Foundations, you partitioned the input space of a function and picked one representative input per partition. In e2e, you partition the user-path space — the different user behaviors a feature has to support — and pick one representative test per partition.
Same discipline. Different domain.
📋 Introducing the Spec Card
Before you write an e2e test, write down the spec it’s verifying. Five fields, fits on screen:
Spec Card: User can add a todo
✓ Behavior: User types a name, clicks Add, sees it in the list.
✓ Should pass when: CSS classes change. The Add button is restyled.
The input becomes a `<textarea>`. The list becomes
a table.
✗ Should fail when: Adding silently drops items. Empty inputs are
accepted. The input doesn't clear after add.
🎯 Locator contract: A textbox labeled "Todo item"; a button named
"Add todo"; a list of items.
✅ Oracle: The new item is visible in the list.
The Spec Card is the artifact you carry through the rest of the tutorial. It forces the question what about this UI is the stable contract? before you write code that can pin the wrong thing.
Notice the “Should pass when” line: it lists implementation changes that should not break the test. That’s your defense against brittleness later.
✏️ Fill in your own Spec Card — pick one of two ways
Two equally good options. Pick whichever fits how you think:
- In-editor template — Open
notes/spec-card.mdin the file tree on the left. It’s a fillable Markdown template (auto-saved alongside your code). Fill it in for the whitespace-only input test you’re about to write below. - Standalone tool — Open the Spec Card tool in a new tab. Same five fields, but as a structured form with auto-save, Export-as-Markdown, and Copy-to-clipboard. The tool persists across tutorials so you can build a portfolio of Spec Cards as you write tests at school and at work.
Either way, fill the card in before you touch the test code below. The whole point of the Spec Card is that the decisions get made upstream of typing.
🎬 Predict — which user-path partitions are missing?
Three tests are pre-written in tests/add-todo.spec.js. They cover:
- Happy path —
"Milk"is accepted. - Empty input —
""is rejected. - Very long input — a 200-character string is accepted.
Read the spec under App.jsx: the app trims input before deciding. Which partition is missing from the tests?
(In your head, before reading on…)
Reveal
The missing partition is **whitespace-only input** (`" "`). After trimming, it equals `""`, so the spec says it should be rejected — exactly like the empty-string case from the partition perspective, but with a different surface input.▶ Run
Click Test. Three tests pass; the fourth is a // TODO you’ll fill in next.
✏️ Modify — write the missing partition test
In tests/add-todo.spec.js, find the whitespace-only input is rejected test. The Arrange / Act / Assert comments are placeholders — fill them in, following the pattern of the three tests above.
Hints will appear on test failure — work through them in layers if you get stuck.
🔍 Investigate
You now have four tests for one feature, each covering a different partition. Why not write a test for every possible input?
The foundations answer applies: representative coverage with low cost. We don’t need a separate test for " ", " ", " ", " ", … — they’re all in the same partition (whitespace-only) and the trimming logic processes them identically. One representative test per partition is enough.
📝 House rules added
- Use partitions to choose user paths. You don’t need a test for every string. You need one test per behaviorally-distinct partition.
- Not every test belongs in e2e. Many edge cases live more cheaply in unit tests. Reserve e2e tests for behaviors that need full-stack browser confidence.
function App() {
const [items, setItems] = React.useState([]);
const [text, setText] = React.useState('');
function addTodo() {
const trimmed = text.trim();
if (!trimmed) return;
setItems([...items, trimmed]);
setText('');
}
return (
<main className="todo-shell">
<section className="todo-panel">
<p className="eyebrow">Playwright tutorial</p>
<h1>Todo Lab</h1>
<div className="todo-form">
<label htmlFor="todo-input">Todo item</label>
<div className="todo-row">
<input
id="todo-input"
value={text}
onChange={(event) => setText(event.target.value)}
placeholder="Buy milk"
/>
<button onClick={addTodo}>Add todo</button>
</div>
</div>
<ul aria-label="Todo list" className="todo-list">
{items.map((item, index) => (
<li key={index}>{item}</li>
))}
</ul>
</section>
</main>
);
}
const root = ReactDOM.createRoot(document.getElementById('root'));
root.render(<App />);
body { margin: 0; font-family: system-ui, -apple-system, sans-serif; background: #f6f7fb; color: #1f2937; }
.todo-shell { min-height: 100vh; display: grid; place-items: center; padding: 32px; }
.todo-panel { width: min(100%, 560px); background: white; border: 1px solid #d9dee8; border-radius: 8px; padding: 28px; box-shadow: 0 18px 40px rgba(31, 41, 55, 0.08); }
.eyebrow { margin: 0 0 8px; color: #4b5563; font-size: 0.85rem; font-weight: 700; text-transform: uppercase; letter-spacing: 0.04em; }
h1 { margin: 0 0 24px; font-size: 2rem; }
label { display: block; margin-bottom: 8px; font-weight: 700; }
.todo-row { display: flex; gap: 10px; }
input { flex: 1; min-width: 0; background: white; color: #1f2937; border: 1px solid #b8c0cc; border-radius: 6px; padding: 10px 12px; font: inherit; }
button { border: 0; border-radius: 6px; padding: 10px 14px; background: #2563eb; color: white; font: inherit; font-weight: 700; cursor: pointer; }
.todo-list { margin: 24px 0 0; padding-left: 24px; }
.todo-list:empty { display: none; }
.todo-list li { margin: 8px 0; }
/* Dark mode (iframe sets [data-bs-theme="dark"] on <html>) */
[data-bs-theme="dark"] body { background: #1c2533; color: #e6edf3; }
[data-bs-theme="dark"] .todo-panel { background: #232a36; border-color: #2a323e; box-shadow: 0 18px 40px rgba(0, 0, 0, 0.4); }
[data-bs-theme="dark"] .eyebrow { color: #9ca3af; }
[data-bs-theme="dark"] input { background: #2a323e; color: #e6edf3; border-color: #3a4351; }
[data-bs-theme="dark"] input::placeholder { color: #6b7280; }
[data-bs-theme="dark"] button { background: #2563eb; }
import { test, expect } from '@playwright/test';
test('user can add a todo (happy path)', async ({ page }) => {
await page.goto('/');
await page.getByRole('textbox', { name: /todo item/i }).fill('Milk');
await page.getByRole('button', { name: /add todo/i }).click();
await expect(page.getByRole('listitem')).toHaveText('Milk');
});
test('empty input is rejected', async ({ page }) => {
await page.goto('/');
await page.getByRole('button', { name: /add todo/i }).click();
await expect(page.getByRole('listitem')).toHaveCount(0);
});
test('very long todo is accepted', async ({ page }) => {
await page.goto('/');
const long = 'x'.repeat(200);
await page.getByRole('textbox', { name: /todo item/i }).fill(long);
await page.getByRole('button', { name: /add todo/i }).click();
await expect(page.getByRole('listitem')).toHaveText(long);
});
// TODO: write the missing partition test here.
// The spec trims input before deciding whether to accept it,
// so whitespace-only input is in the same partition as empty input.
test('whitespace-only input is rejected', async ({ page }) => {
// Arrange: navigate to the page.
// Act: fill the input with whitespace, click Add todo.
// Assert: no list item was added.
});
# Spec Card: User can add a todo (whitespace-only rejected)
Fill this in BEFORE writing the test. The decisions made here
determine which assertions and locators you'll commit to below.
## ✓ Behavior
<!-- One sentence: what user-visible behavior are you proving? -->
## ✓ Should pass when
<!-- Implementation changes the test must SURVIVE.
Examples: CSS class renames, button restyles, layout shifts. -->
## ✗ Should fail when
<!-- Regressions the test must CATCH.
Examples: whitespace input is accepted, the input doesn't
clear after submit, the list silently drops items. -->
## 🎯 Locator contract
<!-- Which semantic queries identify each element?
Prefer role + accessible name, label, or semantic test ID.
Avoid CSS classes and DOM positions. -->
## ✅ Oracle
<!-- Observable outcome that confirms success.
What would the user see? -->
---
Prefer a structured form? Open the standalone Spec Card tool at
/SEBook/tools/spec-card (auto-saves, exports as Markdown).
Step 2 — Knowledge Check
Min. score: 80%1. Which of these scenarios is the BEST candidate for an end-to-end test (rather than a unit or integration test)?
E2E tests are expensive confidence. Spend that budget on flows where the full integration matters: auth, routing, state-across-pages, cross-service behaviors. Push validation rules, formatters, and API contracts to lower test layers where they’re cheaper and clearer.
2. What is the purpose of the “Should pass when” field on a Spec Card?
The Spec Card’s “Should pass when” line forces you to think about the test’s durability before you write it. If you can predict that a CSS class rename should be harmless but you choose a CSS-class locator anyway, you’ve already lost.
3. (Spaced review — Step 1) A Playwright test contains the line:
expect(await page.getByText('Saved').isVisible()).toBe(true);
expect(await locator.isVisible()).toBe(true) is the canonical Playwright anti-pattern. Always use await expect(locator).toBeVisible() — the web-first form auto-waits and retries.
The Locator Ladder: Stable Contracts vs Incidental UI
Why this matters
The locator you choose is the contract between your test and the UI — it decides which UI changes will (correctly) break the test and which will (incorrectly) break it. Pick the wrong rung of the ladder and your test either fails on every CSS rename (false alarms that erode trust) or stays green when accessibility regresses (silent failures). The locator ladder is how you make that choice deliberately, not by accident.
🎯 You will learn to
- Analyze five locator strategies and identify what each one depends on (semantics vs implementation)
- Apply the locator ladder to choose the highest rung the UI actually supports
- Evaluate locator durability against three classes of refactor (CSS rename, text change, DOM restructure)
🧠 Quick recall — commit before reading on
Q. From your Spec Card in Step 2, what does the “Locator contract” field name?
- (a) The exact CSS selectors the test should use.
- (b) The semantic queries (role + accessible name, label, test ID) that identify each element the test interacts with — the stable part of the UI surface.
- (c) The list of test cases the test should cover.
- (d) The CI pipeline that runs the test.
Reveal
(b). “Locator contract” names what about each element is stable — the role and accessible name, the label association, or the semantic test ID. CSS selectors (a) are the brittle rung. Test cases (c) belong in the test code, not the Spec Card.
🎯 The locator ladder
There are five common ways to find the same UI element in Playwright. Each rung depends on something different about the UI.
// Five ways to find the same "Add todo" button:
// Rung 1 — Role + accessible name. Mirrors how assistive tech finds it.
page.getByRole('button', { name: /add todo/i });
// Rung 2 — Label association (best for form controls).
page.getByLabel(/todo item/i); // (this would find the input, not the button)
// Rung 3 — Visible text content.
page.getByText('Add todo');
// Rung 4 — Author-supplied stable test ID.
page.getByTestId('add-todo');
// Rung 5 — Raw CSS/DOM selector (last resort).
page.locator('.add-todo-btn');
What each rung depends on:
| Rung | Locator | Depends on |
|---|---|---|
| 1 | getByRole + name: |
The button has an accessible name (HTML semantics) |
| 2 | getByLabel |
A <label for="…"> connection (forms) |
| 3 | getByText |
Exact visible text |
| 4 | getByTestId |
An author-added data-testid attribute |
| 5 | .locator('.css-class') |
The DOM/CSS structure (implementation detail) |
Higher rungs depend on accessible / user-visible facts. Lower rungs depend on implementation decisions (CSS classes, DOM positions). The official Playwright docs put it bluntly: “Your DOM can easily change … Prefer user-facing attributes to XPath or CSS selectors.”
🎬 Predict — commit to a letter, then click reveal
The team is about to ship three independent changes to the Add button: a CSS-class rename (.add-todo-btn → .primary-btn), a button-text change ("Add todo" → "Add"), and a DOM restructure (the button moves into a different parent element). The user-visible behavior — clicking it adds a todo — doesn’t change.
Q. Of the five locators above, which two would survive all three changes without a single edit?
- (a) Rungs 1 and 4 —
getByRole('button', { name: /add/i })andgetByTestId('add-todo'). - (b) Rungs 1 and 3 — both query user-visible text in some form.
- (c) Rungs 2 and 5 — both target form-control specifics.
- (d) None — every locator breaks on at least one change.
Reveal — pick first, then click
(a). getByRole('button', { name: /add/i }) survives all three: regex tolerance covers the text change (“Add” still matches /add/i); the role-based query is independent of CSS classes and DOM ancestry. getByTestId('add-todo') survives because the data-testid is author-controlled and travels with the element wherever it moves. The other rungs each break on one of the three. The investigate-table below shows the per-cell answer if you want the full breakdown — but the lesson lands in those two rows.
▶ Run
Click Test. All five locators currently work against the Todo app — the file tests/locator-ladder.spec.js has one test per rung, all passing.
🔍 Investigate — reveal the answer table
CSS rename Text change DOM restructure
----------------------------------------------------------------------
1. getByRole({name:/add/i}) ✓ ✗ (a) ✓
2. getByLabel ✓ ✓ (b) ✓
3. getByText('Add todo') ✓ ✗ ✓
4. getByTestId('add-todo') ✓ ✓ ✓
5. .locator('.add-todo-btn') ✗ ✓ ✗ (c)
Notes:
- (a) With a regex
/add/i, the role locator survives “Add todo” → “Add” (regex still matches). With an exactname: 'Add todo'it would break. Regex tolerance is a deliberate design choice. - (b)
getByLabelfinds inputs via their<label>— button labels don’t apply, so this rung doesn’t really apply to buttons. Listed for completeness. - (c) A DOM restructure (changing the button’s surrounding markup) often changes CSS-selector ancestry. Brittle.
The pattern: getByTestId is the only rung that survives a button-text change without exact matching. But getByTestId requires the author to have added the test ID — a code-level decision. And test IDs done badly (<button data-testid="blue-btn-right-col">) are just CSS coupling under another name.
✏️ Modify
Open tests/locator-ladder.spec.js. The fifth test uses the brittle .locator('.add-todo-btn') form. Rewrite it as a role-based locator (Rung 1). Run again — your refactored test should still pass.
📝 House rule
Pick the locator that matches the stable contract of this UI element. If the button label is part of the user-visible promise, use getByRole with a sensible regex. If the wording will change but the action is permanent, use getByTestId with a semantically named test ID. Use raw CSS only when nothing else will do — and write a comment explaining why.
function App() {
const [items, setItems] = React.useState([]);
const [text, setText] = React.useState('');
function addTodo() {
const trimmed = text.trim();
if (!trimmed) return;
setItems([...items, trimmed]);
setText('');
}
return (
<main className="todo-shell">
<section className="todo-panel">
<p className="eyebrow">Playwright tutorial</p>
<h1>Todo Lab</h1>
<div className="todo-form">
<label htmlFor="todo-input">Todo item</label>
<div className="todo-row">
<input
id="todo-input"
value={text}
onChange={(event) => setText(event.target.value)}
placeholder="Buy milk"
/>
<button
className="add-todo-btn"
data-testid="add-todo"
onClick={addTodo}
>
Add todo
</button>
</div>
</div>
<ul aria-label="Todo list" className="todo-list">
{items.map((item, index) => (
<li key={index}>{item}</li>
))}
</ul>
</section>
</main>
);
}
const root = ReactDOM.createRoot(document.getElementById('root'));
root.render(<App />);
body { margin: 0; font-family: system-ui, -apple-system, sans-serif; background: #f6f7fb; color: #1f2937; }
.todo-shell { min-height: 100vh; display: grid; place-items: center; padding: 32px; }
.todo-panel { width: min(100%, 560px); background: white; border: 1px solid #d9dee8; border-radius: 8px; padding: 28px; box-shadow: 0 18px 40px rgba(31, 41, 55, 0.08); }
.eyebrow { margin: 0 0 8px; color: #4b5563; font-size: 0.85rem; font-weight: 700; text-transform: uppercase; letter-spacing: 0.04em; }
h1 { margin: 0 0 24px; font-size: 2rem; }
label { display: block; margin-bottom: 8px; font-weight: 700; }
.todo-row { display: flex; gap: 10px; }
input { flex: 1; min-width: 0; background: white; color: #1f2937; border: 1px solid #b8c0cc; border-radius: 6px; padding: 10px 12px; font: inherit; }
.add-todo-btn,
button { border: 0; border-radius: 6px; padding: 10px 14px; background: #2563eb; color: white; font: inherit; font-weight: 700; cursor: pointer; }
.todo-list { margin: 24px 0 0; padding-left: 24px; }
.todo-list:empty { display: none; }
.todo-list li { margin: 8px 0; }
/* Dark mode */
[data-bs-theme="dark"] body { background: #1c2533; color: #e6edf3; }
[data-bs-theme="dark"] .todo-panel { background: #232a36; border-color: #2a323e; box-shadow: 0 18px 40px rgba(0, 0, 0, 0.4); }
[data-bs-theme="dark"] .eyebrow { color: #9ca3af; }
[data-bs-theme="dark"] input { background: #2a323e; color: #e6edf3; border-color: #3a4351; }
[data-bs-theme="dark"] input::placeholder { color: #6b7280; }
[data-bs-theme="dark"] .add-todo-btn,
[data-bs-theme="dark"] button { background: #2563eb; }
import { test, expect } from '@playwright/test';
// Rung 1 — Role + accessible name (regex-tolerant).
test('rung 1: getByRole finds the Add todo button', async ({ page }) => {
await page.goto('/');
await page.getByRole('textbox', { name: /todo item/i }).fill('Milk');
await page.getByRole('button', { name: /add todo/i }).click();
await expect(page.getByRole('listitem')).toHaveText('Milk');
});
// Rung 2 — getByLabel (best for inputs, but works through the form).
test('rung 2: getByLabel finds the input via its label', async ({ page }) => {
await page.goto('/');
await page.getByLabel(/todo item/i).fill('Bread');
await page.getByRole('button', { name: /add todo/i }).click();
await expect(page.getByRole('listitem')).toHaveText('Bread');
});
// Rung 3 — getByText (couples to exact wording).
test('rung 3: getByText finds the button by visible text', async ({ page }) => {
await page.goto('/');
await page.getByRole('textbox', { name: /todo item/i }).fill('Eggs');
await page.getByText('Add todo').click();
await expect(page.getByRole('listitem')).toHaveText('Eggs');
});
// Rung 4 — getByTestId (semantic test ID).
test('rung 4: getByTestId finds the button via data-testid', async ({ page }) => {
await page.goto('/');
await page.getByRole('textbox', { name: /todo item/i }).fill('Cheese');
await page.getByTestId('add-todo').click();
await expect(page.getByRole('listitem')).toHaveText('Cheese');
});
// Rung 5 — Raw CSS class (the brittle rung — REWRITE this one!).
// TODO: rewrite this test to use page.getByRole instead of CSS.
test('rung 5: brittle CSS locator (rewrite me)', async ({ page }) => {
await page.goto('/');
await page.getByRole('textbox', { name: /todo item/i }).fill('Butter');
await page.locator('.add-todo-btn').click();
await expect(page.getByRole('listitem')).toHaveText('Butter');
});
Step 3 — Knowledge Check
Min. score: 80%1. Which of these is the BEST locator for “the user’s primary save button” — assuming the button has the visible text “Save” today, but the team has announced it will be renamed to “Submit” next quarter?
The locator ladder isn’t “always pick option 1.” It’s “pick the rung that matches the stable contract for this UI element.” When wording is stable, getByRole is best. When wording will change but the action is permanent, getByTestId is right. The choice depends on what about this UI is the promise.
2. Two versions of data-testid for the same Add Todo button — which is BETTER, and why?
Version A: <button data-testid="primary-blue-btn-right-column">
Version B: <button data-testid="add-todo-action">
Test IDs are only as durable as their naming. A test ID named after styling or layout is functionally equivalent to a CSS-class locator — it pins implementation. A test ID named after the action or the semantic role (save-action, cart-checkout-button) is what the docs intend: a stable contract that the test can rely on indefinitely.
3. (Spaced review — Step 2) Your team is debating: should “rejecting whitespace-only input” have its own e2e test, or can it be tested in the same test as “rejecting empty input”?
Partitions are the unit of test design, not individual inputs. Two inputs are in the same partition if the system processes them the same way. One representative per partition is sufficient — adding more is wasted effort, removing one is missed coverage.
Strong Assertions: The Liar Test in the Browser
Why this matters
A green test you can’t trust is worse than no test at all — it gives false confidence while the bug ships. Liar tests are the most dangerous failure mode in an e2e suite because the test visibly clicks buttons, which makes it feel like real verification. This step makes that lie tactile: you’ll watch a buggy app pass a weak assertion, then strengthen it until it tells the truth.
🎯 You will learn to
- Analyze a passing Playwright test and recognize when its oracle is too weak to catch the spec violation
- Apply web-first assertions (
await expect(...)) instead of the synchronousexpect(await locator.isVisible()).toBe(true)antipattern - Evaluate three weak assertion patterns and rewrite them to verify the user-visible promise
🧠 Quick recall — commit before reading on
Q. From Testing Foundations: a liar test has a PASS result that doesn’t prove the spec. What’s the defining feature?
- (a) The test runs slowly and times out before completing.
- (b) The test’s oracle is too weak — the assertion is true for both a correct implementation and a buggy one.
- (c) The test only runs on some platforms.
- (d) The test asserts on the wrong element entirely.
Reveal
(b). A liar test passes against a correct implementation and against a broken one — the assertion can’t distinguish them. The same pattern exists in e2e, and it’s sneakier here because the test visibly clicks buttons, which makes it feel “more real” than it is.
🎬 Predict — commit to a letter, then click reveal
Read this test. The Todo app you’ll run it against has a bug somewhere in addTodo — predict-and-investigate, don’t peek at the source first.
test('adding a todo shows it in the list', async ({ page }) => {
await page.goto('/');
await page.getByRole('textbox', { name: /todo item/i }).fill('Milk');
await page.getByRole('button', { name: /add todo/i }).click();
await expect(page.getByRole('listitem')).toHaveCount(1);
});
Q. Against a buggy app where addTodo somehow drops the user’s text, what does this test do?
- (a) Fail — Playwright detects the empty list item and raises.
- (b) Pass —
toHaveCount(1)only counts list items; it never reads their text. - (c) Error —
toHaveCountrequires non-empty content. - (d) Flaky — sometimes passes, sometimes fails depending on render order.
Reveal — pick first, then click
(b). The assertion only counts. It says nothing about what’s inside the items. The test will be a liar: green check, broken feature.
▶ Run
Click Test.
The test passes. Surprise.
🔍 Investigate — open src/App.jsx and find the bug
Now (and only now) open src/App.jsx. The bug: addTodo stores '' instead of trimmed — the user’s text is dropped between state-update and render, so every <li> renders empty.
What did toHaveCount(1) actually verify? Just that one list item exists. It said nothing about what’s inside the item. The bug — empty text — is invisible to this assertion.
The assertion is a liar: PASS result, broken feature.
Three weak assertion patterns to recognize
| Weak assertion | Why it lies |
|---|---|
await expect(page.getByRole('list')).toBeVisible() |
An empty <ul> is still “visible” |
await expect(page.getByText('')).toBeVisible() |
Always true |
await expect(page.getByRole('listitem')).toHaveCount(1) |
Doesn’t verify item content |
And one Playwright-specific anti-pattern from the official docs:
// ❌ Anti-pattern — non-retrying, no auto-wait:
expect(await page.getByText('Milk').isVisible()).toBe(true);
// ✓ Web-first form — auto-waits and retries:
await expect(page.getByText('Milk')).toBeVisible();
✏️ Modify
In tests/todo.spec.js, strengthen the assertion to verify the item’s text, not just the count. Predict the new failure message before re-running.
Hints will appear on test failure — work through them in layers if you get stuck.
📝 House rule
Assert the promise, not the plumbing.
The promise is what the spec said the user would see. The plumbing is which DOM nodes exist, what CSS class they have, what their internal state is. A strong assertion verifies the promise; a weak assertion verifies the plumbing without verifying what the user actually gets.
// 🐛 BUGGY APP — there's a bug somewhere in addTodo that makes the
// weak assertion lie. Predict + run the test BEFORE you hunt for it
// in the source. The Investigate phase reveals where the bug lives
// (and why the count assertion missed it).
function App() {
const [items, setItems] = React.useState([]);
const [text, setText] = React.useState('');
function addTodo() {
const trimmed = text.trim();
if (!trimmed) return;
setItems([...items, '']);
setText('');
}
return (
<main className="todo-shell">
<section className="todo-panel">
<p className="eyebrow">Buggy Todo Lab</p>
<h1>Todo Lab</h1>
<div className="todo-form">
<label htmlFor="todo-input">Todo item</label>
<div className="todo-row">
<input
id="todo-input"
value={text}
onChange={(event) => setText(event.target.value)}
placeholder="Buy milk"
/>
<button onClick={addTodo}>Add todo</button>
</div>
</div>
<ul aria-label="Todo list" className="todo-list">
{items.map((item, index) => (
<li key={index}>{item}</li>
))}
</ul>
</section>
</main>
);
}
const root = ReactDOM.createRoot(document.getElementById('root'));
root.render(<App />);
body { margin: 0; font-family: system-ui, -apple-system, sans-serif; background: #f6f7fb; color: #1f2937; }
.todo-shell { min-height: 100vh; display: grid; place-items: center; padding: 32px; }
.todo-panel { width: min(100%, 560px); background: white; border: 1px solid #d9dee8; border-radius: 8px; padding: 28px; box-shadow: 0 18px 40px rgba(31, 41, 55, 0.08); }
.eyebrow { margin: 0 0 8px; color: #4b5563; font-size: 0.85rem; font-weight: 700; text-transform: uppercase; letter-spacing: 0.04em; }
h1 { margin: 0 0 24px; font-size: 2rem; }
label { display: block; margin-bottom: 8px; font-weight: 700; }
.todo-row { display: flex; gap: 10px; }
input { flex: 1; min-width: 0; background: white; color: #1f2937; border: 1px solid #b8c0cc; border-radius: 6px; padding: 10px 12px; font: inherit; }
button { border: 0; border-radius: 6px; padding: 10px 14px; background: #2563eb; color: white; font: inherit; font-weight: 700; cursor: pointer; }
.todo-list { margin: 24px 0 0; padding-left: 24px; min-height: 24px; }
.todo-list li { margin: 8px 0; min-height: 1.2em; }
/* Dark mode */
[data-bs-theme="dark"] body { background: #1c2533; color: #e6edf3; }
[data-bs-theme="dark"] .todo-panel { background: #232a36; border-color: #2a323e; box-shadow: 0 18px 40px rgba(0, 0, 0, 0.4); }
[data-bs-theme="dark"] .eyebrow { color: #9ca3af; }
[data-bs-theme="dark"] input { background: #2a323e; color: #e6edf3; border-color: #3a4351; }
[data-bs-theme="dark"] input::placeholder { color: #6b7280; }
[data-bs-theme="dark"] button { background: #2563eb; }
import { test, expect } from '@playwright/test';
// The weak assertion below passes against the buggy app.
// Strengthen it so the test fails — that's the bug-catching version.
test('adding a todo shows it in the list', async ({ page }) => {
await page.goto('/');
await page.getByRole('textbox', { name: /todo item/i }).fill('Milk');
await page.getByRole('button', { name: /add todo/i }).click();
// ❌ Weak assertion: only checks the count.
await expect(page.getByRole('listitem')).toHaveCount(1);
// TODO: replace or extend the assertion above so the test
// catches the empty-text bug. Hint: assert the item's text.
});
Step 4 — Knowledge Check
Min. score: 80%1. Which assertion would catch a bug where the “Mark complete” toggle visually updates (the item gets a strikethrough) but the underlying “remaining” counter does not decrement?
“Assert the promise, not the plumbing.” The promise here is that the counter reflects remaining items. If your assertion only checks visual side-effects (strikethrough, CSS classes), you’ve written a liar test: it passes for a render that’s correct in appearance but wrong in meaning.
2. Which of these is a Playwright anti-pattern that the official best-practices docs explicitly call out?
The Playwright best-practices guide is direct: “Don’t use manual assertions that are not awaiting the expect.” Always use await expect(locator).matcher() so your test gets auto-waiting and retrying — the whole point of Playwright’s web-first assertions.
3. (Spaced review — Step 3) A test uses page.locator('.add-todo-btn') to find the Add button. The team renames the CSS class to .primary-btn. The behavior is unchanged. The test fails. What’s the most accurate label for this failure?
From Step 5 onward (next!), we’ll see this pattern in action — running tests against deliberate refactors and identifying which failures are real regressions vs false alarms. The preview: a test that breaks under a behavior-preserving refactor is brittle, not catching a bug.
Behavior, Not Implementation: The Brittleness Gauntlet
Why this matters
Every brittle test on a real codebase trains the team to ignore the suite — and once trust is gone, the suite’s value collapses. The fix is not to write more tests; it’s to make sure each test breaks for the right reason. This step makes that distinction tactile by having you edit the app yourself and watch one locator survive a refactor while another shatters.
🎯 You will learn to
- Analyze a failing test and classify the break as a real regression or a false alarm
- Apply the locator ladder under pressure: predict which tests survive each refactor before running them
- Evaluate a brittle locator and rewrite it into one coupled to behavior, not styling
🧠 Quick recall — commit before reading on
Q. From Step 3 — which two locator strategies survive a CSS class rename without modification?
- (a)
getByTextandgetByLabel - (b)
getByRoleandgetByTestId - (c)
getByPlaceholderand.locator('.css-class') - (d) Only
getByRolesurvives — every other rung breaks.
Reveal
(b). Both getByRole and getByTestId query non-CSS properties — the accessibility tree and an author-supplied data attribute, respectively. They survive any change to className. CSS-class locators (.locator('.css-class')) explicitly couple to the class.
Now we’re going to make the brittleness tactile. You’ll edit the app yourself and watch tests break.
Two tests, same behavior, two locator strategies
You have two test files in tests/:
tests/css-locator.spec.js— usespage.locator('.add-todo-btn')(Rung 5)tests/role-locator.spec.js— usespage.getByRole('button', { name: /add/i })(Rung 1)
Both verify the same behavior: clicking Add adds a todo. Both pass against the current App.jsx.
🎬 Predict — Round 1: CSS class rename. Commit to a letter, then click reveal.
Imagine the design team does a styling pass and renames the button’s CSS class:
- <button className="add-todo-btn" onClick={addTodo}>Add todo</button>
+ <button className="primary-btn" onClick={addTodo}>Add todo</button>
The user-visible behavior is identical — the button still says “Add todo” and still adds a todo.
Q. After the rename, what happens when you re-run both test files?
- (a) Both pass — the behavior didn’t change, so neither test should break.
- (b) Both fail — Playwright reloads the file and gets confused by the rename.
- (c)
css-locatorfails (false alarm — broke for a styling change),role-locatorpasses (correctly indifferent to CSS). - (d)
role-locatorfails (real regression — the role changed),css-locatorpasses.
Reveal — pick first, then make the edit yourself
(c). This is the entire lesson of the gauntlet. The role-based locator queries the accessibility tree (role + accessible name “Add todo”) — both unchanged. The CSS locator queries the class — which IS what changed. The behavior is identical, so the role test correctly stays green; the CSS test fails for a false alarm. You’re about to watch this happen in real time.
✏️ Edit App.jsx (one line)
Open src/App.jsx. Find the line:
<button className="add-todo-btn" onClick={addTodo}>Add todo</button>
Change add-todo-btn to primary-btn. Just that one identifier. Save the file.
▶ Run
Click Test. You will see one ❌ red and one ✓ green — that’s the design of this step. Do not “fix” the red one by reverting the rename; the red is the lesson. If you see two greens, the rename didn’t take effect (recheck App.jsx); if you see two reds, you broke something else (revert other changes and try again).
The gate below specifically asserts that tests/css-locator.spec.js is failing — passing the gate requires the css-locator test to be in its broken state.
🔍 Investigate
| Test | Result | What it tells us |
|---|---|---|
tests/css-locator.spec.js |
❌ Fails | The test was coupled to a styling decision. The user-facing behavior didn’t change, but the test broke. This is a false alarm — wasted CI time and eroded trust in the suite. |
tests/role-locator.spec.js |
✓ Passes | The test was coupled to the user-visible role + name. Styling changed; behavior didn’t; the test correctly didn’t notice. |
The role-based test honors what’s stable about the UI: the button has an accessible name “Add todo.” Styling is incidental. The CSS-based test pinned the incidental thing.
🔄 Mini-gauntlet, Round 2 (preview)
What if Marketing renames "Add todo" → "Add"? The role-locator’s regex /add/i matches both, so it survives. A name: 'Add todo' (exact) wouldn’t have. Whether that survival is right depends on whether the exact wording is part of the spec — and that ambiguity is exactly the trade-off Step 6 makes explicit.
📝 House rule
A test that breaks under a refactor it shouldn’t have broken under is brittle. Brittleness is the cost of coupling tests to implementation details. The Spec Card’s “Should pass when” field is your defense — write down the changes the test should survive before you write the test, then make sure your locators honor it.
// 🛠 Edit this file as instructed: rename the CSS class
// on the Add todo button from "add-todo-btn" to "primary-btn".
function App() {
const [items, setItems] = React.useState([]);
const [text, setText] = React.useState('');
function addTodo() {
const trimmed = text.trim();
if (!trimmed) return;
setItems([...items, trimmed]);
setText('');
}
return (
<main className="todo-shell">
<section className="todo-panel">
<p className="eyebrow">Brittleness gauntlet</p>
<h1>Todo Lab</h1>
<div className="todo-form">
<label htmlFor="todo-input">Todo item</label>
<div className="todo-row">
<input
id="todo-input"
value={text}
onChange={(event) => setText(event.target.value)}
placeholder="Buy milk"
/>
<button className="add-todo-btn" onClick={addTodo}>
Add todo
</button>
</div>
</div>
<ul aria-label="Todo list" className="todo-list">
{items.map((item, index) => (
<li key={index}>{item}</li>
))}
</ul>
</section>
</main>
);
}
const root = ReactDOM.createRoot(document.getElementById('root'));
root.render(<App />);
body { margin: 0; font-family: system-ui, -apple-system, sans-serif; background: #f6f7fb; color: #1f2937; }
.todo-shell { min-height: 100vh; display: grid; place-items: center; padding: 32px; }
.todo-panel { width: min(100%, 560px); background: white; border: 1px solid #d9dee8; border-radius: 8px; padding: 28px; box-shadow: 0 18px 40px rgba(31, 41, 55, 0.08); }
.eyebrow { margin: 0 0 8px; color: #4b5563; font-size: 0.85rem; font-weight: 700; text-transform: uppercase; letter-spacing: 0.04em; }
h1 { margin: 0 0 24px; font-size: 2rem; }
label { display: block; margin-bottom: 8px; font-weight: 700; }
.todo-row { display: flex; gap: 10px; }
input { flex: 1; min-width: 0; background: white; color: #1f2937; border: 1px solid #b8c0cc; border-radius: 6px; padding: 10px 12px; font: inherit; }
.add-todo-btn,
.primary-btn,
button { border: 0; border-radius: 6px; padding: 10px 14px; background: #2563eb; color: white; font: inherit; font-weight: 700; cursor: pointer; }
.todo-list { margin: 24px 0 0; padding-left: 24px; }
.todo-list:empty { display: none; }
.todo-list li { margin: 8px 0; }
/* Dark mode */
[data-bs-theme="dark"] body { background: #1c2533; color: #e6edf3; }
[data-bs-theme="dark"] .todo-panel { background: #232a36; border-color: #2a323e; box-shadow: 0 18px 40px rgba(0, 0, 0, 0.4); }
[data-bs-theme="dark"] .eyebrow { color: #9ca3af; }
[data-bs-theme="dark"] input { background: #2a323e; color: #e6edf3; border-color: #3a4351; }
[data-bs-theme="dark"] input::placeholder { color: #6b7280; }
[data-bs-theme="dark"] .add-todo-btn,
[data-bs-theme="dark"] .primary-btn,
[data-bs-theme="dark"] button { background: #2563eb; }
import { test, expect } from '@playwright/test';
// CSS-class locator — pins .add-todo-btn (an implementation detail).
test('css-locator: user can add a todo', async ({ page }) => {
await page.goto('/');
await page.getByRole('textbox', { name: /todo item/i }).fill('Milk');
await page.locator('.add-todo-btn').click();
await expect(page.getByRole('listitem')).toHaveText('Milk');
});
import { test, expect } from '@playwright/test';
// Role-based locator — pins the button's accessible name.
test('role-locator: user can add a todo', async ({ page }) => {
await page.goto('/');
await page.getByRole('textbox', { name: /todo item/i }).fill('Milk');
await page.getByRole('button', { name: /add/i }).click();
await expect(page.getByRole('listitem')).toHaveText('Milk');
});
Step 5 — Knowledge Check
Min. score: 80%
1. A team’s CI pipeline reports that test admin can deactivate a user failed last night. Investigation shows: a developer changed a CSS class from .user-row-actions to .row-controls. The deactivate behavior itself works perfectly. The test used page.locator('.user-row-actions button.deactivate').
What’s the most accurate diagnosis?
A test failure is only useful if it points to a behavior break. A test that fails for a styling rename, a class rename, or a DOM restructure is a false alarm — it costs the team time and erodes trust in the suite. Use role-based or test-ID-based locators to keep the contract stable while implementation evolves.
2. You write a new e2e test using getByRole('button', { name: 'Sign in' }). A week later, the marketing team renames the button from “Sign in” to “Log in”. Your test breaks.
Which is the most accurate take?
The locator ladder isn’t "always pick option 1." The right rung depends on what’s promised by the spec. Step 6 makes this trade-off explicit by introducing the match assertion specificity to spec specificity principle.
3. (Spaced review — Step 4) A weak assertion await expect(page.getByRole('listitem')).toHaveCount(1) passed against an app that renders an empty <li> (the user’s text was dropped). Why did it pass?
Strong assertions pin what the spec promises. The spec promised "the user’s text appears in the list," so the assertion needs to verify text content — not just that something exists. This is the same liar-test family from Testing Foundations Step 3.
The Maintenance Trade-off: Pin the Spec, No More, No Less
Why this matters
Step 4 said stronger assertions catch more bugs. Step 5 said brittle locators waste team time. Both are true — and they pull in opposite directions. The skill that separates a maintainable suite from a brittle one is knowing how to reconcile them: pin exactly what the spec promises, no more, no less. Get this calibration wrong and you either over-specify (false alarms on every refactor) or under-specify (the count is broken and the test is green).
🎯 You will learn to
- Apply the principle match assertion specificity to spec specificity to a single-promise feature
- Analyze a 3 × 2 grid of assertion strength × scenario and predict which results are correct vs misleading
- Evaluate a goldilocks assertion against brittle and loose alternatives
🧠 Quick recall — commit before reading on
Q. A test fails. Which of these is the false alarm?
- (a) The behavior under test changed — the user can no longer place an order.
- (b) The test asserts on a CSS class that the design team renamed; the user-visible behavior is unchanged.
- (c) The test discovered a regression in the checkout flow.
- (d) The test caught an off-by-one in the cart count.
Reveal
(b). A false alarm is a test failure that doesn’t correspond to a behavior change — the test was coupled to implementation (CSS class) instead of to the user-visible promise. (a), (c), and (d) are real regressions worth catching. Both Step 4 (liar tests = false passes) and Step 5 (brittle tests = false fails) point at the same underlying issue: a test’s value depends on what it actually verifies. Step 6 puts the principle into one sentence.
🎯 The principle
Match assertion specificity to spec specificity. Pin exactly what the spec promises — no more, no less.
A stronger assertion is not always a better assertion. We’ll see this on a deliberately simple feature first. (Step 7 generalizes it to features with multiple promises.)
The feature
The Todo app has a new remaining-count display: a <p role="status"> showing “3 items remaining”. The spec is one sentence:
“Show the user how many items are still pending.”
That’s it. One promise: surface the count. Notice what’s not in the spec:
- the exact wording (“items remaining” vs “todos pending”)
- plurality grammar (“1 item” vs “1 items”)
- the surrounding sentence (“You have 3…” vs just “3…”)
- color, position, animation
Three candidate assertions
// Brittle (over-specified): pins exact wording, plurality, surrounding copy.
await expect(page.getByRole('status'))
.toHaveText('You have 3 items remaining across all todos');
// Goldilocks (spec-aligned): pins exactly what the spec promises.
await expect(page.getByRole('status')).toContainText('3');
await expect(page.getByRole('status')).toContainText(/item/i);
// Loose (under-specified): the status region exists; nothing more.
await expect(page.getByRole('status')).toBeVisible();
🎬 Predict — Scenario A: marketing changes wording. Commit, then click reveal.
Imagine the team rewrites the status text from "3 items remaining" to "3 todos pending". The spec is still satisfied — the count is still shown.
Q. Which assertion correctly survives the wording change (i.e., passes — and the pass is the right answer)?
- (a) Brittle only — exact text is the contract.
- (b) Goldilocks only — pins the count and the noun, both still present.
- (c) Loose only —
toBeVisible()doesn’t care about content. - (d) Goldilocks and Loose — both still pass; only Goldilocks’s pass is informative.
Reveal
(d). Brittle fails (false alarm — wording changed, spec didn’t). Goldilocks and Loose both pass — but Goldilocks’s pass is meaningful (it verified the count and the noun) while Loose’s pass is trivially true (it never checked the count anyway). A “passing” test that proves nothing isn’t doing its job.
🎬 Predict — Scenario B: an off-by-one regression. Commit, then click reveal.
Now imagine a different change: the count logic has a bug. Where the page should say “3 items remaining,” it says “4 items remaining” instead.
Q. Which assertion catches this regression (i.e., fails — and the fail is the right answer)?
- (a) Brittle and Goldilocks both fail; Loose passes (misses the bug).
- (b) Only Brittle fails; Goldilocks misses it because it doesn’t pin the exact number.
- (c) Only Loose fails — it’s the only one that runs against the count region.
- (d) All three pass —
toContainTextandtoHaveTextboth ignore numeric content.
Reveal
(a). Brittle fails because '3 items remaining' ≠ '4 items remaining'. Goldilocks fails because toContainText('3') doesn’t match '4 items remaining' (no '3' in that string). Loose passes because the status region is still visible — it never checked the count, so it can’t catch a count regression. That last “pass” is the under-specification trap.
▶ Run
Click Test. All three tests pass against the base app. (The base app shows "3 items remaining" correctly.)
✏️ Edit App.jsx — introduce the off-by-one bug
In src/App.jsx, find the line:
const remainingCount = items.length;
Change it to:
const remainingCount = items.length + 1;
That’s the bug — the count is now wrong by one. Predict which tests catch it before re-running.
▶ Run again
🔍 Investigate — Scenario B results
| Assertion | Result | Was the result useful? |
|---|---|---|
| Brittle | ❌ Fails | ✓ Yes — it caught the regression |
| Goldilocks | ❌ Fails | ✓ Yes — it caught the regression |
| Loose | ✓ Passes | ✗ No — it missed the bug entirely |
Now think back to Scenario A (the wording change). Reset the bug — change items.length + 1 back to items.length. Then imagine the wording change happening:
| Assertion | Result under wording change | Was the result useful? |
|---|---|---|
| Brittle | ❌ Fails | ✗ No — false alarm; spec still satisfied |
| Goldilocks | ✓ Passes | ✓ Yes — wording isn’t part of the spec |
| Loose | ✓ Passes | (Trivially — but it never checked the count anyway) |
The 2×2 grid that crystallizes the lesson
| Assertion ↓ / Spec → | Spec is loose (“show the count”) |
Spec is tight (“show ‘3 items remaining’”) |
|---|---|---|
| Loose assertion | ✓ aligned | ✗ misses regressions |
| Tight assertion | ✗ false alarms | ✓ aligned |
Strength (LO3) and spec-fidelity (LO4) are different axes. The best assertion lives on the diagonal — its specificity matches the spec’s specificity.
- Loose spec + loose assertion = good. (You’re pinning what’s promised.)
- Loose spec + tight assertion = false alarms. (You’re pinning more than promised.)
- Tight spec + loose assertion = misses regressions. (You’re pinning less than promised.)
- Tight spec + tight assertion = good. (You’re pinning the exact contract.)
The Goldilocks assertion above is on the diagonal: a loose spec, met with a loose-but-targeted assertion that still verifies the count. Brittle is off the diagonal in one direction; loose is off in the other.
📝 House rule
Pin exactly what the spec promises. No more, no less.
Don’t default to maximum strictness “just in case.” Strictness is not free — every pin is a future false alarm waiting to happen. Don’t default to minimum strictness either — every un-pinned promise is a regression waiting to slip through.
Read the spec. Decide what’s promised. Pin that.
// 🛠 You'll edit one line in this file to introduce the off-by-one bug.
function App() {
const [items, setItems] = React.useState([]);
const [text, setText] = React.useState('');
function addTodo() {
const trimmed = text.trim();
if (!trimmed) return;
setItems([...items, trimmed]);
setText('');
}
const remainingCount = items.length;
return (
<main className="todo-shell">
<section className="todo-panel">
<p className="eyebrow">Todo Lab</p>
<h1>Todo Lab</h1>
<div className="todo-form">
<label htmlFor="todo-input">Todo item</label>
<div className="todo-row">
<input
id="todo-input"
value={text}
onChange={(event) => setText(event.target.value)}
placeholder="Buy milk"
/>
<button onClick={addTodo}>Add todo</button>
</div>
</div>
<p role="status" className="status-line">
{remainingCount} items remaining
</p>
<ul aria-label="Todo list" className="todo-list">
{items.map((item, index) => (
<li key={index}>{item}</li>
))}
</ul>
</section>
</main>
);
}
const root = ReactDOM.createRoot(document.getElementById('root'));
root.render(<App />);
body { margin: 0; font-family: system-ui, -apple-system, sans-serif; background: #f6f7fb; color: #1f2937; }
.todo-shell { min-height: 100vh; display: grid; place-items: center; padding: 32px; }
.todo-panel { width: min(100%, 560px); background: white; border: 1px solid #d9dee8; border-radius: 8px; padding: 28px; box-shadow: 0 18px 40px rgba(31, 41, 55, 0.08); }
.eyebrow { margin: 0 0 8px; color: #4b5563; font-size: 0.85rem; font-weight: 700; text-transform: uppercase; letter-spacing: 0.04em; }
h1 { margin: 0 0 24px; font-size: 2rem; }
label { display: block; margin-bottom: 8px; font-weight: 700; }
.todo-row { display: flex; gap: 10px; }
input { flex: 1; min-width: 0; background: white; color: #1f2937; border: 1px solid #b8c0cc; border-radius: 6px; padding: 10px 12px; font: inherit; }
button { border: 0; border-radius: 6px; padding: 10px 14px; background: #2563eb; color: white; font: inherit; font-weight: 700; cursor: pointer; }
.status-line { margin: 18px 0 0; color: #4b5563; font-weight: 600; }
.todo-list { margin: 12px 0 0; padding-left: 24px; }
.todo-list li { margin: 8px 0; }
/* Dark mode */
[data-bs-theme="dark"] body { background: #1c2533; color: #e6edf3; }
[data-bs-theme="dark"] .todo-panel { background: #232a36; border-color: #2a323e; box-shadow: 0 18px 40px rgba(0, 0, 0, 0.4); }
[data-bs-theme="dark"] .eyebrow { color: #9ca3af; }
[data-bs-theme="dark"] input { background: #2a323e; color: #e6edf3; border-color: #3a4351; }
[data-bs-theme="dark"] input::placeholder { color: #6b7280; }
[data-bs-theme="dark"] button { background: #2563eb; }
[data-bs-theme="dark"] .status-line { color: #9ca3af; }
import { test, expect } from '@playwright/test';
// BRITTLE: pins exact wording, plurality, surrounding copy.
test('brittle: counter shows pinned exact text', async ({ page }) => {
await page.goto('/');
await page.getByRole('textbox', { name: /todo item/i }).fill('A');
await page.getByRole('button', { name: /add/i }).click();
await page.getByRole('textbox', { name: /todo item/i }).fill('B');
await page.getByRole('button', { name: /add/i }).click();
await page.getByRole('textbox', { name: /todo item/i }).fill('C');
await page.getByRole('button', { name: /add/i }).click();
await expect(page.getByRole('status')).toHaveText('3 items remaining');
});
import { test, expect } from '@playwright/test';
// GOLDILOCKS: pins exactly what the spec promises (the count + the noun).
test('goldilocks: counter shows the right count of items', async ({ page }) => {
await page.goto('/');
await page.getByRole('textbox', { name: /todo item/i }).fill('A');
await page.getByRole('button', { name: /add/i }).click();
await page.getByRole('textbox', { name: /todo item/i }).fill('B');
await page.getByRole('button', { name: /add/i }).click();
await page.getByRole('textbox', { name: /todo item/i }).fill('C');
await page.getByRole('button', { name: /add/i }).click();
await expect(page.getByRole('status')).toContainText('3');
await expect(page.getByRole('status')).toContainText(/item/i);
});
import { test, expect } from '@playwright/test';
// LOOSE: the status region exists; nothing more.
// This misses the actual count!
test('loose: status region is visible', async ({ page }) => {
await page.goto('/');
await page.getByRole('textbox', { name: /todo item/i }).fill('A');
await page.getByRole('button', { name: /add/i }).click();
await expect(page.getByRole('status')).toBeVisible();
});
Step 6 — Knowledge Check
Min. score: 80%1. A test asserts:
await expect(page.getByRole('status')).toHaveText(
'Welcome back, Ada! You have 5 unread messages waiting.'
);
The principle: pin exactly what the spec promises — no more, no less. Stronger assertions aren’t always better; they can over-specify and create false alarms. The best assertion matches the spec’s specificity.
2. Which strategy BEST avoids both false alarms AND missed regressions for the spec “the page shows the user’s order ID”?
The diagonal of the 2×2 grid: tight spec (the actual ID matters) → tight assertion (verify the ID). The framing region uses a role locator with a regex name so the wording around the ID can change without breaking the test. The ID itself is pinned because the spec says so.
3. (Spaced review — Step 5) A test fails after a CSS class rename. The behavior is unchanged. The team then changes the class back to silence the test. What’s the underlying problem?
From Step 5: brittle tests fail under refactors that don’t break behavior. The fix is to rewrite the test against a stable contract, not to revert the refactor or freeze internal naming.
Multi-Promise Features and the Capstone
Why this matters
Real features rarely have a single promise. The “Mark as done” toggle has three: state changes, count decrements, item stays visible. Each promise has its own specificity sweet spot — and treating them as one big assertion either over-pins (brittle on harmless changes) or under-pins (misses bugs in two-thirds of the contract). This step is the real-world skill: per-promise specificity decisions, made independently.
🎯 You will learn to
- Apply the specificity-matching principle to features with multiple independent promises
- Analyze each promise separately and choose its locator + assertion shape
- Create a complete multi-promise Playwright test from a Spec Card and a partial test stub
🧠 Quick recall — commit before reading on
Q. From Step 6: a stronger assertion is sometimes worse. When?
- (a) When the SUT is slow — strong assertions time out before the page renders.
- (b) When the spec is loose — pinning more than the spec promises creates false alarms on every harmless wording / styling change.
- (c) Never — stricter is always safer.
- (d) When the test runs on Firefox — strong assertions don’t work cross-browser.
Reveal
(b). This is Step 6’s principle: the best assertion lives on the diagonal of the (spec specificity × assertion specificity) grid. If the spec is loose (“show the count”) but the assertion is tight (toHaveText('3 items remaining')), every wording change becomes a false alarm — a test failure that doesn’t correspond to a behavior break.
Step 6 had a single promise (the count). Real features usually have multiple promises — and you have to make a separate specificity decision for each one. That’s the skill that distinguishes a maintainable test suite from a brittle one.
🎯 The feature: “Mark as done” toggle
The Todo app now supports marking items as done. Click on a todo’s button to toggle its done state. Done items show a checkmark; the remaining-count display only counts items that are not done.
The spec is three promises:
- Toggle state. Clicking a todo toggles its done state.
- Count decrements. The remaining-count display reflects only un-done items.
- Item stays visible. Marked-done items remain in the list (not deleted).
For each promise, we make a specificity decision independently. Read this table — you’ll fill in a similar one for the capstone:
Promise Brittle option Goldilocks option Loose option
────────────────────────── ────────────────────────── ────────────────────────── ─────────────────────────
1. Toggle state toHaveClass(/todo-done/) toHaveAttribute('aria- (skip — but then how
(pins CSS class — pressed', 'true') (pins do you know the toggle
implementation detail) semantic ARIA contract) worked?)
2. Count decrements toHaveText('2 items getByRole('status') toBeVisible() on the
remaining') (over-pins .toContainText('2') status (misses the
wording) (pins the number itself) count regression)
3. Item stays visible (Goldilocks IS the getByRole('listitem') (you can't loose-spec
target — count + visible) .filter({hasText:'Milk'}) a deletion check —
.toBeVisible() this promise is binary)
Notice the asymmetry.
- Promise 2 is the same shape as Step 6: pin the count, not the wording.
- Promise 1 introduces a new dimension: there’s a right tool (
aria-pressed, the semantic contract) and a wrong tool (.todo-doneCSS class). Using the wrong tool isn’t more strict — it’s coupled to implementation in a different way. - Promise 3 is binary — the item either stays visible or it doesn’t. Loose-spec doesn’t apply when the contract is yes/no.
Worked example: one fully written test
Read this carefully — it applies the table above:
test('marking a todo as done decrements the count and keeps it visible', async ({ page }) => {
// Arrange: three todos.
await page.goto('/');
for (const t of ['Milk', 'Bread', 'Eggs']) {
await page.getByRole('textbox', { name: /todo item/i }).fill(t);
await page.getByRole('button', { name: /add todo/i }).click();
}
// Act: mark "Milk" as done.
const milkToggle = page.getByRole('button', { name: 'Milk' });
await milkToggle.click();
// Assert all three promises:
// Promise 1 — toggle state is "done" (semantic ARIA contract).
await expect(milkToggle).toHaveAttribute('aria-pressed', 'true');
// Promise 2 — count decrements (pin the number, not wording).
await expect(page.getByRole('status')).toContainText('2');
// Promise 3 — Milk is still in the list (not deleted).
await expect(
page.getByRole('listitem').filter({ hasText: 'Milk' })
).toBeVisible();
});
Each assertion is on the diagonal of its own 2×2 grid. Promise 1 uses the semantic ARIA attribute (not the CSS class). Promise 2 pins the count number (not the wording). Promise 3 verifies presence (the binary contract).
🎓 Capstone — write the next two tests
You’re given a complete Spec Card and two test stubs. Your job: fill in Act + Assert.
Spec Card: Mark a todo as done
✓ Behavior: Clicking a todo toggles its "done" state. Done todos
are visually distinct. The remaining count decrements.
Marked-done todos remain in the list.
✓ Should pass when: Visual styling of done items changes (color, icon,
font-weight). The toggle becomes a checkbox instead
of a button. The confirmation animation changes.
✗ Should fail when: Marking doesn't persist between renders. Count doesn't
decrement. Done items disappear from the list.
🎯 Locator contract: Each todo is a listitem. The toggle button has the
item's text as its accessible name. The status region
exposes a count.
✅ Oracle: The status count reflects the number of un-done items.
Your two tests:
test('marking and unmarking a todo restores the count', async ({ page }) => {
// Arrange: one todo "Milk".
// Act: mark it done, then unmark it.
// Assert: aria-pressed is back to false; count is back to 1.
});
test('marking one of two todos shows count of 1', async ({ page }) => {
// Arrange: two todos "Milk" and "Bread".
// Act: mark "Milk" as done.
// Assert: count shows "1"; "Bread" is still un-done; "Milk" is done.
});
Use the worked example as your template. Apply per-promise specificity decisions (semantic locators, pin the count, verify the toggle state).
🤔 Metacognitive close
Before you submit:
- Rate your confidence on each LO from Step 1 to now. Anything still fuzzy?
- For your two capstone tests, ask: what’s the smallest change to App.jsx that should make my test fail? What’s the smallest change that should NOT make my test fail?
That second question is the real test of whether you’ve internalized the principle. If your test would fail for anything you can think of, it’s brittle. If it would not fail for a real regression you can think of, it’s loose. Aim for the diagonal.
📝 Final house rule
A durable e2e test isn’t a script of clicks. It’s an executable behavioral spec with a thin adapter that maps user intent onto the current UI.
Next steps beyond this tutorial
The in-browser sandbox here doesn’t host every Playwright feature. In a real Playwright project you’d also use:
- Network mocking (
page.route) — mock API responses for deterministic tests. - Storage state auth — sign in once, reuse the session across tests.
- Fixtures — share setup logic without hiding business intent.
- Trace viewer — inspect failed CI runs frame-by-frame.
The official Playwright docs are the next learning artifact. Everything you’ve built here transfers — only the plumbing differs.
function App() {
const [items, setItems] = React.useState([]);
const [text, setText] = React.useState('');
function addTodo() {
const trimmed = text.trim();
if (!trimmed) return;
setItems([...items, { text: trimmed, done: false }]);
setText('');
}
function toggleDone(idx) {
setItems(items.map((item, i) =>
i === idx ? { ...item, done: !item.done } : item
));
}
const remainingCount = items.filter((item) => !item.done).length;
return (
<main className="todo-shell">
<section className="todo-panel">
<p className="eyebrow">Todo Lab — Capstone</p>
<h1>Todo Lab</h1>
<div className="todo-form">
<label htmlFor="todo-input">Todo item</label>
<div className="todo-row">
<input
id="todo-input"
value={text}
onChange={(event) => setText(event.target.value)}
placeholder="Buy milk"
/>
<button onClick={addTodo}>Add todo</button>
</div>
</div>
<p role="status" className="status-line">
{remainingCount} items remaining
</p>
<ul aria-label="Todo list" className="todo-list">
{items.map((item, idx) => (
<li key={idx} className={item.done ? 'todo-done' : ''}>
<button
className="todo-toggle"
onClick={() => toggleDone(idx)}
aria-pressed={item.done}
>
{item.text}
</button>
</li>
))}
</ul>
</section>
</main>
);
}
const root = ReactDOM.createRoot(document.getElementById('root'));
root.render(<App />);
body { margin: 0; font-family: system-ui, -apple-system, sans-serif; background: #f6f7fb; color: #1f2937; }
.todo-shell { min-height: 100vh; display: grid; place-items: center; padding: 32px; }
.todo-panel { width: min(100%, 560px); background: white; border: 1px solid #d9dee8; border-radius: 8px; padding: 28px; box-shadow: 0 18px 40px rgba(31, 41, 55, 0.08); }
.eyebrow { margin: 0 0 8px; color: #4b5563; font-size: 0.85rem; font-weight: 700; text-transform: uppercase; letter-spacing: 0.04em; }
h1 { margin: 0 0 24px; font-size: 2rem; }
label { display: block; margin-bottom: 8px; font-weight: 700; }
.todo-row { display: flex; gap: 10px; }
input { flex: 1; min-width: 0; background: white; color: #1f2937; border: 1px solid #b8c0cc; border-radius: 6px; padding: 10px 12px; font: inherit; }
.todo-row > button { border: 0; border-radius: 6px; padding: 10px 14px; background: #2563eb; color: white; font: inherit; font-weight: 700; cursor: pointer; }
.status-line { margin: 18px 0 0; color: #4b5563; font-weight: 600; }
.todo-list { margin: 12px 0 0; padding-left: 0; list-style: none; }
.todo-list li { margin: 8px 0; }
.todo-toggle { display: block; width: 100%; text-align: left; color: #1f2937; border: 1px solid #d9dee8; border-radius: 6px; padding: 10px 12px; background: white; font: inherit; cursor: pointer; }
.todo-done .todo-toggle { color: #9ca3af; text-decoration: line-through; }
/* Dark mode */
[data-bs-theme="dark"] body { background: #1c2533; color: #e6edf3; }
[data-bs-theme="dark"] .todo-panel { background: #232a36; border-color: #2a323e; box-shadow: 0 18px 40px rgba(0, 0, 0, 0.4); }
[data-bs-theme="dark"] .eyebrow { color: #9ca3af; }
[data-bs-theme="dark"] input { background: #2a323e; color: #e6edf3; border-color: #3a4351; }
[data-bs-theme="dark"] input::placeholder { color: #6b7280; }
[data-bs-theme="dark"] .todo-row > button { background: #2563eb; }
[data-bs-theme="dark"] .status-line { color: #9ca3af; }
[data-bs-theme="dark"] .todo-toggle { background: #2a323e; color: #e6edf3; border-color: #3a4351; }
[data-bs-theme="dark"] .todo-done .todo-toggle { color: #6b7280; }
import { test, expect } from '@playwright/test';
// Worked example — read this carefully before writing the next two.
test('marking a todo as done decrements the count and keeps it visible', async ({ page }) => {
await page.goto('/');
for (const t of ['Milk', 'Bread', 'Eggs']) {
await page.getByRole('textbox', { name: /todo item/i }).fill(t);
await page.getByRole('button', { name: /add todo/i }).click();
}
const milkToggle = page.getByRole('button', { name: 'Milk' });
await milkToggle.click();
// Promise 1 — toggle state (semantic ARIA contract).
await expect(milkToggle).toHaveAttribute('aria-pressed', 'true');
// Promise 2 — count decrements (pin the number).
await expect(page.getByRole('status')).toContainText('2');
// Promise 3 — item stays visible (binary contract).
await expect(
page.getByRole('listitem').filter({ hasText: 'Milk' })
).toBeVisible();
});
// Your turn: fill in Act + Assert.
test('marking and unmarking a todo restores the count', async ({ page }) => {
// Arrange: navigate and add one todo "Milk".
await page.goto('/');
await page.getByRole('textbox', { name: /todo item/i }).fill('Milk');
await page.getByRole('button', { name: /add todo/i }).click();
// TODO: Act — mark Milk as done, then unmark it.
// TODO: Assert — Milk's aria-pressed is "false"; the status shows "1".
});
test('marking one of two todos shows count of 1', async ({ page }) => {
// Arrange: navigate and add two todos "Milk" and "Bread".
await page.goto('/');
for (const t of ['Milk', 'Bread']) {
await page.getByRole('textbox', { name: /todo item/i }).fill(t);
await page.getByRole('button', { name: /add todo/i }).click();
}
// TODO: Act — mark "Milk" as done.
// TODO: Assert — status shows "1"; "Milk" is done; "Bread" is not done.
});
Step 7 — Knowledge Check
Min. score: 80%1. A “checkout” feature has three spec’d promises:
- After paying, the user sees an order confirmation.
- The order ID is shown so the user can reference it later.
- A confirmation email is sent (verifiable via a test mailbox).
Multi-promise features need per-promise specificity decisions. Each promise has its own answer to “what exactly is this asserting, and what’s allowed to change?” Pinning everything strictly creates a brittle suite; pinning everything loosely creates a leaky one. The skill is judgment: read each promise, decide its specificity independently.
2. Your team built a notifications panel with these spec’d behaviors:
- Unread notifications show a red badge with the count.
- Clicking the bell icon opens the panel.
- Notifications are listed in reverse chronological order.
await expect(badge).toHaveCSS('background-color', 'rgb(239, 68, 68)').
What’s the right diagnosis?
The principle works on both sides — locators (Step 5) and assertions (Step 6). When an assertion pins something the spec doesn’t promise (specific color, exact wording, internal classnames), it generates false alarms. The fix is to find the user-facing promise and pin only that.
3. (Spaced review — Steps 1–6, the integration question) Imagine you’re writing an e2e test for a new feature, before any code exists. Which is the most useful first step?
The Spec Card is the central artifact this tutorial built up to. Every test should start with one — even a small one written in 30 seconds. The cost of writing it is small; the cost of not writing it is the brittle/loose tests you’ve been learning to avoid.
From-Scratch Capstone: Write a Test From a Spec Card Alone
Why this matters
Filling in a TODO inside a tutorial scaffold is not the skill you’ll need at work. At work you get a behavior, an empty file, and a deadline. The gap between “I can finish the test someone started” and “I can write the test from a blank buffer” is enormous — and most Playwright tutorials never close it. This step does. It’s the moment the training wheels come off.
🎯 You will learn to
- Create a complete Playwright test — from
importto closing});— given only a behavior spec - Apply every prior step’s discipline (Spec Card, locator ladder, web-first assertions, per-promise specificity) without a stub to lean on
- Evaluate your own test against the gates: does it survive harmless refactors and catch real regressions?
🪜 The training wheels come off
Every previous step gave you something to start with: a stub, a TODO, a worked example sitting just above the box where you typed. This step gives you nothing. An empty file. A spec. Your judgment.
That’s how it works at work — and that’s the gap most Playwright tutorials never close. We’re closing it here.
📋 The spec — read carefully, don’t skim
The Todo app from Step 7 supports marking items as done. The team has just added a small new spec promise:
Promise. When every todo in the list is marked done, the remaining-count display reads
"0 items remaining", and all the original todos remain visible (done items are not deleted from the list).
Two specific user paths the team wants covered:
- Mark-all-then-check. Add three todos. Mark all three as done. The count should read 0; all three items should still be in the list.
- Toggle-back-restores. Add two todos. Mark both done. Then unmark one. The count should be 1; both items still in the list.
🃏 Your Spec Card (write this BEFORE you write code — on paper or as a comment)
Fill in the five fields:
| Field | Example shape |
|---|---|
| Behavior | One sentence: what user-visible behavior are you proving? |
| Should pass when | List the implementation changes the test must survive (CSS class renames, button text tweaks, etc.) |
| Required failures | List the regressions the test must catch (count not decrementing, items deleted on done, etc.) |
| Locator contract | Which semantic queries (getByRole, getByLabel, etc.) — and why each one |
| Oracle | Per-promise: what assertion shape pins each promise at the right specificity? |
Once your Spec Card has all five fields, then open tests/all-done.spec.js and start typing. You will see only the import line; everything else is yours.
✏️ Write the test
Open tests/all-done.spec.js (currently has only the import line). Write two tests covering the two user paths above. Both must:
- Use
getByRole/getByLabelfor every locator (no CSS classes, no XPath). - Use
await expect(...)for every assertion (no synchronousexpect(await locator.isVisible()).toBe(true)). - Match assertion specificity to spec specificity: the count number IS the contract, but the wording around it (“0 items remaining” vs “Nothing left to do”) is not.
📋 What the gates check
The gates below verify you wrote the test from scratch — the file will have:
- An
importline fortest, expect. - Two
test('...', async ({ page }) => { … });blocks. - At least one
await page.goto(...)per test. - At least one
await expect(...)per test. - At least one
getByRole(...)locator (proving you used the accessibility tree). - And of course: both tests must actually pass against the running app.
Don’t peek at Step 7’s solution mid-task. The point of this step is not the answer; it’s the typing-from-blank habit.
function App() {
const [items, setItems] = React.useState([]);
const [text, setText] = React.useState('');
function addTodo() {
const trimmed = text.trim();
if (!trimmed) return;
setItems([...items, { text: trimmed, done: false }]);
setText('');
}
function toggleDone(idx) {
setItems(items.map((item, i) =>
i === idx ? { ...item, done: !item.done } : item
));
}
const remainingCount = items.filter((item) => !item.done).length;
return (
<main className="todo-shell">
<section className="todo-panel">
<p className="eyebrow">Todo Lab — From-Scratch Capstone</p>
<h1>Todo Lab</h1>
<div className="todo-form">
<label htmlFor="todo-input">Todo item</label>
<div className="todo-row">
<input
id="todo-input"
value={text}
onChange={(event) => setText(event.target.value)}
placeholder="Buy milk"
/>
<button onClick={addTodo}>Add todo</button>
</div>
</div>
<p role="status" className="status-line">
{remainingCount} items remaining
</p>
<ul aria-label="Todo list" className="todo-list">
{items.map((item, idx) => (
<li key={idx} className={item.done ? 'todo-done' : ''}>
<button
className="todo-toggle"
onClick={() => toggleDone(idx)}
aria-pressed={item.done}
>
{item.text}
</button>
</li>
))}
</ul>
</section>
</main>
);
}
const root = ReactDOM.createRoot(document.getElementById('root'));
root.render(<App />);
body { margin: 0; font-family: system-ui, -apple-system, sans-serif; background: #f6f7fb; color: #1f2937; }
.todo-shell { min-height: 100vh; display: grid; place-items: center; padding: 32px; }
.todo-panel { width: min(100%, 560px); background: white; border: 1px solid #d9dee8; border-radius: 8px; padding: 28px; box-shadow: 0 18px 40px rgba(31, 41, 55, 0.08); }
.eyebrow { margin: 0 0 8px; color: #4b5563; font-size: 0.85rem; font-weight: 700; text-transform: uppercase; letter-spacing: 0.04em; }
h1 { margin: 0 0 24px; font-size: 2rem; }
label { display: block; margin-bottom: 8px; font-weight: 700; }
.todo-row { display: flex; gap: 10px; }
input { flex: 1; min-width: 0; background: white; color: #1f2937; border: 1px solid #b8c0cc; border-radius: 6px; padding: 10px 12px; font: inherit; }
.todo-row > button { border: 0; border-radius: 6px; padding: 10px 14px; background: #2563eb; color: white; font: inherit; font-weight: 700; cursor: pointer; }
.status-line { margin: 18px 0 0; color: #4b5563; font-weight: 600; }
.todo-list { margin: 12px 0 0; padding-left: 0; list-style: none; }
.todo-list li { margin: 8px 0; }
.todo-toggle { width: 100%; text-align: left; background: transparent; border: 1px solid #d9dee8; border-radius: 6px; padding: 10px 12px; font: inherit; cursor: pointer; }
.todo-toggle[aria-pressed="true"] { background: #ecfdf5; border-color: #10b981; }
.todo-done .todo-toggle { text-decoration: line-through; color: #6b7280; }
[data-bs-theme="dark"] body { background: #1c2533; color: #e6edf3; }
[data-bs-theme="dark"] .todo-panel { background: #232a36; border-color: #2a323e; box-shadow: 0 18px 40px rgba(0, 0, 0, 0.4); }
[data-bs-theme="dark"] .eyebrow { color: #9ca3af; }
[data-bs-theme="dark"] input { background: #2a323e; color: #e6edf3; border-color: #3a4351; }
[data-bs-theme="dark"] input::placeholder { color: #6b7280; }
[data-bs-theme="dark"] .todo-row > button { background: #2563eb; }
[data-bs-theme="dark"] .todo-toggle { background: transparent; color: #e6edf3; border-color: #3a4351; }
[data-bs-theme="dark"] .todo-toggle[aria-pressed="true"] { background: #064e3b; border-color: #10b981; }
import { test, expect } from '@playwright/test';
// ─────────────────────────────────────────────────────────────
// From-scratch capstone. Two tests, both written by you, both
// following the spec at the top of the step. No TODOs, no stubs.
//
// Spec recap (write this as a comment block before each test):
// Promise: marking all todos done makes the count read 0,
// and all items remain visible.
// Path 1: add 3 todos, mark all 3 done, expect count = 0
// and 3 listitems still visible.
// Path 2: add 2 todos, mark both done, unmark one,
// expect count = 1, both listitems visible.
// ─────────────────────────────────────────────────────────────
Step 8 — Knowledge Check
Min. score: 80%1. (Cumulative — Steps 3 + 6.) You’re testing a button that the team has announced will be renamed from “Submit” to “Place order” next quarter. The action it performs (submitting the order) won’t change. Which locator + assertion shape best matches the spec?
When the spec tells you wording is going to change but the action is permanent, that’s the canonical case for getByTestId with a semantic test ID. Pair it with a Goldilocks assertion on the outcome region (role + regex) and you’ve matched specificity to spec on both sides.
2. (Cumulative — Step 5.) A test using getByRole('button', { name: 'Add todo' }) (exact name, not regex) fails after marketing renamed the button to “Add”. The behavior is unchanged. What’s the most accurate diagnosis?
False alarms erode trust in the test suite faster than anything else. The fix isn’t to reactively patch the test on every UI change — it’s to choose locators whose contract matches what the spec actually promises.
3. (Cumulative — Steps 4 + 7.) A “Mark complete” feature has two spec’d promises: (1) the item shows visually that it’s complete, (2) the remaining-count decrements. Which assertion set best catches both regressions while surviving harmless styling changes?
Multi-promise features (Step 7) require per-promise specificity decisions. Each promise gets its own assertion shape — semantic for the toggle state, count-as-number for the counter — and each independently honors the principle: pin what the spec promises, no more, no less.
4. What’s the single most useful artifact you produced in this step?
Tests are downstream of decisions. The Spec Card is the upstream artifact that made every decision visible before you typed. Carry the habit. On your first job’s first PR, the difference between writing a brittle test and a robust one is whether you wrote the Spec Card before opening the test file.