Use the master deck when you want a mixed review of code-reading, debugging, review, AI, prompting, code-smell, and refactoring vocabulary. Use the master quiz to practice deciding which engineering practice fits a realistic maintenance situation.
Development Practices Master Flashcards
A comprehensive mix of the development-practices flashcards with standalone decks: comprehension, debugging, GenAI, review, code smells, refactoring, and beacons.
Difficulty:Basic
What are the three kinds of cognitive load in code comprehension?
Intrinsic load is the unavoidable difficulty of the domain or algorithm. Extraneous load is unnecessary effort caused by poor presentation, inconsistent names, tangled control flow, or bad tooling. Germane load is productive effort spent building durable mental models of the system.
The engineering goal is not to eliminate all difficulty. It is to reduce extraneous load so the reader has working-memory capacity left for intrinsic and germane load.
Difficulty:Basic
How do bottom-up and top-down comprehension differ?
Bottom-up comprehension starts with statements and builds larger chunks from control flow and data flow. Top-down comprehension starts with a hypothesis about the system’s purpose and looks for beacons that confirm, refine, or reject it.
Bottom-up is useful when the reader lacks context or when the code is genuinely unfamiliar. Top-down is faster when the reader has relevant schemas and the code exposes reliable cues.
Difficulty:Intermediate
What are the four components of the integrated meta-model of program comprehension?
The model combines a situational model of system functions, a low-level program model of control flow, a top-down domain model, and the programmer’s knowledge base. Developers move among these models opportunistically.
The point of the integrated model is that real developers rarely use only one reading strategy. They switch levels as their hypotheses succeed or fail.
Difficulty:Intermediate
What should a reviewer do during the orientation phase before reading a complex diff?
Read the PR description, issue, tests, and design notes to establish the why and what of the change. Then form a hypothesis about the expected design before comparing the implementation against it.
Starting from the diff forces bottom-up reconstruction. Starting from intent gives the reviewer a specification layer that makes the diff easier to evaluate.
Difficulty:Advanced
Why can cyclomatic complexity under-predict human difficulty?
Cyclomatic complexity counts branches, but it treats many branch shapes as equally difficult. Humans find deep nesting, long data-flow dependencies, and large visual blocks more costly than a flat set of familiar branches.
Cognitive complexity metrics try to better match human effort by penalizing nesting and interruptions to linear reading.
Difficulty:Intermediate
What is the architecture-code gap?
It is the mismatch between the architect’s intensional view of components, layers, constraints, and design decisions and the developer’s extensional view of files, classes, packages, and statements.
When this gap is unmanaged, maintainers make local code changes that satisfy the immediate task while eroding the system’s intended architecture.
Difficulty:Advanced
Why can excessive abstraction make code harder to understand?
Abstraction hides detail, but too many shallow layers force readers to jump across files and reconstruct interactions that no one layer explains. This can overload working memory even when each class is individually small.
The useful target is not maximum abstraction. It is deep modules: simple interfaces that hide meaningful complexity and match how readers need to reason.
Difficulty:Intermediate
Name three practices that make code easier to comprehend top-down.
Use domain-rich identifiers, expose architectural intent through package and module structure, and write comments that explain why an approach was chosen. Deep modules, stable interfaces, and tests-as-specifications also help.
These practices give future readers beacons. A reader can verify a hypothesis quickly instead of tracing every line from scratch.
Difficulty:Basic
What is top-down code comprehension?
Top-down comprehension is a whole-to-part reading strategy: the developer starts with a high-level hypothesis about what the system does, then searches for evidence in names, structure, tests, and architecture.
It differs from line-by-line tracing because the reader uses prior knowledge to decide where to look and what details matter.
Difficulty:Basic
How does schema activation help expert programmers read code faster?
A schema is a stored mental pattern. When a developer recognizes an e-commerce system, visitor pattern, repository layer, or sorting loop, the schema supplies expectations that reduce the amount of detail they must hold in working memory.
Expert reading is fast because familiar structure becomes one chunk, not because the expert literally processes every line faster.
Difficulty:Advanced
What is a dangling purpose link in a reader’s mental model?
It is a gap between knowing what a program should accomplish and not yet knowing how the implementation accomplishes it. The gap generates targeted ‘how’ questions that guide the reader into the code.
A good PR description intentionally creates the specification layer first, so the reader’s implementation search is focused rather than random.
Difficulty:Basic
What is the Stepdown Rule?
A source file should present high-level functions first, then place lower-level helper functions below the functions that call them. The reader descends one abstraction level at a time.
The rule supports top-down scanning: read the public story first, drill into details only when a hypothesis needs verification.
Difficulty:Basic
How does the Newspaper Metaphor apply to source files?
Like a newspaper article, a source file should put the most important, high-level ideas first and defer low-level details. A reader should be able to skim the top and understand the main story.
This does not mean hiding details forever. It means organizing details so the reader can choose when to descend.
Difficulty:Intermediate
Why do experts switch between top-down and bottom-up comprehension?
Top-down hypotheses are efficient until they fail. When a beacon is missing, an abstraction leaks, or behavior is surprising, experts temporarily switch to bottom-up tracing to repair the mental model.
The strongest readers are opportunistic. They do not cling to one strategy when evidence says another will be cheaper.
Difficulty:Advanced
When can a design pattern hurt top-down comprehension?
A pattern hurts when the code uses the pattern name but violates the pattern’s expected responsibility, or when the pattern adds layers that obscure a simple domain concept. The beacon then points to the wrong schema.
Patterns are vocabulary, not decoration. A misleading pattern name is worse than no pattern name because it creates a false hypothesis.
Difficulty:Intermediate
Which IDE features support top-down comprehension?
Architecture views, call hierarchy, go-to-definition, symbol search, split views, and intelligent completion all help readers move from high-level beacons to implementation details and back.
These tools are cognitive prosthetics. They reduce navigation load so more working memory can be spent on reasoning.
Difficulty:Basic
What is a code beacon?
A beacon is a recognizable, familiar point in code that gives the reader a hint about the code’s purpose. It acts as evidence for a larger mental model.
Beacons let readers avoid tracing every statement when a reliable clue can activate an existing schema.
Difficulty:Basic
Why are full-word identifiers powerful lexical beacons?
A name like isPrimeNumber communicates domain intent immediately, while a name like pn forces the reader to infer meaning from surrounding code.
Good names move information from working-memory reconstruction into immediate recognition.
Difficulty:Basic
What is a structural beacon?
A structural beacon is a recognizable code shape, such as an accumulator loop, a sorting swap, or a standard request-validation-controller sequence that triggers a familiar programming plan.
The reader recognizes the plan first, then only inspects details that might differ from the expected pattern.
Difficulty:Basic
How do tests act as beacons?
Tests document intended behavior in executable form. A reviewer can read tests first to build a top-down expectation before reading the production implementation.
Tests are especially valuable beacons during code review because they expose the author’s specification layer.
Difficulty:Basic
How do assertions act as beacons?
Assertions make assumptions explicit at the exact point where they matter. They tell readers what state the author believes must hold.
The assertion is both a runtime check and a comprehension cue.
Difficulty:Advanced
What is the Singleton naming paradox for beacons?
Including Singleton in a class name can help readers recognize the design, but it may also leak an implementation decision clients should not depend on.
The trade-off is beacon visibility versus information hiding. Not every helpful cue belongs in a public name.
Difficulty:Intermediate
How do contextual beacons extend beyond source code during review?
PR titles, descriptions, issue links, commit messages, file names, tests, and ownership boundaries all help reviewers build the specification layer before reading the diff.
Modern code review is not just source reading. It is hypothesis formation across the whole change artifact.
Difficulty:Intermediate
Why do experts avoid exhaustive tracing when beacons are reliable?
Exhaustive tracing spends scarce working memory on details that a reliable beacon already compresses. Experts trace line-by-line only when a hypothesis fails or behavior is risky.
Expertise includes knowing when not to trace. Strategic avoidance of unnecessary detail is a strength, not laziness.
Difficulty:Intermediate
Define fault, error, and failure — and explain why keeping them distinct changes how you debug.
Fault — the erroneous location in the code (e.g., radius never converted to a number). Error — an incorrect state during execution (e.g., radius holds '10' instead of 10). Failure — observable incorrect outside behavior at the system boundary (e.g., wrong number printed).
Each term names a different observation point in the system. You see the failure (or the user does), the failure was caused by an error somewhere upstream in the execution, and the error was caused by a fault in the source code. Fixing the bug means changing the fault — but you usually start your investigation from the failure. A try/catch that swallows the exception suppresses the failure but leaves the fault and the error intact, which is why bare excepts make bugs harder to find, not easier.
Difficulty:Basic
Name the four steps of the systematic debugging process, in order.
(1) Investigate symptoms to reproduce the bug. (2) Locate the faulty code. (3) Determine the root cause. (4) Implement and verify the fix.
Order matters: jumping ahead is the most common way to lose hours. Starting in the debugger before you can reproduce the bug means the debugger has nothing to show you. Calling the bug fixed before running the test suite means you may have shipped a regression. Each step has a deliverable — a reproduction, a suspected file, a root-cause story, a passing test — and the next step depends on the previous one.
Difficulty:Basic
Why does reproducing the bug come before trying to fix it? What are you trying to capture?
A bug you cannot reproduce is a bug you cannot debug — and cannot verify the fix for. Capture two things: the problem environment (OS, browser, build version, configuration) and the problem history (the sequence of inputs and interactions that triggered the bug).
The Therac-25 case is the cautionary tale: a radiation therapy machine killed six patients in the 1980s with overdoses triggered by an operator typing faster than the developers expected. The bug was reproducible only with the operator’s actual typing speed — which the test team never matched. Mature bug-report templates ask for environment and history precisely because reproduction is load-bearing for every step that follows.
Difficulty:Intermediate
What is regression testing, and how does it relate to the bug-reproduction test you wrote in step 1?
Regression testing = re-running existing tests after a code change to ensure new updates haven’t broken previously-working behavior. The bug-reproduction test you wrote during debugging becomes a regression test once the fix lands — it stays in the suite forever to catch the same bug if it sneaks back.
Every fixed bug is one assertion away from coming back. The reproduction test you wrote in step 1 is the cheapest insurance against that — it stays in CI, runs on every commit, and fails immediately if anyone (including your future self or an AI agent rewriting the area) reintroduces the same defect. This is one of the highest ROI moves in the entire process.
Difficulty:Intermediate
When debugging your own code, when should you reach for search engines / AI tools vs a debugger? Give the rule.
Search the error message when it comes from a framework, library, or external service (not your own code) — strip project-specific identifiers first, keep error codes. Use the debugger when the error is in your own logic and you need to understand why your variables hold the values they do.
Framework errors have almost always been hit before; a 30-second search beats 30 minutes of stepping. Your own logic errors are unique to your code — no one else has seen them — so the debugger (or a rubber duck) is the right tool. When you do run a command suggested by an AI search result, read it first: prompt-injection attacks on search-result pages occasionally suggest destructive operations like git push --force or curl … | sudo bash.
Difficulty:Basic
You’re explaining your code to a colleague at their desk. Halfway through line 12 you stop, stare, and say ‘oh.’ You’ve just fixed the bug yourself. Name the phenomenon and the technique.
Phenomenon: curse of knowledge — when you read code you wrote, you see what you intended rather than what is actually there. Technique: rubber-duck debugging — explain your code line by line to anything (duck, plant, colleague), and you’ll catch the gap between intent and actual text when you say it aloud.
Verbalizing forces a comparison between what the line should do and what it actually does — that comparison rarely happens silently in your head. A duck is preferable to a colleague because the duck never interrupts, never confirms your biases, and is always available. For students: prefer rubber-ducking over asking an AI for the bug, especially early in your career — the act of explaining is what builds the mental model you’ll need for the next bug.
Difficulty:Advanced
Compare an assertion (assert x > 0) and an exception (if x <= 0: raise ValueError). When is each appropriate?
Assertion = a programmer’s claim about an invariant that should never be false at runtime (catches developer mistakes). Compiled out in production. Exception = a runtime condition that can legitimately occur and should be handled (catches user / external mistakes). Always present.
‘Should never happen’ → assertion. ‘Could happen, here’s the recovery’ → exception. Assertions document and enforce internal invariants; they fail loudly during development so the bug is found early. Exceptions handle external imperfection — bad user input, network failures, missing files. Catching an assertion is almost always a bug; catching an exception is often the point. Many languages let you compile assertions out of production binaries (python -O, gcc -DNDEBUG), which is why exceptions stay and assertions go.
Difficulty:Basic
Your loop iterates 50,000 times and the bug only appears around iteration 12,000. How do you avoid clicking Step Over 12,000 times?
Conditional breakpoint — right-click the breakpoint and set an expression like i == 12000. The debugger only pauses when the expression is true. Alternative: a hit-count breakpoint that fires on the Nth time the line is reached.
Most IDEs support both. The conditional expression can include any variable or function the debugger can evaluate (len(items) > 1000, user.role == 'admin' and amount > 50000, i % 1000 == 0). They’re indispensable for bugs that only manifest at scale, on specific records, or under particular state — exactly the bugs that simple breakpoints cannot help with.
Difficulty:Intermediate
What is a time-travel debugger, and what does it do that an ordinary debugger cannot?
A debugger that records the execution and lets you step backwards in time — re-examine a variable’s value three statements ago, or replay forward from a point with a hypothetically modified value. Ordinary debuggers only move forward.
When you set a breakpoint and miss the moment that mattered, an ordinary debugger forces you to re-run the program. A time-travel debugger lets you reverse-step within the same recorded execution. Not built into VS Code by default but available as extensions (Python’s rr, Node’s recording proxies, GDB’s record full + reverse-continue). The SEBook ships a time-travel-enabled Python debugger tutorial for practice.
Difficulty:Advanced
You write try: do_thing(); except: pass and tell your team ‘this is fault-tolerant.’ Why is this misleading?
A bare except: pass is the opposite of fault-tolerant — it swallows every error silently (including ones that leave the system in a partially-updated, inconsistent state) and gives no signal that anything went wrong. Real fault tolerance is selective handling of known failure modes with deliberate recovery and observability (logs, metrics, alerts, compensating transactions).
Bare-except patterns hide the bug and leave invariants violated. For a transfer(from, to, amount) that’s debited the source but failed to credit the destination, not crashing is much worse than crashing — the crash would have prevented the half-done transaction; the swallow leaves money missing. The right pattern for monetary operations is an atomic transaction (both updates or neither); the right pattern for handled errors elsewhere is a narrowly-typed except SpecificError: with a log line and a recovery path.
Difficulty:Advanced
A regression test passed two weeks ago and fails today. There are ~200 commits between the two versions and no obvious culprit in the diff. What’s the right move, and why does it scale better than the alternatives?
git bisect run ./failing-test.sh — git performs a binary search across the commits, running the test at each midpoint, and converges on the offending commit in roughly $\log_2(200) \approx 8$ test runs instead of ~200. Fully automated if you have a scripted reproduction.
This is one of the highest-leverage payoffs from writing the automated reproduction test in step 1. Without a scripted test, git bisect falls back to manual good/bad judgments at each step — still useful, but slower. The bisect output is the commit that introduced the regression, which usually points straight at the change responsible — much more focused than git blame, which only tells you who last touched a particular line.
Difficulty:Advanced
You just landed a bug fix. The failing reproduction test now passes. What three more things should you do before calling the bug closed?
(1) Add assertions for nearby invariants — the conditions that produced this bug probably hold elsewhere. (2) Run the whole test suite — make sure the fix didn’t break a previously-passing test (a regression). (3) Document the fix — code comment for the non-obvious why, ticket reference in the commit message, root cause in the bug report.
Step 4 is where good debuggers get separated from average ones. The temptation is to mark the ticket FIXED the moment the failing test goes green, but that’s a single data point. Running the full suite turns it into a population check. Adding assertions catches the next bug in the family before it ships. Documenting the why prevents an AI agent (or a teammate, or future you) from reverting the fix six months from now because the line ‘looked unnecessary.’
Difficulty:Advanced
Your team has a 200-step manual reproduction of an intermittent bug. Before fixing the bug, what should you do to the reproduction itself, and why?
Simplify it. Iteratively remove steps and re-run; if the bug still occurs without the step, drop it from the reproduction. Most bug reproductions have ~5 essential steps and ~195 confounders. The smaller the reproduction, the faster every fix attempt, the cleaner the eventual regression test, and the less surface area for an unrelated change to mask the bug.
This is called delta debugging when automated, and it generalizes: minimize the input that still triggers the bug. A 200-step repro that takes 5 minutes to run burns an hour per fix attempt; a 5-step repro that takes 5 seconds lets you iterate dozens of times per hour. The minimal trigger also tells you something about the root cause that the long reproduction obscures.
Difficulty:Advanced
Look at this debugger trace. After input_radius = sys.argv[1], the watch panel shows input_radius = '10' (with quotes). Two steps later, diameter = 2 * radius produces diameter = '1010'. What’s the bug and where is it?
The bug is missing type conversion at the assignment from sys.argv[1] — sys.argv always returns strings, so 2 * '10' performs Python string repetition ('10' + '10' = '1010'), not arithmetic. The fault is one line: input_radius = sys.argv[1] should be input_radius = float(sys.argv[1]). The symptom shows up inside cal_circumference, but the fix belongs at the input boundary.
Classic case of where you observe the symptom ≠ where you fix the fault. The debugger’s quoted string '10' in the watch panel is the diagnostic clue — a number wouldn’t be quoted. Boundary type conversion (string → number when reading CLI args, JSON bodies, query parameters, environment variables) is one of the most common bug sources in scripting languages. A defensive version would assert isinstance(input_radius, (int, float)) immediately after conversion to fail loudly if the input was missing or malformed.
Difficulty:Advanced
A new colleague says: “I’ve been debugging for 4 hours. I’ve read the function 50 times. I just can’t see what’s wrong.” Diagnose what’s happening and prescribe the next 30 minutes.
Diagnosis: stuck mental model — they keep reading what they intended rather than what’s actually there (curse of knowledge). 50 readings won’t fix that because each one applies the same broken model. Prescription: switch tactics. (a) Open the debugger and watch the actual variable values flow — let the program show you instead of inferring it. Or (b) explain the code aloud to a duck or colleague — verbalizing forces intent-vs-actual comparison. Or (c) take a 15-minute walk to reset before either.
Hours of staring is the canonical symptom that the way you are looking has stopped working; the remedy is almost never try harder with the same approach — it’s switch the approach. Debuggers and rubber-ducking both break the curse of knowledge from different angles: the debugger overrides intent with empirical state, while rubber-ducking surfaces the intent so you can compare it against the code. Developing the reflex to switch tactics is itself what separates 1× from 3× debuggers.
Difficulty:Basic
What does it mean to call an LLM a statistical parrot?
An LLM does not understand code in a human sense — it predicts the most likely next token based on statistical patterns in its training data. It mimics fluent code without grounding in formal logic, real-world facts, or the existence of the APIs it references.
This framing explains hallucinations (plausible-looking but fabricated APIs), outdated patterns (repeated from training data), and confident-but-wrong outputs. Linguistic plausibility is not factual correctness.
Difficulty:Intermediate
Why is GenAI’s productivity boost (21–50%) smaller than the compiler revolution (10x)?
Compilers automated accidental complexity — repetitive mechanical translation from high-level intent to machine instructions. GenAI helps with parts of accidental complexity too but does not yet automate essential complexity — understanding requirements, choosing data structures, navigating trade-offs, integrating with real systems. Most of an engineer’s work still lives in essential complexity.
The accidental-vs-essential distinction predicts exactly this ceiling: tools that automate mechanical work give big one-time gains; tools that touch judgment-heavy work give smaller, slower gains.
Difficulty:Basic
Name the three stages of LLM development.
Pre-training: building a base foundation model by training on vast amounts of public code/text to predict the most likely next token. Post-training: fine-tuning on labeled data and applying RLHF (Reinforcement Learning from Human Feedback), where developers rank outputs by readability and correctness. Inference: prompting the model to produce a typically non-deterministic sequence of answer tokens.
Each stage shapes the model’s behavior. Pre-training determines what it ‘knows.’ Post-training (especially RLHF) calibrates what it produces in response to instructions. Inference parameters (temperature, top-p) control how deterministic the output is.
Difficulty:Intermediate
What is the illusion of AI productivity, and how do you avoid being fooled by it?
Generation speed feels like productivity, but if the output is subtly wrong, debugging can dwarf the time saved. Avoid the illusion by measuring productivity end-to-end (features shipped per week with acceptable defect and security rates), not by characters generated per minute.
A controlled study of experienced developers on real open-source work found they felt roughly 24% faster with AI while measured throughput was about 19% slower. Generation is visible and fast; debugging is invisible and slow.
Difficulty:Advanced
Why do AI-generated codebases tend to have higher security vulnerability rates?
Roughly 40% of Copilot suggestions in security-sensitive CWE-specific scenarios have been found to contain vulnerabilities. The AI pattern-matches on training data that mixes secure and insecure examples. Compounding the bug rate, developers with AI assistants often write less secure code while being more confident it is secure — a calibration failure.
The 40% figure is scoped to security-sensitive prompts, not all generated code, but plausible-looking vulnerable patterns appear well beyond that benchmark. Mitigations: explicit security review of every AI block, static-analysis in the loop, extra scrutiny on SQL, deserialization, auth, and never treating AI confidence as evidence of safety.
Difficulty:Basic
What is cognitive offloading, and why is it harmful for junior engineers?
Cognitive offloading is using AI to replace thinking — pasting the prompt, copying the answer, moving on without engaging the material. It minimizes learning, prevents skill formation, and leaves the developer unable to debug or explain the code later. For juniors especially, it kneecaps the foundational understanding their career depends on.
The opposite is conceptual inquiry: asking the AI to explain a concept, compare implementations, or argue trade-offs. This preserves cognitive engagement and exercises continual-learning ability — the skill humans retain over AI.
Difficulty:Basic
What is the Supervisor Mentality for working with GenAI?
Treat GenAI as a knowledgeable but unreliable intern. Three rules: (1) Always review AI-generated code; (2) Explainability rule — never commit AI code you cannot explain to a colleague; (3) Assume subtle incorrectness — work from the premise that the output is subtly buggy or insecure until verified.
This calibration is the antidote to vibe coding (forgetting the code exists and shipping on ‘vibes’). It maps to how a senior engineer would treat any unfamiliar contributor’s PR: review, verify, don’t auto-merge.
Difficulty:Intermediate
Compare the Driver and Navigator roles in AI pair programming.
Driver: the human writes the code and asks the AI to critique it for performance, security, or design issues. Navigator: the human directs the AI to write specific blocks while ensuring they understand every line produced. In both, the human retains intellectual ownership and accountability for the result.
Driver suits security/performance review and design exploration. Navigator suits boilerplate, idiomatic-syntax generation, and well-specified tasks. Both deliberately keep the human in active intellectual control — neither is delegation to autopilot.
Difficulty:Intermediate
What is Test-Driven Generation (TDG), and what are its four steps?
(1) Prompt the AI to generate tests from a problem description. (2) Carefully review the tests as a specification. (3) Prompt the AI to generate the implementation that passes those tests. (4) Use a remediation loop — feed failing test output (stack traces, mismatches) back to the AI until tests pass.
The review step (2) is where TDG earns its quality: if the tests are right, satisfying them produces correct code. Skipping review means satisfying broken tests. This mirrors TDD’s RED-GREEN-REFACTOR rhythm with AI doing the writing under human verification.
Difficulty:Advanced
Why does loose coupling amplify AI effectiveness, and tight coupling sabotage it?
Modular code (Information Hiding, microservices, well-bounded interfaces) limits the context window the AI needs to process. Smaller, well-named modules fit cleanly in context; hidden internals don’t leak unexpected coupling; generated code can be locally verified. In tightly coupled spaghetti code, the AI cannot see (or fit) enough context to reason correctly, and its plausible-looking output silently breaks distant code.
Modern architecture has gained a new payoff: it is now a force multiplier for AI productivity, not just a maintainability concern. Teams that defer architectural cleanup pay a compounding AI-effectiveness tax on every future change.
Difficulty:Intermediate
Why is AI inference typically non-deterministic, and what does that mean for testing?
LLMs sample from a probability distribution over next tokens; identical prompts can produce different outputs depending on the temperature parameter and random seed. Non-determinism means you cannot rely on bit-identical AI output for testing — your tests must verify properties of the result (it compiles, it passes tests, it satisfies a spec), not its exact text.
Some workflows set temperature=0 for more deterministic output, but even then small variations can occur. Anything that depends on the AI’s text matching exactly is brittle; verify behavior or structure, not surface form.
Difficulty:Basic
What is an AI hallucination in coding, and why is it especially dangerous?
The AI confidently produces a call to an API, library, or method that does not exist (e.g., import datafetcher_v2 as dfv2 for a fictitious library). It is dangerous because the output looks correct and would pass casual review; the bug surfaces only when the code actually runs or is integrated.
Hallucinations are a direct consequence of the statistical-parrot architecture: the model generates linguistically plausible tokens without verifying real-world existence. Mitigations: IDE integrations that auto-complete only real symbols, retrieval-augmented generation grounded in real codebases, and treating unfamiliar imports/method calls with extra scrutiny.
Difficulty:Advanced
Why do AI-augmented codebases tend to show rising code complexity and static-analysis warnings?
AI tends to generate additive solutions — adding new code that solves the local problem rather than refactoring toward the existing structure or removing duplication. Without a deliberate refactor step, complexity compounds with each accepted suggestion. The fix is process-level: pair AI generation with refactor passes, enforce linters and complexity limits in CI, and reject AI-suggested duplication.
Industry analysis has reported roughly 42% rising complexity and 30% more warnings in AI-augmented codebases — treat the exact numbers as one data point, not consensus, but the direction matches what review-heavy teams report. The underlying issue is workflow, not tool quality: teams that don’t pair AI generation with refactoring accumulate debt faster than human-written code would.
Difficulty:Intermediate
Why does the leverage of an engineer’s work shift from producing code to specifying and verifying it in the GenAI era?
Because AI can produce plausible-looking code quickly, but cannot reliably decide what code should be produced or whether the produced code is correct in a specific system context. The bottleneck moves from typing-speed (now cheap) to figuring out the spec, designing the architecture, and verifying the output — the parts AI still stumbles on.
Concretely: requirements engineering, systems thinking, architecture, code review, security review, and prompt/context engineering all rise in importance; rote syntax memorization and boilerplate authoring fall. INVEST user stories, formal verification techniques, and architecture-for-context all become increasingly load-bearing skills.
Difficulty:Advanced
Why is prompt and context engineering considered a load-bearing engineering skill rather than a UI trick?
Because what an LLM produces depends sharply on what context it can see (architecture, file boundaries, surrounding code) and how the task is framed. An engineer who can shape both — by structuring the codebase for clean context windows and by writing prompts that surface real constraints — gets dramatically better output than one who treats the AI as a search box.
This is why modular architecture is now an AI multiplier: smaller bounded interfaces fit in context, hidden internals don’t leak, and generated code can be reasoned about locally. Prompt and context engineering compose with architecture skill, not replace it.
Difficulty:Basic
What is vibe coding, and what is the professional alternative?
Vibe coding is forgetting the code exists and relying on ‘vibes’ — letting the AI generate, paste, run, and ship without intellectual ownership of the result. The professional alternative is the Supervisor Mentality: review every block, explain every commit, assume subtle incorrectness, and maintain end-to-end accountability for what ships.
Vibe coding produces immediate results and accumulating hidden debt. It also crushes skill formation, especially for juniors. The Supervisor Mentality is slower per-commit but produces shippable, defensible, debuggable code — and grows the engineer’s skills rather than substituting for them.
Difficulty:Basic
What does an AI coding agent add on top of a plain chatbot?
A coding agent places an LLM inside a development environment: it can inspect files, search the repository, edit code, run tests, read errors, inspect Git history, and sometimes browse documentation. This makes it a workflow participant rather than only a text generator.
The added tool access is why agents feel powerful, but it also raises the supervision bar. If an agent can run useful commands, it can also propose dangerous ones.
Difficulty:Intermediate
What is a prompt injection risk for coding agents?
Prompt injection happens when malicious or irrelevant instructions hidden in a web page, issue, document, or code comment are read by the agent and treated as task instructions. For coding agents, this can lead to unsafe commands, data exposure, or unrelated code changes.
The mitigation is not blind trust: inspect tool calls, understand shell commands before approving them, limit permissions, and keep the task context bounded.
Difficulty:Basic
Why are skill files or project rule files useful for AI-assisted development?
They persist project-specific constraints and checklists — for example accessibility rules, test expectations, storage inventories, dark-mode requirements, naming conventions, or architecture boundaries — so the agent is more likely to apply them without every prompt repeating them.
Skill files improve the agent’s default behavior; they do not remove the need for review. A rule file is an instruction, not proof of compliance.
Difficulty:Intermediate
Why should large AI tasks start in plan mode?
A plan makes the agent’s assumptions visible before code exists. The human can review architecture, state transitions, tests, security, accessibility, and scope, then approve one small step at a time.
Planning changes the workflow from ‘generate a pile of code and hope’ to ‘surface design decisions, bound the task, implement, test, review, refactor.’
Difficulty:Intermediate
Why is dumping the entire repository into an AI context often worse than selecting relevant files?
LLMs have finite context windows and uneven attention. Irrelevant files can bury the important constraints, causing lost-in-the-middle failures or hallucinations. Good context engineering provides the smallest relevant slice: target files, nearby interfaces, tests, and constraints.
More context is not automatically better. High-signal context beats huge low-signal context.
Difficulty:Advanced
What is a design-decision prompt, and why is it useful?
A design-decision prompt asks the AI to compare trade-offs before implementation, such as ‘Should we store the generated SVG or the avatar parameters?’ The AI can list consequences; the human chooses based on product goals and quality attributes.
This preserves human ownership of architecture. The AI helps enumerate options, but the engineer decides which trade-off fits the system.
Difficulty:Intermediate
Which tasks are good candidates for AI assistance once you already understand the domain?
Repetitive scaffolding, familiar boilerplate, first drafts of tests or documentation, simple debugging help, explaining stack traces or APIs, rapid prototypes, edge-case brainstorming, and small refactorings with tests.
These tasks are common, well-bounded, and reviewable. The human still checks the output and quality attributes before shipping.
Difficulty:Intermediate
Which tasks should you be cautious about delegating to AI?
High-stakes security, safety, legal, medical, financial, or accessibility-sensitive work; complex stateful workflows; novel architecture decisions; and any problem you do not understand well enough to review.
AI is an amplifier of engineering skill. If the human lacks the schema needed to evaluate the output, the agent can create an illusion of competence rather than reliable progress.
Difficulty:Advanced
What is the overfitting failure mode in Test-Driven Generation?
The AI may pass visible tests by hard-coding sample inputs and outputs instead of implementing the general rule. The code looks green but fails the real specification.
The fix is to inspect the implementation, add tests for properties and novel inputs, and refactor toward a general solution. Passing weak tests is not enough.
Difficulty:Basic
How did formal inspections differ from modern code review?
Formal inspections were synchronous, role-heavy meetings with printed code and explicit roles like Moderator, Reader, and Reviewers. Modern Code Review is informal, tool-based, asynchronous, and centered on diffs in systems such as GitHub or Gerrit.
MCR traded some rigor for speed and scalability, which fits Agile, CI, and distributed teams.
Difficulty:Basic
What is the defect-finding fallacy in Modern Code Review?
Teams often say review is mainly for finding bugs, but empirical studies show only about 14% to 25% of comments identify functional defects. Most comments concern maintainability, readability, knowledge transfer, norms, and shared ownership.
The practice still improves quality, but its dominant mechanism is broader than bug hunting.
Difficulty:Basic
Name three major non-defect functions of code review.
Maintainability and code improvement, knowledge transfer and mentorship, and shared code ownership / team awareness.
These functions explain why review remains valuable even when few comments point to functional bugs.
Difficulty:Intermediate
What is the Code Review Comprehension Model (CRCM) asking a reviewer to hold in mind?
The reviewer must compare the existing system, the proposed change, and an ideal solution. That comparison is cognitively expensive, so reviewers use linear, difficulty-based, and chunking strategies.
This is why review quality collapses when a PR is too large or lacks context.
Difficulty:Advanced
What practical limits should shape review size and speed?
Keep reviews roughly below the 200-400 LOC danger zone, avoid sessions longer than about 60-90 minutes, and review at a measured pace rather than skimming hundreds of lines too quickly.
The exact numbers are heuristics, not laws. The principle is that review effectiveness drops when the reader’s attention and working memory are exhausted.
Difficulty:Intermediate
Why do stacked pull requests help review quality?
Stacking decomposes a large feature into small, dependent PRs that each fit within the reviewer’s cognitive budget. Reviewers can inspect database, backend, and UI layers as coherent chunks instead of one monolithic code bomb.
Stacking is process design around the human brain.
Difficulty:Intermediate
How do bikeshedding and linters relate?
Bikeshedding wastes human review attention on trivial subjective details. Linters and formatters move style enforcement to automation so reviewers can spend scarce attention on design, correctness, and maintainability.
The highest-value human comments are the ones automation cannot make.
Difficulty:Advanced
What are five authoring practices that make code more reviewable?
Design by Contract, assertions, guard clauses, meaningful abstractions for chunking, and the Boy Scout Rule.
Each practice reduces the reviewer’s cognitive load or makes hidden assumptions inspectable.
Difficulty:Advanced
How do assertions and guard clauses differ?
Assertions express programmer-error invariants that should never be false if the code is correct. Guard clauses handle expected invalid or edge-case inputs gracefully at the top of a function.
Assertions fail fast on broken assumptions. Guard clauses keep normal control flow flat and readable while handling real runtime conditions.
Difficulty:Intermediate
What are Google’s two approval gates in code review?
Ownership approval from someone responsible for the directory and Readability approval from someone certified in the language style and quality norms.
The gates separate domain authority from language and maintainability norms.
Difficulty:Advanced
Why can adding more reviewers reduce accountability?
Large reviewer groups can trigger a bystander effect: each person assumes someone else will read carefully, so focused attention diffuses instead of multiplying.
Review quality depends on active ownership of the review, not the raw number of people copied.
Difficulty:Advanced
Why does AI-generated code shift review toward outcome verification?
AI can generate large, plausible diffs faster than humans can read them. Reviewers need executable evidence, preview environments, tests, security checks, and behavior validation instead of relying only on syntax inspection.
The AI-era risk is rubber stamping plausible code. Outcome verification makes correctness observable.
Difficulty:Basic
What is a code smell?
A code smell is a surface-level sign that code may have a deeper design problem. It is not necessarily a bug, but it often predicts future maintenance pain.
Smells are diagnostic cues. They tell you where to inspect, not what verdict to reach automatically.
Difficulty:Basic
Why is duplicated code dangerous?
A logic change must be made in every copied location. If one copy is missed, the system develops inconsistent behavior.
Duplication is especially harmful when the copied logic encodes a business rule likely to change.
Difficulty:Basic
What usually causes a Long Method smell?
A method is trying to perform too many distinct steps or mix multiple abstraction levels. Extracting well-named helpers can turn each step into a readable chunk.
The goal is not tiny methods for their own sake. The goal is to make each conceptual step visible and separately reviewable.
Difficulty:Intermediate
How do Large Class and Divergent Change relate?
A Large Class often grows by taking on multiple responsibilities. Divergent Change is the behavioral symptom: the same class is edited for unrelated reasons.
Both point toward separating responsibilities so each class has one primary reason to change.
Difficulty:Intermediate
How are Long Parameter List and Data Clumps related?
A Long Parameter List becomes especially suspicious when the same parameters travel together repeatedly. Those repeated groups are Data Clumps and often deserve a named object.
The named object both shortens signatures and captures a domain concept that primitives were hiding.
Difficulty:Intermediate
Distinguish Divergent Change from Shotgun Surgery.
Divergent Change: one module changes for many unrelated reasons. Shotgun Surgery: one conceptual change requires many tiny edits across scattered modules.
They are opposites. Divergent Change suggests responsibilities are too concentrated; Shotgun Surgery suggests a responsibility is too scattered.
Difficulty:Basic
What is Feature Envy?
A method shows Feature Envy when it is more interested in another object’s data or methods than in its own object’s responsibilities.
The typical fix is Move Method or Extract Method plus Move Method, placing behavior closer to the data and invariants it uses.
Difficulty:Advanced
Why should code smells be handled with judgment instead of automatic rules?
A smell may be justified by performance, framework constraints, simple one-off code, or a trade-off that keeps the design clearer. The question is whether the structure makes future change cheaper or more expensive.
Mechanical smell removal can create worse design. Good refactoring starts from the change pressure the code actually faces.
Difficulty:Basic
What is refactoring?
Refactoring is a semantic-preserving transformation: changing internal structure to improve understandability, modifiability, or design quality without changing observable behavior.
The behavior-preserving constraint is what separates refactoring from feature work or bug fixing.
Difficulty:Basic
Why is refactoring an economic activity, not just code cleanup?
Refactoring reduces the future cost of change. If shortcuts are left alone, the codebase drifts toward a big ball of mud where each new change touches many unrelated files and becomes increasingly risky.
The payoff is future velocity and safety. Clean code is cheaper to modify under deadline pressure.
Difficulty:Basic
What are code smells in the refactoring workflow?
Code smells are symptoms that suggest deeper design problems. They are not necessarily bugs, but they signal where refactoring may improve future change.
Smells guide investigation. A smell is a prompt to ask what design force produced it, not automatic proof that a specific refactoring is required.
Difficulty:Basic
Which refactoring often addresses Data Clumps or Long Parameter List?
Introduce Parameter Object groups related values into one named object, such as replacing startDate, endDate with a DateRange.
The new object reduces call-site mistakes and gives the related data a domain name.
Difficulty:Basic
Which refactoring often addresses Divergent Change?
Extract Class separates responsibilities that change for different reasons into specialized classes.
If one class changes for database logic one day and formatting policy the next, it probably owns multiple concerns.
Difficulty:Basic
Which refactoring often addresses repeated type-code conditionals?
Replace Conditional with Polymorphism moves each branch’s behavior behind a shared interface or superclass, often producing Strategy or State objects.
This is strongest when the conditional represents behavior that changes by type, not when it is a simple one-off guard.
Difficulty:Basic
What is the safety net for refactoring?
A trustworthy test suite plus small, reversible steps. Run tests before the change, make one transformation, run tests again, then checkpoint.
Large refactorings fail when they mix many transformations before feedback. The small-step rhythm makes behavior preservation observable.
Difficulty:Advanced
What is the human supervisor’s role when AI performs refactorings?
The human identifies the smell, chooses the transformation, bounds the scope, runs tests after each step, and rejects changes that alter behavior or hide system-level design problems.
AI can execute many catalog refactorings, but it cannot be trusted to decide that the system behavior is preserved without human verification.
Workout Complete!
Your Score: 0/92
Come back later to improve your recall!
Development Practices Master Quiz
A comprehensive mix of the development-practices quizzes with standalone decks: comprehension, debugging, GenAI, review, code smells, refactoring, and beacons.
Difficulty:Intermediate
A function implements a simple discount rule, but the code uses five levels of nested conditionals, inconsistent variable names, and several helper calls whose names do not reveal their purpose. Which kind of cognitive load is the team mostly creating, and what should they do?
A discount rule may have some intrinsic load, but the stem describes avoidable presentation problems: nesting, names, and opaque helpers. That is the kind of load authors can reduce.
Germane load builds useful mental models. Confusing names and tangled control flow usually consume working memory without improving the reader’s schema.
Saturation describes how perceived complexity can stop scaling linearly, not a reason to abandon improvement. The team still controls several obvious sources of avoidable load.
Correct Answer:
Explanation
The useful engineering move is to preserve the rule while reducing extraneous load — flatten control flow, improve names, expose intent. Comprehension improves when the reader’s scarce working memory is spent on the domain, not on deciphering avoidable presentation noise.
Difficulty:Intermediate
A developer joins a legacy project with no domain knowledge and no reliable naming conventions. They must fix a localized bug in a small parsing function. Which comprehension strategy is most likely at first?
Top-down reading depends on prior schemas and reliable beacons. The question removes both, so the reader has little evidence to drive hypotheses.
Architecture recovery can help with system-level erosion, but it is disproportionate for a small localized parser bug.
Design patterns can be beacons, but many functions do not encode a formal pattern. Forcing pattern recognition here would add noise.
Correct Answer:
Explanation
Bottom-up comprehension is costly but appropriate when the reader lacks the context needed for top-down hypotheses. As the developer learns the domain and identifies reliable beacons, they can switch to more opportunistic strategies.
Difficulty:Intermediate
Which artifacts or mental structures belong to the integrated meta-model of program comprehension? Select all that apply.
The situational model is the reader’s high-level understanding of system functions. Omitting it leaves only syntax, not purpose.
The program model captures the low-level implementation view. It is what bottom-up chunking builds.
The top-down domain model is what lets a reader generate expectations before seeing every statement.
The knowledge base supplies schemas and programming plans that make top-down reading possible.
The integrated model is opportunistic, not alphabetical. Expert readers choose routes based on hypotheses, beacons, and difficulty.
Correct Answers:
Explanation
The integrated meta-model explains why real comprehension moves among purpose, code, domain knowledge, and personal experience. It is not a file-order recipe.
Difficulty:Advanced
A system’s architecture document describes a clean separation between presentation, domain, and data_access, but the codebase contains a single UserManager class that validates forms, builds SQL, and formats UI strings. What is the strongest diagnosis?
Removing the document hides the mismatch; it does not repair the code. The reader still lacks trustworthy cues about where responsibilities live.
Searchability is not the same as comprehensibility. A single class that mixes responsibilities may be easy to find and still hard to change safely.
Branch count might be one local symptom, but the stem describes responsibility drift across architectural boundaries.
Correct Answer:
Explanation
The architecture-code gap appears when the intensional architecture (design vocabulary) and extensional code (actual structure) diverge. Repairing it means making intent visible in packages, names, and interfaces, plus task-specific documentation that maps decisions to source elements.
Difficulty:Advanced
A senior engineer proposes adding design-pattern names to every class so future readers can understand the system faster. What is the best response?
Pattern names are helpful only when they map to a real, stable structure. Decorative pattern language can send readers down the wrong mental path.
Explicit vocabulary is often useful. Refusing to name real patterns removes a high-value beacon from the codebase.
Cyclomatic complexity is not the deciding factor. The deciding factor is whether the pattern name accurately communicates design intent that clients or maintainers should know.
Correct Answer:
Explanation
Design patterns are top-down beacons when they are true: a real Observer or Strategy name lets a reader skip ahead with confidence. They become cognitive debt when they imply a schema the code does not actually satisfy — a misapplied or decorative pattern label creates false expectations and forces readers to discover the mismatch the hard way.
Difficulty:Intermediate
You are assigned a 350-line pull request in an unfamiliar area. Which review sequence best applies the chapter’s comprehension advice?
Linear reading can work for tiny changes, but a 350-line unfamiliar change risks exhausting working memory before the reviewer has a useful specification layer.
CI is evidence, not a substitute for human comprehension. It cannot judge architecture, requirements fit, or missing tests on its own.
Textual size is a useful heuristic, but not the only one. A small concurrency change may be harder than a large rename.
Correct Answer:
Explanation
Effective review starts by building top-down context from the PR description, linked issue, and tests, then uses opportunistic navigation — scanning for the core change and likely trouble spots — to spend attention where it has the highest value.
Difficulty:Intermediate
A reviewer opens a complex PR and immediately starts reading the diff line by line. Ten minutes later they still do not know why the change exists. What should they do instead?
More tracing can deepen confusion when the reader lacks purpose. The top-down move is to establish intent first.
Rewriting may eventually be needed, but the immediate problem is the reviewer’s missing context, not proven bad code.
Familiar files can be useful beacons, but ignoring unfamiliar files creates blind spots.
Correct Answer:
Explanation
Top-down review starts by building the ‘why’ and ‘what’ of the change. The implementation becomes easier to inspect once the reader has a hypothesis to test.
Difficulty:Intermediate
Which source-file organization best supports the Stepdown Rule and Newspaper Metaphor?
Placing details before the story forces bottom-up reconstruction. The Stepdown Rule gives the reader the high-level map first.
Alphabetical order may help lookup, but it does not encode abstraction descent or call structure.
Search helps navigation, but layout still shapes the first mental model a reader forms.
Correct Answer:
Explanation
Top-down layout lets a reader skim the main concept first, then descend into lower-level details only when needed.
Difficulty:Intermediate
Which of these are useful beacons for top-down comprehension? Select all that apply.
The test name exposes intended behavior before the reader sees the implementation.
Package names can make architectural intent visible directly in the source tree.
x hides domain information. It forces the reader to infer purpose from surrounding statements.
A truthful pattern name can activate a known schema and compress a whole collaboration into one concept.
Review context can be a beacon too. It helps build the specification layer before source reading begins.
Correct Answers:
Explanation
Beacons are any reliable cues that help a reader connect low-level code to high-level purpose. They can live in source, tests, architecture, or review metadata.
Difficulty:Advanced
A developer expects a payment service to contain a refund path, but no naming, tests, or call hierarchy confirms that hypothesis. What is the most expert next move?
A schema is a hypothesis, not proof. When beacons fail to appear, the reader needs evidence.
Renaming without understanding risks creating false beacons. The reader must first discover the real behavior.
Failed hypotheses are normal. The expert move is to repair the mental model with targeted evidence.
Correct Answer:
Explanation
Top-down comprehension is opportunistic. When beacons fail, experts switch to bottom-up or tool-supported tracing until the hypothesis is confirmed, revised, or rejected.
Difficulty:Advanced
A class named PaymentFactory quietly applies fraud policy, discounts, and audit logging before returning an object. Why is this harmful to top-down comprehension?
Factory names are useful when the class really owns object creation. The issue is mismatch, not the word itself.
A more recognizable but false pattern name would make the problem worse.
File length may contribute, but the deeper issue is semantic deception: the beacon points to the wrong responsibility.
Correct Answer:
Explanation
A beacon is valuable only when it is reliable. Misleading beacons force readers to abandon top-down understanding and re-trace behavior from scratch.
Difficulty:Advanced
You are mentoring students who trace every line of every program, even when the structure is familiar. Which practice best helps them grow toward expert comprehension?
Tracing is still essential when hypotheses fail or syntax is new. The goal is strategic tracing, not no tracing.
Obfuscation is useful for experiments that isolate bottom-up reading, but it is a poor default for teaching expert strategies.
Pattern definitions without code recognition do not build the transfer skill students need.
Correct Answer:
Explanation
Students need a bridge from accurate tracing to purposeful abstraction. Predict-then-verify teaches them when a beacon is strong enough to replace exhaustive line reading.
Difficulty:Intermediate
Researchers want to measure bottom-up comprehension, so they rename isPrimeNumber to pn and remove comments from a code sample. Why does this manipulation matter?
Renaming identifiers does not change runtime behavior. The manipulation targets human comprehension, not performance.
Obfuscating names removes cues; it does not add a recognizable pattern.
Short names can work for narrow conventions, but arbitrary abbreviation destroys domain information.
Correct Answer:
Explanation
Beacons let readers jump to higher-level meaning. Removing them isolates the harder bottom-up work of reconstructing meaning from syntax.
Difficulty:Intermediate
You are reviewing a PR with new production code and tests. Which use of tests best follows the chapter’s beacon argument?
Tests often reveal the author’s intent more directly than production code does, especially for edge cases.
Reading tests after approval wastes their value as specification-layer beacons.
Tests that are unclear may need improvement, but deleting them removes executable intent.
Correct Answer:
Explanation
Tests are powerful review beacons because they show expected behavior before the reviewer dives into implementation details.
Difficulty:Intermediate
Classify the beacons. Which examples are correctly identified? Select all that apply.
The name carries domain intent directly, which is exactly what lexical beacons do.
A recognizable code shape can activate a stored plan without full statement-by-statement reading.
The assertion exposes an assumption the surrounding code depends on.
Review metadata can establish the specification layer before source reading.
A random single-letter variable hides meaning rather than exposing it.
Correct Answers:
Explanation
Beacons operate at several levels: vocabulary, structure, tests, assertions, architecture, and workflow context.
Difficulty:Advanced
A public class is named GlobalConfigSingleton. The name helps maintainers know there is only one instance, but clients now depend on that implementation detail. What is the best evaluation?
Beacon clarity is useful, but public names also define what clients learn and may depend on.
Beacon value does matter. The issue is whether that value belongs in the public abstraction.
Hiding all design information behind vague names destroys useful cues without necessarily protecting the right secret.
Correct Answer:
Explanation
Good naming balances reader support against information hiding: the Singleton suffix is a beacon for maintainers, but it forces clients to depend on the instantiation strategy. Some beacons belong in internal documentation or package structure rather than public API names.
Difficulty:Advanced
An expert reviewer skips a generated client file after confirming it matches the API schema, then spends most of the review on a small authorization change. Which principle explains this behavior?
Strategic attention allocation is not carelessness when the reviewer has reliable evidence about low-risk generated content.
Generated code still needs verification, but often through schema checks or generator trust rather than line-by-line reading.
Size and risk are different. A small authorization change can carry more risk than a large mechanical file.
Correct Answer:
Explanation
Beacon-based expertise means spending deep attention where the evidence is weak, the risk is high, or the hypothesis needs repair.
Difficulty:Advanced
You are designing a review template to help reviewers use contextual beacons. Which prompt belongs in the template?
Duplicating the diff adds reading load without creating a higher-level specification layer.
CI status is useful evidence, but it cannot explain intent, risk, or design structure.
Formatting should usually be automated; leading with it wastes attention before the review has a mental model.
Correct Answer:
Explanation
A good review template creates beacons: behavior, specification evidence, and the architectural center of the change.
Difficulty:Intermediate
A user reports: “I clicked ‘Submit’ and the page froze with a spinning wheel that never stopped.” You open the code and find that a callback in handlePayment() never resolves its Promise when the payment gateway returns a 5xx response. How would you classify each of these in the fault / error / failure vocabulary?
The frozen spinner is what the user observes — that is the failure, not the fault. The fault is the location in the code that produces the bug, which is the missing resolution path in handlePayment().
The 5xx response is an external event, not the bug. The fault is something the developer wrote (or didn’t write) — here, the missing handling for the 5xx case in handlePayment().
The vocabulary is load-bearing for debugging: each term names a different observation point. A try/catch that swallows the exception turns a failure back into a contained error, even though the fault still exists — and you fix it in a different place than where you observe it.
Correct Answer:
Explanation
Fault = the erroneous location in the source code (e.g., the unresolved Promise path). Error = the incorrect program state during execution (a pending Promise that will never settle). Failure = the incorrect observable behavior at the system boundary (the spinner the user sees). Keeping them distinct guides you to the right fix location — you find the failure on the screen, but you fix the fault in the code.
Difficulty:Intermediate
After any immediate privacy risk has been contained, a user reports that your web app sometimes shows them another user’s data. You cannot reproduce it locally. They send a screenshot but no other details. What should your first debugging action be?
Shipping a fix before you can reproduce the bug means you cannot verify the fix worked. A cross-account data leak that seems gone may just be a leak you have not yet reproduced. Reproduce first, then fix.
Setting breakpoints in production stops the world for real users every time the breakpoint fires — unacceptable for a live service. Debuggers belong in a local reproduction of the bug, which is exactly what you don’t yet have.
Spraying print() across every endpoint generates a haystack to search, when the user can hand you a needle. Targeted logging after you have a reproduction hypothesis is useful; blind logging in production is mostly noise.
Correct Answer:
Explanation
Step 1 of the debugging process is reproducing the bug, and that requires both the problem environment (browser, OS, network, build version) and the problem history (the exact click sequence). Without a reproduction you cannot verify a fix worked — and for a cross-account data leak, an unverified fix is a serious incident waiting to recur. Mature bug-report templates ask these questions precisely because reproduction is load-bearing.
Difficulty:Advanced
Your team has just manually reproduced an intermittent payment bug after two days of investigation. Before anyone touches the production code, which of the following are worthwhile next steps? (Select all that apply.)
You are about to try a dozen possible fixes, and re-running the reproduction by hand each time is slow and tempting to skip. Automating it now turns every fix attempt into a seconds-long check — and the test becomes the permanent regression test once the bug is fixed.
A 200-step reproduction usually has a handful of essential steps and many confounders. Stripping the non-load-bearing steps makes every fix attempt faster, yields a cleaner regression test, and exposes the minimal trigger that hints at the root cause.
The notes are precisely what lets a teammate (or future you) reproduce the bug after a context switch. Delete them and the next intermittent failure starts from scratch. Add them to the ticket instead of the trash.
Committing the failing test (often marked xfail or skipped with a TODO referencing the ticket) makes the bug visible in CI and gives the fix a measurable definition of done. Some teams call this “checkpointing the bug.”
Correct Answers:
Explanation
After reproduction, three moves pay dividends: automate the reproduction (a fast feedback loop is what lets you iterate on fixes), simplify it (a 200-step reproduction usually has 5 essential steps and 195 confounders), and preserve it (committed test, ticket notes, anything that lets the next person continue from where you left off). The automated test also becomes the permanent regression test once the bug is fixed.
Difficulty:Intermediate
A teammate has a Python bug they’ve been stuck on for an hour. They walk over to your desk and say “can you look at this?” You read the function — about 30 lines — and notice nothing obviously wrong. Which suggestion is the highest-leverage pedagogical move?
Taking over the keyboard finds the bug faster for you, but the teammate loses the chance to build the debugging skill. They will be in the same spot on the next bug. Make them drive.
Outsourcing the diagnosis short-circuits the most valuable part of debugging — the moment of realizing what the code actually does versus what they intended it to do. That moment is where the mental model updates. AI assistants are useful for things you already understand, less useful for unblocking the learning itself.
A break sometimes helps, but it is a stalling tactic, not a debugging technique. Rubber-duck-explaining produces the same insight without the wait.
Correct Answer:
Explanation
This is rubber-duck debugging applied to a colleague. The curse of knowledge means the author reads what they intended to write — explaining the code line by line forces them to compare intent against the actual text, and the discrepancy is usually the bug. The duck (or in this case, you-as-duck) is most valuable when the explainer says aloud what a line should do and then notices it doesn’t.
Difficulty:Intermediate
You have a regression: a test that passed on Friday now fails on Monday. There are 87 commits between the two versions and no obvious culprit in the diff. Which tool is the most efficient for finding the commit that introduced the regression?
git blame is excellent for “who last touched this line?” but it does not tell you which commit broke a test. A regression often comes from a change in a different file than the one the test exercises.
Linear search through 87 commits is roughly 87 test runs in the worst case. git bisect does the same job in roughly $\log_2(87) \approx 7$ test runs — over an order of magnitude faster.
Batch reverting throws away unrelated work and only narrows the search to a batch of 10 commits, not the single offending one. You still have to bisect the batch.
Correct Answer:
Explanation
git bisect performs a binary search across commits, asking good or bad at each midpoint. With an automated test you can git bisect run ./test.sh and let git work through the history while you do something else. For 87 commits, you go from ~87 tests to ~7 tests. This is exactly why writing the automated reproduction test in Step 1 of debugging is so valuable — it turns regression hunting into a one-liner.
Difficulty:Intermediate
You see this error in your terminal while setting up a new project: ERROR 3680 (HY000): Failed to create schema directory 'tobias_dev_orders_2026_q1' (errno: 2 - No such file or directory). What is the best thing to copy into a search engine or AI assistant?
Including the project-specific schema name pollutes the query: nobody else has a database with that exact name, so search engines can’t match your query to anyone else’s solution. It also leaks information you may not want to send to a third party.
Stripping the error code throws away the most useful diagnostic the message contains. ERROR 3680 (HY000) and errno: 2 are stable identifiers other developers will have searched for. Strip the project-specific bits, keep the framework-specific bits.
Errors from frameworks, libraries, and external services have almost always been encountered before — your job is to find the prior thread. The DBA is a last resort, not a first.
Correct Answer:
Explanation
The pattern is strip project-specific identifiers, keep framework-specific ones. The schema name is unique to you; the error code, error number, and message structure are shared by every MySQL user who hit the same problem. Stripping also helps with privacy — usernames and internal hostnames don’t need to leave your machine. And when you do run a suggested command from the results, read it before executing — prompt-injection attacks on AI-search-result pages are an emerging risk.
Difficulty:Intermediate
You’re chasing a bug that only appears around the 10,000th line item in a specific user’s account. Stepping through the loop one iteration at a time in the debugger would mean clicking Step Over thousands of times. What’s the right move?
Commenting the loop changes the program’s behavior — if the bug interacts with loop state (accumulator overflow, off-by-one at the boundary, an unexpected value at iteration 9,847), the reading-without-running approach misses it entirely.
Hard-coded short lists exercise different code paths than 10,000-item lists. The bug you’re chasing depends on scale and position; shrinking the input is exactly what makes it disappear.
Printing 10,000 iterations to a log is the non-interactive equivalent of clicking Step 10,000 times. A conditional or hit-count breakpoint lets you ask the same question (“what’s happening near iteration 10,000?”) without generating a forest of noise.
Correct Answer:
Explanation
Conditional breakpoints trigger only when a given expression evaluates to true, in any expression the debugger can evaluate (variable comparisons, function calls, boolean combinations). Most IDEs also support hit-count breakpoints (‘fire only on the 10,000th time this line is hit’) — the same idea, expressed differently. Both let you skip directly to the interesting moment without sitting through the boring iterations.
Difficulty:Intermediate
A teammate marks a ticket “FIXED” with this commit: a one-line change that makes the previously-failing reproduction pass. They did not run the rest of the test suite. What is the most important risk they have left exposed?
Searchable commit messages are a real benefit, but missing them produces an inconvenience rather than a broken product. A regression silently shipped to users is a much larger risk.
The documentation point is real (and worth flagging), but a missing comment doesn’t break the product. A regression does.
Nearby assertions are a good practice — they catch related bugs proactively — but they don’t compensate for skipping the existing regression suite. A passing single test plus no suite run is weaker evidence than failing assertions.
Correct Answer:
Explanation
Skipping the test suite is the most common failure mode in step 4 of the debugging process. A fix that closes one bug while opening another is called a regression, and the suite exists precisely to catch them. Run the failing reproduction test (it should now pass), and the entire suite (no previously-green test should now be red), before calling anything fixed. Code comments and assertions matter too, but they don’t replace the suite — they complement it.
The team lead says “This is fault-tolerant — if anything goes wrong, the user doesn’t see a crash.” What’s wrong with this reasoning?
Fault tolerance is selective error handling for known failure modes with deliberate recovery — not a bare except: pass that swallows everything. The latter is one of the most dangerous patterns in the language because it hides bugs and leaves invariants violated.
Printing to the console helps during development but is no substitute for proper error handling in production. The bigger problem is the violated invariant (money debited but not credited), which printing doesn’t fix.
That is a style preference unrelated to the correctness or fault-tolerance argument. The dangerous pattern is the swallow-everything except.
Correct Answer:
Explanation
Fault tolerance does not mean ‘hide every error.’ It means designing so that known failure modes are detected, contained, and (when possible) recovered — usually with a write to a log, a metric, an alert, or a compensating transaction. A bare except: pass does the opposite: it converts a failure (visible, debuggable) into a silent error that leaves the system in an inconsistent state. For money transfers specifically, the right pattern is an atomic transaction that either commits both updates or rolls both back — never half. See also the test design discussion on why broad exception catching also tends to hide real test failures.
Difficulty:Intermediate
A junior engineer is debugging a deeply nested issue in a backend microservice. They have been at it for three hours with no progress, just rereading the same 200 lines of code. What is the single most likely explanation for why they are stuck?
Most bugs in production code do not require deep language esoterica. The much more common pattern is a smart engineer running a mental model that doesn’t match the code — which is exactly what the curse of knowledge predicts and what rubber-ducking breaks.
Unfixable bugs are rare; stuck-on-fixable bugs are common. “Rewrite from scratch” is almost always the wrong answer when the actual problem is a stale mental model.
Tool quality matters at the margins, but a 3-hour stall has a stale-mental-model smell, not a missing-IDE-feature smell. New tools are unlikely to provide the insight that switching debugging tactics would.
Correct Answer:
Explanation
‘Reading the same code for hours’ is the textbook symptom of the curse of knowledge — the author’s mental model is overwriting the actual text. The remedy is to force a comparison between intent and implementation: explain the code line by line to a duck (or colleague), step through it in a debugger so the actual variable values are visible, or set targeted assertions. Three hours of staring rarely beats 15 minutes of any of those. Stuck-because-stale-model is far more common than stuck-because-impossible.
Difficulty:Intermediate
Compilers (1960s) delivered a 10x productivity gain. Current research estimates GenAI delivers 21%–50%. What is the most accurate explanation for the gap?
Compilers were vastly slower than LLMs (compilation took hours on 1960s hardware). Execution speed of the tool is not what produces engineering productivity. The compiler’s leverage came from what it automated, not how fast it ran.
The 21–50% range is the consistent finding across multiple controlled studies — not a measurement artifact. Treating it as undercounted overstates current AI capability and underestimates the work that essential complexity still demands.
Compilers eliminated whole categories of repetitive translation work that previously consumed half a developer’s day. GenAI’s reduction is real but smaller in scope. The asymmetry is well-documented, not marketing.
Correct Answer:
Explanation
Compilers automated accidental complexity (translating high-level intent into machine instructions) — a near-pure mechanical task. GenAI helps with parts of that but leaves essential complexity (understanding requirements, choosing data structures, navigating trade-offs, integrating with messy real systems) largely intact. This is why productivity gains plateau where genuine engineering judgment is needed, and why systems-thinking and requirements skills remain decisive even with AI assistance.
Difficulty:Intermediate
A developer says “Copilot wrote the whole feature in 5 minutes — I’m so much more productive!” Two days later they’re still debugging it and have shipped a security vulnerability. Which trap have they fallen into?
Cognitive offloading is a separate trap — it concerns skill formation, not the productivity illusion specifically. The pattern described is about misattributing speed to productivity, then paying the debt downstream.
Hallucination is one cause of bugs in AI output, but the framing of the question is about how ‘fast’ the generation felt vs how slow the end-to-end work was. The illusion is a measurement error, not a single defect type.
Premature optimization is unrelated — the issue isn’t over-engineering, it’s that the generated code is subtly broken and the bug-tail is long.
Correct Answer:
Explanation
The illusion of AI productivity is the gap between generation (fast, satisfying, visible) and end-to-end shipping (debug, fix, verify, secure — slow and invisible). Measure productivity in features shipped per week with acceptable defect and security rates, not in characters generated per minute. Controlled studies report that developers feel more productive with AI even when measured throughput is flat or lower.
Difficulty:Intermediate
Two computer-science students use a chatbot to learn linked lists. Student A pastes the assignment prompt and copies the answer. Student B asks the chatbot to explain why a tail pointer matters, then implements it themselves. Six months later, which is most likely to struggle on the data-structures exam, and why?
Time-on-task with active engagement is what builds long-term memory. Student B’s extra time was productive struggle, the strongest predictor of durable learning.
Equal performance would mean cognitive engagement has no effect on learning — which contradicts decades of cognitive-science research (effortful retrieval, generation effect, desirable difficulties).
Subscription tier is irrelevant. The difference is how the AI was used, not which version answered.
Correct Answer:
Explanation
Cognitive offloading (paste-prompt, copy-answer) bypasses the effortful retrieval that builds durable knowledge — the same reason students who only re-read notes fail compared to those who self-test. Conceptual inquiry (asking the AI to explain, compare, justify) preserves cognitive engagement and exercises the continual-learning skill humans retain over AI. For junior engineers especially, the way GenAI is used predicts whether it accelerates or kneecaps skill formation.
Difficulty:Intermediate
Which of these are valid items in the Supervisor Mentality for working with GenAI? Select all that apply.
AI output looks polished even when wrong. Every block needs review at the same scrutiny a junior teammate’s code would receive — same defect rate, more confident phrasing.
The explainability rule prevents the team from accumulating code nobody understands. When the bug appears at 3 AM, you’ll need to debug it — being able to explain it is a precondition for being able to fix it.
Roughly 40% of Copilot suggestions in security-sensitive scenarios have been found to contain vulnerabilities, and AI fluently produces plausible-but-wrong patterns it pattern-matched from training data. Defaulting to “subtly broken until proven otherwise” changes review quality immediately.
Reading more code does not produce better judgment. AI lacks domain context, system-specific constraints, and accountability — all of which experienced human teammates bring. Trusting it more is the inversion of the right calibration.
Capable but unreliable is the right mental model: useful for first drafts, dangerous when given final authority. The same trust calibration you’d extend to a smart intern: review, verify, don’t auto-merge.
Correct Answers:
Explanation
The Supervisor Mentality is the antidote to vibe coding. It treats GenAI as a capable but unreliable contributor — every output gets the same scrutiny as an unfamiliar teammate’s PR, and nothing ships that the human can’t explain or own. This calibration is what separates engineers who scale up safely with AI from those who accumulate bugs and security debt invisibly until production catches fire.
Difficulty:Intermediate
Your team adopts Test-Driven Generation. Walk through the correct sequence.
Reversing the order destroys the entire benefit: tests written for the existing implementation just rubber-stamp it instead of constraining it. This is the textbook TDD anti-pattern, AI version.
Tests that ‘defeat’ code is adversarial security testing, not TDG. The point of TDG is to use generated tests as a specification the implementation must satisfy.
Single-shot prompts give the AI no feedback loop to correct itself, and the developer no opportunity to verify the tests before committing to them as the spec. Throughput is fast, defect rate is high.
Correct Answer:
Explanation
Test-Driven Generation: (1) AI generates tests from the description → (2) human reviews tests as the specification → (3) AI generates implementation → (4) remediation loop feeds failing test output back to the AI. The review step in (2) is what gives the workflow its quality: the tests are the contract, and the human’s job is to make sure the contract is right before the AI is asked to satisfy it. Skipping review means the implementation passes broken tests.
Difficulty:Advanced
Two teams adopt the same AI coding assistant. Team A’s codebase is a tightly coupled monolith (“spaghetti”); Team B’s is a set of well-bounded microservices with clean interfaces. Both apply AI to similar tasks. Why does Team B see substantially larger productivity gains?
Same assistant, similar tasks — the structural difference between codebases is the variable, not prompt skill. Even strong prompt engineering on a spaghetti codebase will run into context-window limits and hidden coupling.
Microservices can be written in any language; many are in the same languages as monoliths. The benefit comes from modularity, not language choice.
Attributing the difference to staff skill ignores the architectural variable explicitly described. The same engineers in either codebase would see the same architecture-mediated effect.
Correct Answer:
Explanation
Information Hiding and modularity limit the context window the AI needs to process — bounded interfaces mean the AI sees only the relevant slice, hidden internals don’t leak unexpected coupling, and generated code can be reasoned about locally. In spaghetti codebases the AI is asked to operate in a context it cannot fully see, and its plausible-looking output silently breaks distant code. Good architecture is now a force multiplier for AI productivity, not just a maintainability concern — sloppy architecture pays a compounding tax.
Difficulty:Basic
An LLM confidently produces this line in a Python script: import datafetcher_v2 as dfv2. The library does not exist. What is this called, and why does it happen?
Python is interpreted; the missing import is caught at run time, not by an IDE that does only static checks. Calling this a ‘compiler error’ frames the wrong tool as the safety net.
Some hallucinations are references to deleted libraries, but most are fabricated names that never existed. The mechanism is the same — token prediction without verification — but framing it as ‘old version’ understates the breadth of the problem.
The model has no network connection during inference. Hallucination is a property of the model’s generation process, not of any external lookup.
Correct Answer:
Explanation
Hallucinations come from how LLMs work: they predict the most likely next token given prior context, without any grounding in real-world facts. A plausible-looking import like datafetcher_v2 is linguistically plausible — but linguistic plausibility is not factual existence. This is the ‘statistical parrot’ framing: the model produces sequences that look like correct code without any knowledge of whether the code is correct. Tools like retrieval-augmented generation and IDE integrations help by grounding suggestions in real codebases, but the underlying risk remains.
Difficulty:Basic
Two pair-programming modes with AI: in the Driver mode, the human writes the code; in the Navigator mode, the human directs the AI to write blocks. Which role assignment is correct?
Letting the AI fully drive while a human reviews after is the vibe-coding anti-pattern the SEBook explicitly warns against. The human’s role in both roles is to retain understanding and accountability for every line shipped.
AI handling all decisions removes engineering judgment from the loop and abandons the explainability rule. Pair programming with AI is collaborative, not delegated.
The roles are deliberate and well-defined — they describe different distributions of writing vs reviewing work between human and AI, each appropriate in different situations.
Correct Answer:
Explanation
Both AI-pair-programming roles keep the human in active intellectual control. Driver: human writes, AI critiques (good for security review, performance ideas, edge-case enumeration). Navigator: AI writes under human direction, and the human verifies every line. The crucial invariant in both: the human retains explainability and ownership of the result. The roles change who types, not who understands.
Difficulty:Advanced
Industry analysis has reported that codebases using AI coding assistants had a noticeable rise in code complexity and static-analysis warnings relative to pre-AI baselines. Assume the finding generalizes. What is the architectural risk?
Proportional growth would not produce per-file or per-function complexity rises — the metrics cited normalize for size. The rise is in complexity-per-unit-code, not just total lines.
Mainstream static analyzers handle the same languages and constructs whether code is human- or AI-written. The “new paradigms” framing tries to attribute the gap to tool blind spots; the gap is in the code, not the analyzer.
Tests are typically excluded or analyzed separately. Even if included, the complexity-per-function metric doesn’t credit tests as warnings; the increase is in production code structure.
Correct Answer:
Explanation
AI assistants tend to produce additive solutions — adding code that solves the local problem rather than refactoring to fit the system’s idioms or remove duplication. Without an explicit refactor step in the workflow, complexity compounds and static-analysis warnings climb. The fix is process-level: pair AI generation with a deliberate refactor pass, enforce complexity limits in CI, and reject AI-suggested duplication that human review would have rejected.
Difficulty:Intermediate
A senior architect predicts: “The future belongs to engineers who can orchestrate AI agents, not just write code.” What underlying skills does that prediction imply will become more valuable, and which less?
Typing speed and syntax memorization are exactly the work AI is best at automating. Predicting they will become more valuable inverts the trend.
Equal valuation would mean the skill mix is unchanged, which contradicts every workflow analysis from the past three years. The shift is real and one-directional toward specification, judgment, and verification.
Studies show AI is best as a force multiplier, weakest at autonomous end-to-end engineering. Domain knowledge, real systems thinking, accountability, and the ability to translate ambiguity into structure remain irreplaceable.
Correct Answer:
Explanation
The skill shift is from producing code to specifying and verifying it. Requirements engineering (INVEST stories, acceptance criteria), systems thinking (where the boundaries are, what fails), architecture (modular interfaces the AI can reason inside), security review, and prompt/context engineering all become more decisive. Rote syntax and boilerplate become commoditized. The engineer who raises the ceiling of what they can build is the one who treats AI as leverage over engineering judgment — not as a substitute for it.
Difficulty:Advanced
An AI coding agent reads a blog post while debugging your build and then asks permission to run a shell command you do not recognize. What is the most responsible response?
Finding a command on the web is not evidence that it is safe. A malicious page can plant instructions for agents to copy, so the human must inspect the command and source before approving it.
The lesson is not “never use agents.” The lesson is that tool access raises the supervision bar: inspect commands, bound permissions, and keep the human accountable.
Model confidence is not a security control. The right check is whether the human understands the command’s effects and whether the command is necessary for the task.
Correct Answer:
Explanation
Coding agents are powerful because they can read files and run tools, but that also exposes them to prompt injection and unsafe shell suggestions. A responsible supervisor verifies the command, source, and task fit before allowing execution. If you cannot explain the command, you are not ready to approve it.
Difficulty:Basic
Why do project-level skill files or rule files improve AI coding-agent results?
Skill files improve context, but they do not make an unsound, non-deterministic model sound or deterministic.
Rule files reduce omissions; they do not prove the output is correct. The human still reviews, tests, and owns the resulting code.
Rules are useful only when combined with repository context. They tell the agent how to work here; they do not replace reading the relevant files.
Correct Answer:
Explanation
Skill files encode durable project knowledge: accessibility rules, storage inventories, dark-mode requirements, testing expectations, naming conventions, and similar guardrails. They improve the default behavior of the agent, but they are still instructions to a fallible system, not proof that the system complied.
Difficulty:Advanced
You want an agent to implement a stateful feature in an unfamiliar codebase. Which workflow best applies the lecture’s advice?
A running UI checks the happy path, not the design, state transitions, security, or maintainability. Large one-shot prompts also make it harder to locate where the agent made a bad assumption.
Planning helps, but it does not replace executable verification. Stateful code needs tests because the hard part is often the interaction among cases.
The agent can propose architecture, but the human must judge whether it fits the domain, existing system, and long-term maintenance constraints.
Correct Answer:
Explanation
For complex work, the professional loop is plan, question, approve a small task, implement, test, review, and refactor. This keeps the human in control of architecture and lets mistakes surface while they are still small.
Difficulty:Intermediate
Why is “read the entire repository before coding” often a bad instruction for an AI agent?
Agents can read text files. The issue is not whether text can be read, but whether the right text stays salient inside the model’s limited context.
Speed is not the core problem. A slower prompt can still be worthwhile if it provides the relevant context; the failure is low-signal context, not context itself.
Reading files does not prevent editing. It can simply crowd the context window with details unrelated to the task.
Correct Answer:
Explanation
Context engineering is selective. Give the agent the smallest relevant slice: the target files, nearby interfaces, tests, conventions, and constraints. Dumping everything into context increases search cost and ‘lost in the middle’ failures.
Difficulty:Intermediate
Which tasks are especially well-suited for AI assistance once the human already understands the domain? Select all that apply.
Boilerplate is a strong AI use case when the human can review the pattern and spot deviations.
High-stakes architecture decisions require domain understanding, trade-off judgment, and accountability. AI can help list trade-offs, but it should not make the final decision unreviewed.
Explanation is one of the safest high-value uses: it supports conceptual inquiry while keeping the human responsible for applying the idea.
Prototypes are useful because they make requirements concrete. They still need engineering review before becoming production code.
This is cognitive offloading. It may finish the assignment, but it prevents the student from building the schema needed to review or debug similar code later.
Correct Answers:
Explanation
AI is strongest on repetitive, well-specified, common tasks and on learning support. It is weakest when the task requires unshared domain knowledge, high-stakes judgment, or understanding the student has not yet built.
Difficulty:Advanced
A team adds a hero avatar customizer. A student suggests storing the entire customized SVG in localStorage; another suggests storing the selected parameters and regenerating the SVG. What is the best engineering lesson from this disagreement?
Shorter is only one possible criterion, and often not the important one. Design decisions need explicit quality attributes, not a vague preference.
Storing the SVG captures the current rendering but may make future migrations, validation, and privacy review harder. Exactness today is not the same as good design over time.
Parameters are often better for evolvability, but “always” overstates it. If regeneration is unstable or the renderer changes incompatibly, raw output might have a defensible role.
Correct Answer:
Explanation
AI can implement either storage strategy, but the engineer must decide which strategy fits the product and quality attributes. Good prompts expose the decision: ask for trade-offs, choose deliberately, then give the agent a bounded implementation task.
Difficulty:Advanced
During test-driven generation, the AI writes an implementation that passes every visible example by hard-coding a dictionary from sample inputs to sample outputs. What should the human do?
Passing tests is useful only when the tests specify the behavior rather than merely list examples. A hard-coded lookup table passes examples while failing the real requirement.
The tests revealed a weakness in the specification; removing them loses that signal. Strengthen the tests and inspect the implementation.
Comments do not turn an overfit implementation into a correct one. The problem is behavioral generality, not readability of the wrong approach.
Correct Answer:
Explanation
Generated code can overfit tests just like a student can memorize answers. The human reviewer must inspect whether the implementation solves the general problem, then add stronger tests and refactor until the code matches the actual specification.
Difficulty:Basic
Which sequence correctly names the three main stages discussed for LLM development and use?
That sequence describes a traditional compiled-program toolchain, not the lifecycle of an LLM.
Requirements, design, and maintenance are software-engineering phases. They matter when supervising AI, but they are not the model-development stages.
Tokenization is part of how text is represented, and deployment may follow model development, but this sequence does not capture the training-and-use pipeline from the lecture.
Correct Answer:
Explanation
Pre-training creates the base model, post-training tunes it for useful behavior, and inference is the use-time step where a prompt produces output. This item is the low-Bloom anchor: students need the vocabulary before they can analyze agent workflows.
Difficulty:Intermediate
A reasoning model shows a polished step-by-step explanation before generating code. Why should that trace still be treated cautiously?
Human-looking explanation is not evidence of human-like cognition. The model can generate plausible reasoning text while still missing the real invariant.
Reasoning mode does not turn a non-deterministic system into a deterministic compiler. The same prompt can still lead to different outputs.
Reasoning traces can help, but executable behavior still needs tests and human review.
Correct Answer:
Explanation
Thinking traces can be useful scaffolding, not proof. The engineer should read them as a proposal to inspect, then verify the generated code against requirements, tests, and system context.
Difficulty:Intermediate
You want an agent to add a title-only search box to the SEBook home page. Which prompt best applies the lecture’s prompt-engineering advice?
“Make it work well” gives the agent no acceptance criteria and no scope. The feature it ships may not be the one you wanted.
Dumping the whole repo into context buries the constraints that matter and lets the agent decide design questions you should own.
“Modern” and “polished” are taste words, not criteria. New libraries also expand scope; constrain the feature instead.
Correct Answer:
Explanation
A strong implementation prompt gives role, task, context, acceptance criteria, constraints, and process. It also asks the agent to surface design questions before it silently chooses behavior you did not intend.
Difficulty:Advanced
An agent adds a “schedule study” feature that looks polished, but the generated quiz links use URLs that do not exist. What should a reviewer infer? Select all that apply.
Link validity is observable behavior. A test or manual check should catch it before the feature ships.
Plausible routes are exactly the kind of thing an LLM can invent when it has not been grounded in the repository’s real routing conventions.
Visual polish is not correctness. A polished broken link is still broken.
Acceptance criteria should describe the behavior that makes the feature valuable. If links are part of the value, their validity belongs in the criteria.
Broken links are user-facing defects. They can strand learners and fail the core purpose of the feature.
Correct Answers:
Explanation
This is an analysis-level failure: separate surface polish from behavioral correctness. The reviewer should trace the bug to missing grounding, weak acceptance criteria, and missing verification.
Difficulty:Expert
A team wants AI to implement a feature for a public educational site that must meet WCAG 2.2 AA. Which decision best evaluates the risk?
Accessibility is a release constraint, not optional polish. Waiting for a user complaint shifts the cost to people the system is supposed to serve.
AI can help brainstorm checks and draft code, but the workflow must keep human verification and explicit standards in the loop.
Confidence is not evidence. Accessibility requires concrete checks such as semantic markup, keyboard operation, focus visibility, contrast, reflow, and status-message behavior.
Correct Answer:
Explanation
Evaluation means judging whether the process is adequate for the risk. For a public educational site, the AI workflow must include explicit accessibility criteria and verification, not just generation.
Difficulty:Advanced
You are starting a personal project to learn a library you have never used. Which AI-assisted workflow best creates durable skill rather than cognitive offloading?
Studying only after failure makes the AI do the schema-building work. The project may run while the learner’s understanding stays shallow.
Error-paste loops can fix symptoms without building the mental model needed to debug future problems.
The lecture argues against cognitive offloading, not against all AI use. Conceptual inquiry can strengthen learning when the learner remains active.
Correct Answer:
Explanation
Create-level work means designing a workflow, not just choosing a tool. This plan uses AI as a tutor, reviewer, and bounded helper while preserving the student’s own implementation effort, retrieval, testing, and explanation.
Difficulty:Basic
Which statement best distinguishes formal inspections from Modern Code Review?
Formal inspections were effective but slow, especially because scheduling multiple people into meetings consumed large amounts of development time.
The Reader role belongs to formal inspections, not typical asynchronous MCR.
The shift was process and tooling, not language.
Correct Answer:
Explanation
MCR arose because formal inspections were too slow for Agile, CI, and distributed teams. It keeps peer review but changes the workflow.
Difficulty:Intermediate
Your manager says, “If only 14% to 25% of review comments find functional defects, code review is mostly waste.” What is the strongest response?
CI catches some classes of failures, but it does not provide mentorship, design judgment, ownership diffusion, or maintainability critique.
Guessing produces low-signal comments. Review quality improves by focusing human attention on outcomes humans are suited to judge.
The defect-finding gap is specifically about modern review datasets, not only formal inspections.
Correct Answer:
Explanation
The defect-finding fallacy is assuming review is valuable only when it catches bugs. MCR also teaches teams, enforces norms, spreads context, and improves future modifiability.
Difficulty:Intermediate
A teammate submits a 1,200-line feature PR touching database migrations, backend rules, and UI. They say one large PR is easier because reviewers see the whole feature at once. What should you recommend?
More reviewers can create a bystander effect. It does not guarantee that anyone forms a complete mental model.
Large PRs may happen sometimes, but accepting them as normal makes deep review unlikely.
Faster skimming is the opposite of effective review. Speed without comprehension misses design and functional problems.
Correct Answer:
Explanation
Stacking is a workflow answer to cognitive limits. It preserves reviewability by keeping each change small enough to understand.
Difficulty:Advanced
Which strategies fit the Code Review Comprehension Model for a non-trivial PR? Select all that apply.
Tests can provide the specification layer the reviewer needs before reading implementation detail.
Core-based reading spends attention where the most important design decision lives.
Chunking keeps the mental model small enough to reason about.
A quick scroll is impression-based, not specification-driven review.
Easy-first can be useful when it intentionally reduces clutter before harder reasoning.
Correct Answers:
Explanation
CRCM treats review as comparative comprehension: existing system, proposed change, and ideal solution. Good strategies manage that working-memory load explicitly.
Difficulty:Intermediate
An author wants to make a complex function more reviewable before opening a PR. Which changes are aligned with the chapter? Select all that apply.
Contracts let reviewers check behavior against explicit assumptions instead of reconstructing intent from scratch.
Assertions make impossible states fail fast and expose local assumptions.
Guard clauses reduce nesting and let the normal path stay flat.
Named chunks compress working-memory load and let reviewers drill into one concept at a time.
One giant method removes navigation but overloads working memory. Reviewability depends on meaningful abstraction, not merely file locality.
Correct Answers:
Explanation
Reviewable code is designed around the reader’s cognitive limits. Contracts, assertions, guard clauses, and chunks each make the reviewer’s job more precise.
Difficulty:Advanced
In apply_discount, a check rejects a user-entered discount of 150% and returns a validation error. Elsewhere, assert subtotal >= 0 documents an invariant after pricing. Which statement is most accurate?
User input can be invalid in normal operation. Assertions may be stripped in production and should not be the only handling for expected runtime conditions.
Assertions are useful for invariants that should never be false if the code is correct.
Comments do not fail fast or protect control flow. Executable checks carry stronger evidence.
Correct Answer:
Explanation
Assertions and guard clauses both aid review, but they serve different contracts: impossible programmer-error invariants versus expected invalid inputs or edge cases.
Difficulty:Intermediate
In Google’s review process, why might one change require both an owner approval and a readability approval?
Ownership is about codebase authority, not formatting. Readability is a trained quality norm, not a popularity signal.
Google’s data shows many changes are approved quickly despite the gates. The gates protect different quality dimensions.
Readability does not replace tests or ownership. It adds a human norm check for maintainable code.
Correct Answer:
Explanation
Google separates who knows the directory from who can certify language readability. The two gates protect different aspects of quality at scale.
Difficulty:Advanced
An AI agent opens a 2,000-line PR that passes unit tests. The reviewer feels pressure to approve because the code looks polished and CI is green. What is the safest review posture?
AI code can look authoritative while being subtly wrong. Green tests only prove the existing tests passed.
A blanket ban ignores useful AI assistance. The safer standard is stronger verification and smaller reviewable units.
More comments can help explain intent, but they do not prove behavior or security. Outcome evidence is needed.
Correct Answer:
Explanation
AI-generated code raises the risk of plausible-but-wrong diffs overwhelming human attention. The review artifact must shift toward smaller slices and observable behavior.
Difficulty:Basic
A function works correctly today, but it is 120 lines long, mixes validation, database writes, email formatting, and logging, and is hard to test. Which statement is most accurate?
Passing tests show current behavior, not future modifiability. Smells often matter precisely because they predict later change risk.
Rewriting from scratch is rarely the first move. A smell asks for diagnosis and targeted refactoring.
Testability, responsibility boundaries, and future change cost are engineering concerns, not mere aesthetics.
Correct Answer:
Explanation
A code smell is a warning sign. The code may run correctly today while still being expensive and risky to change tomorrow.
Difficulty:Intermediate
A User class changes when database schema changes, when display-name formatting changes, and when password-reset email copy changes. Which smell is most central?
Shotgun Surgery is one conceptual change scattered across many modules. Here, one class is changing for many unrelated reasons.
Data Clumps are repeated groups of values. The stem is about mixed responsibilities.
Feature Envy is behavior living near the wrong data. The stronger signal here is one class carrying multiple reasons to change.
Correct Answer:
Explanation
Divergent Change means one module changes in different ways for different reasons. The usual response is to split responsibilities along real change axes.
Difficulty:Intermediate
Adding a new tax rule requires tiny edits in Invoice, ReceiptPrinter, TaxReport, OrderSummary, and CustomerExport. Which smell does this suggest?
Large Class concentrates too much behavior in one place. The stem describes a behavior scattered across many places.
A Long Method might exist somewhere, but the defining symptom is one change requiring many scattered edits.
Duplication can contribute, but Shotgun Surgery does not require byte-for-byte repeated lines. It requires scattered change points.
Correct Answer:
Explanation
Shotgun Surgery makes a single conceptual change behave like a hunt across the codebase. The design should consolidate the tax-rule responsibility.
Difficulty:Intermediate
Multiple functions accept street, city, state, zip, and country in that order. Bugs often happen when two adjacent strings are swapped. What is the best smell diagnosis and refactoring response?
Feature Envy concerns behavior leaning on another object’s data. The stem describes related primitives traveling together.
Deleting parameters loses information. The goal is to name and group the related values.
A class merge would likely concentrate responsibilities rather than clarify the address concept.
Correct Answer:
Explanation
A repeated group of related primitives is a Data Clump. A named object reduces call-site mistakes and gives the concept a stable home.
Difficulty:Advanced
A method in InvoicePrinter repeatedly calls invoice.getCustomer().getAddress().getZipCode() and invoice.getCustomer().getDiscountTier() to decide billing rules. Which concerns are plausible? Select all that apply.
The method’s interest is centered on another object’s data and policy, which is the core Feature Envy signal.
Deep getter chains expose internal navigation paths and couple the caller to object structure.
A delegating method can let the client ask a higher-level question without traversing the object graph.
Getters can still leak structure. Encapsulation is about protecting design decisions, not merely using accessor syntax.
Correct Answers:
Explanation
The code may be asking the wrong object to make a billing decision. Smell diagnosis looks at where behavior and data naturally belong.
Difficulty:Advanced
A linter flags a tiny method as a smell because it has only one line. The method name is a domain phrase used throughout the team’s conversations, and it hides a volatile calculation behind a stable interface. What should the team do?
Mechanical smell rules miss context. A tiny method can still earn its keep if it names a concept and hides volatility.
Fewer methods can reduce navigation, but inlining can also expose change-prone detail everywhere.
Shorter names are not automatically clearer. Domain-rich names are often the beacon that justifies the abstraction.
Correct Answer:
Explanation
Smells require judgment. The question is whether the abstraction lowers future change cost and improves comprehension, not whether it satisfies a size heuristic.
Difficulty:Intermediate
Which change is a true refactoring?
Changing accepted inputs changes observable behavior. That may be a good bug fix or feature, but it is not refactoring.
Adding behavior is feature work. Refactoring may prepare for it, but the feature itself is not behavior-preserving.
Deleting a test changes the safety net and may hide behavior changes. It is not an internal structure improvement.
Correct Answer:
Explanation
Refactoring preserves observable behavior while improving internal structure. The behavior-preserving boundary is the key test.
Difficulty:Intermediate
Match the refactoring to the smell it most directly addresses. Which pairings are reasonable? Select all that apply.
Related values that travel together usually deserve a named object that captures the concept.
A class that changes for unrelated reasons likely contains responsibilities that should be split.
A repeated branch on type often means behavior wants to move behind subtype or strategy objects.
A class that does not justify its existence can be folded into a more useful owner.
A missing test is a safety-net gap, not a naming smell. Renaming may improve clarity, but it does not create behavioral evidence.
Correct Answers:
Explanation
Refactoring is strongest when the chosen transformation matches the design force behind the smell.
Difficulty:Intermediate
A team wants to refactor a tangled billing module. What is the safest sequence?
A big rewrite delays feedback until many possible mistakes are mixed together. It becomes hard to know which change broke behavior.
Some tests may be implementation-coupled, but disabling failures before understanding them removes the behavior-preservation signal.
Mechanical extraction without a responsibility model can create shallow classes and new coupling.
Correct Answer:
Explanation
Safe refactoring is a tight feedback loop: a green baseline, one behavior-preserving transformation, green verification, then a checkpoint before the next step.
Difficulty:Advanced
During a feature crunch, a developer notices a misleading local variable name in the function they are already editing. They also want to reorganize the whole package. What is the best refactoring judgment?
Bundling a large reorganization into a feature makes review harder and increases the chance of accidental behavior changes.
Refusing all small cleanup allows broken windows to spread. The key is scope control, not a blanket ban.
Deadline pressure is exactly when large structural changes are riskiest. Keep the current change reviewable.
Correct Answer:
Explanation
Refactoring judgment weighs payoff, scope, and reviewability. Tiny local improvements covered by existing tests belong with the change; large structural work deserves its own focused review so reviewers can separate behavior preservation from feature work.
Difficulty:Advanced
An AI agent proposes to “refactor” a module by extracting helpers, changing error messages, and altering the order in which side effects occur. What should the human supervisor do?
AI can execute many transformations, but it can also quietly change behavior. Capability does not remove the need for verification.
Hiding behavior changes behind names makes review harder and violates the central contract of refactoring.
Formatting proves only surface consistency. It says nothing about observable behavior or side-effect order.
Correct Answer:
Explanation
AI-assisted refactoring still needs human scope control. Refactorings should be small and test-verified; behavior changes and side-effect reordering must be reviewed as feature or bug-fix work so reviewers know what contract they are checking.
Difficulty:Intermediate
A checkout module has a switch on paymentType repeated in five places: fees, validation, receipt text, fraud rules, and retry policy. Which refactoring direction best fits the smell?
Comments may help temporarily, but they do not remove the repeated change point. A new payment type would still require edits in five places.
Consolidating into a utility class may reduce search effort but can create a new god object and preserve the type-code smell.
Inlining worsens working-memory load and makes future payment-type changes harder.
Correct Answer:
Explanation
Repeated conditionals over the same type code are a classic sign that behavior wants to move behind polymorphism or a Strategy boundary, so each subtype owns its variation behind a common interface and adding a new payment type touches one class instead of five switch statements.
Workout Complete!
Your Score: 0/71
Code Beacons
Code Beacons explains how experienced developers use familiar identifiers, structures, tests, and architectural cues as cognitive anchors while reading unfamiliar code.
Code Comprehension
Code Comprehension teaches how developers form mental models of a system, why top-down reading matters, and how architecture-code gaps make comprehension harder. The Part 1 and Part 2 tutorials turn those ideas into guided practice.
Debugging
Debugging covers reproducing a fault, localizing the root cause, using debuggers and assertions, verifying the fix, and preserving the regression test. The Python Debugging Tutorial gives hands-on practice with breakpoints and time-travel debugging.
Defensive Programming and Design by Contract
Defensive Programming in Python teaches boundary validation, precise exceptions, invariant preservation, and failure reporting. Design by Contract in Python follows it with caller/callee responsibility, preconditions, postconditions, old-state reasoning, invariants, and contract strength.
Generative AI
Generative AI in Software Engineering explains how AI coding tools change productivity, verification, skill formation, supervision, and team workflows without replacing engineering judgment.
Modern Code Review
Modern Code Review teaches review as a socio-technical practice: small reviewable changes, reviewer cognition, asynchronous workflows, defect finding, knowledge transfer, and AI-era risks.
Prompt Engineering
Prompt Engineering covers how to communicate tasks, constraints, examples, and verification expectations to AI assistants so their output is useful and reviewable.
Code Smells
Code Smells teaches the symptoms of poor design, including long methods, large classes, duplicated code, feature envy, and deeply nested conditionals.
Refactoring
Refactoring explains behavior-preserving transformations, safe refactoring rhythm, and the relationship between smells, tests, and design improvement. The Code Smells and Refactoring Tutorial provides tool-supported practice.
Top-Down Code Comprehension
Top-Down Code Comprehension focuses on hypothesis-driven reading: start from purpose and architecture, then use targeted navigation to confirm or revise the mental model.
Cookie & Privacy Notice:
This site stores a few preferences and your progress locally in your browser
(cookies and localStorage) so it works the way you left it.
Nothing is sent to or stored on any external server, and this site does not
sell, share, or disclose any user data to third parties.
View & manage your data →