Development Practices

Enable JavaScript to unlock Galleries, BibTeXs, and the Contact Form.

Dark Mode

Show Highlights

Read Aloud

Practice Across Development Practices

Use the master deck when you want a mixed review of code-reading, debugging, review, AI, prompting, code-smell, and refactoring vocabulary. Use the master quiz to practice deciding which engineering practice fits a realistic maintenance situation.

Development Practices Master Flashcards

A comprehensive mix of the development-practices flashcards with standalone decks: comprehension, debugging, GenAI, review, code smells, refactoring, and beacons.

Difficulty: Intermediate

What are the three kinds of cognitive load in code comprehension?

Difficulty: Basic

How do bottom-up and top-down comprehension differ?

Difficulty: Advanced

What are the four components of the integrated meta-model of program comprehension?

Difficulty: Intermediate

What should a reviewer do during the orientation phase before reading a complex diff?

Difficulty: Expert

Why can cyclomatic complexity under-predict human difficulty?

Difficulty: Advanced

What is the architecture-code gap?

Difficulty: Expert

Why can excessive abstraction make code harder to understand?

Difficulty: Intermediate

Name three practices that make code easier to comprehend top-down.

Difficulty: Basic

What is top-down code comprehension?

Difficulty: Intermediate

How does schema activation help expert programmers read code faster?

Difficulty: Expert

What is a dangling purpose link in a reader’s mental model?

Difficulty: Intermediate

What is the Stepdown Rule?

Difficulty: Intermediate

How does the Newspaper Metaphor apply to source files?

Difficulty: Basic

Why do experts switch between top-down and bottom-up comprehension?

Difficulty: Advanced

When can a design pattern hurt top-down comprehension?

Difficulty: Advanced

Which IDE features support top-down comprehension?

Difficulty: Basic

What is a code beacon?

Difficulty: Basic

Why are full-word identifiers powerful lexical beacons?

Difficulty: Basic

What is a structural beacon?

Difficulty: Basic

How do tests act as beacons?

Difficulty: Basic

How do assertions act as beacons?

Difficulty: Advanced

What is the Singleton naming paradox for beacons?

Difficulty: Advanced

How do contextual beacons extend beyond source code during review?

Difficulty: Basic

Why do experts avoid exhaustive tracing when beacons are reliable?

Difficulty: Basic

Define fault, error, and failure — and explain why keeping them distinct changes how you debug.

Difficulty: Basic

Name the four steps of the systematic debugging process, in order.

Difficulty: Basic

Why does reproducing the bug come before trying to fix it? What are you trying to capture?

Difficulty: Basic

What is regression testing, and how does it relate to the bug-reproduction test you wrote in step 1?

Difficulty: Intermediate

When debugging your own code, when should you reach for search engines / AI tools vs a debugger? Give the rule.

Difficulty: Basic

You’re explaining your code to a colleague at their desk. Halfway through line 12 you stop, stare, and say ‘oh.’ You’ve just fixed the bug yourself. Name the phenomenon and the technique.

Difficulty: Advanced

Compare an assertion (assert x > 0) and an exception (if x <= 0: raise ValueError). When is each appropriate?

Difficulty: Basic

Your loop iterates 50,000 times and the bug only appears around iteration 12,000. How do you avoid clicking Step Over 12,000 times?

Difficulty: Intermediate

What is a time-travel debugger, and what does it do that an ordinary debugger cannot?

Difficulty: Advanced

You write try: do_thing(); except: pass and tell your team ‘this is fault-tolerant.’ Why is this misleading?

Difficulty: Intermediate

A regression test passed two weeks ago and fails today. There are ~200 commits between the two versions and no obvious culprit in the diff. What’s the right move, and why does it scale better than the alternatives?

Difficulty: Intermediate

You just landed a bug fix. The failing reproduction test now passes. What three more things should you do before calling the bug closed?

Difficulty: Intermediate

Your team has a 200-step manual reproduction of an intermittent bug. Before fixing the bug, what should you do to the reproduction itself, and why?

Difficulty: Intermediate

Look at this debugger trace. After input_radius = sys.argv[1], the watch panel shows input_radius = '10' (with quotes). Two steps later, diameter = 2 * radius produces diameter = '1010'. What’s the bug and where is it?

Difficulty: Advanced

A new colleague says: “I’ve been debugging for 4 hours. I’ve read the function 50 times. I just can’t see what’s wrong.” Diagnose what’s happening and prescribe the next 30 minutes.

Difficulty: Basic

What does it mean to call an LLM a statistical parrot?

Difficulty: Intermediate

Why is GenAI’s productivity boost (21–50%) smaller than the compiler revolution (10x)?

Difficulty: Basic

Name the three stages of LLM development.

Difficulty: Intermediate

What is the illusion of AI productivity, and how do you avoid being fooled by it?

Difficulty: Intermediate

Why do AI-generated codebases tend to have higher security vulnerability rates?

Difficulty: Basic

What is cognitive offloading, and why is it harmful for junior engineers?

Difficulty: Basic

What is the Supervisor Mentality for working with GenAI?

Difficulty: Intermediate

Compare the Driver and Navigator roles in AI pair programming.

Difficulty: Intermediate

What is Test-Driven Generation (TDG), and what are its five steps?

Difficulty: Advanced

Why does loose coupling amplify AI effectiveness, and tight coupling sabotage it?

Difficulty: Intermediate

Why is AI inference typically non-deterministic, and what does that mean for testing?

Difficulty: Basic

What is an AI hallucination in coding, and why is it especially dangerous?

Difficulty: Advanced

Why do AI-augmented codebases tend to show rising code complexity and static-analysis warnings?

Difficulty: Intermediate

Why does the leverage of an engineer’s work shift from producing code to specifying and verifying it in the GenAI era?

Difficulty: Advanced

Why is prompt and context engineering considered a load-bearing engineering skill rather than a UI trick?

Difficulty: Basic

What is vibe coding, and what is the professional alternative?

Difficulty: Basic

What does an AI coding agent add on top of a plain chatbot?

Difficulty: Advanced

What is a prompt injection risk for coding agents?

Difficulty: Intermediate

Why are skill files or project rule files useful for AI-assisted development?

Difficulty: Intermediate

Why should large AI coding tasks start with a planning step before any code is generated?

Difficulty: Intermediate

Why is dumping the entire repository into an AI context often worse than selecting relevant files?

Difficulty: Intermediate

What is a design-decision prompt, and why is it useful?

Difficulty: Intermediate

Which tasks are good candidates for AI assistance once you already understand the domain?

Difficulty: Intermediate

Which tasks should you be cautious about delegating to AI?

Difficulty: Advanced

What is the overfitting failure mode in Test-Driven Generation?

Difficulty: Intermediate

How did formal inspections differ from modern code review?

Difficulty: Basic

What is the defect-finding fallacy in Modern Code Review?

Difficulty: Basic

Name three major non-defect functions of code review.

Difficulty: Advanced

What is the Code Review Comprehension Model (CRCM) asking a reviewer to hold in mind?

Difficulty: Intermediate

What practical limits should shape review size and speed?

Difficulty: Intermediate

Why do stacked pull requests help review quality?

Difficulty: Advanced

How do bikeshedding and linters relate?

Difficulty: Intermediate

What are five authoring practices that make code more reviewable?

Difficulty: Intermediate

How do assertions and guard clauses differ?

Difficulty: Advanced

What are Google’s two approval gates in code review?

Difficulty: Advanced

Why can adding more reviewers reduce accountability?

Difficulty: Advanced

Why does AI-generated code shift review toward outcome verification?

Difficulty: Basic

What is a code smell?

Difficulty: Intermediate

Why is duplicated code dangerous?

Difficulty: Basic

What usually causes a Long Method smell?

Difficulty: Advanced

How do Large Class and Divergent Change relate?

Difficulty: Advanced

How are Long Parameter List and Data Clumps related?

Difficulty: Advanced

Distinguish Divergent Change from Shotgun Surgery.

Difficulty: Intermediate

What is Feature Envy?

Difficulty: Advanced

Why should code smells be handled with judgment instead of automatic rules?

Difficulty: Basic

What is refactoring?

Difficulty: Basic

Why is refactoring an economic activity, not just code cleanup?

Difficulty: Basic

What are code smells in the refactoring workflow?

Difficulty: Intermediate

Which refactoring often addresses Data Clumps or Long Parameter List?

Difficulty: Intermediate

Which refactoring often addresses Divergent Change?

Difficulty: Intermediate

Which refactoring often addresses repeated type-code conditionals?

Difficulty: Basic

What is the safety net for refactoring?

Difficulty: Expert

What is the human supervisor’s role when AI performs refactorings?

Development Practices Master Quiz

A comprehensive mix of the development-practices quizzes with standalone decks: comprehension, debugging, GenAI, review, code smells, refactoring, and beacons.

Difficulty: Advanced

A function implements a simple discount rule, but the code uses five levels of nested conditionals, inconsistent variable names, and several helper calls whose names do not reveal their purpose. Which kind of cognitive load is the team mostly creating, and what should they do?

A discount rule may have some intrinsic load, but the stem describes avoidable presentation problems: nesting, names, and opaque helpers. That is the kind of load authors can reduce.

Germane load builds useful mental models. Confusing names and tangled control flow usually consume working memory without improving the reader’s schema.

Saturation describes how perceived complexity can stop scaling linearly, not a reason to abandon improvement. The team still controls several obvious sources of avoidable load.

Correct Answer:

Difficulty: Intermediate

A developer joins a legacy project with no domain knowledge and no reliable naming conventions. They must fix a localized bug in a small parsing function. Which comprehension strategy will they most likely need at first?

Top-down reading depends on prior schemas and reliable beacons. The question removes both, so the reader has little evidence to drive hypotheses.

Architecture recovery can help with system-level erosion, but it is disproportionate for a small localized parser bug.

Design patterns can be beacons, but many functions do not encode a formal pattern. Forcing pattern recognition here would add noise.

Correct Answer:

Difficulty: Advanced

Which artifacts or mental structures belong to the integrated meta-model of program comprehension? Select all that apply.

The situational model is the reader’s high-level understanding of system functions. Omitting it leaves only syntax, not purpose.

The program model captures the low-level implementation view. It is what bottom-up chunking builds.

The top-down domain model is what lets a reader generate expectations before seeing every statement.

The knowledge base supplies schemas and programming plans that make top-down reading possible.

The integrated model is opportunistic, not alphabetical. Expert readers choose routes based on hypotheses, beacons, and difficulty.

Correct Answers:

Difficulty: Advanced

A system’s architecture document describes a clean separation between presentation, domain, and data_access, but the codebase contains a single UserManager class that validates forms, builds SQL, and formats UI strings. What is the strongest diagnosis?

Removing the document hides the mismatch; it does not repair the code. The reader still lacks trustworthy cues about where responsibilities live.

Searchability is not the same as comprehensibility. A single class that mixes responsibilities may be easy to find and still hard to change safely.

Branch count might be one local symptom, but the stem describes responsibility drift across architectural boundaries.

Correct Answer:

Difficulty: Advanced

A senior engineer proposes adding design-pattern names to every class so future readers can understand the system faster. What is the best response?

Pattern names are helpful only when they map to a real, stable structure. Decorative pattern language can send readers down the wrong mental path.

Explicit vocabulary is often useful. Refusing to name real patterns removes a high-value beacon from the codebase.

Cyclomatic complexity is not the deciding factor. The deciding factor is whether the pattern name accurately communicates design intent that clients or maintainers should know.

Correct Answer:

Difficulty: Intermediate

You are assigned a 350-line pull request in an unfamiliar area. Which review sequence best applies the chapter’s comprehension advice?

Linear reading can work for tiny changes, but a 350-line unfamiliar change risks exhausting working memory before the reviewer has a useful specification layer.

CI is evidence, not a substitute for human comprehension. It cannot judge architecture, requirements fit, or missing tests on its own.

Textual size is a useful heuristic, but not the only one. A small concurrency change may be harder than a large rename.

Correct Answer:

Difficulty: Intermediate

A reviewer opens a complex PR and immediately starts reading the diff line by line. Ten minutes later they still do not know why the change exists. What should they do instead?

More tracing can deepen confusion when the reader lacks purpose. The top-down move is to establish intent first.

Rewriting may eventually be needed, but the immediate problem is the reviewer’s missing context, not proven bad code.

Familiar files can be useful beacons, but ignoring unfamiliar files creates blind spots.

Correct Answer:

Difficulty: Advanced

Which source-file organization best supports the Stepdown Rule and Newspaper Metaphor?

Placing details before the story forces bottom-up reconstruction. The Stepdown Rule gives the reader the high-level map first.

Alphabetical order may help lookup, but it does not encode abstraction descent or call structure.

Search helps navigation, but layout still shapes the first mental model a reader forms.

Correct Answer:

Difficulty: Intermediate

Which of these are useful beacons for top-down comprehension? Select all that apply.

The test name exposes intended behavior before the reader sees the implementation.

Package names can make architectural intent visible directly in the source tree.

x hides domain information. It forces the reader to infer purpose from surrounding statements.

A truthful pattern name can activate a known schema and compress a whole collaboration into one concept.

Review context can be a beacon too. It helps build the specification layer before source reading begins.

Correct Answers:

Difficulty: Intermediate

A developer expects a payment service to contain a refund path, but no naming, tests, or call hierarchy confirms that hypothesis. What is the most expert next move?

A schema is a hypothesis, not proof. When beacons fail to appear, the reader needs evidence.

Renaming without understanding risks creating false beacons. The reader must first discover the real behavior.

Failed hypotheses are normal. The expert move is to repair the mental model with targeted evidence.

Correct Answer:

Difficulty: Advanced

A class named PaymentFactory quietly applies fraud policy, discounts, and audit logging before returning an object. Why is this harmful to top-down comprehension?

Factory names are useful when the class really owns object creation. The issue is mismatch, not the word itself.

A more recognizable but false pattern name would make the problem worse.

File length may contribute, but the deeper issue is semantic deception: the beacon points to the wrong responsibility.

Correct Answer:

Difficulty: Intermediate

You are mentoring students who trace every line of every program, even when the structure is familiar. Which practice best helps them grow toward expert comprehension?

Tracing is still essential when hypotheses fail or syntax is new. The goal is strategic tracing, not no tracing.

Obfuscation is useful for experiments that isolate bottom-up reading, but it is a poor default for teaching expert strategies.

Pattern definitions without code recognition do not build the transfer skill students need.

Correct Answer:

Difficulty: Intermediate

Researchers want to measure bottom-up comprehension, so they rename isPrimeNumber to pn and remove comments from a code sample. Why does this manipulation matter?

Renaming identifiers does not change runtime behavior. The manipulation targets human comprehension, not performance.

Obfuscating names removes cues; it does not add a recognizable pattern.

Short names can work for narrow conventions, but arbitrary abbreviation destroys domain information.

Correct Answer:

Difficulty: Basic

You are reviewing a PR with new production code and tests. Which use of tests best follows the chapter’s beacon argument?

Tests often reveal the author’s intent more directly than production code does, especially for edge cases.

Reading tests after approval wastes their value as specification-layer beacons.

Tests that are unclear may need improvement, but deleting them removes executable intent.

Correct Answer:

Difficulty: Intermediate

Classify the beacons. Which examples are correctly identified? Select all that apply.

The name carries domain intent directly, which is exactly what lexical beacons do.

A recognizable code shape can activate a stored plan without full statement-by-statement reading.

The assertion exposes an assumption the surrounding code depends on.

Review metadata can establish the specification layer before source reading.

A random single-letter variable hides meaning rather than exposing it.

Correct Answers:

Difficulty: Advanced

A public class is named GlobalConfigSingleton. The name helps maintainers know there is only one instance, but clients now depend on that implementation detail. What is the best evaluation?

Beacon clarity is useful, but public names also define what clients learn and may depend on.

Beacon value does matter. The issue is whether that value belongs in the public abstraction.

Hiding all design information behind vague names destroys useful cues without necessarily protecting the right secret.

Correct Answer:

Difficulty: Intermediate

An expert reviewer skips a generated client file after confirming it matches the API schema, then spends most of the review on a small authorization change. Which principle explains this behavior?

Strategic attention allocation is not carelessness when the reviewer has reliable evidence about low-risk generated content.

Generated code still needs verification, but often through schema checks or generator trust rather than line-by-line reading.

Size and risk are different. A small authorization change can carry more risk than a large mechanical file.

Correct Answer:

Difficulty: Advanced

You are designing a review template to help reviewers use contextual beacons. Which prompt belongs in the template?

Duplicating the diff adds reading load without creating a higher-level specification layer.

CI status is useful evidence, but it cannot explain intent, risk, or design structure.

Formatting should usually be automated; leading with it wastes attention before the review has a mental model.

Correct Answer:

Difficulty: Intermediate

A user reports: “I clicked ‘Submit’ and the page froze with a spinning wheel that never stopped.” You open the code and find that a callback in handlePayment() never resolves its Promise when the payment gateway returns a 5xx response. How would you classify each of these in the fault / error / failure vocabulary?

The frozen spinner is what the user observes — that is the failure, not the fault. The fault is the location in the code that produces the bug, which is the missing resolution path in handlePayment().

The 5xx response is an external event, not the bug. The fault is something the developer wrote (or didn’t write) — here, the missing handling for the 5xx case in handlePayment().

The vocabulary is load-bearing for debugging: each term names a different observation point. A try/catch that swallows the exception turns a failure back into a contained error, even though the fault still exists — and you fix it in a different place than where you observe it.

Correct Answer:

Difficulty: Intermediate

After any immediate privacy risk has been contained, a user reports that your web app sometimes shows them another user’s data. You cannot reproduce it locally. They send a screenshot but no other details. What should your first debugging action be?

Shipping a fix before you can reproduce the bug means you cannot verify the fix worked. A cross-account data leak that seems gone may just be a leak you have not yet reproduced. Reproduce first, then fix.

Setting breakpoints in production stops the world for real users every time the breakpoint fires — unacceptable for a live service. Debuggers belong in a local reproduction of the bug, which is exactly what you don’t yet have.

Spraying print() across every endpoint generates a haystack to search, when the user can hand you a needle. Targeted logging after you have a reproduction hypothesis is useful; blind logging in production is mostly noise.

Correct Answer:

Difficulty: Intermediate

Your team has just manually reproduced an intermittent payment bug after two days of investigation. Before anyone touches the production code, which of the following are worthwhile next steps? (Select all that apply.)

You are about to try a dozen possible fixes, and re-running the reproduction by hand each time is slow and tempting to skip. Automating it now turns every fix attempt into a seconds-long check — and the test becomes the permanent regression test once the bug is fixed.

A 200-step reproduction usually has a handful of essential steps and many confounders. Stripping the non-load-bearing steps makes every fix attempt faster, yields a cleaner regression test, and exposes the minimal trigger that hints at the root cause.

The notes are precisely what lets a teammate (or future you) reproduce the bug after a context switch. Delete them and the next intermittent failure starts from scratch. Add them to the ticket instead of the trash.

Correct Answers:

Difficulty: Intermediate

A teammate has a Python bug they’ve been stuck on for an hour. They walk over to your desk and say “can you look at this?” You read the function — about 30 lines — and notice nothing obviously wrong. Which suggestion is the highest-leverage pedagogical move?

Taking over the keyboard finds the bug faster for you, but the teammate loses the chance to build the debugging skill. They will be in the same spot on the next bug. Make them drive.

Outsourcing the diagnosis short-circuits the most valuable part of debugging — the moment of realizing what the code actually does versus what they intended it to do. That moment is where the mental model updates. AI assistants are useful for things you already understand, less useful for unblocking the learning itself.

A break sometimes helps, but it is a stalling tactic, not a debugging technique. Rubber-duck-explaining produces the same insight without the wait.

Correct Answer:

Difficulty: Intermediate

You have a regression: a test that passed on Friday now fails on Monday. There are 87 commits between the two versions and no obvious culprit in the diff. Which tool is the most efficient for finding the commit that introduced the regression?

git blame is excellent for “who last touched this line?” but it does not tell you which commit broke a test. A regression often comes from a change in a different file than the one the test exercises.

Linear search through 87 commits is roughly 87 test runs in the worst case. git bisect does the same job in roughly $\log_2(87) \approx 7$ test runs — over an order of magnitude faster.

Batch reverting throws away unrelated work and only narrows the search to a batch of 10 commits, not the single offending one. You still have to bisect the batch.

Correct Answer:

Difficulty: Intermediate

You see this error in your terminal while setting up a new project: ERROR 3680 (HY000): Failed to create schema directory 'tobias_dev_orders_2026_q1' (errno: 2 - No such file or directory). What is the best thing to copy into a search engine or AI assistant?

Including the project-specific schema name pollutes the query: nobody else has a database with that exact name, so search engines can’t match your query to anyone else’s solution. It also leaks information you may not want to send to a third party.

Stripping the error code throws away the most useful diagnostic the message contains. ERROR 3680 (HY000) and errno: 2 are stable identifiers other developers will have searched for. Strip the project-specific bits, keep the framework-specific bits.

Errors from frameworks, libraries, and external services have almost always been encountered before — your job is to find the prior thread. The DBA is a last resort, not a first.

Correct Answer:

Difficulty: Intermediate

You’re chasing a bug that only appears around the 10,000th line item in a specific user’s account. Stepping through the loop one iteration at a time in the debugger would mean clicking Step Over thousands of times. What’s the right move?

Commenting the loop changes the program’s behavior — if the bug interacts with loop state (accumulator overflow, off-by-one at the boundary, an unexpected value at iteration 9,847), the reading-without-running approach misses it entirely.

Hard-coded short lists exercise different code paths than 10,000-item lists. The bug you’re chasing depends on scale and position; shrinking the input is exactly what makes it disappear.

Printing 10,000 iterations to a log is the non-interactive equivalent of clicking Step 10,000 times. A conditional or hit-count breakpoint lets you ask the same question (“what’s happening near iteration 10,000?”) without generating a forest of noise.

Correct Answer:

Difficulty: Intermediate

A teammate marks a ticket “FIXED” with this commit: a one-line change that makes the previously-failing reproduction pass. They did not run the rest of the test suite. What is the most important risk they have left exposed?

Searchable commit messages are a real benefit, but missing them produces an inconvenience rather than a broken product. A regression silently shipped to users is a much larger risk.

The documentation point is real (and worth flagging), but a missing comment doesn’t break the product. A regression does.

Nearby assertions are a good practice — they catch related bugs proactively — but they don’t compensate for skipping the existing regression suite. A passing single test plus no suite run is weaker evidence than failing assertions.

Correct Answer:

Difficulty: Advanced

Look at this code:

def transfer(account_from, account_to, amount):
    try:
        account_from.balance -= amount
        account_to.balance += amount
    except:
        pass

The team lead says “This is fault-tolerant — if anything goes wrong, the user doesn’t see a crash.” What’s wrong with this reasoning?

Fault tolerance is selective error handling for known failure modes with deliberate recovery — not a bare except: pass that swallows everything. The latter is one of the most dangerous patterns in the language because it hides bugs and leaves invariants violated.

Printing to the console helps during development but is no substitute for proper error handling in production. The bigger problem is the violated invariant (money debited but not credited), which printing doesn’t fix.

That is a style preference unrelated to the correctness or fault-tolerance argument. The dangerous pattern is the swallow-everything except.

Correct Answer:

Difficulty: Intermediate

A junior engineer is debugging a deeply nested issue in a backend microservice. They have been at it for three hours with no progress, just rereading the same 200 lines of code. What is the single most likely explanation for why they are stuck?

Most bugs in production code do not require deep language esoterica. The much more common pattern is a smart engineer running a mental model that doesn’t match the code — which is exactly what the curse of knowledge predicts and what rubber-ducking breaks.

Unfixable bugs are rare; stuck-on-fixable bugs are common. “Rewrite from scratch” is almost always the wrong answer when the actual problem is a stale mental model.

Tool quality matters at the margins, but a 3-hour stall has a stale-mental-model smell, not a missing-IDE-feature smell. New tools are unlikely to provide the insight that switching debugging tactics would.

Correct Answer:

Difficulty: Intermediate

Compilers (1960s) delivered a 10x productivity gain. Current research estimates GenAI delivers 21%–50%. What is the most accurate explanation for the gap?

Compilers were vastly slower than LLMs (compilation took hours on 1960s hardware). Execution speed of the tool is not what produces engineering productivity. The compiler’s leverage came from what it automated, not how fast it ran.

The 21–50% range is the consistent finding across multiple controlled studies — not a measurement artifact. Treating it as undercounted overstates current AI capability and underestimates the work that essential complexity still demands.

Compilers eliminated whole categories of repetitive translation work that previously consumed half a developer’s day. GenAI’s reduction is real but smaller in scope. The asymmetry is well-documented, not marketing.

Correct Answer:

Difficulty: Intermediate

A developer says “Copilot wrote the whole feature in 5 minutes — I’m so much more productive!” Two days later they’re still debugging it and have shipped a security vulnerability. Which trap have they fallen into?

Cognitive offloading is a separate trap — it concerns skill formation, not the productivity illusion specifically. The pattern described is about misattributing speed to productivity, then paying the debt downstream.

Hallucination is one cause of bugs in AI output, but the framing of the question is about how ‘fast’ the generation felt vs how slow the end-to-end work was. The illusion is a measurement error, not a single defect type.

Premature optimization is unrelated — the issue isn’t over-engineering, it’s that the generated code is subtly broken and the bug-tail is long.

Correct Answer:

Difficulty: Intermediate

Two computer-science students use a chatbot to learn linked lists. Student A pastes the assignment prompt and copies the answer. Student B asks the chatbot to explain why a tail pointer matters, then implements it themselves. Six months later, which is most likely to struggle on the data-structures exam, and why?

Time-on-task with active engagement is what builds long-term memory. Student B’s extra time was productive struggle, the strongest predictor of durable learning.

Equal performance would mean cognitive engagement has no effect on learning — which contradicts decades of cognitive-science research (effortful retrieval, generation effect, desirable difficulties).

Subscription tier is irrelevant. The difference is how the AI was used, not which version answered.

Correct Answer:

Difficulty: Intermediate

Which of these are valid items in the Supervisor Mentality for working with GenAI? Select all that apply.

AI output looks polished even when wrong. Every block needs review at the same scrutiny a junior teammate’s code would receive — same defect rate, more confident phrasing.

The explainability rule prevents the team from accumulating code nobody understands. When the bug appears at 3 AM, you’ll need to debug it — being able to explain it is a precondition for being able to fix it.

Roughly 40% of Copilot suggestions in security-sensitive scenarios have been found to contain vulnerabilities, and AI fluently produces plausible-but-wrong patterns it pattern-matched from training data. Defaulting to “subtly broken until proven otherwise” changes review quality immediately.

Reading more code does not produce better judgment. AI lacks domain context, system-specific constraints, and accountability — all of which experienced human teammates bring. Trusting it more is the inversion of the right calibration.

Capable but unreliable is the right mental model: useful for first drafts, dangerous when given final authority. The same trust calibration you’d extend to a smart intern: review, verify, don’t auto-merge.

Correct Answers:

Difficulty: Intermediate

Your team adopts Test-Driven Generation. Walk through the correct sequence.

Reversing the order destroys the entire benefit: tests written for the existing implementation just rubber-stamp it instead of constraining it. This is the textbook TDD anti-pattern, AI version.

Tests that ‘defeat’ code is adversarial security testing, not TDG. The point of TDG is to use generated tests as a specification the implementation must satisfy.

Single-shot prompts give the AI no feedback loop to correct itself, and the developer no opportunity to verify the tests before committing to them as the spec. Throughput is fast, defect rate is high.

Correct Answer:

Difficulty: Advanced

Two teams adopt the same AI coding assistant. Team A’s codebase is a tightly coupled monolith (“spaghetti”); Team B’s is a set of well-bounded microservices with clean interfaces. Both apply AI to similar tasks. Why does Team B see substantially larger productivity gains?

Same assistant, similar tasks — the structural difference between codebases is the variable, not prompt skill. Even strong prompt engineering on a spaghetti codebase will run into context-window limits and hidden coupling.

Microservices can be written in any language; many are in the same languages as monoliths. The benefit comes from modularity, not language choice.

Attributing the difference to staff skill ignores the architectural variable explicitly described. The same engineers in either codebase would see the same architecture-mediated effect.

Correct Answer:

Difficulty: Basic

An LLM confidently produces this line in a Python script: import datafetcher_v2 as dfv2. The library does not exist. What is this called, and why does it happen?

Python has no compile step — a fabricated module fails with ModuleNotFoundError at run time. Linters can flag unresolved imports, but the root cause is the model inventing a library, not a translation error a compiler would catch.

Some hallucinations are references to deleted libraries, but most are fabricated names that never existed. The mechanism is the same — token prediction without verification — but framing it as ‘old version’ understates the breadth of the problem.

The model has no network connection during inference. Hallucination is a property of the model’s generation process, not of any external lookup.

Correct Answer:

Difficulty: Basic

AI pair programming distinguishes a Driver mode and a Navigator mode for the human. Which role assignment is correct?

Letting the AI fully drive while a human reviews after is the vibe-coding anti-pattern the SEBook explicitly warns against. The human’s role in both roles is to retain understanding and accountability for every line shipped.

AI handling all decisions removes engineering judgment from the loop and abandons the explainability rule. Pair programming with AI is collaborative, not delegated.

The roles are deliberate and well-defined — they describe different distributions of writing vs reviewing work between human and AI, each appropriate in different situations.

Correct Answer:

Difficulty: Advanced

Industry analysis has reported that codebases using AI coding assistants had a noticeable rise in code complexity and static-analysis warnings relative to pre-AI baselines. Assume the finding generalizes. What is the architectural risk?

Proportional growth would not produce per-file or per-function complexity rises — the metrics cited normalize for size. The rise is in complexity-per-unit-code, not just total lines.

Mainstream static analyzers handle the same languages and constructs whether code is human- or AI-written. The “new paradigms” framing tries to attribute the gap to tool blind spots; the gap is in the code, not the analyzer.

Tests are typically excluded or analyzed separately. Even if included, the complexity-per-function metric doesn’t credit tests as warnings; the increase is in production code structure.

Correct Answer:

Difficulty: Basic

A senior architect predicts: “The future belongs to engineers who can orchestrate AI agents, not just write code.” What underlying skills does that prediction imply will become more valuable, and which less?

Typing speed and syntax memorization are exactly the work AI is best at automating. Predicting they will become more valuable inverts the trend.

Equal valuation would mean the skill mix is unchanged, which contradicts every workflow analysis from the past three years. The shift is real and one-directional toward specification, judgment, and verification.

Studies show AI is best as a force multiplier, weakest at autonomous end-to-end engineering. Domain knowledge, real systems thinking, accountability, and the ability to translate ambiguity into structure remain irreplaceable.

Correct Answer:

Difficulty: Advanced

An AI coding agent reads a blog post while debugging your build and then asks permission to run a shell command you do not recognize. What is the most responsible response?

Finding a command on the web is not evidence that it is safe. A malicious page can plant instructions for agents to copy, so the human must inspect the command and source before approving it.

The lesson is not “never use agents.” The lesson is that tool access raises the supervision bar: inspect commands, bound permissions, and keep the human accountable.

Model confidence is not a security control. The right check is whether the human understands the command’s effects and whether the command is necessary for the task.

Correct Answer:

Difficulty: Intermediate

Why do project-level skill files or rule files improve AI coding-agent results?

Skill files improve context, but they do not make an unsound, non-deterministic model sound or deterministic.

Rule files reduce omissions; they do not prove the output is correct. The human still reviews, tests, and owns the resulting code.

Rules are useful only when combined with repository context. They tell the agent how to work here; they do not replace reading the relevant files.

Correct Answer:

Difficulty: Advanced

You want an agent to implement a stateful feature in an unfamiliar codebase. Which workflow best applies the lecture’s advice?

A running UI checks the happy path, not the design, state transitions, security, or maintainability. Large one-shot prompts also make it harder to locate where the agent made a bad assumption.

Planning helps, but it does not replace executable verification. Stateful code needs tests because the hard part is often the interaction among cases.

The agent can propose architecture, but the human must judge whether it fits the domain, existing system, and long-term maintenance constraints.

Correct Answer:

Difficulty: Intermediate

Why is “read the entire repository before coding” often a bad instruction for an AI agent?

Agents can read text files. The issue is not whether text can be read, but whether the right text stays salient inside the model’s limited context.

Speed is not the core problem. A slower prompt can still be worthwhile if it provides the relevant context; the failure is low-signal context, not context itself.

Reading files does not prevent editing. It can simply crowd the context window with details unrelated to the task.

Correct Answer:

Difficulty: Intermediate

Which tasks are especially well-suited for AI assistance once the human already understands the domain? Select all that apply.

Boilerplate is a strong AI use case when the human can review the pattern and spot deviations.

High-stakes architecture decisions require domain understanding, trade-off judgment, and accountability. AI can help list trade-offs, but it should not make the final decision unreviewed.

Explanation is one of the safest high-value uses: it supports conceptual inquiry while keeping the human responsible for applying the idea.

Prototypes are useful because they make requirements concrete. They still need engineering review before becoming production code.

This is cognitive offloading. It may finish the assignment, but it prevents the student from building the schema needed to review or debug similar code later.

Correct Answers:

Difficulty: Intermediate

A team adds a hero avatar customizer. A student suggests storing the entire customized SVG in localStorage; another suggests storing the selected parameters and regenerating the SVG. What is the best engineering lesson from this disagreement?

Shorter is only one possible criterion, and often not the important one. Design decisions need explicit quality attributes, not a vague preference.

Storing the SVG captures the current rendering but may make future migrations, validation, and privacy review harder. Exactness today is not the same as good design over time.

Parameters are often better for evolvability, but “always” overstates it. If regeneration is unstable or the renderer changes incompatibly, raw output might have a defensible role.

Correct Answer:

Difficulty: Advanced

During test-driven generation, the AI writes an implementation that passes every visible example by hard-coding a dictionary from sample inputs to sample outputs. What should the human do?

Passing tests is useful only when the tests specify the behavior rather than merely list examples. A hard-coded lookup table passes examples while failing the real requirement.

The tests revealed a weakness in the specification; removing them loses that signal. Strengthen the tests and inspect the implementation.

Comments do not turn an overfit implementation into a correct one. The problem is behavioral generality, not readability of the wrong approach.

Correct Answer:

Difficulty: Basic

Which sequence correctly names the three main stages discussed for LLM development and use?

That sequence describes a traditional compiled-program toolchain, not the lifecycle of an LLM.

Requirements, design, and maintenance are software-engineering phases. They matter when supervising AI, but they are not the model-development stages.

Tokenization is part of how text is represented, and deployment may follow model development, but this sequence does not capture the training-and-use pipeline from the lecture.

Correct Answer:

Difficulty: Intermediate

A reasoning model shows a polished step-by-step explanation before generating code. Why should that trace still be treated cautiously?

Human-looking explanation is not evidence of human-like cognition. The model can generate plausible reasoning text while still missing the real invariant.

Reasoning mode does not turn a non-deterministic system into a deterministic compiler. The same prompt can still lead to different outputs.

Reasoning traces can help, but executable behavior still needs tests and human review.

Correct Answer:

Difficulty: Intermediate

You want an agent to add a title-only search box to the SEBook home page. Which prompt best applies the lecture’s prompt-engineering advice?

“Make it work well” gives the agent no acceptance criteria and no scope. The feature it ships may not be the one you wanted.

Dumping the whole repo into context buries the constraints that matter and lets the agent decide design questions you should own.

“Modern” and “polished” are taste words, not criteria. New libraries also expand scope; constrain the feature instead.

Correct Answer:

Difficulty: Advanced

An agent adds a “schedule study” feature that looks polished, but the generated quiz links use URLs that do not exist. What should a reviewer infer? Select all that apply.

Link validity is observable behavior. A test or manual check should catch it before the feature ships.

Plausible routes are exactly the kind of thing an LLM can invent when it has not been grounded in the repository’s real routing conventions.

Visual polish is not correctness. A polished broken link is still broken.

Acceptance criteria should describe the behavior that makes the feature valuable. If links are part of the value, their validity belongs in the criteria.

Broken links are user-facing defects. They can strand learners and fail the core purpose of the feature.

Correct Answers:

Difficulty: Advanced

A team wants AI to implement a feature for a public educational site that must meet WCAG 2.2 AA. Which decision best evaluates the risk?

Accessibility is a release constraint, not optional polish. Waiting for a user complaint shifts the cost to people the system is supposed to serve.

AI can help brainstorm checks and draft code, but the workflow must keep human verification and explicit standards in the loop.

Confidence is not evidence. Accessibility requires concrete checks such as semantic markup, keyboard operation, focus visibility, contrast, reflow, and status-message behavior.

Correct Answer:

Difficulty: Intermediate

You are starting a personal project to learn a library you have never used. Which AI-assisted workflow best creates durable skill rather than cognitive offloading?

Studying only after failure makes the AI do the schema-building work. The project may run while the learner’s understanding stays shallow.

Error-paste loops can fix symptoms without building the mental model needed to debug future problems.

The lecture argues against cognitive offloading, not against all AI use. Conceptual inquiry can strengthen learning when the learner remains active.

Correct Answer:

Difficulty: Intermediate

Which statement best distinguishes formal inspections from Modern Code Review?

Formal inspections were effective but slow, especially because scheduling multiple people into meetings consumed large amounts of development time.

The Reader role belongs to formal inspections, not typical asynchronous MCR.

The shift was process and tooling, not language.

Correct Answer:

Difficulty: Intermediate

Your manager says, “If only about 15% of review comments find functional defects, code review is mostly waste.” What is the strongest response?

CI catches some classes of failures, but it does not provide mentorship, design judgment, ownership diffusion, or maintainability critique.

Guessing produces low-signal comments. Review quality improves by focusing human attention on outcomes humans are suited to judge.

The defect-finding gap is specifically about modern review datasets, not only formal inspections.

Correct Answer:

Difficulty: Intermediate

A teammate submits a 1,200-line feature PR touching database migrations, backend rules, and UI. They say one large PR is easier because reviewers see the whole feature at once. What should you recommend?

More reviewers can create a bystander effect. It does not guarantee that anyone forms a complete mental model.

Large PRs may happen sometimes, but accepting them as normal makes deep review unlikely.

Faster skimming is the opposite of effective review. Speed without comprehension misses design and functional problems.

Correct Answer:

Difficulty: Advanced

Which strategies fit the Code Review Comprehension Model for a non-trivial PR? Select all that apply.

Tests can provide the specification layer the reviewer needs before reading implementation detail.

Core-based reading spends attention where the most important design decision lives.

Chunking keeps the mental model small enough to reason about.

A quick scroll is impression-based, not specification-driven review.

Easy-first can be useful when it intentionally reduces clutter before harder reasoning.

Correct Answers:

Difficulty: Intermediate

An author wants to make a complex function more reviewable before opening a PR. Which changes are aligned with the chapter? Select all that apply.

Contracts let reviewers check behavior against explicit assumptions instead of reconstructing intent from scratch.

Assertions make impossible states fail fast and expose local assumptions.

Guard clauses reduce nesting and let the normal path stay flat.

Named chunks compress working-memory load and let reviewers drill into one concept at a time.

One giant method removes navigation but overloads working memory. Reviewability depends on meaningful abstraction, not merely file locality.

Correct Answers:

Difficulty: Advanced

In apply_discount, a check rejects a user-entered discount of 150% and returns a validation error. Elsewhere, assert subtotal >= 0 documents an invariant after pricing. Which statement is most accurate?

User input can be invalid in normal operation. Assertions may be stripped in production and should not be the only handling for expected runtime conditions.

Assertions are useful for invariants that should never be false if the code is correct.

Comments do not fail fast or protect control flow. Executable checks carry stronger evidence.

Correct Answer:

Difficulty: Advanced

In Google’s review process, why might one change require both an owner approval and a readability approval?

Ownership is about codebase authority, not formatting. Readability is a trained quality norm, not a popularity signal.

Google’s data shows many changes are approved quickly despite the gates. The gates protect different quality dimensions.

Readability does not replace tests or ownership. It adds a human norm check for maintainable code.

Correct Answer:

Difficulty: Advanced

An AI agent opens a 2,000-line PR that passes unit tests. The reviewer feels pressure to approve because the code looks polished and CI is green. What is the safest review posture?

AI code can look authoritative while being subtly wrong. Green tests only prove the existing tests passed.

A blanket ban ignores useful AI assistance. The safer standard is stronger verification and smaller reviewable units.

More comments can help explain intent, but they do not prove behavior or security. Outcome evidence is needed.

Correct Answer:

Difficulty: Basic

A function works correctly today, but it is 120 lines long, mixes validation, database writes, email formatting, and logging, and is hard to test. Which statement is most accurate?

Passing tests show current behavior, not future modifiability. Smells often matter precisely because they predict later change risk.

Rewriting from scratch is rarely the first move. A smell asks for diagnosis and targeted refactoring.

Testability, responsibility boundaries, and future change cost are engineering concerns, not mere aesthetics.

Correct Answer:

Difficulty: Advanced

A User class changes when database schema changes, when display-name formatting changes, and when password-reset email copy changes. Which smell is most central?

Shotgun Surgery is one conceptual change scattered across many modules. Here, one class is changing for many unrelated reasons.

Data Clumps are repeated groups of values. The stem is about mixed responsibilities.

Feature Envy is behavior living near the wrong data. The stronger signal here is one class carrying multiple reasons to change.

Correct Answer:

Difficulty: Advanced

Adding a new tax rule requires tiny edits in Invoice, ReceiptPrinter, TaxReport, OrderSummary, and CustomerExport. Which smell does this suggest?

Large Class concentrates too much behavior in one place. The stem describes a behavior scattered across many places.

A Long Method might exist somewhere, but the defining symptom is one change requiring many scattered edits.

Duplication can contribute, but Shotgun Surgery does not require byte-for-byte repeated lines. It requires scattered change points.

Correct Answer:

Difficulty: Advanced

Multiple functions accept street, city, state, zip, and country in that order. Bugs often happen when two adjacent strings are swapped. What is the best smell diagnosis and refactoring response?

Feature Envy concerns behavior leaning on another object’s data. The stem describes related primitives traveling together.

Deleting parameters loses information. The goal is to name and group the related values.

A class merge would likely concentrate responsibilities rather than clarify the address concept.

Correct Answer:

Difficulty: Expert

A method in InvoicePrinter repeatedly calls invoice.getCustomer().getAddress().getZipCode() and invoice.getCustomer().getDiscountTier() to decide billing rules. Which concerns are plausible? Select all that apply.

The method’s interest is centered on another object’s data and policy, which is the core Feature Envy signal.

Deep getter chains expose internal navigation paths and couple the caller to object structure.

A delegating method can let the client ask a higher-level question without traversing the object graph.

Getters can still leak structure. Encapsulation is about protecting design decisions, not merely using accessor syntax.

Correct Answers:

Difficulty: Advanced

A linter flags a tiny method as a smell because it has only one line. The method name is a domain phrase used throughout the team’s conversations, and it hides a volatile calculation behind a stable interface. What should the team do?

Mechanical smell rules miss context. A tiny method can still earn its keep if it names a concept and hides volatility.

Fewer methods can reduce navigation, but inlining can also expose change-prone detail everywhere.

Shorter names are not automatically clearer. Domain-rich names are often the beacon that justifies the abstraction.

Correct Answer:

Difficulty: Intermediate

Which change is a true refactoring?

Changing accepted inputs changes observable behavior. That may be a good bug fix or feature, but it is not refactoring.

Adding behavior is feature work. Refactoring may prepare for it, but the feature itself is not behavior-preserving.

Deleting a test changes the safety net and may hide behavior changes. It is not an internal structure improvement.

Correct Answer:

Difficulty: Advanced

Match the refactoring to the smell it most directly addresses. Which pairings are reasonable? Select all that apply.

Related values that travel together usually deserve a named object that captures the concept.

A class that changes for unrelated reasons likely contains responsibilities that should be split.

A repeated branch on type often means behavior wants to move behind subtype or strategy objects.

A class that does not justify its existence can be folded into a more useful owner.

A missing test is a safety-net gap, not a naming smell. Renaming may improve clarity, but it does not create behavioral evidence.

Correct Answers:

Difficulty: Intermediate

A team wants to refactor a tangled billing module. What is the safest sequence?

A big rewrite delays feedback until many possible mistakes are mixed together. It becomes hard to know which change broke behavior.

Some tests may be implementation-coupled, but disabling failures before understanding them removes the behavior-preservation signal.

Mechanical extraction without a responsibility model can create shallow classes and new coupling.

Correct Answer:

Difficulty: Advanced

During a feature crunch, a developer notices a misleading local variable name in the function they are already editing. They also want to reorganize the whole package. What is the best refactoring judgment?

Bundling a large reorganization into a feature makes review harder and increases the chance of accidental behavior changes.

Refusing all small cleanup allows broken windows to spread. The key is scope control, not a blanket ban.

Deadline pressure is exactly when large structural changes are riskiest. Keep the current change reviewable.

Correct Answer:

Difficulty: Advanced

An AI agent proposes to “refactor” a module by extracting helpers, changing error messages, and altering the order in which side effects occur. What should the human supervisor do?

AI can execute many transformations, but it can also quietly change behavior. Capability does not remove the need for verification.

Hiding behavior changes behind names makes review harder and violates the central contract of refactoring.

Formatting proves only surface consistency. It says nothing about observable behavior or side-effect order.

Correct Answer:

Difficulty: Advanced

A checkout module has a switch on paymentType repeated in five places: fees, validation, receipt text, fraud rules, and retry policy. Which refactoring direction best fits the smell?

Comments may help temporarily, but they do not remove the repeated change point. A new payment type would still require edits in five places.

Consolidating into a utility class may reduce search effort but can create a new god object and preserve the type-code smell.

Inlining worsens working-memory load and makes future payment-type changes harder.

Correct Answer:

Code Beacons

Code Beacons explains how experienced developers use familiar identifiers, structures, tests, and architectural cues as cognitive anchors while reading unfamiliar code.

Code Comprehension

Code Comprehension teaches how developers form mental models of a system, why top-down reading matters, and how architecture-code gaps make comprehension harder. The Part 1 and Part 2 tutorials turn those ideas into guided practice.

Debugging

Debugging covers reproducing a fault, localizing the root cause, using debuggers and assertions, verifying the fix, and preserving the regression test. The Python Debugging Tutorial gives hands-on practice with breakpoints and time-travel debugging.

Defensive Programming and Design by Contract

Defensive Programming in Python teaches boundary validation, precise exceptions, invariant preservation, and failure reporting. Design by Contract in Python follows it with caller/callee responsibility, preconditions, postconditions, old-state reasoning, invariants, and contract strength.

Generative AI

Generative AI in Software Engineering explains how AI coding tools change productivity, verification, skill formation, supervision, and team workflows without replacing engineering judgment.

Modern Code Review

Modern Code Review teaches review as a socio-technical practice: small reviewable changes, reviewer cognition, asynchronous workflows, defect finding, knowledge transfer, and AI-era risks.

Prompt Engineering

Prompt Engineering covers how to communicate tasks, constraints, examples, and verification expectations to AI assistants so their output is useful and reviewable.

Code Smells

Code Smells teaches the symptoms of poor design, including long methods, large classes, duplicated code, feature envy, and deeply nested conditionals.

Refactoring

Refactoring explains behavior-preserving transformations, safe refactoring rhythm, and the relationship between smells, tests, and design improvement. The Code Smells and Refactoring Tutorial provides tool-supported practice.

Top-Down Code Comprehension

Top-Down Code Comprehension focuses on hypothesis-driven reading: start from purpose and architecture, then use targeted navigation to confirm or revise the mental model.

Development Practices

Practice Across Development Practices

Development Practices Master Flashcards

Workout Complete!

Development Practices Master Quiz

Workout Complete!

Code Beacons

Code Comprehension

Debugging

Defensive Programming and Design by Contract

Generative AI

Modern Code Review

Prompt Engineering

Code Smells

Refactoring

Top-Down Code Comprehension