The Role of Generative AI in Modern Software Engineering

Enable JavaScript to unlock Galleries, BibTeXs, and the Contact Form.

Dark Mode

Show Highlights

Read Aloud

The integration of Generative AI (GenAI) into software development represents one of the most significant shifts in the industry since the 1960s. During that era, the invention of compilers allowed developers to move from low-level assembly to high-level languages, resulting in a 10x productivity gain because a single statement could translate into approximately ten machine instructions. Current research suggests that while GenAI is disruptive, its current productivity boost is more modest, estimated between 21% and 50%. This discrepancy exists because compilers automated accidental complexity—the repetitive mechanics of coding—whereas modern developers must still grapple with essential complexity, which involves the core logic and design decisions inherent to a problem.

The compiler comparison is useful because it highlights a deeper difference: compilers are sound abstractions. Given the same source program and compiler settings, a developer can predict the compilation result. AI coding agents are usually unsound abstractions: they are non-deterministic, black-box systems that may produce different answers to the same prompt and can confidently generate code that is plausible but wrong. That means the human engineer cannot stop being responsible for requirements, design, review, testing, security, accessibility, and maintainability.

By the end of this chapter, you should be able to:

Explain how an AI coding agent builds on an LLM.
Identify why AI-generated code creates security, correctness, maintainability, and learning risks.
Apply software-engineering techniques such as small user stories, code review, test-driven development, refactoring, and architecture boundaries to control those risks.
Use prompt and context-engineering techniques to get more useful output without surrendering understanding.

How LLMs Work: The “Statistical Parrot”

Large Language Models (LLMs) do not “understand” code in a human sense; instead, they function as statistical parrots. Their development involves three primary stages:

Pre-Training: Creating a base foundation model by training on vast amounts of publicly accessible code to predict the most likely next token.
Post-Training: Optimizing the model for specific use cases through fine-tuning on labeled data (like LeetCode problems) and Reinforcement Learning from Human Feedback (RLHF), where developers rank outputs based on readability and correctness.
Inference: The process of prompting the model to produce a sequence of answer tokens, which is typically non-deterministic.

Because these models rely on linguistic similarities rather than formal logic, they are prone to repeating outdated patterns, quoting factually incorrect statements, or “hallucinating” calls to non-existent methods.

Reasoning or “thinking” models reduce some failures by spending extra inference compute on intermediate steps that resemble a human working through a problem. This can be useful, but it does not make the system a human reasoner. It is still generating likely token sequences, just with more scaffolding between the prompt and the final answer. The output may look like a chain of careful thought while still resting on pattern matching rather than grounded knowledge of your code base or the real world.

What Coding Agents Add

An AI coding agent wraps an LLM in a software-development environment. Instead of only chatting about code, the agent can inspect files, search the repository, edit files, run tests, read compiler errors, inspect Git history, and sometimes browse documentation. This is the jump from “chatbot that suggests code” to “assistant that can participate in a workflow.”

That extra power cuts both ways. An agent that can run npm test can also propose a destructive command such as rm -rf if the prompt or retrieved context leads it there. Modern agents are also exposed to prompt injection attacks: malicious instructions placed in web pages, issues, comments, or documents that the agent reads and then treats as if they were legitimate task instructions. A developer who does not understand shell commands, Git, package managers, or the project architecture cannot safely supervise the agent.

Persistent instruction files help. Tools such as Cursor rules, Claude skills, AGENTS.md, and similar project-level directives let a team encode “always do this here” knowledge: run the test suite after code changes, keep the storage inventory in sync when adding localStorage, preserve dark-mode contrast, or update the shortcut registry when adding a keyboard command. These files are not magic. They improve the default behavior of the agent by making important constraints visible, but the human still has to verify that the agent actually followed them.

Risks: the “Illusion of AI Productivity”

One of the most dangerous traps for developers is the illusion of AI productivity. AI often provides an immediate solution that looks solid, making the developer feel highly productive. However, if the solution is flawed, the time saved in generation is quickly lost in debugging; for example, a task that once took two hours to code and six hours to debug might now take five minutes to generate but 24 hours to debug.

Furthermore, widespread use of AI has introduced significant security risks. Studies indicate that 40% of code generated by tools like GitHub Copilot contains security vulnerabilities. Paradoxically, developers with access to AI assistants often write less secure code while simultaneously being more confident that their code is secure. Additionally, the use of AI can lead to a surge in technical debt; research into repositories using AI coding agents found a 41.6% increase in code complexity and a 30.3% rise in static analysis warnings.

The exact percentages vary by study design and model generation, but the pattern matters more than any single number: AI can increase both defect risk and confidence at the same time. One study discussed in lecture found serious AI-related security vulnerabilities in a substantial fraction of surveyed companies. Other controlled studies found that code generated with AI assistants can be less secure even when developers are explicitly asked to improve security. This is a calibration failure: the AI’s fluency makes the code feel safer than it is.

The same pattern appears outside security. Accessibility, privacy, compliance, and maintainability are not optional polish in professional systems. Regulators, users, and production incidents do not care that the feature looked good in a demo. If the prompt never mentions WCAG compliance, consent, auditability, or domain-specific invariants, the agent may simply optimize for the visible happy path.

Skill Formation

For junior engineers, relying too heavily on GenAI can hinder skill formation. Using AI for “cognitive offloading”—simply copying and pasting answers—minimizes learning and leaves the developer unable to debug or explain the logic later. A more effective approach is conceptual inquiry, where the developer treats the AI as a “Digital Teaching Assistant”, asking it to explain library functions or argue the pros and cons of different implementations. This method ensures the developer utilizes their continual learning ability, which remains a key differentiator between humans and AI.

The practical rule is simple: you can outsource some thinking, but you cannot outsource your understanding. If you use AI to avoid the struggle of learning a data structure, API, design pattern, or debugging strategy, you may finish the immediate task while becoming less capable afterward. If you use AI to ask better questions, compare alternatives, critique your attempt, or explain an unfamiliar algorithm after you have tried it, you can raise your ceiling instead.

For students, that distinction is especially important. A professional engineer may sometimes optimize for delivery speed because the main goal is to ship. A student is usually optimizing for durable skill. That changes the recommended workflow:

Write your own first attempt before asking the AI for code.
Ask the AI to critique, explain, and propose edge cases rather than to replace your work.
When the AI writes code, read it until you can explain it line by line.
If you cannot review the code quickly, shrink the task until you can.

Best Practices: The Supervisor Mentality

Professional software engineering requires moving from “vibe coding”—forgetting the code exists and relying on “vibes”—to a Supervisor Mentality. Developers must treat GenAI like a knowledgeable but unreliable intern. Key rules for this mentality include:

Always Review AI-Generated Code: Every block must be scrutinized as if it were written by an unreliable teammate.
The Explainability Rule: Never commit AI-generated code that you cannot comfortably explain to a colleague.
Assume Subtle Incorrectness: Work from the premise that the AI’s output is subtly buggy or insecure.

This mentality is not anti-AI. It is how experts get leverage from AI. The agent can draft, search, explain, and transform code quickly. The engineer supplies the problem framing, quality bar, domain knowledge, and accountability. If the only value a developer adds is typing “build this,” the developer is replaceable by anyone else who can type the same sentence. The durable value is in specifying the right thing, decomposing it, judging the output, and improving the system afterward.

Advanced Orchestration Techniques

To maximize AI’s usefulness, developers should adopt AI Pair Programming roles. As the Driver, the human writes the code and asks the AI to critique it for performance or security issues. As the Navigator, the human directs the AI to write specific blocks while ensuring they understand every line produced.

Another powerful technique is Test-Driven Generation:

Prompt the AI to generate tests based on a problem description.
Carefully review those tests to ensure they serve as an adequate specification.
Prompt the AI to generate the implementation that passes those tests.
Use a remediation loop by providing the AI with stack traces of any failed tests to increase correctness.

Test-driven generation works because tests give the agent a concrete target and give the human a reviewable contract. The hard part is step 2. If the tests are wrong, incomplete, overfit to examples, or merely duplicate the prompt, the implementation can pass while still failing the real requirement. Watch especially for generated solutions that hard-code the sample inputs and outputs instead of solving the underlying problem.

For larger changes, start with a plan before code:

Ask the agent to inspect only the relevant files and propose a small implementation plan.
Review the plan for architecture, state, edge cases, security, accessibility, and test strategy.
Approve one small task at a time.
Run tests and review the diff after each task.
Refactor deliberately instead of accepting additive code forever.

Good prompt engineering supports this workflow. The most useful prompts are not magic incantations; they expose the context and constraints that a human teammate would need:

Role and quality bar: “Act as a senior software engineer who values maintainability, security, and accessibility.”
Concrete task: “Implement this acceptance criterion in this file; do not change unrelated behavior.”
Relevant context: “This feature belongs to this user story; privacy matters more than performance.”
Explicit steps: “First propose a plan, then wait. After approval, implement, test, and summarize the diff.”
Question prompt: “Before coding, ask me any questions needed to avoid making design assumptions.”
Design-decision prompt: “List the trade-offs between storing the generated SVG and storing the avatar parameters.”
TODO pattern: Put precise TODO comments in the code and ask the agent to fill only those gaps.

Because every model has a finite context window, more context is not always better. Dumping the whole repository into a prompt can bury the important details and trigger “lost in the middle” attention failures. Provide the smallest set of files, constraints, and examples needed for the task. Good architecture helps here too: a well-bounded module is easier for both humans and AI to reason about.

Architecture as an AI Multiplier

Software architecture significantly impacts AI effectiveness. AI’s benefits are amplified in systems with loosely coupled architectures, such as well-defined microservices. Conversely, in tightly coupled “spaghetti code” systems, AI may provide no benefit or even magnify existing dysfunction. By applying Information Hiding and modularity, developers limit the “context window” the AI needs to process, reducing context degradation and leading to more accurate code generation.

What to Delegate, What to Keep

AI shines on tasks that are repetitive, well-specified, and common in the training distribution:

Scaffolding boilerplate that you already know how to write.
Generating first drafts of tests, documentation, examples, and simple refactorings.
Explaining unfamiliar syntax, APIs, compiler errors, or stack traces.
Creating rapid prototypes so users can react to something concrete.
Enumerating edge cases, trade-offs, and review checklists.

AI is much riskier on tasks with complex state, unclear requirements, high stakes, or novel domain constraints:

Security-critical, safety-critical, legal, financial, medical, or accessibility-sensitive code.
Stateful workflows where small rule misunderstandings cascade across the system.
Architecture decisions that require understanding the business, users, and long-term maintenance costs.
Problems you do not yet understand well enough to review.

The boundary changes with your expertise. If you already know how to implement binary search, asking the AI to draft it may save time. If you do not know how an AVL tree works, using AI to skip the learning step makes you a weaker navigator later.

Conclusion: The Future of the Engineer

The future of software engineering belongs to those who can orchestrate AI agents rather than those who simply write code. Essential skills will shift toward requirements engineering, systems thinking, and architecture design—areas where AI currently stumbles because they require domain knowledge and real systems thinking. As the former CEO of GitHub noted, developers who embrace AI are raising the ceiling of what is possible, not just lowering the cost of production. Citing the INVEST criteria for user stories and formal logic for verification will become increasingly vital to “translate ambiguity into structure”, a skill that AI cannot yet automate.

The most important career lesson is not “AI makes homework easier.” It is “AI amplifies the skills you already have.” Strong engineers use AI to attempt more ambitious work, get faster feedback, and expose gaps in their own reasoning. Weak workflows use AI to create an illusion of competence while silently accumulating bugs, security debt, and shallow understanding. The difference is not the model alone; it is the engineering process wrapped around the model.

Practice This

Use the flashcards to retrieve the core concepts without looking, then use the quiz to apply them to realistic engineering decisions. If a quiz explanation surprises you, return to the section above and ask: “What would I do differently the next time an AI agent offers me code?”

Generative AI in Software Engineering Flashcards

Core concepts, productivity trade-offs, skill-formation risks, coding-agent safety, and best practices for using Generative AI in software engineering.

Difficulty: Basic

What does it mean to call an LLM a statistical parrot?

Difficulty: Intermediate

Why is GenAI’s productivity boost (21–50%) smaller than the compiler revolution (10x)?

Difficulty: Basic

Name the three stages of LLM development.

Difficulty: Intermediate

What is the illusion of AI productivity, and how do you avoid being fooled by it?

Difficulty: Intermediate

Why do AI-generated codebases tend to have higher security vulnerability rates?

Difficulty: Basic

What is cognitive offloading, and why is it harmful for junior engineers?

Difficulty: Basic

What is the Supervisor Mentality for working with GenAI?

Difficulty: Intermediate

Compare the Driver and Navigator roles in AI pair programming.

Difficulty: Intermediate

What is Test-Driven Generation (TDG), and what are its five steps?

Difficulty: Advanced

Why does loose coupling amplify AI effectiveness, and tight coupling sabotage it?

Difficulty: Intermediate

Why is AI inference typically non-deterministic, and what does that mean for testing?

Difficulty: Basic

What is an AI hallucination in coding, and why is it especially dangerous?

Difficulty: Advanced

Why do AI-augmented codebases tend to show rising code complexity and static-analysis warnings?

Difficulty: Intermediate

Why does the leverage of an engineer’s work shift from producing code to specifying and verifying it in the GenAI era?

Difficulty: Advanced

Why is prompt and context engineering considered a load-bearing engineering skill rather than a UI trick?

Difficulty: Basic

What is vibe coding, and what is the professional alternative?

Difficulty: Basic

What does an AI coding agent add on top of a plain chatbot?

Difficulty: Advanced

What is a prompt injection risk for coding agents?

Difficulty: Intermediate

Why are skill files or project rule files useful for AI-assisted development?

Difficulty: Intermediate

Why should large AI coding tasks start with a planning step before any code is generated?

Difficulty: Intermediate

Why is dumping the entire repository into an AI context often worse than selecting relevant files?

Difficulty: Intermediate

What is a design-decision prompt, and why is it useful?

Difficulty: Intermediate

Which tasks are good candidates for AI assistance once you already understand the domain?

Difficulty: Intermediate

Which tasks should you be cautious about delegating to AI?

Difficulty: Advanced

What is the overfitting failure mode in Test-Driven Generation?

Generative AI in Software Engineering Quiz

Apply GenAI judgment across Bloom levels, with extra emphasis on analyzing, evaluating, and creating safe AI-assisted engineering workflows.

Difficulty: Intermediate

Compilers (1960s) delivered a 10x productivity gain. Current research estimates GenAI delivers 21%–50%. What is the most accurate explanation for the gap?

Compilers were vastly slower than LLMs (compilation took hours on 1960s hardware). Execution speed of the tool is not what produces engineering productivity. The compiler’s leverage came from what it automated, not how fast it ran.

The 21–50% range is the consistent finding across multiple controlled studies — not a measurement artifact. Treating it as undercounted overstates current AI capability and underestimates the work that essential complexity still demands.

Compilers eliminated whole categories of repetitive translation work that previously consumed half a developer’s day. GenAI’s reduction is real but smaller in scope. The asymmetry is well-documented, not marketing.

Correct Answer:

Difficulty: Intermediate

A developer says “Copilot wrote the whole feature in 5 minutes — I’m so much more productive!” Two days later they’re still debugging it and have shipped a security vulnerability. Which trap have they fallen into?

Cognitive offloading is a separate trap — it concerns skill formation, not the productivity illusion specifically. The pattern described is about misattributing speed to productivity, then paying the debt downstream.

Hallucination is one cause of bugs in AI output, but the framing of the question is about how ‘fast’ the generation felt vs how slow the end-to-end work was. The illusion is a measurement error, not a single defect type.

Premature optimization is unrelated — the issue isn’t over-engineering, it’s that the generated code is subtly broken and the bug-tail is long.

Correct Answer:

Difficulty: Intermediate

Two computer-science students use a chatbot to learn linked lists. Student A pastes the assignment prompt and copies the answer. Student B asks the chatbot to explain why a tail pointer matters, then implements it themselves. Six months later, which is most likely to struggle on the data-structures exam, and why?

Time-on-task with active engagement is what builds long-term memory. Student B’s extra time was productive struggle, the strongest predictor of durable learning.

Equal performance would mean cognitive engagement has no effect on learning — which contradicts decades of cognitive-science research (effortful retrieval, generation effect, desirable difficulties).

Subscription tier is irrelevant. The difference is how the AI was used, not which version answered.

Correct Answer:

Difficulty: Intermediate

Which of these are valid items in the Supervisor Mentality for working with GenAI? Select all that apply.

AI output looks polished even when wrong. Every block needs review at the same scrutiny a junior teammate’s code would receive — same defect rate, more confident phrasing.

The explainability rule prevents the team from accumulating code nobody understands. When the bug appears at 3 AM, you’ll need to debug it — being able to explain it is a precondition for being able to fix it.

Roughly 40% of Copilot suggestions in security-sensitive scenarios have been found to contain vulnerabilities, and AI fluently produces plausible-but-wrong patterns it pattern-matched from training data. Defaulting to “subtly broken until proven otherwise” changes review quality immediately.

Reading more code does not produce better judgment. AI lacks domain context, system-specific constraints, and accountability — all of which experienced human teammates bring. Trusting it more is the inversion of the right calibration.

Capable but unreliable is the right mental model: useful for first drafts, dangerous when given final authority. The same trust calibration you’d extend to a smart intern: review, verify, don’t auto-merge.

Correct Answers:

Difficulty: Intermediate

Your team adopts Test-Driven Generation. Walk through the correct sequence.

Reversing the order destroys the entire benefit: tests written for the existing implementation just rubber-stamp it instead of constraining it. This is the textbook TDD anti-pattern, AI version.

Tests that ‘defeat’ code is adversarial security testing, not TDG. The point of TDG is to use generated tests as a specification the implementation must satisfy.

Single-shot prompts give the AI no feedback loop to correct itself, and the developer no opportunity to verify the tests before committing to them as the spec. Throughput is fast, defect rate is high.

Correct Answer:

Difficulty: Advanced

Two teams adopt the same AI coding assistant. Team A’s codebase is a tightly coupled monolith (“spaghetti”); Team B’s is a set of well-bounded microservices with clean interfaces. Both apply AI to similar tasks. Why does Team B see substantially larger productivity gains?

Same assistant, similar tasks — the structural difference between codebases is the variable, not prompt skill. Even strong prompt engineering on a spaghetti codebase will run into context-window limits and hidden coupling.

Microservices can be written in any language; many are in the same languages as monoliths. The benefit comes from modularity, not language choice.

Attributing the difference to staff skill ignores the architectural variable explicitly described. The same engineers in either codebase would see the same architecture-mediated effect.

Correct Answer:

Difficulty: Basic

An LLM confidently produces this line in a Python script: import datafetcher_v2 as dfv2. The library does not exist. What is this called, and why does it happen?

Python has no compile step — a fabricated module fails with ModuleNotFoundError at run time. Linters can flag unresolved imports, but the root cause is the model inventing a library, not a translation error a compiler would catch.

Some hallucinations are references to deleted libraries, but most are fabricated names that never existed. The mechanism is the same — token prediction without verification — but framing it as ‘old version’ understates the breadth of the problem.

The model has no network connection during inference. Hallucination is a property of the model’s generation process, not of any external lookup.

Correct Answer:

Difficulty: Basic

AI pair programming distinguishes a Driver mode and a Navigator mode for the human. Which role assignment is correct?

Letting the AI fully drive while a human reviews after is the vibe-coding anti-pattern the SEBook explicitly warns against. The human’s role in both roles is to retain understanding and accountability for every line shipped.

AI handling all decisions removes engineering judgment from the loop and abandons the explainability rule. Pair programming with AI is collaborative, not delegated.

The roles are deliberate and well-defined — they describe different distributions of writing vs reviewing work between human and AI, each appropriate in different situations.

Correct Answer:

Difficulty: Advanced

Industry analysis has reported that codebases using AI coding assistants had a noticeable rise in code complexity and static-analysis warnings relative to pre-AI baselines. Assume the finding generalizes. What is the architectural risk?

Proportional growth would not produce per-file or per-function complexity rises — the metrics cited normalize for size. The rise is in complexity-per-unit-code, not just total lines.

Mainstream static analyzers handle the same languages and constructs whether code is human- or AI-written. The “new paradigms” framing tries to attribute the gap to tool blind spots; the gap is in the code, not the analyzer.

Tests are typically excluded or analyzed separately. Even if included, the complexity-per-function metric doesn’t credit tests as warnings; the increase is in production code structure.

Correct Answer:

Difficulty: Basic

A senior architect predicts: “The future belongs to engineers who can orchestrate AI agents, not just write code.” What underlying skills does that prediction imply will become more valuable, and which less?

Typing speed and syntax memorization are exactly the work AI is best at automating. Predicting they will become more valuable inverts the trend.

Equal valuation would mean the skill mix is unchanged, which contradicts every workflow analysis from the past three years. The shift is real and one-directional toward specification, judgment, and verification.

Studies show AI is best as a force multiplier, weakest at autonomous end-to-end engineering. Domain knowledge, real systems thinking, accountability, and the ability to translate ambiguity into structure remain irreplaceable.

Correct Answer:

Difficulty: Advanced

An AI coding agent reads a blog post while debugging your build and then asks permission to run a shell command you do not recognize. What is the most responsible response?

Finding a command on the web is not evidence that it is safe. A malicious page can plant instructions for agents to copy, so the human must inspect the command and source before approving it.

The lesson is not “never use agents.” The lesson is that tool access raises the supervision bar: inspect commands, bound permissions, and keep the human accountable.

Model confidence is not a security control. The right check is whether the human understands the command’s effects and whether the command is necessary for the task.

Correct Answer:

Difficulty: Intermediate

Why do project-level skill files or rule files improve AI coding-agent results?

Skill files improve context, but they do not make an unsound, non-deterministic model sound or deterministic.

Rule files reduce omissions; they do not prove the output is correct. The human still reviews, tests, and owns the resulting code.

Rules are useful only when combined with repository context. They tell the agent how to work here; they do not replace reading the relevant files.

Correct Answer:

Difficulty: Advanced

You want an agent to implement a stateful feature in an unfamiliar codebase. Which workflow best applies the lecture’s advice?

A running UI checks the happy path, not the design, state transitions, security, or maintainability. Large one-shot prompts also make it harder to locate where the agent made a bad assumption.

Planning helps, but it does not replace executable verification. Stateful code needs tests because the hard part is often the interaction among cases.

The agent can propose architecture, but the human must judge whether it fits the domain, existing system, and long-term maintenance constraints.

Correct Answer:

Difficulty: Intermediate

Why is “read the entire repository before coding” often a bad instruction for an AI agent?

Agents can read text files. The issue is not whether text can be read, but whether the right text stays salient inside the model’s limited context.

Speed is not the core problem. A slower prompt can still be worthwhile if it provides the relevant context; the failure is low-signal context, not context itself.

Reading files does not prevent editing. It can simply crowd the context window with details unrelated to the task.

Correct Answer:

Difficulty: Intermediate

Which tasks are especially well-suited for AI assistance once the human already understands the domain? Select all that apply.

Boilerplate is a strong AI use case when the human can review the pattern and spot deviations.

High-stakes architecture decisions require domain understanding, trade-off judgment, and accountability. AI can help list trade-offs, but it should not make the final decision unreviewed.

Explanation is one of the safest high-value uses: it supports conceptual inquiry while keeping the human responsible for applying the idea.

Prototypes are useful because they make requirements concrete. They still need engineering review before becoming production code.

This is cognitive offloading. It may finish the assignment, but it prevents the student from building the schema needed to review or debug similar code later.

Correct Answers:

Difficulty: Intermediate

A team adds a hero avatar customizer. A student suggests storing the entire customized SVG in localStorage; another suggests storing the selected parameters and regenerating the SVG. What is the best engineering lesson from this disagreement?

Shorter is only one possible criterion, and often not the important one. Design decisions need explicit quality attributes, not a vague preference.

Storing the SVG captures the current rendering but may make future migrations, validation, and privacy review harder. Exactness today is not the same as good design over time.

Parameters are often better for evolvability, but “always” overstates it. If regeneration is unstable or the renderer changes incompatibly, raw output might have a defensible role.

Correct Answer:

Difficulty: Advanced

During test-driven generation, the AI writes an implementation that passes every visible example by hard-coding a dictionary from sample inputs to sample outputs. What should the human do?

Passing tests is useful only when the tests specify the behavior rather than merely list examples. A hard-coded lookup table passes examples while failing the real requirement.

The tests revealed a weakness in the specification; removing them loses that signal. Strengthen the tests and inspect the implementation.

Comments do not turn an overfit implementation into a correct one. The problem is behavioral generality, not readability of the wrong approach.

Correct Answer:

Difficulty: Basic

Which sequence correctly names the three main stages discussed for LLM development and use?

That sequence describes a traditional compiled-program toolchain, not the lifecycle of an LLM.

Requirements, design, and maintenance are software-engineering phases. They matter when supervising AI, but they are not the model-development stages.

Tokenization is part of how text is represented, and deployment may follow model development, but this sequence does not capture the training-and-use pipeline from the lecture.

Correct Answer:

Difficulty: Intermediate

A reasoning model shows a polished step-by-step explanation before generating code. Why should that trace still be treated cautiously?

Human-looking explanation is not evidence of human-like cognition. The model can generate plausible reasoning text while still missing the real invariant.

Reasoning mode does not turn a non-deterministic system into a deterministic compiler. The same prompt can still lead to different outputs.

Reasoning traces can help, but executable behavior still needs tests and human review.

Correct Answer:

Difficulty: Intermediate

You want an agent to add a title-only search box to the SEBook home page. Which prompt best applies the lecture’s prompt-engineering advice?

“Make it work well” gives the agent no acceptance criteria and no scope. The feature it ships may not be the one you wanted.

Dumping the whole repo into context buries the constraints that matter and lets the agent decide design questions you should own.

“Modern” and “polished” are taste words, not criteria. New libraries also expand scope; constrain the feature instead.

Correct Answer:

Difficulty: Advanced

An agent adds a “schedule study” feature that looks polished, but the generated quiz links use URLs that do not exist. What should a reviewer infer? Select all that apply.

Link validity is observable behavior. A test or manual check should catch it before the feature ships.

Plausible routes are exactly the kind of thing an LLM can invent when it has not been grounded in the repository’s real routing conventions.

Visual polish is not correctness. A polished broken link is still broken.

Acceptance criteria should describe the behavior that makes the feature valuable. If links are part of the value, their validity belongs in the criteria.

Broken links are user-facing defects. They can strand learners and fail the core purpose of the feature.

Correct Answers:

Difficulty: Advanced

A team wants AI to implement a feature for a public educational site that must meet WCAG 2.2 AA. Which decision best evaluates the risk?

Accessibility is a release constraint, not optional polish. Waiting for a user complaint shifts the cost to people the system is supposed to serve.

AI can help brainstorm checks and draft code, but the workflow must keep human verification and explicit standards in the loop.

Confidence is not evidence. Accessibility requires concrete checks such as semantic markup, keyboard operation, focus visibility, contrast, reflow, and status-message behavior.

Correct Answer:

Difficulty: Intermediate

You are starting a personal project to learn a library you have never used. Which AI-assisted workflow best creates durable skill rather than cognitive offloading?

Studying only after failure makes the AI do the schema-building work. The project may run while the learner’s understanding stays shallow.

Error-paste loops can fix symptoms without building the mental model needed to debug future problems.

The lecture argues against cognitive offloading, not against all AI use. Conceptual inquiry can strengthen learning when the learner remains active.

Correct Answer:

The Role of Generative AI in Modern Software Engineering

How LLMs Work: The “Statistical Parrot”

What Coding Agents Add

Risks: the “Illusion of AI Productivity”

Skill Formation

Best Practices: The Supervisor Mentality

Advanced Orchestration Techniques

Architecture as an AI Multiplier

What to Delegate, What to Keep

Conclusion: The Future of the Engineer

Practice This

Generative AI in Software Engineering Flashcards

Workout Complete!

Generative AI in Software Engineering Quiz

Workout Complete!