Code Comprehension

Enable JavaScript to unlock Galleries, BibTeXs, and the Contact Form.

Dark Mode

Show Highlights

Read Aloud

This chapter explores program comprehension—the cognitive processes developers use to understand existing software. Because developers spend up to 70% of their time reading and comprehending code rather than writing it (Wyrich et al. 2023), optimizing for understandability is paramount. This chapter bridges cognitive psychology, neuro-software engineering, structural metrics, and architectural design to provide a holistic guide to writing brain-friendly software.

Cognitive Effects

Reading code is recognized as the most time-consuming activity in software maintenance, taking up approximately 58% to 70% of a developer’s time (Xia et al. 2018; Wyrich et al. 2023). Code comprehension is an “accidental property” (controlled by the engineer) rather than an “essential property” (dictated by the problem space) (Alawad et al. 2018; Brooks 1987). To understand how to optimize this process, we must look at how the human brain processes software.

Working Memory and Cognitive Load An average human can hold roughly four “chunks” of information in their working memory at a time (Gobet and Clarkson 2004). Exceeding this threshold results in developer confusion, bugs, and mental fatigue (Wondrasek 2025). Cognitive Load Theory (CLT) categorizes this mental effort into three buckets (Sweller 1988; Wondrasek 2025):

Intrinsic Load: The unavoidable mental effort required to solve the core domain problem or algorithm (Wondrasek 2025).
Extraneous Load: The “productivity killer”. This is unnecessary mental overhead caused by poorly presented information, inconsistent naming, or convoluted toolchains (Wondrasek 2025).
Germane Load: The productive mental effort invested in building lasting mental models, such as understanding the architecture through pair programming (Wondrasek 2025).

Neuro Software Engineering (NeuroSE) Moving beyond subjective surveys, modern research utilizes physiological metrics (EEG, fMRI, eye-tracking) to objectively measure mental effort (Gao et al. 2023; Peitek et al. 2021). For example, fMRI studies reveal that complex data-flow dependencies heavily activate Broca’s area (BA 44/45) in the brain—the same region used to process complex, nested grammatical sentences in natural language (Peitek et al. 2021).

Mental Models: Bottom-Up vs. Top-Down

Program comprehension—the mental process of understanding an existing software system—is a highly complex cognitive task that consumes a majority of a software engineer’s time (Xia et al. 2018; Wyrich et al. 2023). To navigate this complexity, human cognition relies on mental models capable of supporting mental simulation (Letovsky 1987; Pennington 1987). The application of these models depends largely on a developer’s expertise, the structure of the code, and the presence of contextual clues (Wiedenbeck 1986).

The Bottom-Up Approach (Inductive Sense-Making)

In the bottom-up model, comprehension begins at the lowest, most granular level of abstraction (Fekete and Porkoláb 2020).

Mechanics of Bottom-Up: A developer reads the code statement-by-statement, analyzing the control flow to group localized lines into higher-level abstractions known as chunks (Shneiderman 1980; Ali and Khan 2019). By progressively combining these chunks, the developer slowly builds a systematic view of the program’s overall control flow (Ali and Khan 2019; Fekete and Porkoláb 2020).
Cognitive Limitations: This approach is highly cognitively demanding. The human mind relies on working memory to store these elements, and working memory is strictly limited in capacity (Darcy et al. 2005). Because reading line-by-line requires a developer to hold many variables, call sequences, and logic branches in their head simultaneously, this approach can quickly lead to cognitive overload if the code is deeply nested or highly coupled (Darcy et al. 2005).
When it is used: Developers are often forced into bottom-up comprehension when they lack domain knowledge, when the code is entirely new to them, or when contextual clues are explicitly stripped away (Wyrich et al. 2023; Ali and Khan 2019). It is the primary method used during isolated maintenance tasks where localized changes are required (Pennington 1987).

The Top-Down Approach (Deductive Hypothesis Verification)

The top-down approach flips the cognitive process. Instead of building understanding from the syntax up, the programmer leverages their existing knowledge base (prior programming experience and domain knowledge) to infer what the code does (Brooks 1983; Fekete and Porkoláb 2020).

Mechanics of Top-Down: The developer formulates a mental hypothesis about the system’s purpose (Brooks 1983; Fekete and Porkoláb 2020). They then actively scan the codebase looking for beacons—familiar, recognizable points in the code that act as evidence (Wiedenbeck 1986; Ali and Khan 2019). Beacons can be anything from specific function names and naming conventions to recognizable architectural patterns (Ali and Khan 2019; Fekete and Porkoláb 2020). Based on the presence or absence of these beacons, the developer either verifies or rejects their initial hypothesis (Ali and Khan 2019).
Cognitive Efficiency: Because it utilizes pre-existing schemas stored in long-term memory, the top-down approach bypasses the strict limits of working memory (Rumelhart 1980; Darcy et al. 2005). It is a vastly more efficient way to navigate a codebase, provided the developer has the requisite expertise and the code contains reliable, recognizable beacons (Wiedenbeck 1986; Fekete and Porkoláb 2020).

In reality, modern software engineering rarely relies on a single approach. Successful developers employ an Integrated Meta-Model that fluidly combines both top-down and bottom-up strategies (von Mayrhauser and Vans 1995; Fekete and Porkoláb 2020).

First formalized by Von Mayrhauser and Vans (von Mayrhauser and Vans 1995), the integrated model consists of four interrelated components (Ali and Khan 2019; Fekete and Porkoláb 2020):

The Situational Model: A high-level, abstract representation of the system’s functions (von Mayrhauser and Vans 1995).
The Program Model: The low-level, control-flow abstraction built by chunking code (von Mayrhauser and Vans 1995).
The Top-Down Domain Model: The developer’s understanding of the business or problem domain (von Mayrhauser and Vans 1995).
The Knowledge Base: The programmer’s personal repository of experience (Ali and Khan 2019).

Developers navigate between these models using specific strategies, such as browsing support (scrolling up and down to link beacons to code chunks) and search strategies (iterative code searches based on their knowledge base) (von Mayrhauser and Vans 1995).

Divergent Perspectives: How Developers Apply Mental Models

While the theories of bottom-up and top-down comprehension are well established, empirical studies reveal divergent behaviors in how different programmers apply them:

Systematic vs. Opportunistic Tracing: When attempting to build a control-flow abstraction (a bottom-up task), developers display divergent strategies. Some developers use a systematic approach, reading the code line-by-line to build a complete mental representation before making a change (Arisholm 2001). Others use an opportunistic approach (or “as-needed” strategy), studying code only when necessary, guided by clues and hypotheses to minimize the amount of code they must actually read (Koenemann and Robertson 1991; Arisholm 2001). Studies show that systematic programmers struggle significantly more when dealing with deeply nested, highly modular architectures, as the constant jumping between files exhausts their working memory (Arisholm 2001).
Novice vs. Expert Schemas: The size and quality of a “chunk” varies wildly depending on a developer’s expertise. Experts do not necessarily possess more schemas than novices; they possess larger, more interrelated schemas created through a highly automated chunking process (Kolfschoten et al. 2011). While novices structure their mental models based on surface-level similarities, experts categorize their knowledge based on solution models (Kolfschoten et al. 2011). Consequently, expert mental representations demonstrate a superior extent, depth, and level of detail, allowing them to rapidly map top-down hypotheses to bottom-up implementations (Björklund 2013).

Metrics and Perception

Historically, the industry relied on structural metrics like McCabe’s Cyclomatic Complexity (CC) and Halstead’s volume metrics (McCabe 1976; Halstead 1977). Modern tools (e.g., SonarSource) have shifted toward Cognitive Complexity, which penalizes deep nesting over simple linear branches to better quantify human effort (Campbell 2017). However, empirical and neuroscientific studies reveal divergent perspectives on metric accuracy (Peitek et al. 2021; Gao et al. 2023):

The Failure of Cyclomatic Complexity: CC treats all branching equally (Gao et al. 2023). It ignores the reality that repeated code constructs (like a switch statement) are much easier for humans to process than deeply nested while loops (Ajami et al. 2017; Jbara and Feitelson 2017).
The “Saturation Effect”: Empirical EEG studies show that modern Cognitive Complexity metrics are critically flawed by scaling linearly and infinitely (Gao et al. 2023). In reality, human perception features a “saturation effect” (Couceiro et al. 2019; Gao et al. 2023). Once code reaches a certain level of complexity, the brain simply recognizes it as “too complex”, and additional logic does not proportionally increase perceived effort (Couceiro et al. 2019; Gao et al. 2023).
Textual Size as a Visual Heuristic: fMRI data suggests that raw code size (Lines of Code and vocabulary size) acts as a preattentive indicator (Peitek et al. 2021). Developers anticipate high cognitive load simply by looking at the size of the block, driving their attention and working memory load before they even read the logic (Peitek et al. 2021; Gao et al. 2023).

Architecture-Code Gap

One of the most persistent challenges in software engineering is the misalignment of perspectives between different roles in the software lifecycle, creating a cognitive obstacle during architecture realization (Rost and Naab 2016).

The Developer’s View (Bottom-Up): Developers operate at the implementation level, working primarily with extensional elements such as classes, packages, interfaces, and specific lines of code (Rost and Naab 2016; Kapto et al. 2016).
The Architect’s View (Top-Down): Architects reason about the system using intensional elements, such as components, layers, design decisions, and architectural constraints (Rost and Naab 2016; Kapto et al. 2016).

Without proper documentation, developers implementing change requests often introduce technical debt by opting for straightforward code-level changes rather than preserving top-down design integrity, leading to architectural erosion (Candela et al. 2016).

Architecture Recovery When dealing with eroded legacy systems, engineers use Software Architecture Recovery to build a top-down understanding from bottom-up data (Belle et al. 2015). Reverse engineering tools (like Bunch or ACDC) transform source code into directed graphs, applying clustering algorithms to maximize intra-module cohesion and minimize inter-module coupling (Belle et al. 2015; Shahbazian et al. 2018). By treating recovery as a constraint-satisfaction problem (e.g., a quadratic assignment problem), these clusters can be mapped into hierarchical layers (Belle et al. 2015).

Automated vs. Human-in-the-Loop While fully automated “Big Bang” remodularization tools exist, they often require thousands of unviable code changes (Candela et al. 2016). A highly recommended alternative is using interactive genetic algorithms (IGAs) or supervised search-based techniques (Candela et al. 2016). These utilize automated tools for basic metrics but keep the human developer “in the loop” to apply top-down domain knowledge (Candela et al. 2016).

Structural Trade-Offs

High cohesion (grouping related logic) and low coupling (minimizing dependencies) are widely considered the gold standard for understandable modules (Candela et al. 2016). However, empirical studies reveal critical trade-offs when pushing these concepts to their limits.

The Danger of Excessive Abstraction While modularity isolates complexity, excessive abstraction can severely damage understandability (Arisholm 2001). A controlled experiment comparing a highly modular “Responsibility-Driven” (RD) design against a monolithic “Mainframe” design found that the RD system required 20-50% more change effort (Arisholm 2001). The highly modular system forced developers to constantly jump between many shallow modules to trace deeply nested interactions, exhausting their working memory (Arisholm 2001). The monolithic system allowed for a localized, linear reading experience (Arisholm 2001). Therefore, decreasing coupling and increasing cohesion may actually increase complexity if taken to an extreme (Candela et al. 2016).

The Design Pattern Paradox Design patterns serve a dual, somewhat paradoxical role in comprehension:

As a High-Level Language: Patterns provide a “theory of the design” (Gamma et al. 1995). Stating that a component uses a “Command Processor” pattern immediately conveys top-down intent and behavioral dynamics to peers without requiring a bottom-up explanation.
As a Source of Cognitive Load: Despite assumptions that patterns improve understandability, empirical studies reveal they often do not (Khomh and Guéhéneuc 2018). Patterns introduce extra layers of abstraction and implicit coupling (e.g., the Observer pattern), which can increase cognitive load and make code harder for maintainers to learn and debug (Mohammed et al. 2016).

Actionable Practices for Top-Down Comprehension

As developers transition from junior roles to senior engineering positions, their approach to code review and design must undergo a fundamental cognitive shift. Novice reviewers naturally default to a bottom-up approach: reading linearly line-by-line, attempting to reconstruct the program’s overall purpose by mentally compiling raw syntax (Gonçalves et al. 2025). While this works for small patches, it rapidly leads to cognitive overload in complex systems (Gonçalves et al. 2025).

To review and write code efficiently at scale, developers must master top-down comprehension—establishing a high-level mental model of the system’s architecture before diving into specific implementation details (Gonçalves et al. 2025). Based on empirical models like Letovsky’s and the Code Review Comprehension Model (CRCM), here are actionable strategies to elevate your approach (Letovsky 1987; Gonçalves et al. 2025).

1. Master the “Orientation Phase” & Hypothesis-Driven Review

Top-down reviewers do not start by looking at code diffs; they begin by building context and mental models (Gonçalves et al. 2025).

Establish the “Why” and “What”: Spend time exclusively seeking the rationale of the change. Read the PR description, issue tracker, and design documents. In Letovsky’s (Letovsky 1987) model, this builds the Specification Layer of your mental model (Letovsky 1987; Gonçalves et al. 2025). If the author hasn’t provided this context, stop and ask for it.
Speculate About the Design: Once you understand the goal, pause. Develop a hypothesis about how you would have solved the problem. Construct a mental representation of the expected ideal implementation (Gonçalves et al. 2025).
Compare and Contrast: When you finally look at the source code, you are no longer trying to figure out what it does from scratch. You are comparing the author’s implementation against your ideal mental model, looking for discrepancies (Gonçalves et al. 2025).

Reading files sequentially as presented by a review tool strips away structural context (Baum et al. 2017). Use opportunistic strategies to navigate complexity (Gonçalves et al. 2025).

Execute a “First Scan”: Eye-tracking studies reveal expert reviewers perform a rapid first scan, touching roughly 80% of the lines to map out the structure, locate function headers, and identify likely “trouble spots” before scrutinizing for bugs (Uwano et al. 2006; Gonçalves et al. 2025).
Shift from Chunking Lines to Finding Beacons: Instead of building understanding by chunking individual lines of code together, actively scan the codebase for beacons (familiar function names, domain conventions) to verify the hypothesis you built during the orientation phase (Brooks 1983; Wiedenbeck 1986).
Utilize Difficulty-Based Reading: Search the PR for the “core” architectural modification. Understand that core first, then follow the data flow outward to peripheral files. Alternatively, use an easy-first approach to quickly approve simple boilerplate files, clearing them from your working memory before tackling complex logic (Gonçalves et al. 2025).
Segment Massive PRs: If a PR is a massive composite change, manually break it down into logical clusters (e.g., database changes, backend logic, frontend UI) and review them as isolated functional units (Gonçalves et al. 2025).
Leverage Dependency Tools: Actively reconstruct structural context using IDE features or static analysis tools to trace caller/callee trees and view object dependencies (Fekete and Porkoláb 2020). Ask top-down reachability questions like, “Does this change break any code elsewhere?”

3. Code-Level Practices for Cognitive Relief

To facilitate top-down thinking for yourself and your team, you must design boundaries that hide bottom-up complexity.

Design Deep Modules: Avoid “Shallow Modules” whose interfaces simply mirror their implementations. Instead, favor “Deep Modules”—encapsulating a massive amount of complex, bottom-up logic behind a very simple, concise, and highly abstracted public interface.
Optimize Identifier Naming: Using full English-word identifiers leads to significantly better comprehension than single letters (Lawrie et al. 2006). Keep the number of domain-information-carrying identifiers to around five to optimize for working memory limits (Gobet and Clarkson 2004).
Comment for “Why”, Not “What”: Code should explain what it does; comments should act as a cognitive guide explaining why an approach was taken and what alternatives were ruled out (Cline 2018).
Make the Architecture Visible: Embed architectural intent directly into the source code through explicit naming conventions, package structures, and directory hierarchies (e.g., grouping classes into presentation or data_access packages) (Ali and Khan 2019; Fekete and Porkoláb 2020).
Program to Interfaces: Rely on abstract interfaces at the root of a class hierarchy rather than concrete implementations. This Dependency Inversion approach allows developers to think about high-level roles rather than bottom-up executions (Martin 2000).
Adopt Hybrid Documentation: Establish a Documentation Roadmap providing a bird’s-eye view of subsystems for top-down navigation (Aguiar and David 2011). Generate task-specific documentation that explicitly maps high-level components to specific source code elements (Rost and Naab 2016).
Practice Architecture-Guided Refactoring: Adopt the “boy scout rule” by integrating top-down improvements into daily feature work to organically evolve modularity and prevent architectural drift, rather than waiting for technical debt sprints (Jeffries 2014; Martini and Bosch 2015).

Interactive Tutorials

Build the strategy hands-on in this two-part interactive tutorial sequence. Do Part 1 first, then wait two or three days before continuing with Part 2 so the second tutorial becomes spaced retrieval instead of immediate repetition.

Code Comprehension Part 1: Hypotheses & Beacons — practice attention focus, hypothesis generation, beacon-based reading, and targeted traces in Python and React.
Code Comprehension Part 2: Tests & Cross-File Flow — use tests, Node.js routes, and multi-file React examples to practice deeper hypothesis testing and interaction mapping.

Practice This

Use the flashcards to retrieve the cognitive models, then use the quiz to apply them to code review, architecture-code alignment, and comprehension trade-offs.

Code Comprehension Flashcards

Cognitive load, mental models, comprehension metrics, architecture-code alignment, and practical strategies for making code easier to understand.

Difficulty: Intermediate

What are the three kinds of cognitive load in code comprehension?

Difficulty: Basic

How do bottom-up and top-down comprehension differ?

Difficulty: Advanced

What are the four components of the integrated meta-model of program comprehension?

Difficulty: Intermediate

What should a reviewer do during the orientation phase before reading a complex diff?

Difficulty: Expert

Why can cyclomatic complexity under-predict human difficulty?

Difficulty: Advanced

What is the architecture-code gap?

Difficulty: Expert

Why can excessive abstraction make code harder to understand?

Difficulty: Intermediate

Name three practices that make code easier to comprehend top-down.

Code Comprehension Quiz

Apply code-comprehension research to realistic reading, review, architecture, and refactoring decisions.

Difficulty: Advanced

A function implements a simple discount rule, but the code uses five levels of nested conditionals, inconsistent variable names, and several helper calls whose names do not reveal their purpose. Which kind of cognitive load is the team mostly creating, and what should they do?

A discount rule may have some intrinsic load, but the stem describes avoidable presentation problems: nesting, names, and opaque helpers. That is the kind of load authors can reduce.

Germane load builds useful mental models. Confusing names and tangled control flow usually consume working memory without improving the reader’s schema.

Saturation describes how perceived complexity can stop scaling linearly, not a reason to abandon improvement. The team still controls several obvious sources of avoidable load.

Correct Answer:

Difficulty: Intermediate

A developer joins a legacy project with no domain knowledge and no reliable naming conventions. They must fix a localized bug in a small parsing function. Which comprehension strategy will they most likely need at first?

Top-down reading depends on prior schemas and reliable beacons. The question removes both, so the reader has little evidence to drive hypotheses.

Architecture recovery can help with system-level erosion, but it is disproportionate for a small localized parser bug.

Design patterns can be beacons, but many functions do not encode a formal pattern. Forcing pattern recognition here would add noise.

Correct Answer:

Difficulty: Advanced

Which artifacts or mental structures belong to the integrated meta-model of program comprehension? Select all that apply.

The situational model is the reader’s high-level understanding of system functions. Omitting it leaves only syntax, not purpose.

The program model captures the low-level implementation view. It is what bottom-up chunking builds.

The top-down domain model is what lets a reader generate expectations before seeing every statement.

The knowledge base supplies schemas and programming plans that make top-down reading possible.

The integrated model is opportunistic, not alphabetical. Expert readers choose routes based on hypotheses, beacons, and difficulty.

Correct Answers:

Difficulty: Advanced

A system’s architecture document describes a clean separation between presentation, domain, and data_access, but the codebase contains a single UserManager class that validates forms, builds SQL, and formats UI strings. What is the strongest diagnosis?

Removing the document hides the mismatch; it does not repair the code. The reader still lacks trustworthy cues about where responsibilities live.

Searchability is not the same as comprehensibility. A single class that mixes responsibilities may be easy to find and still hard to change safely.

Branch count might be one local symptom, but the stem describes responsibility drift across architectural boundaries.

Correct Answer:

Difficulty: Advanced

A senior engineer proposes adding design-pattern names to every class so future readers can understand the system faster. What is the best response?

Pattern names are helpful only when they map to a real, stable structure. Decorative pattern language can send readers down the wrong mental path.

Explicit vocabulary is often useful. Refusing to name real patterns removes a high-value beacon from the codebase.

Cyclomatic complexity is not the deciding factor. The deciding factor is whether the pattern name accurately communicates design intent that clients or maintainers should know.

Correct Answer:

Difficulty: Intermediate

You are assigned a 350-line pull request in an unfamiliar area. Which review sequence best applies the chapter’s comprehension advice?

Linear reading can work for tiny changes, but a 350-line unfamiliar change risks exhausting working memory before the reviewer has a useful specification layer.

CI is evidence, not a substitute for human comprehension. It cannot judge architecture, requirements fit, or missing tests on its own.

Textual size is a useful heuristic, but not the only one. A small concurrency change may be harder than a large rename.

Correct Answer:

Code Comprehension

Cognitive Effects

Mental Models: Bottom-Up vs. Top-Down

The Bottom-Up Approach (Inductive Sense-Making)

The Top-Down Approach (Deductive Hypothesis Verification)

The Integrated Meta-Model (Fluid Navigation)

Divergent Perspectives: How Developers Apply Mental Models

Metrics and Perception

Architecture-Code Gap

Structural Trade-Offs

Actionable Practices for Top-Down Comprehension

1. Master the “Orientation Phase” & Hypothesis-Driven Review

2. Abandon Linear Reading for Strategic Navigation

3. Code-Level Practices for Cognitive Relief

Interactive Tutorials

Practice This

Code Comprehension Flashcards

Workout Complete!

Code Comprehension Quiz

Workout Complete!

References