Architectural Tactics

Enable JavaScript to unlock Galleries, BibTeXs, and the Contact Form.

Dark Mode

Show Highlights

Read Aloud

Architectural Tactics

Architectural styles describe the dominant shape of a system: pipe-and-filter, layered, publish-subscribe, client-server, and so on. Architectural tactics are smaller design moves that an architect uses to improve one quality attribute inside that larger shape.

Think of tactics as the architect’s quality-attribute toolbox. A style says, “organize this subsystem as independent filters connected by pipes.” A tactic says, “add a watchdog and timeout so failed components are detected quickly,” or “add a cache so repeated requests avoid expensive reacquisition.”

Tactics are useful because they make quality attributes concrete. Instead of saying “make it available,” the architect can ask: What failure do we need to detect? How quickly? What recovery action happens after detection? What performance cost are we willing to pay for that detection?

Tactics vs. Styles

Concept	Scope	Example	Main question
Architectural style	Shapes the gross structure of a subsystem or whole system	publish-subscribe, layered, pipe-and-filter	What element types, connector types, and constraints dominate this design?
Architectural tactic	Improves a target quality attribute through a reusable design move	heartbeat, ping-echo, caching, redundancy	Which quality scenario improves, and what qualities does the tactic trade away?

A system usually combines both. A robot might use publish-subscribe as its communication style, then apply heartbeat to detect failed components and caching to avoid repeatedly recomputing expensive map data.

Availability Tactics

Availability is the ability of a system to mask, detect, repair, or recover from faults. Many availability tactics start with the same problem: before a system can recover from a failed component, it has to notice the failure.

Ping-Echo

Goal: detect that a component, process, node, or service has stopped responding before the fault escalates into a visible failure.

Solution: a watchdog periodically sends an asynchronous request, the ping, to each monitored component. A healthy component replies with an echo. If the watchdog does not receive the echo before a timeout, it activates a recovery mechanism, such as restarting the component, routing around it, or starting a replacement instance.

Quality impact:

Promotes availability: the system can detect failed components and trigger recovery.
Inhibits performance: pings and echoes consume network bandwidth, processing cycles, and queue capacity.
Simplifies monitored components: most of the logic lives in the watchdog; a monitored component only needs to answer the ping.

Ping-echo is a good fit when the watchdog controls the monitoring schedule and when the extra request-response traffic is acceptable.

Heartbeat

Goal: detect that a component, process, node, or service has stopped working.

Solution: each monitored component periodically sends a heartbeat message to a watchdog. If the watchdog does not receive a heartbeat before a timeout, it activates recovery.

Quality impact:

Promotes availability: the system can infer failure from silence.
Inhibits performance: heartbeat messages consume resources, though usually fewer messages than ping-echo because there is no request-response pair.
Complicates monitored components: every monitored component needs a heartbeat routine and must keep sending heartbeats even while doing its normal work.

Heartbeat is a good fit when monitored components already have their own control loop, or when reducing monitoring traffic matters more than keeping monitored components simple.

Ping-Echo vs. Heartbeat

Tactic	Who initiates the message?	Message pattern	Main benefit	Main cost
Ping-echo	Watchdog	watchdog ping, component echo	simple monitored components	more messages and centralized monitoring work
Heartbeat	Monitored component	component heartbeat	fewer messages and easy passive monitoring	heartbeat logic inside every monitored component

Both tactics need carefully chosen timeout values. A timeout that is too short creates false positives and unnecessary recovery. A timeout that is too long lets failures remain hidden.

Redundancy

Redundancy improves availability by ensuring that another component can take over when one component fails.

Active redundancy: multiple replicas run at the same time. If one fails, another already-running replica can continue service quickly. This improves recovery time but costs more CPU, memory, and coordination.
Cold spare: a backup component is available but not running the workload until failure occurs. This saves resources but recovery is slower because the spare must be started, warmed up, or synchronized.

Redundancy is rarely enough on its own. The system still needs detection, failover, state synchronization, and tests that prove the recovery path actually works.

Performance Tactic: Caching

Goal: avoid expensive reacquisition or recomputation of a resource.

Solution: store a local copy of a resource in a fast-access cache. When a later request asks for the same resource, the system serves the cached copy instead of asking the slower provider again.

Quality impact:

Promotes performance: repeated requests can avoid slow network calls, database reads, file-system access, or expensive computation.
May improve availability: cached data can sometimes let a system keep serving degraded responses when the source is temporarily unavailable.
Inhibits consistency and modifiability: the system now has to decide when cached data is stale, how invalidation works, and which components are responsible for cache correctness.
Consumes memory or storage: a cache trades space for time.

A good caching requirement names the scenario and the measure. “Use caching” is not a quality requirement. “When the product catalog receives repeated requests for the same item within a 10-minute window, at least 90% of those requests are served from cache and p95 response time stays below 100 ms” is a quality requirement that caching might satisfy.

Choosing a Tactic

Use tactics after the quality attribute scenario is specific enough to judge them. A practical sequence is:

State the quality scenario and measure.
Identify the failure, delay, change, or risk that blocks the measure.
Choose a tactic that directly addresses that blocker.
Name the qualities the tactic will likely inhibit.
Add observability so the team can verify the tactic works in production-like conditions.

For example, a team trying to improve availability might start with this scenario: “If one perception worker crashes while the robot is operating, the system detects the crash within 2 seconds and starts a replacement worker within 5 seconds.” Ping-echo, heartbeat, or process supervision could all be candidate tactics. The right choice depends on the runtime style, the acceptable monitoring traffic, and how much logic the team wants inside each worker.

Tactics do not remove trade-offs. They make trade-offs inspectable.

Architectural Tactics Quiz and Flashcards

Use these flashcards and quiz questions to practice distinguishing tactics from styles, matching tactics to quality scenarios, and naming the costs of ping-echo, heartbeat, redundancy, and caching.

Architectural Tactics Flashcards

Availability and performance tactics, including ping-echo, heartbeat, redundancy, and caching.

Difficulty: Basic

What is an architectural tactic?

Difficulty: Basic

How does a tactic differ from an architectural style?

Difficulty: Basic

Describe the ping-echo availability tactic.

Difficulty: Basic

Describe the heartbeat availability tactic.

Difficulty: Intermediate

Compare ping-echo and heartbeat.

Difficulty: Intermediate

Why do timeout values matter in ping-echo and heartbeat tactics?

Difficulty: Basic

Distinguish active redundancy and cold spare.

Difficulty: Basic

Describe the caching performance tactic.

Difficulty: Intermediate

What quality attributes can caching inhibit?

Difficulty: Advanced

What sequence should an architect follow when choosing a tactic?

Architectural Tactics Quiz

Apply availability and performance tactics to concrete quality-attribute scenarios.

Difficulty: Basic

Which statement best distinguishes an architectural tactic from an architectural style?

The labels are swapped. Styles describe the gross structure (publish-subscribe, layered, pipe-and-filter), and tactics are the smaller quality-attribute moves applied inside that structure.

Tactics are not tied to object-oriented programming. Heartbeat, caching, and redundancy appear in many paradigms and runtimes.

Both styles and tactics can affect many qualities. Caching is a performance tactic, dependency injection is a testability tactic — there is no fixed performance-versus-maintainability split.

Correct Answer:

Difficulty: Basic

A watchdog sends a request every 2 seconds to each worker. Each healthy worker replies immediately. If no reply arrives before timeout, the watchdog restarts the worker. Which tactic is this?

In heartbeat, the monitored component initiates periodic messages. Here the watchdog initiates the check and expects a reply.

A cold spare is a backup component waiting to be activated after failure. It does not describe the failure-detection message pattern.

Caching stores resources to avoid expensive reacquisition. It is unrelated to liveness checks.

Correct Answer:

Difficulty: Basic

Each worker sends an “alive” message to a monitor every 5 seconds. If the monitor stops receiving messages from one worker, it replaces that worker. Which tactic is this, and what is one cost?

Ping-echo is watchdog-initiated. The stem says each worker initiates the periodic “alive” message, so the workers are not passive responders.

Cold spare describes the recovery resource (a standby kept stopped until needed), not how the monitor detects that a worker has failed.

Active redundancy is about running multiple replicas simultaneously so failover is fast. It does not describe the periodic liveness signal in the stem.

Correct Answer:

Difficulty: Intermediate

A team is choosing between ping-echo and heartbeat for 10,000 IoT devices on a low-bandwidth network. Which trade-offs should they consider? Select all that apply.

Ping-echo’s two-message-per-check pattern is exactly what matters at 10,000 devices on a low-bandwidth network — easy to overlook when comparing tactics on a whiteboard.

Heartbeat saves the ping side of the exchange, but the device firmware now owns periodic liveness behavior — this is a real cost to weigh, not a free lunch.

Heartbeat still needs timeouts. The monitor infers failure from silence, but only after a threshold elapses — without one, the monitor could never declare a device dead.

Monitoring is not free under either tactic. Even tiny liveness messages add up at scale and compete with real workload traffic.

Both are availability tactics — both detect failed components so recovery can run. Both also inhibit performance as a cost. There is no clean split where one is a performance tactic and the other an availability tactic.

Correct Answers:

Difficulty: Basic

A checkout service keeps a standby payment worker stopped until the active worker fails. On failure, the standby is started and warmed up. Which redundancy tactic is this?

Active redundancy keeps multiple replicas running at the same time so another can take over quickly. The stem says the standby is stopped until failure, which is the opposite end of the redundancy trade-off.

Ping-echo is a detection tactic — it tells the system that the active worker has failed. The question asks about the recovery resource the system has waiting after detection.

Caching stores resources to avoid expensive reacquisition. It does not describe whether a backup worker is already running or kept stopped until needed.

Correct Answer:

Difficulty: Intermediate

A product catalog receives repeated requests for the same item. A cache serves 92% of repeat requests and keeps p95 latency below 100 ms. Which quality attribute does the tactic primarily improve, and what risk did it introduce?

Caching can sometimes help degraded availability, but the scenario’s measure is latency. Dependency cycles are not the cache-specific risk.

Caching may affect tests, but the scenario is explicitly about latency. Lower bandwidth is usually a benefit, not the central risk.

Portability is about moving across environments. CPU scheduling is not the relevant cache trade-off.

Correct Answer:

Difficulty: Intermediate

A team says, “We should add caching.” What is the best architectural response?

Caching can slow systems down or break semantics if hit rates are low or invalidation is hard.

Caching is not inherently wrong. It is wrong when the consistency cost exceeds the performance benefit for the scenario.

Pipe-and-filter is a style choice and is unrelated to whether a repeated resource should be cached.

Correct Answer:

Difficulty: Advanced

A quality scenario says: “If one perception worker crashes while the robot is operating, the system detects the crash within 2 seconds and starts a replacement worker within 5 seconds.” Which architectural elements or tactics are likely relevant? Select all that apply.

The scenario’s first half is fault detection within 2 seconds, exactly what heartbeat or ping-echo addresses.

Starting a replacement worker requires recovery capacity, commonly redundancy or supervision.

Old heartbeat messages would hide failure. Liveness must be current.

If the team cannot observe detection and recovery times, it cannot verify the quality scenario.

Layer bridging is a layered-style performance trade-off, not a recovery tactic.

Correct Answers:

Architectural Tactics

Architectural Tactics

Tactics vs. Styles

Availability Tactics

Ping-Echo

Heartbeat

Ping-Echo vs. Heartbeat

Redundancy

Performance Tactic: Caching

Choosing a Tactic

Architectural Tactics Quiz and Flashcards

Architectural Tactics Flashcards

Workout Complete!

Architectural Tactics Quiz

Workout Complete!