Python Essentials: Scripting & Automation

1

Hello, Python!

Why this matters

You already write C++ and shell scripts, but Python is the language of choice when you need to get something done fast — process a CSV, call an API, prototype an algorithm. It now ranks among the world’s top 5 most widely used languages, which makes learning it a great investment of your time. Before you can write Python idiomatically, you need a feel for how its execution model differs from what you already know.

🎯 You will learn to

Apply Python’s interpreted execution model by running your first script
Contrast Python’s syntax (no semicolons, no main(), indentation-based) with C++ and Bash

You already write C++ and shell scripts. Here is how Python fits into your toolkit:

Aspect	C++	Bash	Python
Typing	Static (`int x`)	Untyped strings	Dynamic (`x = 5`)
Memory	Manual (`new`/`delete`)	N/A	Garbage-collected
Run with	Compile → `./app`	`bash script.sh`	`python3 script.py`
Strength	Speed, systems code	Glue commands together	Rapid prototyping, data, automation

Python is the language of choice when you need to get something done fast — process a CSV, call an API, write a test harness, or prototype an algorithm before porting it to C++. Very large systems or systems with high performance requirements are often better implemented in statically typed, compiled languages like C++ or Rust to detect bugs earlier and to improve performance. However, Python has significantly grown in popularity in recent years and is now one of the top 5 most widely used programming languages in the world. In some surveys it even ranks number 1. So learning Python is a great investment of your time!

A Note About Errors

You will see many error messages in this tutorial. That is completely normal — every programmer, from beginner to expert, spends a large part of their time reading errors and debugging. Error messages are Python telling you exactly what to fix. Read them carefully; they are your most useful debugging tool. If you are not stuck at least some of the time, you are not learning.

Your First Python Script

Python’s print() is the equivalent of C++’s printf() / cout and Bash’s echo:

# Bash:   echo "Hello, World!"
# C++:    printf("Hello, World!\n");
# Python:
print("Hello, World!")

Notice there are no semicolons, no #include, and no main() function. Python scripts run top-to-bottom like shell scripts.

Predict Before You Run

Before changing anything, look at hello.py and predict: what will Python print when you click Run? Try it now and compare.

Task

Open hello.py. Change the message so it prints:

Hello, CS 35L!

Then click ▶ Run (or press Ctrl+Enter) to execute your script and see the output.

Starter files

hello.py

# Task: Change the message to "Hello, CS 35L!"
print("Hello, World!")

Solution

hello.py

# Task: Change the message to "Hello, CS 35L!"
print("Hello, CS 35L!")

Why this is correct:

print("Hello, CS 35L!"): Python’s print() is the direct equivalent of C++’s printf() / cout and Bash’s echo. The test checks that the exact string "Hello, CS 35L!" appears in the output.
Python scripts run top-to-bottom with no main() function, no #include, and no semicolons — unlike C++. This is the same execution model as a Bash script.
The string is surrounded by double quotes; Python accepts both single and double quotes interchangeably.

Step 1 — Knowledge Check

Min. score: 80%

1. A C++ programmer sees this Python file and says: “This must be wrong — there’s no main() function and no semicolons.” What should you tell them?

Python requires a main() function but it is inferred automatically
Scripting languages (Python, Ruby, Bash) execute the file top-to-bottom — there is no inferred entry point. The if __name__ == '__main__' idiom is a convention for when a file is also imported as a module, not a required entry point. The C/C++/Java requirement of a main() is a language-design choice, not a property of all languages.
Python scripts run top-to-bottom — no main(), no semicolons
Python is actually compiled, it just hides the main() function internally
CPython does compile each script to .pyc bytecode at runtime, but transparently — the programmer never invokes a compiler, no main() is generated, and there is no separate build artifact to ship. C++ requires an explicit, ahead-of-time compile step that produces a native binary; Python does not.
The programmer is correct — Python requires semicolons in production code
There is no production-vs-script Python. Python’s grammar accepts an optional ; to put two statements on one line, but it is never required. A teammate insisting otherwise is recalling a different language (likely C, Java, or JavaScript).

Python is an interpreted scripting language. Like Bash, it executes statements from top to bottom. There is no required main() entry point (though you can simulate one with if __name__ == '__main__': ...). Semicolons are optional in Python and almost never used.

2. Which of the following statements about Python are correct? (select all that apply)

Python is garbage-collected, so you never call delete or free()
Python is dynamically typed — you do not declare variable types
Python must be compiled before running, just like C++
CPython does produce .pyc bytecode, but transparently and at runtime — the programmer never invokes a compiler and there is no separate build artifact to ship. C++ requires an explicit, ahead-of-time compile step that produces a native binary; Python does not.
Python is strong at rapid prototyping, automation, and data processing

Python is an interpreted language — you run it directly with python3 script.py with no separate compile step. Behind the scenes CPython does compile to bytecode (.pyc), but this is invisible to the programmer.

3. In which scenario is Python a better choice than a shell script?

Renaming 10 files using a simple glob pattern
Starting and stopping system services
Parsing a 50-column CSV and writing a report
Chaining three Unix commands with a pipe

Shell scripts excel at chaining Unix commands. Python excels at anything involving data structures, algorithms, or complex logic — like parsing structured data, calling APIs, or processing text with conditionals and loops. The CSV/statistics task is exactly where Python shines over Bash.

4. A teammate is choosing between Python and C++ for a new project. The project needs to process 10 GB of sensor data as fast as possible in real time, with strict latency requirements. Another teammate suggests Python because “it’s easier.” Evaluate both suggestions. Which response best captures the trade-off?

Python is always slower than C++, so C++ is the only correct choice for any project with performance requirements
Real systems are layered: a slow glue layer driving a fast hot path is the standard pattern (NumPy in Python wraps C; PyTorch wraps CUDA). Choosing C++ for every line because the hot path needs it picks the wrong scope for the decision — most code in any project isn’t latency-critical.
Python is fine for real-time processing — modern hardware makes the speed difference between Python and C++ negligible
CPython is roughly 30–100× slower than C++ for tight numeric loops; that gap is intrinsic to the interpreter, not something hardware closes over time. Real-time latency budgets (sub-millisecond) cannot absorb a 30× constant factor regardless of how fast the CPU gets.
C++ for the real-time core; Python for prototyping, config, and visualization
They should use Bash — piping data between Unix tools is faster than either Python or C++ for data processing
Pipes are great for line-oriented text streams, but a 10 GB sensor stream with strict latency needs zero-copy buffer management, fixed-rate scheduling, and concurrency primitives — none of which cat | awk | grep provides. Unix tools shine when the data is text and the timing is loose; that’s the opposite of this scenario.

This is a real-world trade-off. Python’s strength is rapid development; C++’s strength is raw performance. For strict latency requirements, C++ is likely needed for the hot path. But Python is excellent for prototyping, data exploration, and glue code around the performance-critical core. Many real systems combine both.

2

Variables, Types & f-Strings

Why this matters

Python’s dynamic typing eliminates the declaration ceremony you write every day in C++, but it does not make Python “weakly typed” — a confusion that traps C++ programmers and produces hard-to-find bugs. f-strings are the modern, readable way to format output, and they are far more compact than printf or cout << chains.

🎯 You will learn to

Apply Python’s dynamic typing to assign and inspect variables without declarations
Analyze the difference between dynamic typing and weak typing
Create formatted output using f-strings

Bridging Your C++ Mental Model

No Type Declarations

In C++ every variable must be declared with its type:

int   score   = 95;
float gpa     = 3.8;
std::string name = "Alice";

In Python, you just assign. Python infers the type:

score = 95        # int
gpa   = 3.8       # float
name  = "Alice"   # str

You can always check the type at runtime: print(type(score)) → <class 'int'>.

String Quotes: `"..."` and `'...'` Are Interchangeable

In C++, single quotes and double quotes mean different things: 'A' is a char, while "Alice" is a const char* (or std::string). Mixing them up is a compile error.

In Python, single and double quotes are completely interchangeable for strings — there is no char type:

name = "Alice"    # str
name = 'Alice'    # also str — identical result

This is handy when your string itself contains quotes:

msg = "It's easy"          # double quotes avoid escaping the apostrophe
html = '<div class="box">' # single quotes avoid escaping the double quotes

In C++ you’d have to escape: "It\'s easy" or "<div class=\"box\">". Python lets you pick whichever quote style avoids the clash.

Convention: Most Python style guides (including PEP 8) accept either, but recommend picking one and being consistent. You’ll see both in the wild.

⚠️ Dynamic ≠ Weak: Python Still Has Type Rules

Python is dynamically typed (you don’t declare types) but strongly typed (it won’t silently convert between incompatible types). This trips up C++ programmers who assume “no declarations” means “no type errors”:

x = "5" + 3    # TypeError: can only concatenate str to str

Unlike JavaScript (which would give "53"), Python refuses to guess. You must be explicit: int("5") + 3 → 8 or "5" + str(3) → "53".

f-Strings — Like C++’s `printf` but Readable

# C++:    printf("Student: %s, GPA: %.1f\n", name, gpa);
# Python: (note the f prefix and {variable} syntax — same idea as shell's $variable)
print(f"Student: {name}, GPA: {gpa:.1f}")

The f"..." string is called an f-string (formatted string literal). It is Python’s idiomatic way to embed expressions inside strings.

Predict Before You Code

Before writing any code, predict: what will type(3.14) return in Python? What about type("3.14")? Write your predictions down, then verify with print(type(...)) in the editor.

Task

Complete profile.py by replacing the print(...) placeholder with an f-string that produces:

Student: Alice | Year: 2 | Major: Computer Science | GPA: 3.82

Use :.2f inside the braces to format the GPA to two decimal places.

Starter files

profile.py

name  = "Alice"
year  = 2
gpa   = 3.819
major = "Computer Science"

print(f'The type of 3.14 is {type(3.14)}')
print(f'The type of "3.14" is {type("3.14")}')


# TODO: print the line below using a single f-string:
# Student: Alice | Year: 2 | Major: Computer Science | GPA: 3.82
# Hint: format gpa with :.2f inside the braces
print(...)

Solution

profile.py

name  = "Alice"
year  = 2
gpa   = 3.819
major = "Computer Science"

# Using a single f-string with :.2f to format GPA
print(f"Student: {name} | Year: {year} | Major: {major} | GPA: {gpa:.2f}")

Why this is correct:

f"..." prefix: Marks the string as an f-string so {variable} expressions are evaluated and interpolated. The f prefix is analogous to backtick template literals in JavaScript or C++’s printf format specifiers.
{gpa:.2f}: The :.2f format specifier inside the braces tells Python to format gpa as a float with exactly two decimal places. 3.819 rounds to 3.82 in the output, which is what the test checks. The variable still holds the original value 3.819 — the formatting happens only at display time.
Variables, not literals: The test uses AST inspection to ensure you used the variable names (name, year, major, gpa) inside the f-string rather than hard-coding the values as strings.
Dynamic vs. weak typing: Python infers year as int and gpa as float from the assigned values — no type declarations needed. But Python will refuse "Year: " + year (a TypeError) because it won’t silently coerce int to str.

Step 2 — Knowledge Check

Min. score: 80%

1. What does type(3.14) return in Python?

double
float
decimal
number

Python uses float (not C++’s double) for floating-point numbers. You can always use type(x) to inspect a variable’s type at runtime — a handy debugging tool that does not exist in C++ without runtime type info (RTTI).

2. Which of the following correctly uses an f-string to print "Price: €12.50"?

print("Price: €" + price)
print(f"Price: €{price:.2f}")
printf("Price: €%.2f", price)
print("Price: %s" % price)

f-strings use the f"..." prefix and embed expressions with {expr}. Format specifiers like :.2f (two decimal places) go inside the braces. The % operator (option D) is the old Python 2 way; f-strings are the modern idiom.

3. A student runs x = "5" + 3 in Python and gets a TypeError. They say: “But Python is dynamically typed — it should convert automatically!” Analyze their misunderstanding. What is wrong with their reasoning?

They are correct — dynamically typed languages should convert between types automatically, so this is a Python bug
Dynamic and weak are independent properties: dynamic describes WHEN types are checked (runtime); weak describes HOW aggressively the language coerces between them. JavaScript is dynamic + weak ("5" + 3 gives "53"); Python is dynamic + strong on purpose — silent coercion is a famous bug source.
Dynamic ≠ weak: Python checks types at runtime but still refuses to coerce str + int
The error happens because x was already declared as a string elsewhere, and Python does not allow reassignment to a different type
In Python, types live on values, not on names — x = "5" then x = 3 is fine. The error here is purely about the + operator’s two operands at the moment of evaluation. Languages where a variable’s type is fixed at declaration (C/C++/Java) make this rule look stricter than it actually is.
Python only allows concatenation through the explicit concat() function, not the + operator which is reserved for numbers

This is a critical distinction: dynamic typing (types checked at runtime, not compile time) is different from weak typing (implicit type coercion). Python is dynamic and strong. JavaScript is dynamic and weak ("5" + 3 → "53"). C++ is static and strong. Understanding this prevents a whole class of bugs.

4. A student writes x = 42 in Python. What is the type of x?

integer
int
number
float

Python infers the type from the assigned value. Integer literals like 42 become int. Unlike C++, there is no explicit type declaration — Python does this automatically. You can verify with type(x), which returns <class 'int'>.

3

The Indentation Trap

Why this matters

Indentation is the single most common stumbling block when C++ programmers write Python. In C++ indentation is cosmetic; in Python, indentation is the syntax. Wrong indentation produces an IndentationError and confused students who do not know why their previously-fine code is now broken. Confronting this early prevents weeks of frustration.

🎯 You will learn to

Analyze Python code to identify indentation errors caused by negative transfer from C++
Apply correct indentation rules (4 spaces, never mixed with tabs) to fix block structure

⚠️ The Indentation Trap (Negative Transfer from C++)

In C++, indentation is cosmetic — the compiler ignores it, {} defines blocks. In Python, indentation IS the syntax. Wrong indentation = IndentationError.

# C++ programmer's instinct (WRONG in Python):
if score >= 90:
print("A")          # IndentationError: expected an indented block

# Correct Python:
if score >= 90:
    print("A")      # 4 spaces (or 1 tab — never mix them!)

Rule: Use 4 spaces per indent level. Never mix tabs and spaces.

Every block-opening statement (if, elif, else, for, while, def, class, …) ends with a : and the body must be indented one level further.

Task: Fixer Upper

The file grades.py below has two bugs:

An indentation error inside the if block
A type error in one of the print statements

Fix both bugs so the script prints the correct letter grade for each score.

Starter files

grades.py

# Fixer Upper: Find and fix the two bugs in this script.
# Bug 1: Indentation error
# Bug 2: Type error in a print statement

scores = [95, 83, 71, 62, 55]

for score in scores:
    if score >= 90:
    print(f"Score {score}: A")
    elif score >= 80:
        print("Score " + score + ": B")
    elif score >= 60:
        print(f"Score {score}: C")
    else:
        print(f"Score {score}: F")

Solution

grades.py

# Fixer Upper: both bugs fixed
scores = [95, 83, 71, 62, 55]

for score in scores:
    if score >= 90:
        print(f"Score {score}: A")    # Bug 1 fixed: indented 8 spaces
    elif score >= 80:
        print(f"Score {score}: B")    # Bug 2 fixed: f-string instead of + concatenation
    elif score >= 60:
        print(f"Score {score}: C")
    else:
        print(f"Score {score}: F")

Why this is correct:

Bug 1 — indentation error: The original print(f"Score {score}: A") was at the same indentation level as if score >= 90:, which is an IndentationError. The body of an if block must be indented one level further. Python uses indentation (4 spaces) instead of {} to define blocks — this is the most common negative-transfer mistake from C++.
Bug 2 — type error: The original print("Score " + score + ": B") fails with TypeError: can only concatenate str (not "int") to str. Unlike C++, Python will not silently convert score (an int) to a string when concatenating. The fix is to use an f-string: f"Score {score}: B", which handles the conversion automatically.
The tests verify that scores 95, 83, and 71 produce the correct letter grades A, B, and C respectively.

Step 3 — Knowledge Check

Min. score: 80%

1. A student writes the following Python and gets IndentationError: expected an indented block:

for item in inventory:
print(item)

What is the fix?

Add a semicolon at the end of the for line
Add braces: for item in inventory: { print(item) }
Indent print(item) with 4 spaces so it is inside the for block
Use for (item in inventory) C-style syntax

Python uses indentation to define blocks, not braces. Any statement inside a for, if, or def must be indented by at least one consistent level (4 spaces is the convention). Forgetting this is the most common mistake for students coming from C++ or Java.

2. In Python, what marks the start of a new indented block (instead of { in C++)?

An opening brace { — same as C++ and Java
The begin keyword — like Pascal or Ruby
A colon : at the end of the control statement
A semicolon ; followed by increased indentation

Every block-opening statement (if, for, while, def, class, …) ends with a colon :. The body of the block is then indented one level. There are no braces — the indentation alone defines where the block ends. This is unlike C++, Java, or JavaScript.

3. A student accidentally mixes tabs and spaces for indentation in the same Python file. What will happen when they run it?

Python auto-converts tabs to spaces and runs fine
Python 3’s parser refuses to guess whether a tab counts as 1, 4, or 8 spaces, because every guess could change the program’s meaning. Auto-conversion would silently re-bind code to a different block — the language designers chose loud failure over silent miscompilation. (Editor settings can enforce a convention, but the parser still won’t second-guess what’s on disk.)
The code runs but indented blocks are silently skipped
Python halts on any indentation inconsistency at parse time, before any code runs — there is no partial execution where some blocks are skipped. The runtime never sees a half-broken program; it sees the SyntaxError instead.
Python raises a TabError or IndentationError
Only the lines with tabs produce output
Python parses the whole file before running anything; there is no per-line execution where some lines succeed and others fail. The whole script either parses or doesn’t.

Mixing tabs and spaces is a syntax error in Python 3. Python raises TabError: inconsistent use of tabs and spaces in indentation. Always use 4 spaces (the universal Python convention) and configure your editor to insert spaces when you press Tab.

4. A teammate argues: “Python’s indentation-as-syntax is worse than C++’s braces because you can’t see block boundaries as clearly.” Another teammate replies: “It’s better because it forces everyone to format consistently.” Evaluate both claims. Which assessment is most accurate?

The first teammate is right — braces are always superior because you can collapse blocks and see structure without relying on whitespace
The second teammate is right — indentation-as-syntax is strictly better because it eliminates an entire category of bugs with zero tradeoffs
Both: indentation enforces consistency but causes bugs when copy-pasting or mixing tab settings
Neither is right — the choice of block syntax has no practical effect on code quality

This is a genuine trade-off. Python’s indentation rule eliminates entire classes of formatting debates and ensures code looks like what it does. But it introduces risks when copy-pasting from web pages (which may mix tabs/spaces) or when editors silently convert between them. The key practice: configure your editor to insert 4 spaces for Tab.

4

Functions

Why this matters

Functions are how you compose larger programs. Python’s def syntax is briefer than C++’s — no return type, no parameter types required — but the trade-off is that mistakes surface at runtime instead of compile time. Default parameters let you write APIs that are short to call in the common case and explicit when callers need control.

🎯 You will learn to

Apply def syntax to implement Python functions with optional type hints
Create functions with default parameter values and use them with positional or keyword arguments
Contrast Python’s def signature with C++ function signatures

Functions: `def` vs C++ Signatures

In C++ you must specify return types and parameter types:

int add(int a, int b) { return a + b; }

In Python you just use def. Types are optional (you can add them as type hints, but they are not enforced):

# SUB-GOAL: Define the function with its parameters
def add(a, b):
    # SUB-GOAL: Compute and return the result
    return a + b          # No type declarations required

# With optional type hints (documents intent, not enforced at runtime):
def add(a: int, b: int) -> int:
    return a + b

Default Parameters

A parameter can have a default value, used when the caller omits that argument. Default parameters must come after required ones — the same rule as in C++.

def greet(name, greeting="Hello"):
    print(f"{greeting}, {name}!")

greet("Alice")             # → Hello, Alice!   (uses default)
greet("Bob", "Welcome")    # → Welcome, Bob!   (overrides default)

Predict Before You Code

Before writing any code, predict: what does mean([4, 8, 15, 16, 23, 42]) return? Do the mental math, write your answer down, then check it after implementing.

Task

Complete two functions in functions.py:

mean(numbers) — returns the arithmetic mean. Hint: sum() and len() are built-in Python functions — no import needed. Python ships dozens of these (builtins) that are always available, similar to how printf is always available in C via <stdio.h> — except builtins require no #include at all.
label_score(score, threshold=50) — returns "pass" if score >= threshold, otherwise "fail".

What does pass mean? In Python, pass is a do-nothing placeholder that makes an otherwise empty function or block body syntactically valid — the same idea as leaving a C++ function body as { }. The starter code uses pass to mark every spot you need to fill in. Replace every pass with your real implementation — no pass statements should remain in your final solution.

Starter files

functions.py

def mean(numbers):
    """Return the arithmetic mean of a list of numbers."""
    # TODO: implement using sum() and len()
    pass

def label_score(score, threshold=50):
    """Return 'pass' if score >= threshold, else 'fail'."""
    # TODO: implement using an if/else
    pass

# --- Quick self-test ---
data = [4, 8, 15, 16, 23, 42]
print(f"Data: {data}")
print(f"Mean: {mean(data)}")
print(f"Score 75: {label_score(75)}")
print(f"Score 30: {label_score(30)}")
print(f"Score 75 (threshold=80): {label_score(75, 80)}")

Solution

functions.py

def mean(numbers):
    """Return the arithmetic mean of a list of numbers."""
    return sum(numbers) / len(numbers)

def label_score(score, threshold=50):
    """Return 'pass' if score >= threshold, else 'fail'."""
    if score >= threshold:
        return 'pass'
    else:
        return 'fail'

# --- Quick self-test ---
data = [4, 8, 15, 16, 23, 42]
print(f"Data: {data}")
print(f"Mean: {mean(data)}")
print(f"Score 75: {label_score(75)}")
print(f"Score 30: {label_score(30)}")
print(f"Score 75 (threshold=80): {label_score(75, 80)}")

Why this is correct:

mean: sum(numbers) and len(numbers) are Python built-ins. In Python 3, / always performs float division (sum / len returns a float), so mean([4, 8, 15, 16, 23, 42]) returns 18.0, not 18. The test checks == 18.0. This is different from C++ where int / int would be integer division.
label_score with default parameter: threshold=50 is a default parameter — calling label_score(75) uses 50 as the threshold (returns 'pass'), while label_score(75, 80) overrides it with 80 (returns 'fail'). Default parameters must always come after required parameters in the signature.
return is explicit: Unlike C++ (which has undefined behavior for missing return), Python functions without return silently return None. You must write return 'pass' explicitly.
def vs C++: Python’s def requires no return type or parameter types — Python infers types dynamically at runtime.

Step 4 — Knowledge Check

Min. score: 80%

1. What is the output of the following code?

def describe(item, label="unknown"):
    return f"{item} is {label}"

print(describe("gold", "rare"))
print(describe("rock"))

gold is rare then rock is unknown
gold is rare then rock is rare
SyntaxError — default parameters must come before non-default
gold is unknown then rock is unknown

label="unknown" is a default parameter. When describe("rock") is called without a second argument, label falls back to "unknown". When describe("gold", "rare") is called, label is set to "rare".

2. A C++ programmer writes a Python function and is confused that it “doesn’t return anything”:

def double(x):
    x * 2
print(double(5))  # prints None

Analyze the bug. What went wrong, and how does this differ from C++?

Python functions cannot perform multiplication — the * operator only works for string repetition
The function is missing return. In Python, no return means None
double is a reserved word in Python (like C++’s double type), so it shadows the function definition
The function needs a type annotation like def double(x: int) -> int: before Python will return a value

In C++, forgetting return in a non-void function is undefined behavior — the compiler may warn you, but the code might appear to work. In Python, the behavior is defined but surprising: a function without return always returns None. You must explicitly write return x * 2. This is a common mistake when switching languages.

3. What does mean([10, 20]) return if mean is defined as return sum(numbers) / len(numbers)?

15 (an int)
15.0 (a float)
[15] (a list)
TypeError — sum() doesn’t work on lists

In Python 3, / always performs float division: 30 / 2 → 15.0. This differs from C++, where 30 / 2 → 15 (integer division). Python uses // for integer (floor) division: 30 // 2 → 15.

4. (Spaced review — Step 1: Python Execution Model) A teammate is confused: “I wrote a Python file with a helper function and some test prints, but when I import it from another file, all the test prints run too.” What should they use to prevent this?

Move the test prints into a main() function — Python automatically detects and skips main() during import
Wrap them in if __name__ == '__main__': — runs only when executed directly
Use #pragma once at the top of the file to prevent double execution, similar to C++ header guards
Add import guard at the top — this is Python’s built-in mechanism to prevent code from running during import

Python scripts run top-to-bottom (like Bash). When imported, all top-level code executes. if __name__ == '__main__': is the standard Python idiom to separate “run as script” code from “importable” code. C++ doesn’t have this problem because #include only brings in declarations, not executable statements.

5. Arrange the lines to define a function that returns the larger of two numbers, with a default for b. (arrange in order)

Correct order:

def max_of(a, b=0):
if a >= b:
return a
else:
return b

Distractors (not used):

return a, b

The function signature comes first with the default parameter b=0. The if/else block must be indented inside the function. The return statements must be indented inside their respective branches. The distractor return a, b would return a tuple, not the max.

5

Type Hints

Why this matters

Dynamic typing is fast to write but easy to break. Type hints give you a middle ground: contracts that document your intent, that IDEs use for autocomplete, and that mypy enforces statically — without sacrificing Python’s flexibility. They are how serious Python codebases stay maintainable as they grow.

🎯 You will learn to

Apply type hint syntax to annotate Python function parameters and return values
Analyze why Python type hints are checked by external tools (mypy, IDEs) rather than by the interpreter at runtime

A Bridge from C++ Types

In C++, types are part of the contract the compiler enforces:

double mean(std::vector<double> numbers);   // compiler rejects mean("abc")

Python lets you write the same kind of contract — but it is checked by external tools (mypy, IDEs like PyCharm and VS Code/Pyright), not by the Python interpreter. The annotations live on the function but Python itself ignores them at runtime.

def mean(numbers: list[float]) -> float:
    return sum(numbers) / len(numbers)

Read this as: “numbers is annotated as a list of float; this function is annotated to return a float.” Python stores those annotations on mean.__annotations__ but never raises a TypeError from them.

Built-in Generics vs. the `typing` Module

Since Python 3.9, you can use the built-in collections directly as generics — no import needed:

def biggest(scores: list[int]) -> int: ...
def lookup(table: dict[str, int], key: str) -> int: ...

For “could be int or None” (a common case), import from typing:

from typing import Optional

def first_failing(scores: list[int], threshold: int = 50) -> Optional[int]:
    """Return the first failing score, or None if everyone passed."""
    ...

Optional[int] is shorthand for int | None. (Python 3.10+ also supports int | None directly — both work.)

Predict Before You Run

What do you think happens at runtime when this is called with strings?

def add(a: int, b: int) -> int:
    return a + b

add("hello", "world")    # ← what does Python do here?

Predict first — actually write your prediction down or say it aloud — then try it in the editor. Most learners coming from C++ predict that Python rejects the call. Being wrong here is the lesson, not a failure: your C++ instinct is exactly what we are tuning. The answer is illuminating: Python does not raise a TypeError from the annotation. The + between two strings happily concatenates them. The annotation is documentation. The check happens when mypy (or your IDE) reads the source — not when Python runs it.

Task

Complete typed_grades.py. The functions are recycled from Step 4 — your job is to add type hints without changing any of the logic.

Add hints to mean(numbers) so it accepts a list[float] and returns a float.
Add hints to label_score(score, threshold=50) — both parameters are int, return is str. Remember the order: name: type = default.
Add hints to first_failing(scores, threshold=50) — return type is Optional[int] (and don’t forget from typing import Optional).
Predict, then run. At the bottom of the file, uncomment the probe print(mean(['a', 'b'])). Before you run it, write down what you predict happens — does Python raise an error? If so, where does the error come from (the annotation, or the function body)? Then run, and compare to your prediction. This step is the lesson; do not skip it.

Starter files

typed_grades.py

# Goal: add type hints to each function. The behavior is already correct.
# TODO: import Optional from typing (you'll need it for first_failing)

def mean(numbers):                              # TODO: annotate numbers and return type
    return sum(numbers) / len(numbers)

def label_score(score, threshold=50):           # TODO: annotate score, threshold, return type
    if score >= threshold:
        return 'pass'
    return 'fail'

def first_failing(scores, threshold=50):        # TODO: annotate — return type is Optional[int]
    """Return the first score below threshold, or None if all pass."""
    for s in scores:
        if s < threshold:
            return s
    return None

# --- Quick self-test ---
print(f"Mean:           {mean([4, 8, 15, 16, 23, 42])}")
print(f"Label 75:       {label_score(75)}")
print(f"First failing:  {first_failing([90, 80, 30, 70])}")

# --- Step 4 (required): predict, then uncomment ---
# Predict FIRST: does Python raise an error? If so, from where?
# Then uncomment and run, and compare to your prediction.
# print(mean(['a', 'b']))

Solution

typed_grades.py

from typing import Optional

def mean(numbers: list[float]) -> float:
    return sum(numbers) / len(numbers)

def label_score(score: int, threshold: int = 50) -> str:
    if score >= threshold:
        return 'pass'
    return 'fail'

def first_failing(scores: list[int], threshold: int = 50) -> Optional[int]:
    """Return the first score below threshold, or None if all pass."""
    for s in scores:
        if s < threshold:
            return s
    return None

# --- Quick self-test ---
print(f"Mean:           {mean([4, 8, 15, 16, 23, 42])}")
print(f"Label 75:       {label_score(75)}")
print(f"First failing:  {first_failing([90, 80, 30, 70])}")

# Step 4 probe (left commented — uncommenting crashes the file):
# print(mean(['a', 'b']))
#   → TypeError: unsupported operand type(s) for +: 'int' and 'str'
# The error comes from `sum(numbers)`, not from the annotation.
# Python ran the call; mypy would have flagged it at edit-time.

Why this is correct:

numbers: list[float] uses Python 3.9+ built-in generic syntax — no from typing import List needed. The legacy List[float] still works but is verbose.
-> float declares the return type. sum(...) / len(...) always yields a float in Python 3 (/ is float division), so the annotation is honest.
threshold: int = 50 combines a type hint with a default value. The order is name: type = default.
Optional[int] is the idiom for “either an int or None.” It is shorthand for int | None (which also works on Python 3.10+).
Annotations are inert at runtime. Try the commented mean(['a', 'b']) probe — Python does not raise a TypeError from the annotation. The exception comes from inside sum, when + between the initial 0 and a string fails. Tools like mypy would flag the call before you run it.
Annotations are stored, though — you can inspect them: mean.__annotations__ returns something like {'numbers': list[float], 'return': <class 'float'>}.

Step 5 — Knowledge Check

Min. score: 80%

1. What is the most useful type annotation for this function?

def parse_csv_row(line):
    return line.split(',')

def parse_csv_row(line: str) -> list[str]:
def parse_csv_row(line: str) -> str:
split(',') returns a list, not a str. The string is the input here, not the output.
def parse_csv_row(line: List) -> tuple[str]:
split() returns a list, not a tuple. Also, capital-L List would require from typing import List — modern Python uses lowercase list[str].
def parse_csv_row(line) -> list:
Bare list is better than nothing, but list[str] is more informative — it tells static checkers what the element type is, so callers can be flagged for passing the wrong shape.

str.split(',') returns a list of strings. The Pythonic, modern annotation is list[str] — Python 3.9+ built-in generic. Both list[str] and List[str] work, but list[str] needs no import.

2. What happens at runtime when you call add('1', '2') on this function?

def add(a: int, b: int) -> int:
    return a + b

Python raises TypeError: argument 'a' must be int, not str because of the type annotation
Annotations are stored on add.__annotations__ but Python never raises a TypeError from them. That check is the job of mypy or your IDE, not the interpreter.
Python returns the string '12' — annotations are ignored at runtime
Python raises SyntaxError — type annotations only work with literal types like int(1)
Type annotations are part of Python’s syntax (PEP 526 / PEP 3107) — there’s no SyntaxError. The whole point is that they parse fine but are not enforced.
Python silently coerces the strings to integers and returns 3
Python does not auto-coerce here. '1' + '2' is valid string concatenation, so the function happily returns '12'. No coercion, no error from the interpreter.

Annotations are stored but never checked at runtime — Python returns '12' (string concatenation). A static checker like mypy would flag the call before you run it. This is the runtime-vs-static distinction at the heart of type hints.

3. Given two annotated functions:

def add(a: int, b: int) -> int:
    return a + b

def repeat(s: str, n: int) -> str:
    return s * n

For which calls would mypy flag a type error but Python execute without raising? (Select all that apply.) (select all that apply)

add(1, 2) — both 1 and 2 are int
Both arguments match the annotations and the runtime succeeds. Nothing to flag and nothing to raise.
add('a', 'b') — passing strings where ints are annotated
repeat('hi', 3) — both arguments match
Both arguments match the annotations and 'hi' * 3 is valid string repetition. Quiet on both sides.
repeat('hi', '3') — passing a string where int is annotated
mypy would flag this — but Python also raises TypeError: can't multiply sequence by non-int of type 'str'. The runtime error is real, just from *, not from the annotation. The question asks for cases where Python runs without raising; this one isn’t.

Only add('a', 'b') is silently accepted by Python ('a' + 'b' → 'ab') while mypy would flag it as a type error. The other cases either match the annotations (no flag, no error) or fail at runtime for a different reason than the annotation. The lesson: annotations are read by tools, not the interpreter — but the interpreter still has its own opinions about what operations are legal between which types.

4. (Spaced review — Step 4: Functions) Which function signature correctly combines type hints with a default parameter?

def greet(name: str, greeting = 'Hello': str) -> None:
The order is name: type = default — the annotation goes between the name and the equals sign, not after the default value.
def greet(name: str, greeting: str = 'Hello') -> None:
def greet(name = 'World': str, greeting: str) -> None:
Defaulted parameters must come after required ones (same rule as in C++).
def greet(name, greeting): -> None # type: str, str
Modern Python uses inline annotations (PEP 526). The # type: comment style is a legacy form from before annotations were syntax.

The correct order is name: type = default. Defaults must come after required parameters, and the return type goes after -> before the colon.

5. A teammate calls first_n([1.5, 2.5, 3.5], 2) against this annotated function:

def first_n(items: list[int], n: int) -> list[int]:
    return items[:n]

What does the call return at runtime, and what would mypy say at edit-time?

Runtime: returns [1.5, 2.5] (annotation ignored). mypy: error — list[float] is not list[int].
Runtime: TypeError (annotation enforced). mypy: error.
Python does not enforce annotations at runtime — they are documentation that tools read.
Runtime: returns [1.5, 2.5]. mypy: ok (floats can be passed where ints are expected).
mypy treats list[float] and list[int] as distinct (they’re invariant in their type parameter, per PEP 484). It would flag this call as an error.
Runtime: returns [1, 2] (Python silently truncates floats to ints). mypy: ok.
Python doesn’t silently coerce values to match annotations. The list elements stay as floats.

Annotations are checked by tools (mypy, IDEs), not by the Python interpreter. Runtime: the slice works for any indexable, so you get [1.5, 2.5]. mypy: list[float] is not assignable to list[int] — it would flag the call as an error. This is exactly why an external type checker exists.

6

Loops

Why this matters

Iteration is the workhorse of any program. Python’s for is item-based by default — you almost never write for i in range(len(...)) like you would in C++. Mastering enumerate() and range() unlocks idiomatic Python, and avoiding the ** vs ^ and / vs // operator traps will save you hours of confused debugging.

🎯 You will learn to

Apply Python for loops with enumerate() and range() to iterate over collections idiomatically
Analyze the operator differences between Python and C++ (** vs ^, / vs //)

Transfer Note: C++ Range-Based Loops → Python `for`

If you have used modern C++ range-based for (for (auto& x : vec)), Python’s iteration model will feel familiar — Python just makes it the default. The key habit to build: reach for for x in collection first, not for i in range(len(...)).

Tuple Unpacking

Before diving into loops, one quick concept. Python can unpack a pair (or tuple) into separate variables in a single assignment:

pair = (0, "Alice")
i, name = pair        # i = 0, name = "Alice"

This works anywhere Python assigns a value — including in for loops. You will see this pattern immediately below with enumerate().

Python `for` Loops: Iterating Over Collections

C++ for loops typically count indices. Python loops iterate over items directly:

// C++: index-based
for (int i = 0; i < nums.size(); i++) { cout << nums[i]; }

# Python: item-based (preferred)
for num in nums:
    print(num)

# Need the index too? enumerate() yields (index, item) pairs.
# Tuple unpacking splits each pair into two loop variables:
for i, num in enumerate(nums):
    print(f"Index {i}: {num}")

`range()` — Generating Integer Sequences

C++ counting loops translate directly to range() in Python:

# C++: for (int i = 0; i < 5; i++) { ... }
for i in range(5):           # i = 0, 1, 2, 3, 4

# C++: for (int i = 1; i <= 5; i++) { ... }
for i in range(1, 6):        # i = 1, 2, 3, 4, 5  (stop is *exclusive*, like C++'s <)

# C++: for (int i = 0; i < 10; i += 2) { ... }
for i in range(0, 10, 2):    # i = 0, 2, 4, 6, 8  (optional step argument)

Key rule: range(start, stop) always includes start and excludes stop — exactly like C++’s i < stop.

List Operations (`append`, `remove`, `clear`)

Unlike fixed-size C++ arrays, Python lists are dynamic (like std::vector). A few common operations you will use:

# C++: vec.push_back(5);
# Python:
result = []       # 1. Create an empty list
result.append(5)  # 2. Add an item to the end
result.append(10) # result is now [5, 10]

# Removing items:
result.remove(5)  # Removes the first occurrence of 5 (result is now [10])
                  # (Raises ValueError if 5 is not in the list)

result.clear()    # Empties the entire list (result is now [])
                  # C++: vec.clear();

⚠️ Two Operator Traps from C++

Trap 1: ** for exponentiation — not ^

Python uses ** for exponentiation. ^ is bitwise XOR — a common mistake from math notation or C++ (pow()):

** 8    # 256  ✓  (two to the eighth power)
** 0.5  # 3.0  ✓  (square root — works on floats)
^ 8     # 10   ✗  (bitwise XOR — NOT exponentiation!)

Trap 2: / for float division — not integer division

In C++, 7 / 2 → 3 (integer division). In Python 3, / always gives a float:

/ 2     # 3.5   (float division — different from C++!)
// 2    # 3     (integer/floor division — like C++'s /)
% 2     # 1     (modulo — same as C++)

Predict Before You Code

Before implementing: what does running_total([1, 2, 3]) return? Trace through the loop by hand.

Task

Complete loops.py:

running_total(numbers) — returns a new list where each element is the cumulative sum up to that index. Example: running_total([1, 2, 3]) → [1, 3, 6]. Use a for loop.

Starter files

loops.py

def running_total(numbers: list[int]) -> list[int]:
    """Return a list of cumulative sums.
    Example: running_total([1, 2, 3]) == [1, 3, 6]
    """
    result = []
    total = 0
    for n in numbers:
        # TODO: add n to total, then append total to result
        pass
    return result

# --- Quick self-test ---
data = [4, 8, 15, 16, 23, 42]
print(f"Data:          {data}")
print(f"Running total: {running_total(data)}")

# Verify your understanding of / vs //
print(f"7 / 2  = {7 / 2}")    # What do you predict?
print(f"7 // 2 = {7 // 2}")   # What do you predict?

Solution

loops.py

def running_total(numbers: list[int]) -> list[int]:
    """Return a list of cumulative sums.
    Example: running_total([1, 2, 3]) == [1, 3, 6]
    """
    result = []
    total = 0
    for n in numbers:
        total += n          # add n to the running sum
        result.append(total)  # append the current cumulative total
    return result

# --- Quick self-test ---
data = [4, 8, 15, 16, 23, 42]
print(f"Data:          {data}")
print(f"Running total: {running_total(data)}")

# Verify your understanding of / vs //
print(f"7 / 2  = {7 / 2}")    # 3.5
print(f"7 // 2 = {7 // 2}")   # 3

Why this is correct:

for n in numbers: Python’s for loop iterates over items directly — no index variable needed. This is cleaner than C++’s for (int i = 0; i < nums.size(); i++).
total += n: Adds each element to the running sum before appending.
result.append(total): list.append() is Python’s equivalent of std::vector::push_back(). Appending total (not n) gives the cumulative sum at each position.
result = []: Initializes an empty list. total = 0 is the accumulator. Both must be initialized before the loop.
7 / 2 → 3.5: Python 3’s / always gives a float. For C++-style integer division, use // (7 // 2 → 3). This is one of the most common negative-transfer traps from C++.
The test checks running_total([1, 2, 3]) == [1, 3, 6] — after the first iteration: total = 1, second: total = 3, third: total = 6.

Step 6 — Knowledge Check

Min. score: 80%

1. Which of the following iterates over a list and gives both the index and the item?

for i, x in index(nums):
for i, x in enumerate(nums):
for i in nums.keys():
for i in range(nums):

enumerate(iterable) yields (index, value) pairs. Unpacking them into i, x gives you both at once. This is the Pythonic replacement for C++’s index-based for (int i = 0; i < nums.size(); i++).

2. What does list(range(2, 8, 2)) evaluate to?

[2, 4, 6, 8]
[2, 4, 6]
[2, 3, 4, 5, 6, 7]
[2, 8]

range(start, stop, step) generates numbers from start up to but not including stop, counting by step. So range(2, 8, 2) → 2, 4, 6 (8 is excluded because stop is exclusive). This matches C++’s for (int i = 2; i < 8; i += 2).

3. A C++ programmer expects 6 / 2 to return the integer 3 in Python. What actually happens?

It returns the integer 3 — Python division works just like C++
Python 3 deliberately split the operator: / is always float division, // is always floor division — regardless of whether the operands are int or float. C/C++/Java pick the operator’s behavior based on the operand types, but PEP 238 broke that link in Python 3 precisely because too many learners were surprised by integer truncation.
It returns 3.0 — Python’s / always gives a float; use // for integer division
It raises a TypeError because both operands are integers
Python is happy to mix int and float; the result is just promoted to float. The TypeError pattern shows up for non-numeric mixing ("5" + 3), not for arithmetic between two numbers.
It returns the fraction object fractions.Fraction(6, 2) — Python automatically converts integer division to a rational number

In Python 3, / is always float division: 6 / 2 → 3.0. For integer (floor) division like C++, use //: 7 // 2 → 3. This is one of the most common negative-transfer traps from C++.

4. What are the values of a and b after this line?

a, b = (3, 7)

a = (3, 7), b is undefined
a = 3, b = 7
a = 7, b = 3
TypeError — cannot assign a tuple to two variables

Python tuple unpacking splits the right-hand side into individual variables left-to-right: a gets 3, b gets 7. This is the same mechanism that lets for i, x in enumerate(...): split each (index, value) pair into two loop variables.

5. (Spaced review — Step 4: Functions) What does this function return when called as compute(10)?

def compute(x: int, power: int = 2) -> int:
    return x ** power

20 — x * power
100 — 10 ** 2
12 — 10 + 2
TypeError — missing required argument

power=2 is a default parameter, so compute(10) uses power=2. 10 ** 2 is 100 (the ** operator is exponentiation, not multiplication). This combines two concepts: default parameters (Step 4) and the ** operator (this step).

7

List Comprehensions

Why this matters

List comprehensions are one of the features that makes Python Python. They turn five-line for-loops into a single readable expression — once you can read them. Recognizing the [expr for x in iter if cond] pattern is essential for reading any modern Python codebase, and writing them cleanly is what separates idiomatic Python from “Python written like C++”.

🎯 You will learn to

Create list comprehensions with filters using the [expr for x in iter if cond] pattern
Analyze when a comprehension is clearer than the equivalent for-loop and when it is not

Comprehensions Look Strange at First

List comprehensions are one of Python’s most powerful idioms, but their compact syntax can feel cryptic at first. That is normal — everyone reads comprehensions slowly when they first encounter them. After a few exercises they become natural. Do not worry if you need to mentally “unpack” each one into a for-loop to understand it.

Try It First (Productive Failure)

Challenge: Before reading further, try to build the list [1, 4, 9, 16, 25] (the squares of 1 through 5) in a single line of Python. You already know range() and ** from the previous step. Give it your best shot in the editor, then read on.

✨ Python Beacon: List Comprehensions

A list comprehension is a compact way to build a list. Once you recognize the pattern, you will see it everywhere in Python code:

# C++ equivalent:
# std::vector<int> squares;
# for (int i = 1; i <= 5; i++) squares.push_back(i * i);

# Python: one line — combines range() and **
squares = [x**2 for x in range(1, 6)]          # [1, 4, 9, 16, 25]

The general form is:

[expression  for variable in iterable]

Filtering with a Condition

Add an if at the end to keep only items that match:

evens = [x for x in range(10) if x % 2 == 0]   # [0, 2, 4, 6, 8]
nums  = [4, 8, 15, 16, 23, 42]
big   = [x for x in nums if x > 20]             # [23, 42]

Compared to a for-loop

# For-loop version:
result = []
for x in range(10):
    if x % 2 == 0:
        result.append(x)

# List comprehension — same result, one line:
result = [x for x in range(10) if x % 2 == 0]

List comprehensions are preferred when the transformation is simple — they are a recognized Python idiom that experienced readers understand at a glance.

Predict Before You Code

Before writing any code, predict: what does [x**2 for x in range(4)] produce? Write your answer, then verify by typing it into the editor and clicking Run.

Task

Complete two functions in listcomp.py:

above_average(numbers) — returns a list of numbers strictly greater than the mean. Use a list comprehension with a condition.
squares_up_to(n) — returns [1, 4, 9, ..., n**2]. Use range() starting at 1 and ** for exponentiation in a list comprehension.

Starter files

listcomp.py

from functions import mean

def above_average(numbers: list[float]) -> list[float]:
    """Return a list of numbers strictly greater than the mean."""
    avg = mean(numbers)
    # Use a list comprehension with a condition
    pass

def squares_up_to(n: int) -> list[int]:
    """Return [1**2, 2**2, ..., n**2] using range() and **."""
    pass

# --- Quick self-test ---
data = [4, 8, 15, 16, 23, 42]
print(f"Data:          {data}")
print(f"Above average: {above_average(data)}")
print(f"Squares to 5:  {squares_up_to(5)}")

functions.py

def mean(numbers: list[float]) -> float:
    """Return the arithmetic mean of a list of numbers."""
    return sum(numbers) / len(numbers)

def label_score(score: int, threshold: int = 50) -> str:
    """Return 'pass' if score >= threshold, else 'fail'."""
    if score >= threshold:
        return 'pass'
    else:
        return 'fail'

Solution

functions.py

def mean(numbers: list[float]) -> float:
    """Return the arithmetic mean of a list of numbers."""
    return sum(numbers) / len(numbers)

def label_score(score: int, threshold: int = 50) -> str:
    """Return 'pass' if score >= threshold, else 'fail'."""
    if score >= threshold:
        return 'pass'
    else:
        return 'fail'

listcomp.py

from functions import mean

def above_average(numbers: list[float]) -> list[float]:
    """Return a list of numbers strictly greater than the mean."""
    avg = mean(numbers)
    return [x for x in numbers if x > avg]

def squares_up_to(n: int) -> list[int]:
    """Return [1**2, 2**2, ..., n**2] using range() and **."""
    return [x**2 for x in range(1, n + 1)]

# --- Quick self-test ---
data = [4, 8, 15, 16, 23, 42]
print(f"Data:          {data}")
print(f"Above average: {above_average(data)}")
print(f"Squares to 5:  {squares_up_to(5)}")

Why this is correct:

above_average: The general form is [expression for variable in iterable if condition]. The condition x > avg is strictly greater than (not >=), as the test checks above_average([4, 8, 15, 16, 23, 42]) == [23, 42]. The mean is 18.0; only 23 and 42 are strictly above it.
AST check: The test uses Python’s ast module to verify that above_average contains a ListComp node. A manual for loop with append would pass functionally but fail this test — you must use list comprehension syntax.
squares_up_to: range(1, n + 1) generates 1 through n inclusive (stop is exclusive, so we need n + 1). x**2 uses the ** exponentiation operator — not ^ which is bitwise XOR in Python. The test checks squares_up_to(5) == [1, 4, 9, 16, 25].
** operator check: The test also uses AST inspection to confirm squares_up_to contains a BinOp with Pow — you must use **, not math.pow().

Step 7 — Knowledge Check

Min. score: 80%

1. Which list comprehension correctly produces only the odd numbers from 1 to 9?

[x for x in range(1, 10) if x % 2 != 0]
[x if x % 2 != 0 for x in range(1, 10)]
Swapping if before for is a syntax error — the filter condition must come after the iteration: [expr for var in iterable if condition].
[x for x in range(1, 10, 1) if odd(x)]
odd() is not a built-in Python function. Use x % 2 != 0 as the filter condition.
(x for x in range(1, 10) if x % 2 != 0)
Parentheses () create a generator expression, not a list. Use square brackets [] for a list comprehension.

The filter condition goes at the end: [expr for var in iterable if condition].

2. A student rewrites [x**2 for x in range(5)] as a for-loop and gets the same result. Why would a Python programmer prefer the list comprehension?

List comprehensions run faster than for-loops for all input sizes
More readable for simple transformations — a recognized Python idiom
For-loops are deprecated in Python 3
List comprehensions avoid creating a temporary list in memory

List comprehensions are preferred for their readability and conciseness when the transformation is simple. They are a recognized Python beacon — experienced Python readers immediately understand their intent. Performance-wise, they are slightly faster than equivalent for-loops, but readability is the primary motivation.

3. Analyze this code. What does it produce, and could a list comprehension replace it?

result = []
for name in ["Alice", "Bob", "Charlie"]:
    if len(name) > 3:
        result.append(name.upper())

['ALICE', 'CHARLIE'] — yes: [n.upper() for n in names if len(n) > 3]
['Alice', 'Charlie'] — no: comprehensions can’t call methods
['ALICE', 'BOB', 'CHARLIE'] — the if is ignored
['alice', 'charlie'] — upper() converts to lowercase

The loop filters names longer than 3 characters, then converts to uppercase. This is exactly the pattern list comprehensions handle: [expr for var in iterable if condition]. The comprehension equivalent is [name.upper() for name in ["Alice", "Bob", "Charlie"] if len(name) > 3].

4. (Spaced review — Step 2: f-Strings) What does this expression produce?

items = [3, 1, 4]
print(f"Count: {len(items)}, Sum: {sum(items)}")

Count: 3, Sum: 8
Count: [3, 1, 4], Sum: [3, 1, 4]
SyntaxError — you can’t call functions inside f-strings
Count: 3, Sum: 4

f-strings can contain any valid Python expression inside the braces, including function calls like len(items) and sum(items). This is one of their great strengths over C++’s printf — you get the full power of Python expressions inline.

8

Reading Files with open() and with

Why this matters

Reading files is something every program eventually has to do, and resource leaks (forgotten fclose()) are a classic C/C++ bug. Python’s with statement is the language’s elegant answer: a context manager that guarantees cleanup, even on exceptions. The same pattern (RAII in C++ terms) extends to network sockets, locks, and database connections — learning it here pays off everywhere.

🎯 You will learn to

Apply with open() to read files line-by-line in idiomatic Python
Analyze how Python’s context manager pattern relates to C++’s RAII

Python’s “Batteries Included” Philosophy

One of Python’s greatest strengths is its standard library — hundreds of modules ready to use with no installation:

Module	What it does	C++ / Bash equivalent
`os`, `pathlib`	File paths, directory traversal	`<filesystem>` / `ls`, `find`
`sys`	Command-line args, exit codes	`argc/argv` / `$@`
`json`	Parse/write JSON	Requires a library
`re`	Regular expressions	`<regex>` / `grep`
`csv`	Read/write CSV	Manual parsing
`subprocess`	Run shell commands	`system()` / direct Bash

Reading Files with `open()` and `with`

In C++ you fopen, check for NULL, process, and fclose. Python’s with statement handles the close automatically — even if an exception occurs:

# SUB-GOAL: Open the file (with ensures automatic close)
with open("data.txt") as f:
    # SUB-GOAL: Process each line
    for line in f:
        # SUB-GOAL: Clean and display
        print(line.strip())   # .strip() removes the trailing newline

The with statement is Python’s resource management idiom — just like RAII in C++, the file is guaranteed to be closed when the block exits.

Predict Before You Code

Before writing any code, look at data.txt and predict: how many total words does it contain? Then click Run on the starter code and see if your mental count matches.

Task

Complete word_count.py. It should:

Read every line from data.txt
Split each line into words (.split() splits on whitespace)
Count the total number of words across all lines
Print: Total words: <count>

The file data.txt is already created for you.

Starter files

word_count.py

# SUB-GOAL: Initialize the counter
total = 0

# SUB-GOAL: Open and read the file
with open("data.txt") as f:
    for line in f:
        words = line.split()
        # SUB-GOAL: Accumulate the count
        # TODO: add len(words) to total
        pass

# SUB-GOAL: Report the result
# TODO: print "Total words: <count>"
pass

data.txt

the quick brown fox jumps over the lazy dog
pack my box with five dozen big liquor jugs
how vexingly quick daft zebras jump

Solution

word_count.py

# SUB-GOAL: Initialize the counter
total = 0

# SUB-GOAL: Open and read the file
with open("data.txt") as f:
    for line in f:
        words = line.split()
        # SUB-GOAL: Accumulate the count
        total += len(words)

# SUB-GOAL: Report the result
print(f"Total words: {total}")

Why this is correct:

with open("data.txt") as f: The with statement is Python’s context manager for resource management — it guarantees the file is closed when the block exits, even if an exception occurs. This is analogous to RAII in C++. Without with, you must manually call f.close(), and if an exception occurs before that line, the file handle leaks.
for line in f: Files are directly iterable in Python. Each iteration yields one line including the trailing \n. This is memory-efficient — only one line is in memory at a time (important for large files).
line.split() without arguments splits on any whitespace and discards empty strings, so len(words) correctly counts the words per line.
total += len(words): Accumulates the count across all lines. The three lines in data.txt have 9 + 9 + 6 = 24 words. The test checks for 'Total words: 24' in the output.
No line.strip() needed here: split() without arguments already handles the trailing \n by splitting on all whitespace.

Step 8 — Knowledge Check

Min. score: 80%

1. A student writes this code and asks why Python is better than C++ for this task:

with open("log.txt") as f:
    errors = [line for line in f if "ERROR" in line]

What is the best answer?

Python runs faster than C++ for file I/O operations
with, file iteration, and list comprehensions cut this to 3-4 lines vs 20+ in C++
C++ cannot open text files, only binary files
Python files never need to be closed because the OS does it automatically

This is Python’s scripting sweet spot: the with statement handles resource cleanup, files are directly iterable (no manual buffering), and the list comprehension filters in one line. The equivalent C++ code would need ifstream, a while(getline(...)) loop, string search, and explicit close() — easily 20+ lines for robust code.

2. What does line.strip() do when reading lines from a file?

Removes all spaces from the middle of the line
Removes leading and trailing whitespace (including \n)
Converts the line to lowercase
Splits the line into a list of characters

When you read a line from a file, it includes the trailing newline \n. .strip() removes leading and trailing whitespace (spaces, tabs, \n, \r). This is analogous to trimming a C++ std::string.

3. A teammate proposes reading a 2 GB log file with text = f.read() (loading the entire file into memory). Another proposes for line in f: (iterating line by line). Evaluate both approaches. Which is better for a 2 GB file, and why?

Both are identical in behavior and memory usage — Python handles buffering automatically regardless of which method you use
f.read() is better because reading the entire file into one string is faster than processing line by line due to fewer I/O calls
for line in f: is better — constant memory regardless of file size; f.read() loads all 2 GB
Neither works — Python can’t handle files over 1 GB

f.read() loads the entire file into a single string in memory. For a 2 GB file, that’s 2 GB of RAM just for the string. for line in f: streams one line at a time — the memory usage stays constant regardless of file size. This is the same principle as C++’s getline() in a while loop vs reading the whole file with fstream::read().

4. (Spaced review — Step 3: Indentation) What is wrong with this code?

with open("data.txt") as f:
for line in f:
    print(line)

Nothing — the code is correct
The for line must be indented inside with
You need to call f.close() after the loop
open() requires a mode argument like 'r'

The with statement opens an indented block (note the :). Everything inside that block must be indented — including the for loop. This is the same indentation rule from Step 3: a colon : starts a block that must be indented.

5. (Spaced review — Step 2: String Quotes) A student writes this Python code and gets a SyntaxError. Why?

message = 'It's a beautiful day'

Single quotes can’t be used for strings in Python
The apostrophe ends the string — it matches the opening '
Python strings must use double quotes
The string is too long for single quotes

Unlike C++ where 'x' is a char and "x" is a string, Python uses '...' and "..." interchangeably for strings. The fix is either double quotes ("It's a beautiful day") or escaping the apostrophe ('It\'s a beautiful day'). This flexibility lets you pick whichever quote style avoids conflicts with the string’s content.

6. Arrange the lines to read a file and count total words. (arrange in order)

Correct order:

total = 0
with open('data.txt') as f:
for line in f:
total += len(line.split())
print(f'Words: {total}')

Distractors (not used):

f.close()

Initialize the counter first, then open the file with with (no manual close() needed). The for loop must be indented inside with, and the word-counting line inside for. The print is outside both blocks (no indentation) because it runs after the file is processed. The distractor f.close() is unnecessary — with handles closing automatically.

9

Regular Expressions in Python: the re Module

Why this matters

You already know regex from grep and sed. Python’s re module brings that same power inside a script — no subprocess, no fragile shell escaping. Whenever you need to extract structured data from text (log lines, HTML, CSV oddities, error messages), re.findall(), re.search(), and re.sub() are the three tools that solve the vast majority of cases.

🎯 You will learn to

Apply re.findall(), re.search(), and re.sub() to extract, test, and transform text patterns
Apply raw strings (r'...') to write regex patterns without backslash-escaping headaches

From grep to Python

In the RegEx tutorial you used patterns with grep -E and sed. Python’s built-in re module gives you the same power inside a script — no subprocess needed:

Shell	Python `re` equivalent
`grep -E 'pattern' file`	`re.findall(r'pattern', text)`
`grep -c 'pattern' file`	`len(re.findall(r'pattern', text))`
`sed 's/old/new/g' file`	`re.sub(r'old', 'new', text)`
Test if a match exists	`re.search(r'pattern', text)`

The three essential functions

import re

text = "Error 404: page not found. Error 500: server crash."

# SUB-GOAL: Find the first match
m = re.search(r'Error \d+', text)
if m:
    print(m.group())     # "Error 404"

# SUB-GOAL: Find all matches
codes = re.findall(r'\d+', text)
print(codes)             # ['404', '500']

# SUB-GOAL: Replace all matches
clean = re.sub(r'Error \d+', 'ERR', text)
print(clean)             # "ERR: page not found. ERR: server crash."

Raw strings (r'...') are the standard for regex patterns in Python — they prevent Python from interpreting backslashes before re sees them.

Predict Before You Code

Before implementing: what does re.findall(r'\d+', 'boot in 3... 2... 1...') return? Write your prediction, then check in the editor.

Task

Complete log_parser.py. The log file is already loaded as a string for you.

Use re.findall() to collect all timestamps (HH:MM:SS pattern) and print the count
Use re.findall() to collect every ERROR line and print the count
Use re.sub() to redact all IP addresses with "x.x.x.x" and print the redacted log

Starter files

log_parser.py

import re

with open("log.txt") as f:
    text = f.read()

# 1. Extract all timestamps (HH:MM:SS) and print count
# Hint: pattern is r'\d{2}:\d{2}:\d{2}'
# Expected output: Timestamps found: 6

# 2. Extract all ERROR lines and print count
# Hint: pattern is r'ERROR.*'
# Expected output: Errors: 2

# 3. Redact IPv4 addresses and print redacted log
# Hint: pattern is r'\d+\.\d+\.\d+\.\d+'

log.txt

2024-01-15 09:23:11 INFO  Server started on port 8080
2024-01-15 09:23:45 ERROR Connection failed: timeout
2024-01-15 09:24:02 INFO  Request from 192.168.1.42
2024-01-15 09:24:18 WARNING Slow response: 2345ms
2024-01-15 09:24:33 ERROR Disk usage at 94%
2024-01-15 09:24:51 INFO  Request from 10.0.0.7

Solution

log_parser.py

import re

with open("log.txt") as f:
    text = f.read()

# 1. Extract all timestamps (HH:MM:SS) and print count
timestamps = re.findall(r'\d{2}:\d{2}:\d{2}', text)
print(f"Timestamps found: {len(timestamps)}")

# 2. Extract all ERROR lines and print count
errors = re.findall(r'ERROR.*', text)
print(f"Errors: {len(errors)}")

# 3. Redact IPv4 addresses and print redacted log
redacted = re.sub(r'\d+\.\d+\.\d+\.\d+', 'x.x.x.x', text)
print(redacted)

Why this is correct:

re.findall(r'\d{2}:\d{2}:\d{2}', text): \d{2} matches exactly two digits; the colons are literal. This matches all 6 timestamp entries (09:23:11, 09:23:45, etc.). The test checks for 'Timestamps found: 6' in the output.
re.findall(r'ERROR.*', text): ERROR matches the literal word; .* matches everything to the end of the line (. doesn’t match \n by default in Python’s re). This finds the 2 ERROR lines. The test checks for 'Errors: 2'.
re.sub(r'\d+\.\d+\.\d+\.\d+', 'x.x.x.x', text): \d+ matches one or more digits; \. matches a literal dot (unescaped . would match any character). This replaces both 192.168.1.42 and 10.0.0.7 with x.x.x.x. The tests check that x.x.x.x appears in the output and that 192.168.1.42 does not.
Raw strings (r'...'): The r prefix prevents Python from interpreting backslashes before re sees them. r'\d+' passes the two-character sequence \d to the regex engine; without r, '\d' would be just 'd'.
f.read() vs line-by-line: This step uses f.read() to load the entire file as a string, because re.findall() and re.sub() operate on a string. This is fine for small log files; for very large files, you’d process line by line.

Step 9 — Knowledge Check

Min. score: 80%

1. What does re.findall(r'\d+', 'boot in 3... 2... 1...') return?

'3 2 1'
['3', '2', '1']
'321'
3 (just the count)

re.findall() returns a list of strings — one string per non-overlapping match. \d+ matches one or more digit characters, so it finds '3', '2', and '1' independently, returning ['3', '2', '1'].

2. You want to know whether a log line contains an IP address, but you don’t need to extract it. Which function is most appropriate?

re.findall() — it returns all matches, so you can check len() > 0
re.search() — returns a truthy match object or falsy None
re.sub() — it can test for a match while replacing
re.compile() — it tests patterns without needing a string

re.search() is the idiomatic choice for a yes/no existence check:

if re.search(r'\d+\.\d+\.\d+\.\d+', line):
    print("has IP")

It short-circuits on the first match and returns None if there is none — exactly like grep -q in the shell.

3. Why are raw strings (r'\d+') preferred over regular strings ('\\d+') for regex patterns?

Raw strings run faster because Python skips Unicode processing
Raw strings keep backslashes literal, so re receives \d as two characters
The re module only accepts raw strings and will raise a TypeError otherwise
Raw strings automatically escape special regex characters like . and *

In a regular string, '\d' is just 'd' (Python drops the unrecognised escape). In a raw string r'\d', the backslash is preserved literally, so re receives the two-character sequence \d and interprets it as “any digit”. Using raw strings avoids double-escaping ('\\d+') and matches the pattern you see in grep or sed.

4. Analyze this code. What does results contain after execution?

import re
text = "alice@example.com and bob@test.org"
results = re.findall(r'\w+@\w+\.\w+', text)

['alice@example.com', 'bob@test.org']
['alice', 'bob'] — findall only returns the first group
2 — findall returns a count
'alice@example.com' — findall returns the first match as a string

re.findall() returns a list of all non-overlapping matches. The pattern \w+@\w+\.\w+ matches word characters around an @ and ., capturing both email addresses. This combines \w+ (word chars), literal @, and escaped ..

5. (Spaced review — Step 6: List Comprehensions) Which expression produces ['ERROR Connection failed: timeout', 'ERROR Disk usage at 94%'] from a variable lines containing all log lines as a list of strings?

[line for line in lines if 'ERROR' in line]
lines.filter(lambda l: 'ERROR' in l)
[line if 'ERROR' in line for line in lines]
lines.findall('ERROR')

A list comprehension with a filter: [line for line in lines if 'ERROR' in line]. This is the same pattern from Step 6 — [expr for var in iterable if condition]. Note: you could also use re.findall(r'ERROR.*', text) on the full text string (as you just learned), but the list comprehension works on a list of lines.

10

sys.argv & stderr

Why this matters

Real Python scripts do not run from a hard-coded print — they take input from the command line, just like every CLI tool you use daily. sys.argv is the equivalent of argc/argv in C++, and routing error output to sys.stderr lets your scripts compose cleanly with shell pipelines (so users can redirect logs separately from data). Get this right and your scripts behave like proper Unix citizens.

🎯 You will learn to

Apply sys.argv to read and validate command-line arguments in a Python script
Apply sys.stderr (via print(..., file=sys.stderr)) to route error and diagnostic output away from stdout

Command-Line Arguments with `sys.argv`

import sys

# SUB-GOAL: Parse command-line arguments
# sys.argv is a list: ["script.py", "arg1", "arg2", ...]
# C++ equivalent:  argv[0], argv[1], ...

# SUB-GOAL: Validate arguments
if len(sys.argv) < 2:
    print("Usage: python3 script.py <filename>", file=sys.stderr)
    sys.exit(1)              # Exit with non-zero code — just like in C++

# SUB-GOAL: Use the argument
filename = sys.argv[1]

sys.argv[0] is always the script name itself. Extra arguments start at index 1. sys.exit(1) terminates the process with exit code 1 — the same convention as C’s exit(1).

Writing to `stderr` with `print()`

By default print() writes to stdout. Error and diagnostic messages should go to stderr, matching C++’s std::cerr and Bash’s >&2 redirect:

import sys

# C++: std::cout << "Done." << std::endl;
print("Done.")                                    # → stdout

# C++: std::cerr << "Warning: file not found" << std::endl;
print("Warning: file not found", file=sys.stderr) # → stderr

Separating them lets callers redirect each stream independently:

python3 script.py > output.txt 2> errors.txt

Predict Before You Code

Before writing any code, predict: if you run python3 script.py with no arguments, what is sys.argv? Is it an empty list, or does it contain something? Verify by adding print(sys.argv) to a test script.

Task

Write safe_word_count.py from scratch. (Note: type data.txt into the “args: “ input box in the Output panel to add it to the program args to read this file). It should:

If no filename argument is provided (len(sys.argv) < 2), print Error: no filename given to sys.stderr and call sys.exit(1)
Read filename = sys.argv[1] and print Reading: <filename> to sys.stderr
Count words and print Total words: <count> to stdout

Starter files

safe_word_count.py

import sys

# Write the complete script from scratch.
# Requirements:
#   1. Check sys.argv — error to stderr + exit(1) if no filename
#   2. Print "Reading: <filename>" to stderr
#   3. Count words, print "Total words: <count>" to stdout

data.txt

the quick brown fox jumps over the lazy dog
pack my box with five dozen big liquor jugs
how vexingly quick daft zebras jump

Solution

safe_word_count.py

import sys

# 1. Check sys.argv — error to stderr + exit(1) if no filename
if len(sys.argv) < 2:
    print("Error: no filename given", file=sys.stderr)
    sys.exit(1)

# 2. Print "Reading: <filename>" to stderr
filename = sys.argv[1]
print(f"Reading: {filename}", file=sys.stderr)

# 3. Count words, print "Total words: <count>" to stdout
total = 0
with open(filename) as f:
    for line in f:
        total += len(line.split())

print(f"Total words: {total}")

Why this is correct:

sys.argv: A list where index 0 is the script name and index 1 onwards are the arguments. len(sys.argv) < 2 means no filename was given. This mirrors C/C++’s argc < 2 check.
print(..., file=sys.stderr): The file= keyword argument redirects the print to sys.stderr instead of sys.stdout. This is Python’s equivalent of C++’s std::cerr and Bash’s echo "error" >&2. Mixing error messages into stdout would corrupt pipelines.
sys.exit(1): Terminates the process with exit code 1 — the Unix convention for failure. The test captures this as a SystemExit exception.
print(f"Reading: {filename}", file=sys.stderr): Diagnostic/progress messages go to stderr. The test captures stderr separately and checks for 'Reading: data.txt'.
print(f"Total words: {total}"): Normal output goes to stdout (the default). The test checks stdout for 'Total words: 24' when data.txt is passed. The word count logic is identical to Step 7.

Step 10 — Knowledge Check

Min. score: 80%

1. A script is run with python3 myscript.py hello world. What is sys.argv[0]?

"hello"
"world"
"myscript.py"
None

sys.argv[0] is always the script name itself. Arguments start at index 1: sys.argv[1] is "hello", sys.argv[2] is "world". This mirrors C/C++’s argv[0] convention.

2. Why should error messages be written to sys.stderr rather than printed normally?

stderr is faster than stdout in Python’s standard library
stdout can only handle one line at a time, while stderr can buffer
Separating stdout and stderr lets users redirect output and errors independently
Python automatically color-codes stderr messages in red on the terminal

When stdout and stderr are separate streams, users can capture output (> out.txt) and errors (2> err.txt) independently. Mixing error messages into stdout breaks pipelines — a downstream command would receive the error text as data. This is the same reason C++ uses std::cerr and Bash scripts use echo "error" >&2.

3. A script should exit with code 1 and print an error if the user provides no arguments. Evaluate these two approaches. Which is correct Python? Approach A:

import sys
if len(sys.argv) == 1:
    print("Error: no arguments", file=sys.stderr)
    sys.exit(1)

Approach B:

import sys
if len(sys.argv) == 1:
    print("Error: no arguments")
    sys.exit(1)

Both are correct and equivalent
Only A — errors must go to stderr so piped stdout stays clean
Only B is correct — file=sys.stderr is not valid Python syntax
Neither is correct — you should use raise SystemExit(1) instead

Approach A is correct. Error messages should go to sys.stderr so that if the user pipes stdout to another program or file, the error message doesn’t contaminate the data stream. Approach B “works” but violates the Unix convention of separating output from diagnostics.

4. (Spaced review — Step 5: Loops) A student writes this code to print each word with its position number. What is wrong?

words = ["apple", "banana", "cherry"]
for i in words:
    print(f"{i}: {words[i]}")

Nothing is wrong — for i in words gives i as the index, and words[i] retrieves each element correctly
i is the word itself (not an index), so words[i] causes TypeError. Use enumerate(words) to get both index and value
The f-string syntax is incorrect — f-strings cannot contain variable references inside braces, so {i} fails at runtime
The loop should use range(words) instead — passing a list to range() automatically generates valid indices

Python’s for i in words gives you the elements, not indices — this is different from C++’s for (int i = 0; ...). Using words['apple'] causes a TypeError. The Pythonic fix: for i, word in enumerate(words): gives both the index and the value. This is a common negative transfer trap from C++.

5. (Spaced review — Step 7: File I/O) What happens if you forget the with keyword and write f = open("data.txt") instead?

The file opens, but you must call f.close() manually or it leaks
Python raises a SyntaxError — open() can only be used with with
The file opens in read-only mode instead of read-write
Nothing different — with is just syntactic sugar with no functional effect

Without with, the file opens normally but there’s no automatic cleanup. You must manually call f.close(). If an exception occurs between open() and close(), the file handle leaks — exactly the same problem as forgetting fclose() in C. The with statement guarantees cleanup via Python’s context manager protocol.

6. (Spaced review — Step 2: String Quotes) In C++, 'A' is a char and "Alice" is a string — they are different types. What is the equivalent distinction in Python?

Python also distinguishes 'A' as a character and "Alice" as a string
No distinction — Python has no char; '...' and "..." both create str objects
Single quotes create byte strings, double quotes create Unicode strings
Single quotes are for single characters, but Python stores them as length-1 strings

Python has no char type at all. 'A' and "A" are both str objects of length 1. This means you can freely choose whichever quote style avoids escaping — e.g., "It's easy" or '<div class="box">'. This is a key difference from C++ where mixing up 'x' and "x" is a compile error.

11

Capstone: Build a Log Analyzer

Why this matters

You now have all the component skills — functions, file I/O, regex, list comprehensions, and command-line arguments. The hard part of programming is not learning each piece in isolation, but composing them into something that solves a real problem. This capstone is your chance to integrate everything you’ve learned with no scaffolding telling you what to type.

🎯 You will learn to

Create a complete Python script that integrates functions, file I/O, regex, list comprehensions, and command-line arguments
Apply your judgment to structure code without step-by-step guidance

Putting It All Together

You now have all the component skills. This capstone integrates them into a single real-world script — with no scaffolding. You decide how to structure the code.

Task

Build log_analyzer.py — a command-line tool that analyzes a server log. (Note: type server.log into the “args: “ input box in the Output panel to add it to the program args to read this file).

Requirements:

Accept a filename via sys.argv[1]. If missing, print an error to stderr and exit with code 1.
Read the file and extract:
- The total number of log lines
- All unique IP addresses (use re.findall() and a set)
- The number of ERROR lines
- The number of WARNING lines

Print a summary report to stdout in this exact format:

Log Analysis Report
===================
Total lines:    6
Unique IPs:     2
Errors:         2
Warnings:       1

Print Reading: <filename> to stderr at the start.

Hints (only if you’re stuck):

Use a function for each sub-task (e.g., count_by_level(), extract_ips())
Use list comprehensions or re.findall() to filter lines
Use len(set(...)) to count unique items
f-string format specifiers like {value:>8} right-align in 8 characters

Starter files

log_analyzer.py

# Capstone: Build a complete log analyzer.
# No scaffolding — use everything you have learned.
import sys
import re

server.log

2024-01-15 09:23:11 INFO  Server started on port 8080
2024-01-15 09:23:45 ERROR Connection failed: timeout
2024-01-15 09:24:02 INFO  Request from 192.168.1.42
2024-01-15 09:24:18 WARNING Slow response: 2345ms
2024-01-15 09:24:33 ERROR Disk usage at 94%
2024-01-15 09:24:51 INFO  Request from 10.0.0.7

Solution

log_analyzer.py

import sys
import re

def count_by_level(text: str, level: str) -> int:
    """Return the number of lines matching the given log level."""
    return len(re.findall(rf'{level}.*', text))

def extract_ips(text: str) -> set[str]:
    """Return all unique IP addresses found in text."""
    return set(re.findall(r'\d+\.\d+\.\d+\.\d+', text))

def parse_args() -> str:
    """Validate and return the filename argument."""
    if len(sys.argv) < 2:
        print("Error: no filename given", file=sys.stderr)
        sys.exit(1)
    return sys.argv[1]

def read_log(filename: str) -> str:
    """Read and return the full log file as a string."""
    print(f"Reading: {filename}", file=sys.stderr)
    with open(filename) as f:
        return f.read()

def print_report(text: str) -> None:
    """Print the analysis report to stdout."""
    lines = text.strip().splitlines()
    total = len(lines)
    unique_ips = len(extract_ips(text))
    errors = count_by_level(text, 'ERROR')
    warnings = count_by_level(text, 'WARNING')

    print("Log Analysis Report")
    print("===================")
    print(f"Total lines:    {total}")
    print(f"Unique IPs:     {unique_ips}")
    print(f"Errors:         {errors}")
    print(f"Warnings:       {warnings}")

# Main flow
filename = parse_args()
text = read_log(filename)
print_report(text)

Why this is correct:

parse_args(): Validates sys.argv, prints an error to sys.stderr, and calls sys.exit(1) if no argument is given. The test captures SystemExit and verifies the exit code is non-zero.
read_log(): Prints "Reading: <filename>" to sys.stderr (the test captures stderr and checks for this). Returns the full file content as a string for regex processing.
count_by_level(text, 'ERROR'): Uses re.findall(r'ERROR.*', text) — .* matches to end of line. The log has 2 ERROR and 1 WARNING line. Tests use regex re.search(r'[Ee]rror.*2', output) so the label can be Errors: or errors:.
extract_ips(text) with set(...): re.findall() returns all IP matches including duplicates. Wrapping in set() removes duplicates. len(set(...)) is the Pythonic one-liner for counting unique items. The log has 2 unique IPs.
total = len(text.strip().splitlines()): splitlines() splits on newlines and handles the trailing newline correctly (unlike split('\n') which would include an empty string). The log has 6 lines.
Function decomposition: The capstone explicitly rewards a function-based design — each function has a single responsibility, making it testable and readable.
Type hints on every helper: Each function carries the annotation pattern from Step 5 (text: str, -> int, -> set[str], -> None). They don’t change runtime behavior, but mypy would flag a caller that passed the wrong type.

Step 11 — Knowledge Check

Min. score: 80%

1. You need to count the number of unique IP addresses in a log file. You have a list of all IP addresses (with duplicates): ips = ['10.0.0.1', '10.0.0.2', '10.0.0.1']. Which approach is most Pythonic?

Use a for-loop to check each IP against a list of already-seen IPs
len(set(ips)) — convert to a set (which removes duplicates) and count
ips.unique() — lists have a built-in unique method
len(ips) - len(duplicates) — count total minus duplicates

set(ips) creates a set with only unique elements: {'10.0.0.1', '10.0.0.2'}. len(...) gives the count. This is the Pythonic one-liner for “count unique items.” Lists do not have a .unique() method (that’s pandas, not base Python).

2. Evaluate this code for a log analyzer. What is the bug?

import sys, re

filename = sys.argv[1]
with open(filename) as f:
    text = f.read()

errors = re.findall(r'ERROR.*', text)
warnings = re.findall(r'WARNING.*', text)
ips = re.findall(r'\d+\.\d+\.\d+\.\d+', text)

print(f"Errors: {len(errors)}")
print(f"Warnings: {len(warnings)}")
print(f"Unique IPs: {len(ips)}")

The regex patterns are wrong — ERROR.* only matches the literal characters E-R-R-O-R, not the full line
Two bugs: no sys.argv check (IndexError if no arg), and len(ips) counts duplicates
The file is never properly closed because with blocks do not support the .read() method on the file handle
There is no bug — the with statement, regex patterns, and f-string formatting are all correct as written

Two bugs: (1) No argument validation — sys.argv[1] will raise IndexError if the user runs the script without arguments. (2) len(ips) counts all IPs including duplicates; len(set(ips)) would count unique IPs. Good code validates inputs and uses the right data structure for the task.

3. Analyze the design of a log analyzer script. A student puts all logic in one long script with no functions. Another student breaks it into functions: parse_args(), read_log(), count_by_level(), extract_ips(), print_report(). Which approach is better, and why?

The single-script approach is better — functions add unnecessary complexity for a short script
Both are equivalent — it’s purely a matter of style
The function-based approach is better — each function is testable, reusable, and clearly named
The function-based approach is worse — Python functions are slower than inline code

Breaking code into functions improves readability (the main flow reads like an outline), testability (each function can be tested independently), and reusability (functions can be imported by other scripts). This is the same principle as C++’s function decomposition, and it becomes even more important as scripts grow. Even for short scripts, named functions act as documentation.

4. (Spaced review — Step 5: Loops) You need to process a list of log lines and print each line’s number alongside it (starting from 1). Which approach is most Pythonic?

for i in range(len(lines)): print(f'{i+1}: {lines[i]}') — use range to generate index numbers
This works but is unpythonic — range(len(...)) then indexing is the C/Java pattern. enumerate() is the idiomatic Python way to get both index and value.
for n, line in enumerate(lines, 1): print(f'{n}: {line}') — yields (index, value) pairs
i = 0; for line in lines: i += 1; print(f'{i}: {line}') — manually track the counter like in C++
Manually tracking a counter variable is C-style — verbose and error-prone. enumerate() handles this automatically.
for line in lines: print(f'{lines.index(line)+1}: {line}') — use index() to find the position
.index() scans the entire list each iteration — O(n²) overall — and returns the first match, so it breaks silently on duplicate lines. Use enumerate() instead.

enumerate(lines, 1) is the Pythonic way: it yields (index, value) pairs without manual indexing. The start=1 parameter avoids the +1 hack.

5. (Spaced review — Step 8: Regular Expressions) A log analyzer needs to extract all timestamps matching the pattern 2024-01-15 14:30:22 from a log string. Which re call is correct?

re.search(r'\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}', log) — search finds the first match only
re.findall(r'\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}', log) — findall returns a list of all matching strings
re.match(r'\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}', log) — match scans the entire string for all occurrences
re.split(r'\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}', log) — split extracts everything that matches the pattern

re.findall() returns a list of ALL non-overlapping matches — exactly what you need to extract every timestamp. re.search() finds only the first match. re.match() only checks the start of the string. re.split() splits the string AT the pattern, returning the parts between matches, not the matches themselves.

12

Data Classes

Why this matters

Plain Python classes force you to write __init__, __eq__, and __repr__ by hand — boilerplate you would never write in C++ for a simple struct. @dataclass generates that plumbing automatically, frozen=True gives you immutability for free, and @property lets you compute attributes on the fly. Together, these turn data modeling in Python from tedious to elegant.

🎯 You will learn to

Create value-object classes using @dataclass to eliminate __init__ / __eq__ / __repr__ boilerplate
Apply frozen=True to make dataclass instances immutable
Create computed attributes with @property
Evaluate when each tool is the right choice

A Bridge from C++ Structs

In C++ you would describe a 2D point with a struct — a small data holder, often with auto-generated comparison via operator== and printing via operator<<.

struct Point {
    const int x;          // immutable field
    const int y;
    bool operator==(const Point& o) const { return x == o.x && y == o.y; }
};

Plain Python classes work for this, but you have to write all the boilerplate yourself — __init__, __eq__, __repr__. The starter file shows that pain on purpose. Then @dataclass writes those three methods for you.

from dataclasses import dataclass

@dataclass
class Point:
    x: int
    y: int

That tiny declaration is roughly equivalent to a 10-line hand-written class. It uses the type hints from Step 5 (x: int) — that’s how @dataclass knows what fields exist and what their types are.

`frozen=True`: Immutability as a Design Tool

Add frozen=True and instances become immutable — like declaring all fields const in the C++ struct above. Trying to assign raises FrozenInstanceError:

@dataclass(frozen=True)
class Point:
    x: int
    y: int

p = Point(3, 4)
p.x = 99       # ❌ FrozenInstanceError — Point is immutable

Immutability is not just a defensive habit — it makes value-object equality safe (two Point(3, 4) instances compare equal) and makes the instance hashable (so you can put it in a set or use it as a dict key).

Value Objects vs. Reference Objects

The distinction underneath all of this:

A value object is its fields. Two Point(3, 4) instances are interchangeable, the same way two copies of the number 5 are interchangeable. Coordinates, money amounts, dates, RGB colors all fit this pattern. Value objects belong in sets, work as dict keys, and benefit from frozen=True.
A reference object has identity that survives equal contents. A database connection, a logger, a shopping cart, a file handle — even two with identical fields are not interchangeable. Reference objects need a regular class (or a non-frozen dataclass) because their internal state changes over time.

frozen=True is the design tool that says “this is a value object.” Asking “is the answer to a == b based on contents alone?” is the test: yes → value object → frozen dataclass; no → reference object → regular class.

`@property`: a Method That Looks Like an Attribute

What about derived values, like the distance from the origin? You could write a method distance_to_origin(). But callers would have to remember the parens. @property lets you define a method that is read as an attribute — no parens at the call site:

@dataclass(frozen=True)
class Point:
    x: int
    y: int

    @property
    def distance_to_origin(self) -> float:
        return (self.x ** 2 + self.y ** 2) ** 0.5

p = Point(3, 4)
print(p.distance_to_origin)   # 5.0  — no parens!

@property does not make a field private (a common Java/C# habit to drop). It just lets a computation look like an attribute on the outside.

(C++ analogy note: @property has no exact C++ counterpart. The closest is a const getter member function — but C++ would still require parens at the call site. @property erases the parens.)

Predict Before You Run

Once you have made Point frozen, what do you predict happens when this runs?

p = Point(3, 4)
p.x = 99

Predict the exception type, then try it. If you guess AttributeError, you are pattern-matching from the “property without a setter” idiom — close, but frozen=True raises a different exception precisely because it does something different under the hood. Being half-right is informative; the actual exception name reveals the mechanism.

Task

Complete geometry.py. The starter shows PointManual — the hand-written boilerplate version — so you can feel the contrast.

TODO 1. Define Point using @dataclass (no kwargs yet) with two int fields x and y.
TODO 2. Change to @dataclass(frozen=True) so Point is immutable.
TODO 3. Add a @property distance_to_origin that returns (x**2 + y**2) ** 0.5 annotated -> float.
TODO 4 (independent practice). Below Point, define a new frozen dataclass RGB with three int fields r, g, b and a @property as_hex that returns the lowercase 7-character hex string (e.g., RGB(255, 128, 0).as_hex == '#ff8000'). Use the f-string format f'{r:02x}' (Step 2 spaced review) for two-digit hex. No further hints — this one is on you.

Stretch (optional): uncomment the mutation probe at the bottom and observe the FrozenInstanceError.

Starter files

geometry.py

from dataclasses import dataclass

class PointManual:
    """The OLD way: hand-written __init__, __eq__, __repr__."""
    def __init__(self, x, y):
        self.x = x
        self.y = y
    def __eq__(self, other):
        return isinstance(other, PointManual) and self.x == other.x and self.y == other.y
    def __repr__(self):
        return f"PointManual(x={self.x}, y={self.y})"

# TODO 1: Define `Point` using @dataclass with int fields x and y.
# TODO 2: Change to @dataclass(frozen=True) so Point is immutable.
# TODO 3: Add a @property distance_to_origin that returns sqrt(x**2 + y**2).
# TODO 4 (independent practice): Define a frozen dataclass `RGB` with
#         int fields r, g, b and a @property as_hex returning a string
#         like '#ff8000'. Use f'{r:02x}' for two-digit hex.

# --- Quick self-test (uncomment after you finish ALL TODOs above) ---
# a = Point(3, 4)
# b = Point(3, 4)
# print(a == b)                # True (free __eq__)
# print(a)                     # Point(x=3, y=4) (free __repr__)
# print(a.distance_to_origin)  # 5.0 (computed)
# print(RGB(255, 128, 0).as_hex)  # '#ff8000'

# Predict-before-run probe (uncomment after TODO 2):
# a.x = 99                     # What exception type does this raise?

Solution

geometry.py

from dataclasses import dataclass

class PointManual:
    """The OLD way: hand-written __init__, __eq__, __repr__."""
    def __init__(self, x, y):
        self.x = x
        self.y = y
    def __eq__(self, other):
        return isinstance(other, PointManual) and self.x == other.x and self.y == other.y
    def __repr__(self):
        return f"PointManual(x={self.x}, y={self.y})"

@dataclass(frozen=True)
class Point:
    x: int
    y: int

    @property
    def distance_to_origin(self) -> float:
        return (self.x ** 2 + self.y ** 2) ** 0.5

@dataclass(frozen=True)
class RGB:
    r: int
    g: int
    b: int

    @property
    def as_hex(self) -> str:
        return f'#{self.r:02x}{self.g:02x}{self.b:02x}'

# --- Quick self-test ---
a = Point(3, 4)
b = Point(3, 4)
print(a == b)                # True
print(a)                     # Point(x=3, y=4)
print(a.distance_to_origin)  # 5.0
print(RGB(255, 128, 0).as_hex)  # '#ff8000'

Why this is correct:

@dataclass(frozen=True) writes three dunder methods for you: __init__ (so Point(3, 4) works), __eq__ (so Point(3, 4) == Point(3, 4) is True), and __repr__ (so print(p) shows Point(x=3, y=4)). With frozen=True it also makes Point hashable and prevents assignment to fields after construction.
x: int / y: int are not just documentation — @dataclass reads these type hints (Step 5) to figure out what fields the class has. Without the annotations, @dataclass would not know to generate __init__.
frozen=True makes mutation raise FrozenInstanceError. The contract is: “once constructed, a Point value never changes.” This is exactly what makes value-object equality safe and what makes the instance hashable.
@property turns distance_to_origin into a read-as-attribute method. The test reads p.distance_to_origin (no parens). Without @property, that expression would evaluate to a bound method object, not a number — a confusing error mode.
RGB.as_hex reuses every pattern from Point — frozen dataclass, typed int fields, @property returning a typed string. The f-string spec f'{r:02x}' (Step 2 spaced review) formats an int as a two-digit lowercase hex value. Same recipe, different field types and different return type — that’s the point of this independent task.
Mutable defaults are forbidden. If you ever try events: list = [], Python rejects the class with ValueError: mutable default <class 'list'> is not allowed. Use a tuple, or field(default_factory=list) if you really need a list.
PointManual stays in the file as a contrast — it shows what the decorator saved you from writing.

Step 12 — Knowledge Check

Min. score: 80%

1. Which three dunder methods does @dataclass write for you by default (no extra kwargs)?

__init__, __eq__, __repr__
__init__, __del__, __str__
__del__ is for destructors (rare in Python — garbage collection handles it). @dataclass doesn’t write either of these. __repr__, not __str__, is what @dataclass generates.
__init__, __eq__, __hash__
Close — but __hash__ is only generated when you also pass frozen=True (or eq=False). By default, a non-frozen dataclass is unhashable to discourage using mutable objects as dict keys.
__new__, __copy__, __format__
These aren’t related. @dataclass focuses on the standard data-holder boilerplate: construction, equality, and string representation.

@dataclass writes __init__ (so you can write Point(3, 4)), __eq__ (structural equality based on fields), and __repr__ (a readable string like Point(x=3, y=4)). __hash__ is generated only with frozen=True (or eq=False).

2. Given:

@dataclass(frozen=True)
class Point:
    x: int
    y: int

p = Point(3, 4)
p.x = 99

What does the assignment p.x = 99 do?

It silently succeeds — frozen=True only affects equality, not assignment
frozen=True is specifically about preventing mutation — that’s its whole purpose. Equality is generated by eq=True (the default), separately.
It raises AttributeError — the field is read-only because it has no setter
AttributeError is what you get from @property without a setter, or from accessing a missing attribute. Frozen dataclasses raise their own dedicated exception.
FrozenInstanceError — frozen dataclasses block __setattr__
It raises TypeError — 99 is not declared int in this context
Type annotations are not enforced at runtime (Step 5). The assignment fails because the class is frozen, not because of any type check.

@dataclass(frozen=True) overrides __setattr__ to raise FrozenInstanceError on any attempt to assign to a field. This is what gives you immutability — and it’s also why frozen dataclasses are hashable (immutable values can be safely put into sets and dict keys).

3. Which of these statements about @property are true? (Select all that apply.) (select all that apply)

It lets you read a method’s result as an attribute, without parentheses
It makes the underlying field private — no callers can read it
@property is purely about interface shape (p.x vs p.x()). It doesn’t make anything private. Python’s privacy convention is the underscore prefix (_internal), and even that is only a convention — there are no hard private fields.
It can be combined with a setter (using @<name>.setter) to control writes
It is a special form of __getattr__
@property is implemented as a descriptor on the class, not via __getattr__. __getattr__ is the per-instance fallback for missing attributes — completely different mechanism.

@property lets a method look like an attribute on the outside (no parens). You can pair it with @<name>.setter to also control writes. It does not make the underlying state private — that’s a Java/C# habit that doesn’t translate. And it is a descriptor, not __getattr__.

4. (Spaced review — Step 7: List Comprehensions) What is points[2] after this line?

points = [Point(x, x * 2) for x in range(5)]

Point(2, 2)
x * 2 for x = 2 is 4, not 2. The y-coordinate is the doubled value.
Point(2, 4)
Point(4, 8)
Indexing is 0-based: points[2] corresponds to x = 2, not x = 4.
(2, 4) — a plain tuple, since list comprehensions don’t return objects
List comprehensions can absolutely produce class instances. Point(x, x*2) is a function call expression — it constructs a Point for each iteration.

range(5) yields 0, 1, 2, 3, 4. The list comprehension constructs a Point for each, with y = x * 2. So points[2] corresponds to x = 2, giving Point(2, 4). List comprehensions compose just as well with custom classes as with primitives.

5. Evaluate. For which use case is @dataclass(frozen=True) the best fit?

A shopping cart whose items are added and removed throughout the session
Shopping carts mutate over time — adding/removing items requires assignment to fields or methods that change state. A frozen dataclass would block all of that.
A 2D grid coordinate (row, col) used as a dictionary key
A database connection that holds a socket and a transaction state
Connections have identity (this specific socket) and changing internal state (buffer, transaction). They are reference objects, not value objects. Frozen dataclasses fit values where two with equal fields are interchangeable — connections aren’t.
A logger object that buffers messages and flushes them periodically
Loggers buffer messages — internal state changes constantly. Also, two loggers with equal configuration aren’t interchangeable; they have identity.

frozen=True is the right fit for value objects: small, conceptually immutable, where Point(3, 4) == Point(3, 4) should mean “the same value.” Coordinates, money amounts, dates, RGB colors all fit this pattern. Things with changing internal state (carts, connections, loggers) are reference objects — use a regular class.

6. (Spaced review — Step 5: Type Hints) Given:

@dataclass(frozen=True)
class Point:
    x: int
    y: int

What happens when you write p = Point(3.5, 4.5)?

Runtime: TypeError — Python rejects floats where ints are annotated.
Type annotations on dataclass fields are not enforced by Python at runtime — same rule as Step 5. The annotations only tell @dataclass what to put in the auto-generated __init__ signature; they do not gate the values.
Runtime: constructs with p.x == 3.5. mypy: flags float ≠ int.
Runtime: Point(3, 4) (Python silently truncates floats). mypy: ok.
Python does not coerce floats to ints in dataclass construction — p.x stays 3.5. The annotation is documentation, not a converter.
Runtime: depends on whether frozen=True is set.
frozen=True controls post-construction mutation, not what types the constructor accepts. Annotations are inert at runtime regardless of frozen.

This is Step 5’s lesson applied inside @dataclass: the field annotations (x: int) are read by the decorator to wire up __init__, but Python never enforces them at runtime. Point(3.5, 4.5) constructs cleanly; mypy would flag it. The runtime-vs-static distinction is the same rule everywhere annotations appear — function signatures (Step 5) or dataclass fields (here).

Python Essentials: Scripting & Automation

Hello, Python!

Why this matters

🎯 You will learn to

A Note About Errors

Your First Python Script

Predict Before You Run

Task

Solution

Step 1 — Knowledge Check

Variables, Types & f-Strings

Why this matters

🎯 You will learn to

Bridging Your C++ Mental Model

No Type Declarations

String Quotes: "..." and '...' Are Interchangeable

⚠️ Dynamic ≠ Weak: Python Still Has Type Rules

f-Strings — Like C++’s printf but Readable

Predict Before You Code

Task

Solution

Step 2 — Knowledge Check

The Indentation Trap

Why this matters

🎯 You will learn to

⚠️ The Indentation Trap (Negative Transfer from C++)

Task: Fixer Upper

Solution

Step 3 — Knowledge Check

Functions

Why this matters

🎯 You will learn to

Functions: def vs C++ Signatures

Default Parameters

Predict Before You Code

Task

Solution

Step 4 — Knowledge Check

Type Hints

Why this matters

🎯 You will learn to

A Bridge from C++ Types

Built-in Generics vs. the typing Module

Predict Before You Run

Task

Solution

Step 5 — Knowledge Check

Loops

Why this matters

🎯 You will learn to

Transfer Note: C++ Range-Based Loops → Python for

Tuple Unpacking

Python for Loops: Iterating Over Collections

range() — Generating Integer Sequences

List Operations (append, remove, clear)

⚠️ Two Operator Traps from C++

Predict Before You Code

Task

Solution

Step 6 — Knowledge Check

List Comprehensions

Why this matters

🎯 You will learn to

Comprehensions Look Strange at First

Try It First (Productive Failure)

✨ Python Beacon: List Comprehensions

Filtering with a Condition

Compared to a for-loop

Predict Before You Code

Task

Solution

Step 7 — Knowledge Check

Reading Files with open() and with

Why this matters

🎯 You will learn to

Python’s “Batteries Included” Philosophy

Reading Files with open() and with

Predict Before You Code

Task

Solution

String Quotes: `"..."` and `'...'` Are Interchangeable

f-Strings — Like C++’s `printf` but Readable

Functions: `def` vs C++ Signatures

Built-in Generics vs. the `typing` Module

Transfer Note: C++ Range-Based Loops → Python `for`

Python `for` Loops: Iterating Over Collections

`range()` — Generating Integer Sequences

List Operations (`append`, `remove`, `clear`)

Reading Files with `open()` and `with`

Command-Line Arguments with `sys.argv`

Writing to `stderr` with `print()`

`frozen=True`: Immutability as a Design Tool

`@property`: a Method That Looks Like an Attribute