1

Hello, Python!

Why this matters

You already write C++ and shell scripts, but Python is the language of choice when you need to get something done fast — process a CSV, call an API, prototype an algorithm. It now ranks among the world’s top 5 most widely used languages, which makes learning it a great investment of your time. Before you can write Python idiomatically, you need a feel for how its execution model differs from what you already know.

🎯 You will learn to

  • Apply Python’s interpreted execution model by running your first script
  • Contrast Python’s syntax (no semicolons, no main(), indentation-based) with C++ and Bash

You already write C++ and shell scripts. Here is how Python fits into your toolkit:

Aspect C++ Bash Python
Typing Static (int x) Untyped strings Dynamic (x = 5)
Memory Manual (new/delete) N/A Garbage-collected
Run with Compile → ./app bash script.sh python3 script.py
Strength Speed, systems code Glue commands together Rapid prototyping, data, automation

Python is the language of choice when you need to get something done fast — process a CSV, call an API, write a test harness, or prototype an algorithm before porting it to C++. Very large systems or systems with high performance requirements are often better implemented in statically typed, compiled languages like C++ or Rust to detect bugs earlier and to improve performance. However, Python has significantly grown in popularity in recent years and is now one of the top 5 most widely used programming languages in the world. In some surveys it even ranks number 1. So learning Python is a great investment of your time!

A Note About Errors

You will see many error messages in this tutorial. That is completely normal — every programmer, from beginner to expert, spends a large part of their time reading errors and debugging. Error messages are Python telling you exactly what to fix. Read them carefully; they are your most useful debugging tool. If you are not stuck at least some of the time, you are not learning.

Your First Python Script

Python’s print() is the equivalent of C++’s printf() / cout and Bash’s echo:

# Bash:   echo "Hello, World!"
# C++:    printf("Hello, World!\n");
# Python:
print("Hello, World!")

Notice there are no semicolons, no #include, and no main() function. Python scripts run top-to-bottom like shell scripts.

Predict Before You Run

Before changing anything, look at hello.py and predict: what will Python print when you click Run? Try it now and compare.

Task

Open hello.py. Change the message so it prints:

Hello, CS 35L!

Then click ▶ Run (or press Ctrl+Enter) to execute your script and see the output.

Starter files
hello.py
# Task: Change the message to "Hello, CS 35L!"
print("Hello, World!")
2

Variables, Types & f-Strings

Why this matters

Python’s dynamic typing eliminates the declaration ceremony you write every day in C++, but it does not make Python “weakly typed” — a confusion that traps C++ programmers and produces hard-to-find bugs. f-strings are the modern, readable way to format output, and they are far more compact than printf or cout << chains.

🎯 You will learn to

  • Apply Python’s dynamic typing to assign and inspect variables without declarations
  • Analyze the difference between dynamic typing and weak typing
  • Create formatted output using f-strings

Bridging Your C++ Mental Model

No Type Declarations

In C++ every variable must be declared with its type:

int   score   = 95;
float gpa     = 3.8;
std::string name = "Alice";

In Python, you just assign. Python infers the type:

score = 95        # int
gpa   = 3.8       # float
name  = "Alice"   # str

You can always check the type at runtime: print(type(score))<class 'int'>.

String Quotes: "..." and '...' Are Interchangeable

In C++, single quotes and double quotes mean different things: 'A' is a char, while "Alice" is a const char* (or std::string). Mixing them up is a compile error.

In Python, single and double quotes are completely interchangeable for strings — there is no char type:

name = "Alice"    # str
name = 'Alice'    # also str — identical result

This is handy when your string itself contains quotes:

msg = "It's easy"          # double quotes avoid escaping the apostrophe
html = '<div class="box">' # single quotes avoid escaping the double quotes

In C++ you’d have to escape: "It\'s easy" or "<div class=\"box\">". Python lets you pick whichever quote style avoids the clash.

Convention: Most Python style guides (including PEP 8) accept either, but recommend picking one and being consistent. You’ll see both in the wild.

⚠️ Dynamic ≠ Weak: Python Still Has Type Rules

Python is dynamically typed (you don’t declare types) but strongly typed (it won’t silently convert between incompatible types). This trips up C++ programmers who assume “no declarations” means “no type errors”:

x = "5" + 3    # TypeError: can only concatenate str to str

Unlike JavaScript (which would give "53"), Python refuses to guess. You must be explicit: int("5") + 38 or "5" + str(3)"53".

f-Strings — Like C++’s printf but Readable

# C++:    printf("Student: %s, GPA: %.1f\n", name, gpa);
# Python: (note the f prefix and {variable} syntax — same idea as shell's $variable)
print(f"Student: {name}, GPA: {gpa:.1f}")

The f"..." string is called an f-string (formatted string literal). It is Python’s idiomatic way to embed expressions inside strings.

Predict Before You Code

Before writing any code, predict: what will type(3.14) return in Python? What about type("3.14")? Write your predictions down, then verify with print(type(...)) in the editor.

Task

Complete profile.py by replacing the print(...) placeholder with an f-string that produces:

Student: Alice | Year: 2 | Major: Computer Science | GPA: 3.82

Use :.2f inside the braces to format the GPA to two decimal places.

Starter files
profile.py
name  = "Alice"
year  = 2
gpa   = 3.819
major = "Computer Science"

print(f'The type of 3.14 is {type(3.14)}')
print(f'The type of "3.14" is {type("3.14")}')


# TODO: print the line below using a single f-string:
# Student: Alice | Year: 2 | Major: Computer Science | GPA: 3.82
# Hint: format gpa with :.2f inside the braces
print(...)
3

The Indentation Trap

Why this matters

Indentation is the single most common stumbling block when C++ programmers write Python. In C++ indentation is cosmetic; in Python, indentation is the syntax. Wrong indentation produces an IndentationError and confused students who do not know why their previously-fine code is now broken. Confronting this early prevents weeks of frustration.

🎯 You will learn to

  • Analyze Python code to identify indentation errors caused by negative transfer from C++
  • Apply correct indentation rules (4 spaces, never mixed with tabs) to fix block structure

⚠️ The Indentation Trap (Negative Transfer from C++)

In C++, indentation is cosmetic — the compiler ignores it, {} defines blocks. In Python, indentation IS the syntax. Wrong indentation = IndentationError.

# C++ programmer's instinct (WRONG in Python):
if score >= 90:
print("A")          # IndentationError: expected an indented block

# Correct Python:
if score >= 90:
    print("A")      # 4 spaces (or 1 tab — never mix them!)

Rule: Use 4 spaces per indent level. Never mix tabs and spaces.

Every block-opening statement (if, elif, else, for, while, def, class, …) ends with a : and the body must be indented one level further.

Task: Fixer Upper

The file grades.py below has two bugs:

  1. An indentation error inside the if block
  2. A type error in one of the print statements

Fix both bugs so the script prints the correct letter grade for each score.

Starter files
grades.py
# Fixer Upper: Find and fix the two bugs in this script.
# Bug 1: Indentation error
# Bug 2: Type error in a print statement

scores = [95, 83, 71, 62, 55]

for score in scores:
    if score >= 90:
    print(f"Score {score}: A")
    elif score >= 80:
        print("Score " + score + ": B")
    elif score >= 60:
        print(f"Score {score}: C")
    else:
        print(f"Score {score}: F")
4

Functions

Why this matters

Functions are how you compose larger programs. Python’s def syntax is briefer than C++’s — no return type, no parameter types required — but the trade-off is that mistakes surface at runtime instead of compile time. Default parameters let you write APIs that are short to call in the common case and explicit when callers need control.

🎯 You will learn to

  • Apply def syntax to implement Python functions with optional type hints
  • Create functions with default parameter values and use them with positional or keyword arguments
  • Contrast Python’s def signature with C++ function signatures

Functions: def vs C++ Signatures

In C++ you must specify return types and parameter types:

int add(int a, int b) { return a + b; }

In Python you just use def. Types are optional (you can add them as type hints, but they are not enforced):

# SUB-GOAL: Define the function with its parameters
def add(a, b):
    # SUB-GOAL: Compute and return the result
    return a + b          # No type declarations required

# With optional type hints (documents intent, not enforced at runtime):
def add(a: int, b: int) -> int:
    return a + b

Default Parameters

A parameter can have a default value, used when the caller omits that argument. Default parameters must come after required ones — the same rule as in C++.

def greet(name, greeting="Hello"):
    print(f"{greeting}, {name}!")

greet("Alice")             # → Hello, Alice!   (uses default)
greet("Bob", "Welcome")    # → Welcome, Bob!   (overrides default)

Predict Before You Code

Before writing any code, predict: what does mean([4, 8, 15, 16, 23, 42]) return? Do the mental math, write your answer down, then check it after implementing.

Task

Complete two functions in functions.py:

  1. mean(numbers) — returns the arithmetic mean. Hint: sum() and len() are built-in Python functions — no import needed. Python ships dozens of these (builtins) that are always available, similar to how printf is always available in C via <stdio.h> — except builtins require no #include at all.
  2. label_score(score, threshold=50) — returns "pass" if score >= threshold, otherwise "fail".

What does pass mean? In Python, pass is a do-nothing placeholder that makes an otherwise empty function or block body syntactically valid — the same idea as leaving a C++ function body as { }. The starter code uses pass to mark every spot you need to fill in. Replace every pass with your real implementation — no pass statements should remain in your final solution.

Starter files
functions.py
def mean(numbers):
    """Return the arithmetic mean of a list of numbers."""
    # TODO: implement using sum() and len()
    pass

def label_score(score, threshold=50):
    """Return 'pass' if score >= threshold, else 'fail'."""
    # TODO: implement using an if/else
    pass

# --- Quick self-test ---
data = [4, 8, 15, 16, 23, 42]
print(f"Data: {data}")
print(f"Mean: {mean(data)}")
print(f"Score 75: {label_score(75)}")
print(f"Score 30: {label_score(30)}")
print(f"Score 75 (threshold=80): {label_score(75, 80)}")
5

Type Hints

Why this matters

Dynamic typing is fast to write but easy to break. Type hints give you a middle ground: contracts that document your intent, that IDEs use for autocomplete, and that mypy enforces statically — without sacrificing Python’s flexibility. They are how serious Python codebases stay maintainable as they grow.

🎯 You will learn to

  • Apply type hint syntax to annotate Python function parameters and return values
  • Analyze why Python type hints are checked by external tools (mypy, IDEs) rather than by the interpreter at runtime

A Bridge from C++ Types

In C++, types are part of the contract the compiler enforces:

double mean(std::vector<double> numbers);   // compiler rejects mean("abc")

Python lets you write the same kind of contract — but it is checked by external tools (mypy, IDEs like PyCharm and VS Code/Pyright), not by the Python interpreter. The annotations live on the function but Python itself ignores them at runtime.

def mean(numbers: list[float]) -> float:
    return sum(numbers) / len(numbers)

Read this as: numbers is annotated as a list of float; this function is annotated to return a float.” Python stores those annotations on mean.__annotations__ but never raises a TypeError from them.

Built-in Generics vs. the typing Module

Since Python 3.9, you can use the built-in collections directly as generics — no import needed:

def biggest(scores: list[int]) -> int: ...
def lookup(table: dict[str, int], key: str) -> int: ...

For “could be int or None” (a common case), import from typing:

from typing import Optional

def first_failing(scores: list[int], threshold: int = 50) -> Optional[int]:
    """Return the first failing score, or None if everyone passed."""
    ...

Optional[int] is shorthand for int | None. (Python 3.10+ also supports int | None directly — both work.)

Predict Before You Run

What do you think happens at runtime when this is called with strings?

def add(a: int, b: int) -> int:
    return a + b

add("hello", "world")    # ← what does Python do here?

Predict first — actually write your prediction down or say it aloud — then try it in the editor. Most learners coming from C++ predict that Python rejects the call. Being wrong here is the lesson, not a failure: your C++ instinct is exactly what we are tuning. The answer is illuminating: Python does not raise a TypeError from the annotation. The + between two strings happily concatenates them. The annotation is documentation. The check happens when mypy (or your IDE) reads the source — not when Python runs it.

Task

Complete typed_grades.py. The functions are recycled from Step 4 — your job is to add type hints without changing any of the logic.

  1. Add hints to mean(numbers) so it accepts a list[float] and returns a float.
  2. Add hints to label_score(score, threshold=50) — both parameters are int, return is str. Remember the order: name: type = default.
  3. Add hints to first_failing(scores, threshold=50) — return type is Optional[int] (and don’t forget from typing import Optional).
  4. Predict, then run. At the bottom of the file, uncomment the probe print(mean(['a', 'b'])). Before you run it, write down what you predict happens — does Python raise an error? If so, where does the error come from (the annotation, or the function body)? Then run, and compare to your prediction. This step is the lesson; do not skip it.
Starter files
typed_grades.py
# Goal: add type hints to each function. The behavior is already correct.
# TODO: import Optional from typing (you'll need it for first_failing)

def mean(numbers):                              # TODO: annotate numbers and return type
    return sum(numbers) / len(numbers)

def label_score(score, threshold=50):           # TODO: annotate score, threshold, return type
    if score >= threshold:
        return 'pass'
    return 'fail'

def first_failing(scores, threshold=50):        # TODO: annotate — return type is Optional[int]
    """Return the first score below threshold, or None if all pass."""
    for s in scores:
        if s < threshold:
            return s
    return None

# --- Quick self-test ---
print(f"Mean:           {mean([4, 8, 15, 16, 23, 42])}")
print(f"Label 75:       {label_score(75)}")
print(f"First failing:  {first_failing([90, 80, 30, 70])}")

# --- Step 4 (required): predict, then uncomment ---
# Predict FIRST: does Python raise an error? If so, from where?
# Then uncomment and run, and compare to your prediction.
# print(mean(['a', 'b']))
6

Loops

Why this matters

Iteration is the workhorse of any program. Python’s for is item-based by default — you almost never write for i in range(len(...)) like you would in C++. Mastering enumerate() and range() unlocks idiomatic Python, and avoiding the ** vs ^ and / vs // operator traps will save you hours of confused debugging.

🎯 You will learn to

  • Apply Python for loops with enumerate() and range() to iterate over collections idiomatically
  • Analyze the operator differences between Python and C++ (** vs ^, / vs //)

Transfer Note: C++ Range-Based Loops → Python for

If you have used modern C++ range-based for (for (auto& x : vec)), Python’s iteration model will feel familiar — Python just makes it the default. The key habit to build: reach for for x in collection first, not for i in range(len(...)).

Tuple Unpacking

Before diving into loops, one quick concept. Python can unpack a pair (or tuple) into separate variables in a single assignment:

pair = (0, "Alice")
i, name = pair        # i = 0, name = "Alice"

This works anywhere Python assigns a value — including in for loops. You will see this pattern immediately below with enumerate().

Python for Loops: Iterating Over Collections

C++ for loops typically count indices. Python loops iterate over items directly:

// C++: index-based
for (int i = 0; i < nums.size(); i++) { cout << nums[i]; }
# Python: item-based (preferred)
for num in nums:
    print(num)

# Need the index too? enumerate() yields (index, item) pairs.
# Tuple unpacking splits each pair into two loop variables:
for i, num in enumerate(nums):
    print(f"Index {i}: {num}")

range() — Generating Integer Sequences

C++ counting loops translate directly to range() in Python:

# C++: for (int i = 0; i < 5; i++) { ... }
for i in range(5):           # i = 0, 1, 2, 3, 4

# C++: for (int i = 1; i <= 5; i++) { ... }
for i in range(1, 6):        # i = 1, 2, 3, 4, 5  (stop is *exclusive*, like C++'s <)

# C++: for (int i = 0; i < 10; i += 2) { ... }
for i in range(0, 10, 2):    # i = 0, 2, 4, 6, 8  (optional step argument)

Key rule: range(start, stop) always includes start and excludes stop — exactly like C++’s i < stop.

List Operations (append, remove, clear)

Unlike fixed-size C++ arrays, Python lists are dynamic (like std::vector). A few common operations you will use:

# C++: vec.push_back(5);
# Python:
result = []       # 1. Create an empty list
result.append(5)  # 2. Add an item to the end
result.append(10) # result is now [5, 10]

# Removing items:
result.remove(5)  # Removes the first occurrence of 5 (result is now [10])
                  # (Raises ValueError if 5 is not in the list)

result.clear()    # Empties the entire list (result is now [])
                  # C++: vec.clear();

⚠️ Two Operator Traps from C++

Trap 1: ** for exponentiation — not ^

Python uses ** for exponentiation. ^ is bitwise XOR — a common mistake from math notation or C++ (pow()):

2 ** 8    # 256  ✓  (two to the eighth power)
9 ** 0.5  # 3.0  ✓  (square root — works on floats)
2 ^ 8     # 10   ✗  (bitwise XOR — NOT exponentiation!)

Trap 2: / for float division — not integer division

In C++, 7 / 23 (integer division). In Python 3, / always gives a float:

7 / 2     # 3.5   (float division — different from C++!)
7 // 2    # 3     (integer/floor division — like C++'s /)
7 % 2     # 1     (modulo — same as C++)

Predict Before You Code

Before implementing: what does running_total([1, 2, 3]) return? Trace through the loop by hand.

Task

Complete loops.py:

  1. running_total(numbers) — returns a new list where each element is the cumulative sum up to that index. Example: running_total([1, 2, 3])[1, 3, 6]. Use a for loop.
Starter files
loops.py
def running_total(numbers: list[int]) -> list[int]:
    """Return a list of cumulative sums.
    Example: running_total([1, 2, 3]) == [1, 3, 6]
    """
    result = []
    total = 0
    for n in numbers:
        # TODO: add n to total, then append total to result
        pass
    return result

# --- Quick self-test ---
data = [4, 8, 15, 16, 23, 42]
print(f"Data:          {data}")
print(f"Running total: {running_total(data)}")

# Verify your understanding of / vs //
print(f"7 / 2  = {7 / 2}")    # What do you predict?
print(f"7 // 2 = {7 // 2}")   # What do you predict?
7

List Comprehensions

Why this matters

List comprehensions are one of the features that makes Python Python. They turn five-line for-loops into a single readable expression — once you can read them. Recognizing the [expr for x in iter if cond] pattern is essential for reading any modern Python codebase, and writing them cleanly is what separates idiomatic Python from “Python written like C++”.

🎯 You will learn to

  • Create list comprehensions with filters using the [expr for x in iter if cond] pattern
  • Analyze when a comprehension is clearer than the equivalent for-loop and when it is not

Comprehensions Look Strange at First

List comprehensions are one of Python’s most powerful idioms, but their compact syntax can feel cryptic at first. That is normal — everyone reads comprehensions slowly when they first encounter them. After a few exercises they become natural. Do not worry if you need to mentally “unpack” each one into a for-loop to understand it.

Try It First (Productive Failure)

Challenge: Before reading further, try to build the list [1, 4, 9, 16, 25] (the squares of 1 through 5) in a single line of Python. You already know range() and ** from the previous step. Give it your best shot in the editor, then read on.

✨ Python Beacon: List Comprehensions

A list comprehension is a compact way to build a list. Once you recognize the pattern, you will see it everywhere in Python code:

# C++ equivalent:
# std::vector<int> squares;
# for (int i = 1; i <= 5; i++) squares.push_back(i * i);

# Python: one line — combines range() and **
squares = [x**2 for x in range(1, 6)]          # [1, 4, 9, 16, 25]

The general form is:

[expression  for variable in iterable]

Filtering with a Condition

Add an if at the end to keep only items that match:

evens = [x for x in range(10) if x % 2 == 0]   # [0, 2, 4, 6, 8]
nums  = [4, 8, 15, 16, 23, 42]
big   = [x for x in nums if x > 20]             # [23, 42]

Compared to a for-loop

# For-loop version:
result = []
for x in range(10):
    if x % 2 == 0:
        result.append(x)

# List comprehension — same result, one line:
result = [x for x in range(10) if x % 2 == 0]

List comprehensions are preferred when the transformation is simple — they are a recognized Python idiom that experienced readers understand at a glance.

Predict Before You Code

Before writing any code, predict: what does [x**2 for x in range(4)] produce? Write your answer, then verify by typing it into the editor and clicking Run.

Task

Complete two functions in listcomp.py:

  1. above_average(numbers) — returns a list of numbers strictly greater than the mean. Use a list comprehension with a condition.
  2. squares_up_to(n) — returns [1, 4, 9, ..., n**2]. Use range() starting at 1 and ** for exponentiation in a list comprehension.
Starter files
listcomp.py
from functions import mean

def above_average(numbers: list[float]) -> list[float]:
    """Return a list of numbers strictly greater than the mean."""
    avg = mean(numbers)
    # Use a list comprehension with a condition
    pass

def squares_up_to(n: int) -> list[int]:
    """Return [1**2, 2**2, ..., n**2] using range() and **."""
    pass

# --- Quick self-test ---
data = [4, 8, 15, 16, 23, 42]
print(f"Data:          {data}")
print(f"Above average: {above_average(data)}")
print(f"Squares to 5:  {squares_up_to(5)}")
functions.py
def mean(numbers: list[float]) -> float:
    """Return the arithmetic mean of a list of numbers."""
    return sum(numbers) / len(numbers)

def label_score(score: int, threshold: int = 50) -> str:
    """Return 'pass' if score >= threshold, else 'fail'."""
    if score >= threshold:
        return 'pass'
    else:
        return 'fail'
8

Reading Files with open() and with

Why this matters

Reading files is something every program eventually has to do, and resource leaks (forgotten fclose()) are a classic C/C++ bug. Python’s with statement is the language’s elegant answer: a context manager that guarantees cleanup, even on exceptions. The same pattern (RAII in C++ terms) extends to network sockets, locks, and database connections — learning it here pays off everywhere.

🎯 You will learn to

  • Apply with open() to read files line-by-line in idiomatic Python
  • Analyze how Python’s context manager pattern relates to C++’s RAII

Python’s “Batteries Included” Philosophy

One of Python’s greatest strengths is its standard library — hundreds of modules ready to use with no installation:

Module What it does C++ / Bash equivalent
os, pathlib File paths, directory traversal <filesystem> / ls, find
sys Command-line args, exit codes argc/argv / $@
json Parse/write JSON Requires a library
re Regular expressions <regex> / grep
csv Read/write CSV Manual parsing
subprocess Run shell commands system() / direct Bash

Reading Files with open() and with

In C++ you fopen, check for NULL, process, and fclose. Python’s with statement handles the close automatically — even if an exception occurs:

# SUB-GOAL: Open the file (with ensures automatic close)
with open("data.txt") as f:
    # SUB-GOAL: Process each line
    for line in f:
        # SUB-GOAL: Clean and display
        print(line.strip())   # .strip() removes the trailing newline

The with statement is Python’s resource management idiom — just like RAII in C++, the file is guaranteed to be closed when the block exits.

Predict Before You Code

Before writing any code, look at data.txt and predict: how many total words does it contain? Then click Run on the starter code and see if your mental count matches.

Task

Complete word_count.py. It should:

  1. Read every line from data.txt
  2. Split each line into words (.split() splits on whitespace)
  3. Count the total number of words across all lines
  4. Print: Total words: <count>

The file data.txt is already created for you.

Starter files
word_count.py
# SUB-GOAL: Initialize the counter
total = 0

# SUB-GOAL: Open and read the file
with open("data.txt") as f:
    for line in f:
        words = line.split()
        # SUB-GOAL: Accumulate the count
        # TODO: add len(words) to total
        pass

# SUB-GOAL: Report the result
# TODO: print "Total words: <count>"
pass
data.txt
the quick brown fox jumps over the lazy dog
pack my box with five dozen big liquor jugs
how vexingly quick daft zebras jump
9

Regular Expressions in Python: the re Module

Why this matters

You already know regex from grep and sed. Python’s re module brings that same power inside a script — no subprocess, no fragile shell escaping. Whenever you need to extract structured data from text (log lines, HTML, CSV oddities, error messages), re.findall(), re.search(), and re.sub() are the three tools that solve the vast majority of cases.

🎯 You will learn to

  • Apply re.findall(), re.search(), and re.sub() to extract, test, and transform text patterns
  • Apply raw strings (r'...') to write regex patterns without backslash-escaping headaches

From grep to Python

In the RegEx tutorial you used patterns with grep -E and sed. Python’s built-in re module gives you the same power inside a script — no subprocess needed:

Shell Python re equivalent
grep -E 'pattern' file re.findall(r'pattern', text)
grep -c 'pattern' file len(re.findall(r'pattern', text))
sed 's/old/new/g' file re.sub(r'old', 'new', text)
Test if a match exists re.search(r'pattern', text)

The three essential functions

import re

text = "Error 404: page not found. Error 500: server crash."

# SUB-GOAL: Find the first match
m = re.search(r'Error \d+', text)
if m:
    print(m.group())     # "Error 404"

# SUB-GOAL: Find all matches
codes = re.findall(r'\d+', text)
print(codes)             # ['404', '500']

# SUB-GOAL: Replace all matches
clean = re.sub(r'Error \d+', 'ERR', text)
print(clean)             # "ERR: page not found. ERR: server crash."

Raw strings (r'...') are the standard for regex patterns in Python — they prevent Python from interpreting backslashes before re sees them.

Predict Before You Code

Before implementing: what does re.findall(r'\d+', 'boot in 3... 2... 1...') return? Write your prediction, then check in the editor.

Task

Complete log_parser.py. The log file is already loaded as a string for you.

  1. Use re.findall() to collect all timestamps (HH:MM:SS pattern) and print the count
  2. Use re.findall() to collect every ERROR line and print the count
  3. Use re.sub() to redact all IP addresses with "x.x.x.x" and print the redacted log
Starter files
log_parser.py
import re

with open("log.txt") as f:
    text = f.read()

# 1. Extract all timestamps (HH:MM:SS) and print count
# Hint: pattern is r'\d{2}:\d{2}:\d{2}'
# Expected output: Timestamps found: 6

# 2. Extract all ERROR lines and print count
# Hint: pattern is r'ERROR.*'
# Expected output: Errors: 2

# 3. Redact IPv4 addresses and print redacted log
# Hint: pattern is r'\d+\.\d+\.\d+\.\d+'
log.txt
2024-01-15 09:23:11 INFO  Server started on port 8080
2024-01-15 09:23:45 ERROR Connection failed: timeout
2024-01-15 09:24:02 INFO  Request from 192.168.1.42
2024-01-15 09:24:18 WARNING Slow response: 2345ms
2024-01-15 09:24:33 ERROR Disk usage at 94%
2024-01-15 09:24:51 INFO  Request from 10.0.0.7
10

sys.argv & stderr

Why this matters

Real Python scripts do not run from a hard-coded print — they take input from the command line, just like every CLI tool you use daily. sys.argv is the equivalent of argc/argv in C++, and routing error output to sys.stderr lets your scripts compose cleanly with shell pipelines (so users can redirect logs separately from data). Get this right and your scripts behave like proper Unix citizens.

🎯 You will learn to

  • Apply sys.argv to read and validate command-line arguments in a Python script
  • Apply sys.stderr (via print(..., file=sys.stderr)) to route error and diagnostic output away from stdout

Command-Line Arguments with sys.argv

import sys

# SUB-GOAL: Parse command-line arguments
# sys.argv is a list: ["script.py", "arg1", "arg2", ...]
# C++ equivalent:  argv[0], argv[1], ...

# SUB-GOAL: Validate arguments
if len(sys.argv) < 2:
    print("Usage: python3 script.py <filename>", file=sys.stderr)
    sys.exit(1)              # Exit with non-zero code — just like in C++

# SUB-GOAL: Use the argument
filename = sys.argv[1]

sys.argv[0] is always the script name itself. Extra arguments start at index 1. sys.exit(1) terminates the process with exit code 1 — the same convention as C’s exit(1).

Writing to stderr with print()

By default print() writes to stdout. Error and diagnostic messages should go to stderr, matching C++’s std::cerr and Bash’s >&2 redirect:

import sys

# C++: std::cout << "Done." << std::endl;
print("Done.")                                    # → stdout

# C++: std::cerr << "Warning: file not found" << std::endl;
print("Warning: file not found", file=sys.stderr) # → stderr

Separating them lets callers redirect each stream independently:

python3 script.py > output.txt 2> errors.txt

Predict Before You Code

Before writing any code, predict: if you run python3 script.py with no arguments, what is sys.argv? Is it an empty list, or does it contain something? Verify by adding print(sys.argv) to a test script.

Task

Write safe_word_count.py from scratch. (Note: type data.txt into the “args: “ input box in the Output panel to add it to the program args to read this file). It should:

  1. If no filename argument is provided (len(sys.argv) < 2), print Error: no filename given to sys.stderr and call sys.exit(1)
  2. Read filename = sys.argv[1] and print Reading: <filename> to sys.stderr
  3. Count words and print Total words: <count> to stdout
Starter files
safe_word_count.py
import sys

# Write the complete script from scratch.
# Requirements:
#   1. Check sys.argv — error to stderr + exit(1) if no filename
#   2. Print "Reading: <filename>" to stderr
#   3. Count words, print "Total words: <count>" to stdout
data.txt
the quick brown fox jumps over the lazy dog
pack my box with five dozen big liquor jugs
how vexingly quick daft zebras jump
11

Capstone: Build a Log Analyzer

Why this matters

You now have all the component skills — functions, file I/O, regex, list comprehensions, and command-line arguments. The hard part of programming is not learning each piece in isolation, but composing them into something that solves a real problem. This capstone is your chance to integrate everything you’ve learned with no scaffolding telling you what to type.

🎯 You will learn to

  • Create a complete Python script that integrates functions, file I/O, regex, list comprehensions, and command-line arguments
  • Apply your judgment to structure code without step-by-step guidance

Putting It All Together

You now have all the component skills. This capstone integrates them into a single real-world script — with no scaffolding. You decide how to structure the code.

Task

Build log_analyzer.py — a command-line tool that analyzes a server log. (Note: type server.log into the “args: “ input box in the Output panel to add it to the program args to read this file).

Requirements:

  1. Accept a filename via sys.argv[1]. If missing, print an error to stderr and exit with code 1.
  2. Read the file and extract:
    • The total number of log lines
    • All unique IP addresses (use re.findall() and a set)
    • The number of ERROR lines
    • The number of WARNING lines
  3. Print a summary report to stdout in this exact format:
    Log Analysis Report
    ===================
    Total lines:    6
    Unique IPs:     2
    Errors:         2
    Warnings:       1
    
  4. Print Reading: <filename> to stderr at the start.

Hints (only if you’re stuck):

  • Use a function for each sub-task (e.g., count_by_level(), extract_ips())
  • Use list comprehensions or re.findall() to filter lines
  • Use len(set(...)) to count unique items
  • f-string format specifiers like {value:>8} right-align in 8 characters
Starter files
log_analyzer.py
# Capstone: Build a complete log analyzer.
# No scaffolding — use everything you have learned.
import sys
import re
server.log
2024-01-15 09:23:11 INFO  Server started on port 8080
2024-01-15 09:23:45 ERROR Connection failed: timeout
2024-01-15 09:24:02 INFO  Request from 192.168.1.42
2024-01-15 09:24:18 WARNING Slow response: 2345ms
2024-01-15 09:24:33 ERROR Disk usage at 94%
2024-01-15 09:24:51 INFO  Request from 10.0.0.7
12

Data Classes

Why this matters

Plain Python classes force you to write __init__, __eq__, and __repr__ by hand — boilerplate you would never write in C++ for a simple struct. @dataclass generates that plumbing automatically, frozen=True gives you immutability for free, and @property lets you compute attributes on the fly. Together, these turn data modeling in Python from tedious to elegant.

🎯 You will learn to

  • Create value-object classes using @dataclass to eliminate __init__ / __eq__ / __repr__ boilerplate
  • Apply frozen=True to make dataclass instances immutable
  • Create computed attributes with @property
  • Evaluate when each tool is the right choice

A Bridge from C++ Structs

In C++ you would describe a 2D point with a struct — a small data holder, often with auto-generated comparison via operator== and printing via operator<<.

struct Point {
    const int x;          // immutable field
    const int y;
    bool operator==(const Point& o) const { return x == o.x && y == o.y; }
};

Plain Python classes work for this, but you have to write all the boilerplate yourself — __init__, __eq__, __repr__. The starter file shows that pain on purpose. Then @dataclass writes those three methods for you.

from dataclasses import dataclass

@dataclass
class Point:
    x: int
    y: int

That tiny declaration is roughly equivalent to a 10-line hand-written class. It uses the type hints from Step 5 (x: int) — that’s how @dataclass knows what fields exist and what their types are.

frozen=True: Immutability as a Design Tool

Add frozen=True and instances become immutable — like declaring all fields const in the C++ struct above. Trying to assign raises FrozenInstanceError:

@dataclass(frozen=True)
class Point:
    x: int
    y: int

p = Point(3, 4)
p.x = 99       # ❌ FrozenInstanceError — Point is immutable

Immutability is not just a defensive habit — it makes value-object equality safe (two Point(3, 4) instances compare equal) and makes the instance hashable (so you can put it in a set or use it as a dict key).

Value Objects vs. Reference Objects

The distinction underneath all of this:

  • A value object is its fields. Two Point(3, 4) instances are interchangeable, the same way two copies of the number 5 are interchangeable. Coordinates, money amounts, dates, RGB colors all fit this pattern. Value objects belong in sets, work as dict keys, and benefit from frozen=True.
  • A reference object has identity that survives equal contents. A database connection, a logger, a shopping cart, a file handle — even two with identical fields are not interchangeable. Reference objects need a regular class (or a non-frozen dataclass) because their internal state changes over time.

frozen=True is the design tool that says “this is a value object.” Asking “is the answer to a == b based on contents alone?” is the test: yes → value object → frozen dataclass; no → reference object → regular class.

@property: a Method That Looks Like an Attribute

What about derived values, like the distance from the origin? You could write a method distance_to_origin(). But callers would have to remember the parens. @property lets you define a method that is read as an attribute — no parens at the call site:

@dataclass(frozen=True)
class Point:
    x: int
    y: int

    @property
    def distance_to_origin(self) -> float:
        return (self.x ** 2 + self.y ** 2) ** 0.5

p = Point(3, 4)
print(p.distance_to_origin)   # 5.0  — no parens!

@property does not make a field private (a common Java/C# habit to drop). It just lets a computation look like an attribute on the outside.

(C++ analogy note: @property has no exact C++ counterpart. The closest is a const getter member function — but C++ would still require parens at the call site. @property erases the parens.)

Predict Before You Run

Once you have made Point frozen, what do you predict happens when this runs?

p = Point(3, 4)
p.x = 99

Predict the exception type, then try it. If you guess AttributeError, you are pattern-matching from the “property without a setter” idiom — close, but frozen=True raises a different exception precisely because it does something different under the hood. Being half-right is informative; the actual exception name reveals the mechanism.

Task

Complete geometry.py. The starter shows PointManual — the hand-written boilerplate version — so you can feel the contrast.

  1. TODO 1. Define Point using @dataclass (no kwargs yet) with two int fields x and y.
  2. TODO 2. Change to @dataclass(frozen=True) so Point is immutable.
  3. TODO 3. Add a @property distance_to_origin that returns (x**2 + y**2) ** 0.5 annotated -> float.
  4. TODO 4 (independent practice). Below Point, define a new frozen dataclass RGB with three int fields r, g, b and a @property as_hex that returns the lowercase 7-character hex string (e.g., RGB(255, 128, 0).as_hex == '#ff8000'). Use the f-string format f'{r:02x}' (Step 2 spaced review) for two-digit hex. No further hints — this one is on you.

Stretch (optional): uncomment the mutation probe at the bottom and observe the FrozenInstanceError.

Starter files
geometry.py
from dataclasses import dataclass

class PointManual:
    """The OLD way: hand-written __init__, __eq__, __repr__."""
    def __init__(self, x, y):
        self.x = x
        self.y = y
    def __eq__(self, other):
        return isinstance(other, PointManual) and self.x == other.x and self.y == other.y
    def __repr__(self):
        return f"PointManual(x={self.x}, y={self.y})"

# TODO 1: Define `Point` using @dataclass with int fields x and y.
# TODO 2: Change to @dataclass(frozen=True) so Point is immutable.
# TODO 3: Add a @property distance_to_origin that returns sqrt(x**2 + y**2).
# TODO 4 (independent practice): Define a frozen dataclass `RGB` with
#         int fields r, g, b and a @property as_hex returning a string
#         like '#ff8000'. Use f'{r:02x}' for two-digit hex.

# --- Quick self-test (uncomment after you finish ALL TODOs above) ---
# a = Point(3, 4)
# b = Point(3, 4)
# print(a == b)                # True (free __eq__)
# print(a)                     # Point(x=3, y=4) (free __repr__)
# print(a.distance_to_origin)  # 5.0 (computed)
# print(RGB(255, 128, 0).as_hex)  # '#ff8000'

# Predict-before-run probe (uncomment after TODO 2):
# a.x = 99                     # What exception type does this raise?