A hands-on introduction to Python for students who already know C++ and shell scripting. Build a solid Python mental model by mapping every concept to what you already know — and spotting where the analogies break.
Learning objective: After this step you will be able to explain how Python’s execution model differs from C++ and write a basic Python script.
You already write C++ and shell scripts. Here is how Python fits into your toolkit:
| C++ | Bash | Python | |
|---|---|---|---|
| Typing | Static (int x) |
Untyped strings | Dynamic (x = 5) |
| Memory | Manual (new/delete) |
N/A | Garbage-collected |
| Run with | Compile → ./app |
bash script.sh |
python3 script.py |
| Strength | Speed, systems code | Glue commands together | Rapid prototyping, data, automation |
Python is the language of choice when you need to get something done fast — process a CSV, call an API, write a test harness, or prototype an algorithm before porting it to C++.
You will see many error messages in this tutorial. That is completely normal — every programmer, from beginner to expert, spends a large part of their time reading errors and debugging. Error messages are Python telling you exactly what to fix. Read them carefully; they are your most useful debugging tool. If you are not stuck at least some of the time, you are not learning.
Python’s print() is the equivalent of C++’s printf() / cout and Bash’s echo:
# Bash: echo "Hello, World!"
# C++: printf("Hello, World!\n");
# Python:
print("Hello, World!")
Notice there are no semicolons, no #include, and no main() function. Python scripts run top-to-bottom like shell scripts.
Before changing anything, look at hello.py and predict: what will Python print when you click Run? Try it now and compare.
Open hello.py. Change the message so it prints:
Hello, CS 35L!
Then click ▶ Run (or press Ctrl+Enter) to execute your script and see the output.
# Task: Change the message to "Hello, CS 35L!"
print("Hello, World!")
1. A C++ programmer sees this Python file and says: “This must be wrong — there’s no main() function and no semicolons.”
What should you tell them?
Python is an interpreted scripting language. Like Bash, it executes statements from top to bottom.
There is no required main() entry point (though you can simulate one with if __name__ == '__main__': ...).
Semicolons are optional in Python and almost never used.
2. Which of the following statements about Python is false?
Python is an interpreted language — you run it directly with python3 script.py with no separate compile step.
Behind the scenes CPython does compile to bytecode (.pyc), but this is invisible to the programmer.
3. In which scenario is Python a better choice than a shell script?
Shell scripts excel at chaining Unix commands. Python excels at anything involving data structures, algorithms, or complex logic — like parsing structured data, calling APIs, or processing text with conditionals and loops. The CSV/statistics task is exactly where Python shines over Bash.
4. A teammate is choosing between Python and C++ for a new project. The project needs to process 10 GB of sensor data as fast as possible in real time, with strict latency requirements. Another teammate suggests Python because “it’s easier.” Evaluate both suggestions. Which response best captures the trade-off?
This is a real-world trade-off. Python’s strength is rapid development; C++’s strength is raw performance. For strict latency requirements, C++ is likely needed for the hot path. But Python is excellent for prototyping, data exploration, and glue code around the performance-critical core. Many real systems combine both.
Learning objective: After this step you will be able to use Python’s dynamic typing and f-strings, and explain the difference between dynamic and weak typing.
In C++ every variable must be declared with its type:
int score = 95;
float gpa = 3.8;
std::string name = "Alice";
In Python, you just assign. Python infers the type:
score = 95 # int
gpa = 3.8 # float
name = "Alice" # str
You can always check the type at runtime: print(type(score)) → <class 'int'>.
"..." and '...' Are InterchangeableIn C++, single quotes and double quotes mean different things: 'A' is a char, while "Alice" is a const char* (or std::string). Mixing them up is a compile error.
In Python, single and double quotes are completely interchangeable for strings — there is no char type:
name = "Alice" # str
name = 'Alice' # also str — identical result
This is handy when your string itself contains quotes:
msg = "It's easy" # double quotes avoid escaping the apostrophe
html = '<div class="box">' # single quotes avoid escaping the double quotes
In C++ you’d have to escape: "It\'s easy" or "<div class=\"box\">". Python lets you pick whichever quote style avoids the clash.
Convention: Most Python style guides (including PEP 8) accept either, but recommend picking one and being consistent. You’ll see both in the wild.
Python is dynamically typed (you don’t declare types) but strongly typed (it won’t silently convert between incompatible types). This trips up C++ programmers who assume “no declarations” means “no type errors”:
x = "5" + 3 # TypeError: can only concatenate str to str
Unlike JavaScript (which would give "53"), Python refuses to guess. You must be explicit: int("5") + 3 → 8 or "5" + str(3) → "53".
printf but Readable# C++: printf("Student: %s, GPA: %.1f\n", name, gpa);
# Python: (note the f prefix and {variable} syntax — same idea as shell's $variable)
print(f"Student: {name}, GPA: {gpa:.1f}")
The f"..." string is called an f-string (formatted string literal). It is Python’s idiomatic way to embed expressions inside strings.
Before writing any code, predict: what will type(3.14) return in Python? What about type("3.14")? Write your predictions down, then verify with print(type(...)) in the editor.
Complete profile.py by replacing the print(...) placeholder with an f-string that produces:
Student: Alice | Year: 2 | Major: Computer Science | GPA: 3.82
Use :.2f inside the braces to format the GPA to two decimal places.
name = "Alice"
year = 2
gpa = 3.819
major = "Computer Science"
print(f'The type of 3.14 is {type(3.14)}')
print(f'The type of "3.14" is {type("3.14")}')
# TODO: print the line below using a single f-string:
# Student: Alice | Year: 2 | Major: Computer Science | GPA: 3.82
# Hint: format gpa with :.2f inside the braces
print(...)
1. What does type(3.14) return in Python?
Python uses float (not C++’s double) for floating-point numbers.
You can always use type(x) to inspect a variable’s type at runtime —
a handy debugging tool that does not exist in C++ without runtime type info (RTTI).
2. Which of the following correctly uses an f-string to print "Price: €12.50"?
f-strings use the f"..." prefix and embed expressions with {expr}.
Format specifiers like :.2f (two decimal places) go inside the braces.
The % operator (option D) is the old Python 2 way; f-strings are the modern idiom.
3. A student runs x = "5" + 3 in Python and gets a TypeError. They say: “But Python is dynamically typed — it should convert automatically!”
Analyze their misunderstanding. What is wrong with their reasoning?
This is a critical distinction: dynamic typing (types checked at runtime, not compile time) is
different from weak typing (implicit type coercion). Python is dynamic and strong.
JavaScript is dynamic and weak ("5" + 3 → "53"). C++ is static and strong.
Understanding this prevents a whole class of bugs.
4. A student writes x = 42 in Python. What is the type of x?
Python infers the type from the assigned value. Integer literals like 42 become int.
Unlike C++, there is no explicit type declaration — Python does this automatically.
You can verify with type(x), which returns <class 'int'>.
Learning objective: After this step you will be able to identify and fix indentation errors caused by negative transfer from C++, and explain why Python uses indentation instead of braces.
In C++, indentation is cosmetic — the compiler ignores it, {} defines blocks.
In Python, indentation IS the syntax. Wrong indentation = IndentationError.
# C++ programmer's instinct (WRONG in Python):
if score >= 90:
print("A") # IndentationError: expected an indented block
# Correct Python:
if score >= 90:
print("A") # 4 spaces (or 1 tab — never mix them!)
Rule: Use 4 spaces per indent level. Never mix tabs and spaces.
Every block-opening statement (if, elif, else, for, while, def, class, …)
ends with a : and the body must be indented one level further.
The file grades.py below has two bugs:
if blockFix both bugs so the script prints the correct letter grade for each score.
# Fixer Upper: Find and fix the two bugs in this script.
# Bug 1: Indentation error
# Bug 2: Type error in a print statement
scores = [95, 83, 71, 62, 55]
for score in scores:
if score >= 90:
print(f"Score {score}: A")
elif score >= 80:
print("Score " + score + ": B")
elif score >= 60:
print(f"Score {score}: C")
else:
print(f"Score {score}: F")
1. A student writes the following Python and gets IndentationError: expected an indented block:
for item in inventory:
print(item)
Python uses indentation to define blocks, not braces. Any statement inside a for, if, or def
must be indented by at least one consistent level (4 spaces is the convention).
Forgetting this is the most common mistake for students coming from C++ or Java.
2. In Python, what marks the start of a new indented block (instead of { in C++)?
Every block-opening statement (if, for, while, def, class, …) ends with a colon :.
The body of the block is then indented one level. There are no braces — the indentation alone
defines where the block ends. This is unlike C++, Java, or JavaScript.
3. A student accidentally mixes tabs and spaces for indentation in the same Python file. What will happen when they run it?
Mixing tabs and spaces is a syntax error in Python 3. Python raises TabError: inconsistent use
of tabs and spaces in indentation. Always use 4 spaces (the universal Python convention) and
configure your editor to insert spaces when you press Tab.
4. A teammate argues: “Python’s indentation-as-syntax is worse than C++’s braces because you can’t see block boundaries as clearly.” Another teammate replies: “It’s better because it forces everyone to format consistently.” Evaluate both claims. Which assessment is most accurate?
This is a genuine trade-off. Python’s indentation rule eliminates entire classes of formatting debates and ensures code looks like what it does. But it introduces risks when copy-pasting from web pages (which may mix tabs/spaces) or when editors silently convert between them. The key practice: configure your editor to insert 4 spaces for Tab.
Learning objective: After this step you will be able to implement Python functions with default parameters and contrast Python’s
defsyntax with C++ function signatures.
def vs C++ SignaturesIn C++ you must specify return types and parameter types:
int add(int a, int b) { return a + b; }
In Python you just use def. Types are optional (you can add them as type hints, but they are not enforced):
# SUB-GOAL: Define the function with its parameters
def add(a, b):
# SUB-GOAL: Compute and return the result
return a + b # No type declarations required
# With optional type hints (documents intent, not enforced at runtime):
def add(a: int, b: int) -> int:
return a + b
A parameter can have a default value, used when the caller omits that argument. Default parameters must come after required ones — the same rule as in C++.
def greet(name, greeting="Hello"):
print(f"{greeting}, {name}!")
greet("Alice") # → Hello, Alice! (uses default)
greet("Bob", "Welcome") # → Welcome, Bob! (overrides default)
Before writing any code, predict: what does mean([4, 8, 15, 16, 23, 42]) return? Do the mental math, write your answer down, then check it after implementing.
Complete two functions in functions.py:
mean(numbers) — returns the arithmetic mean.
Hint: sum() and len() are built-in Python functions — no import needed. Python ships dozens of these (builtins) that are always available, similar to how printf is always available in C via <stdio.h> — except builtins require no #include at all.label_score(score, threshold=50) — returns "pass" if score >= threshold, otherwise "fail".def mean(numbers):
"""Return the arithmetic mean of a list of numbers."""
# TODO: implement using sum() and len()
pass
def label_score(score, threshold=50):
"""Return 'pass' if score >= threshold, else 'fail'."""
# TODO: implement using an if/else
pass
# --- Quick self-test ---
data = [4, 8, 15, 16, 23, 42]
print(f"Data: {data}")
print(f"Mean: {mean(data)}")
print(f"Score 75: {label_score(75)}")
print(f"Score 30: {label_score(30)}")
print(f"Score 75 (threshold=80): {label_score(75, 80)}")
1. What is the output of the following code?
def describe(item, label="unknown"):
return f"{item} is {label}"
print(describe("gold", "rare"))
print(describe("rock"))
label="unknown" is a default parameter. When describe("rock") is called without
a second argument, label falls back to "unknown". When describe("gold", "rare") is called,
label is set to "rare".
2. A C++ programmer writes a Python function and is confused that it “doesn’t return anything”:
def double(x):
x * 2
print(double(5)) # prints None
In C++, forgetting return in a non-void function is undefined behavior — the compiler
may warn you, but the code might appear to work. In Python, the behavior is defined but
surprising: a function without return always returns None. You must explicitly
write return x * 2. This is a common mistake when switching languages.
3. What does mean([10, 20]) return if mean is defined as return sum(numbers) / len(numbers)?
In Python 3, / always performs float division: 30 / 2 → 15.0.
This differs from C++, where 30 / 2 → 15 (integer division).
Python uses // for integer (floor) division: 30 // 2 → 15.
4. (Spaced review — Step 1: Python Execution Model) A teammate is confused: “I wrote a Python file with a helper function and some test prints, but when I import it from another file, all the test prints run too.” What should they use to prevent this?
Python scripts run top-to-bottom (like Bash). When imported, all top-level code
executes. if __name__ == '__main__': is the standard Python idiom to separate
“run as script” code from “importable” code. C++ doesn’t have this problem because
#include only brings in declarations, not executable statements.
5. Arrange the lines to define a function that returns the larger of two numbers, with a default for b.
(arrange in order)
def max_of(a, b=0): if a >= b: return a else: return b return a, bThe function signature comes first with the default parameter b=0.
The if/else block must be indented inside the function.
The return statements must be indented inside their respective branches.
The distractor return a, b would return a tuple, not the max.
Learning objective: After this step you will be able to use Python’s
forloops,enumerate(), andrange(), and identify the key operator differences between Python and C++ (**vs^,/vs//).
forIf you have used modern C++ range-based for (for (auto& x : vec)), Python’s iteration model will feel familiar — Python just makes it the default. The key habit to build: reach for for x in collection first, not for i in range(len(...)).
Before diving into loops, one quick concept. Python can unpack a pair (or tuple) into separate variables in a single assignment:
pair = (0, "Alice")
i, name = pair # i = 0, name = "Alice"
This works anywhere Python assigns a value — including in for loops. You will see this pattern immediately below with enumerate().
for Loops: Iterating Over CollectionsC++ for loops typically count indices. Python loops iterate over items directly:
// C++: index-based
for (int i = 0; i < nums.size(); i++) { cout << nums[i]; }
# Python: item-based (preferred)
for num in nums:
print(num)
# Need the index too? enumerate() yields (index, item) pairs.
# Tuple unpacking splits each pair into two loop variables:
for i, num in enumerate(nums):
print(f"Index {i}: {num}")
range() — Generating Integer SequencesC++ counting loops translate directly to range() in Python:
# C++: for (int i = 0; i < 5; i++) { ... }
for i in range(5): # i = 0, 1, 2, 3, 4
# C++: for (int i = 1; i <= 5; i++) { ... }
for i in range(1, 6): # i = 1, 2, 3, 4, 5 (stop is *exclusive*, like C++'s <)
# C++: for (int i = 0; i < 10; i += 2) { ... }
for i in range(0, 10, 2): # i = 0, 2, 4, 6, 8 (optional step argument)
Key rule:
range(start, stop)always includesstartand excludesstop— exactly like C++’si < stop.
append, remove, clear)Unlike fixed-size C++ arrays, Python lists are dynamic (like std::vector). A few common operations you will use:
# C++: vec.push_back(5);
# Python:
result = [] # 1. Create an empty list
result.append(5) # 2. Add an item to the end
result.append(10) # result is now [5, 10]
# Removing items:
result.remove(5) # Removes the first occurrence of 5 (result is now [10])
# (Raises ValueError if 5 is not in the list)
result.clear() # Empties the entire list (result is now [])
# C++: vec.clear();
Trap 1: ** for exponentiation — not ^
Python uses ** for exponentiation. ^ is bitwise XOR — a common mistake from math notation or C++ (pow()):
2 ** 8 # 256 ✓ (two to the eighth power)
9 ** 0.5 # 3.0 ✓ (square root — works on floats)
2 ^ 8 # 10 ✗ (bitwise XOR — NOT exponentiation!)
Trap 2: / for float division — not integer division
In C++, 7 / 2 → 3 (integer division). In Python 3, / always gives a float:
7 / 2 # 3.5 (float division — different from C++!)
7 // 2 # 3 (integer/floor division — like C++'s /)
7 % 2 # 1 (modulo — same as C++)
Before implementing: what does running_total([1, 2, 3]) return? Trace through the loop by hand.
Complete loops.py:
running_total(numbers) — returns a new list where each element is the cumulative sum up to that index.
Example: running_total([1, 2, 3]) → [1, 3, 6]. Use a for loop.def running_total(numbers):
"""Return a list of cumulative sums.
Example: running_total([1, 2, 3]) == [1, 3, 6]
"""
result = []
total = 0
for n in numbers:
# TODO: add n to total, then append total to result
pass
return result
# --- Quick self-test ---
data = [4, 8, 15, 16, 23, 42]
print(f"Data: {data}")
print(f"Running total: {running_total(data)}")
# Verify your understanding of / vs //
print(f"7 / 2 = {7 / 2}") # What do you predict?
print(f"7 // 2 = {7 // 2}") # What do you predict?
1. Which of the following iterates over a list and gives both the index and the item?
enumerate(iterable) yields (index, value) pairs. Unpacking them into i, x gives you both
at once. This is the Pythonic replacement for C++’s index-based for (int i = 0; i < nums.size(); i++).
2. What does list(range(2, 8, 2)) evaluate to?
range(start, stop, step) generates numbers from start up to but not including stop,
counting by step. So range(2, 8, 2) → 2, 4, 6 (8 is excluded because stop is exclusive).
This matches C++’s for (int i = 2; i < 8; i += 2).
3. A C++ programmer expects 6 / 2 to return the integer 3 in Python. What actually happens?
In Python 3, / is always float division: 6 / 2 → 3.0.
For integer (floor) division like C++, use //: 7 // 2 → 3.
This is one of the most common negative-transfer traps from C++.
4. What are the values of a and b after this line?
a, b = (3, 7)
Python tuple unpacking splits the right-hand side into individual variables left-to-right:
a gets 3, b gets 7. This is the same mechanism that lets for i, x in enumerate(...):
split each (index, value) pair into two loop variables.
5. (Spaced review — Step 4: Functions)
What does this function return when called as compute(10)?
def compute(x, power=2):
return x ** power
power=2 is a default parameter, so compute(10) uses power=2.
10 ** 2 is 100 (the ** operator is exponentiation, not multiplication).
This combines two concepts: default parameters (Step 4) and the ** operator (this step).
Learning objective: After this step you will be able to write list comprehensions with filters, and compare them with equivalent for-loop code.
List comprehensions are one of Python’s most powerful idioms, but their compact syntax can feel cryptic at first. That is normal — everyone reads comprehensions slowly when they first encounter them. After a few exercises they become natural. Do not worry if you need to mentally “unpack” each one into a for-loop to understand it.
Challenge: Before reading further, try to build the list [1, 4, 9, 16, 25] (the squares of 1 through 5) in a single line of Python. You already know range() and ** from the previous step. Give it your best shot in the editor, then read on.
A list comprehension is a compact way to build a list. Once you recognise the pattern, you will see it everywhere in Python code:
# C++ equivalent:
# std::vector<int> squares;
# for (int i = 1; i <= 5; i++) squares.push_back(i * i);
# Python: one line — combines range() and **
squares = [x**2 for x in range(1, 6)] # [1, 4, 9, 16, 25]
The general form is:
[expression for variable in iterable]
Add an if at the end to keep only items that match:
evens = [x for x in range(10) if x % 2 == 0] # [0, 2, 4, 6, 8]
nums = [4, 8, 15, 16, 23, 42]
big = [x for x in nums if x > 20] # [23, 42]
# For-loop version:
result = []
for x in range(10):
if x % 2 == 0:
result.append(x)
# List comprehension — same result, one line:
result = [x for x in range(10) if x % 2 == 0]
List comprehensions are preferred when the transformation is simple — they are a recognised Python idiom that experienced readers understand at a glance.
Before writing any code, predict: what does [x**2 for x in range(4)] produce? Write your answer, then verify by typing it into the editor and clicking Run.
Complete two functions in listcomp.py:
above_average(numbers) — returns a list of numbers strictly greater than the mean.
Use a list comprehension with a condition.squares_up_to(n) — returns [1, 4, 9, ..., n**2].
Use range() starting at 1 and ** for exponentiation in a list comprehension.from functions import mean
def above_average(numbers):
"""Return a list of numbers strictly greater than the mean."""
avg = mean(numbers)
# Use a list comprehension with a condition
pass
def squares_up_to(n):
"""Return [1**2, 2**2, ..., n**2] using range() and **."""
pass
# --- Quick self-test ---
data = [4, 8, 15, 16, 23, 42]
print(f"Data: {data}")
print(f"Above average: {above_average(data)}")
print(f"Squares to 5: {squares_up_to(5)}")
1. Which list comprehension correctly produces only the odd numbers from 1 to 9?
The filter condition goes at the end: [expr for var in iterable if condition].
Option B has the if before for — that is a syntax error.
Option C calls odd(x) which is not a built-in Python function.
Option D uses () which creates a generator, not a list.
2. A student rewrites [x**2 for x in range(5)] as a for-loop and gets the same result.
Why would a Python programmer prefer the list comprehension?
List comprehensions are preferred for their readability and conciseness when the transformation is simple. They are a recognised Python beacon — experienced Python readers immediately understand their intent. Performance-wise, they are slightly faster than equivalent for-loops, but readability is the primary motivation.
3. Analyze this code. What does it produce, and could a list comprehension replace it?
result = []
for name in ["Alice", "Bob", "Charlie"]:
if len(name) > 3:
result.append(name.upper())
The loop filters names longer than 3 characters, then converts to uppercase.
This is exactly the pattern list comprehensions handle: [expr for var in iterable if condition].
The comprehension equivalent is [name.upper() for name in ["Alice", "Bob", "Charlie"] if len(name) > 3].
4. (Spaced review — Step 2: f-Strings) What does this expression produce?
items = [3, 1, 4]
print(f"Count: {len(items)}, Sum: {sum(items)}")
f-strings can contain any valid Python expression inside the braces, including
function calls like len(items) and sum(items). This is one of their great strengths
over C++’s printf — you get the full power of Python expressions inline.
Learning objective: After this step you will be able to read files using
with open()and explain how Python’s context manager pattern relates to C++’s RAII.
One of Python’s greatest strengths is its standard library — hundreds of modules ready to use with no installation:
| Module | What it does | C++ / Bash equivalent |
|---|---|---|
os, pathlib |
File paths, directory traversal | <filesystem> / ls, find |
sys |
Command-line args, exit codes | argc/argv / $@ |
json |
Parse/write JSON | Requires a library |
re |
Regular expressions | <regex> / grep |
csv |
Read/write CSV | Manual parsing |
subprocess |
Run shell commands | system() / direct Bash |
open() and withIn C++ you fopen, check for NULL, process, and fclose. Python’s with statement
handles the close automatically — even if an exception occurs:
# SUB-GOAL: Open the file (with ensures automatic close)
with open("data.txt") as f:
# SUB-GOAL: Process each line
for line in f:
# SUB-GOAL: Clean and display
print(line.strip()) # .strip() removes the trailing newline
The with statement is Python’s resource management idiom — just like RAII in C++,
the file is guaranteed to be closed when the block exits.
Before writing any code, look at data.txt and predict: how many total words does it contain? Then click Run on the starter code and see if your mental count matches.
Complete word_count.py. It should:
data.txt.split() splits on whitespace)Total words: <count>The file data.txt is already created for you.
# SUB-GOAL: Initialize the counter
total = 0
# SUB-GOAL: Open and read the file
with open("data.txt") as f:
for line in f:
words = line.split()
# SUB-GOAL: Accumulate the count
# TODO: add len(words) to total
pass
# SUB-GOAL: Report the result
# TODO: print "Total words: <count>"
pass
the quick brown fox jumps over the lazy dog
pack my box with five dozen big liquor jugs
how vexingly quick daft zebras jump
1. A student writes this code and asks why Python is better than C++ for this task:
with open("log.txt") as f:
errors = [line for line in f if "ERROR" in line]
This is Python’s scripting sweet spot: the with statement handles resource cleanup,
files are directly iterable (no manual buffering), and the list comprehension filters in one line.
The equivalent C++ code would need ifstream, a while(getline(...)) loop, string search,
and explicit close() — easily 20+ lines for robust code.
2. What does line.strip() do when reading lines from a file?
When you read a line from a file, it includes the trailing newline \n.
.strip() removes leading and trailing whitespace (spaces, tabs, \n, \r).
This is analogous to trimming a C++ std::string.
3. A teammate proposes reading a 2 GB log file with text = f.read() (loading the entire file into memory). Another proposes for line in f: (iterating line by line).
Evaluate both approaches. Which is better for a 2 GB file, and why?
f.read() loads the entire file into a single string in memory. For a 2 GB file, that’s
2 GB of RAM just for the string. for line in f: streams one line at a time — the memory
usage stays constant regardless of file size. This is the same principle as C++’s
getline() in a while loop vs reading the whole file with fstream::read().
4. (Spaced review — Step 3: Indentation) What is wrong with this code?
with open("data.txt") as f:
for line in f:
print(line)
The with statement opens an indented block (note the :). Everything inside
that block must be indented — including the for loop. This is the same
indentation rule from Step 3: a colon : starts a block that must be indented.
5. (Spaced review — Step 2: String Quotes)
A student writes this Python code and gets a SyntaxError. Why?
message = 'It's a beautiful day'
Unlike C++ where 'x' is a char and "x" is a string, Python uses '...' and "..." interchangeably
for strings. This flexibility lets you pick whichever quote style avoids conflicts with the string’s content.
Here, "It's a beautiful day" avoids the problem entirely — no escaping needed.
6. Arrange the lines to read a file and count total words. (arrange in order)
total = 0with open('data.txt') as f: for line in f: total += len(line.split())print(f'Words: {total}')f.close()Initialize the counter first, then open the file with with (no manual close() needed).
The for loop must be indented inside with, and the word-counting line inside for.
The print is outside both blocks (no indentation) because it runs after the file is processed.
The distractor f.close() is unnecessary — with handles closing automatically.
Learning objective: After this step you will be able to apply Python’s
re.findall(),re.search(), andre.sub()to extract, test, and transform text patterns.
In the RegEx tutorial you used patterns with grep -E and sed. Python’s built-in
re module gives you the same power inside a script — no subprocess needed:
| Shell | Python re equivalent |
|---|---|
grep -E 'pattern' file |
re.findall(r'pattern', text) |
grep -c 'pattern' file |
len(re.findall(r'pattern', text)) |
sed 's/old/new/g' file |
re.sub(r'old', 'new', text) |
| Test if a match exists | re.search(r'pattern', text) |
import re
text = "Error 404: page not found. Error 500: server crash."
# SUB-GOAL: Find the first match
m = re.search(r'Error \d+', text)
if m:
print(m.group()) # "Error 404"
# SUB-GOAL: Find all matches
codes = re.findall(r'\d+', text)
print(codes) # ['404', '500']
# SUB-GOAL: Replace all matches
clean = re.sub(r'Error \d+', 'ERR', text)
print(clean) # "ERR: page not found. ERR: server crash."
Raw strings (r'...') are the standard for regex patterns in Python —
they prevent Python from interpreting backslashes before re sees them.
Before implementing: what does re.findall(r'\d+', 'boot in 3... 2... 1...') return? Write your prediction, then check in the editor.
Complete log_parser.py. The log file is already loaded as a string for you.
re.findall() to collect all timestamps (HH:MM:SS pattern) and print the countre.findall() to collect every ERROR line and print the countre.sub() to redact all IP addresses with "x.x.x.x" and print the redacted logimport re
with open("log.txt") as f:
text = f.read()
# 1. Extract all timestamps (HH:MM:SS) and print count
# Hint: pattern is r'\d{2}:\d{2}:\d{2}'
# Expected output: Timestamps found: 6
# 2. Extract all ERROR lines and print count
# Hint: pattern is r'ERROR.*'
# Expected output: Errors: 2
# 3. Redact IPv4 addresses and print redacted log
# Hint: pattern is r'\d+\.\d+\.\d+\.\d+'
2024-01-15 09:23:11 INFO Server started on port 8080
2024-01-15 09:23:45 ERROR Connection failed: timeout
2024-01-15 09:24:02 INFO Request from 192.168.1.42
2024-01-15 09:24:18 WARNING Slow response: 2345ms
2024-01-15 09:24:33 ERROR Disk usage at 94%
2024-01-15 09:24:51 INFO Request from 10.0.0.7
1. What does re.findall(r'\d+', 'boot in 3... 2... 1...') return?
re.findall() returns a list of strings — one string per non-overlapping match.
\d+ matches one or more digit characters, so it finds '3', '2', and '1'
independently, returning ['3', '2', '1'].
2. You want to know whether a log line contains an IP address, but you don’t need to extract it. Which function is most appropriate?
re.search() is the idiomatic choice for a yes/no existence check:
if re.search(r'\d+\.\d+\.\d+\.\d+', line):
print("has IP")
It short-circuits on the first match and returns None if there is none —
exactly like grep -q in the shell.
3. Why are raw strings (r'\d+') preferred over regular strings ('\\d+') for regex patterns?
In a regular string, '\d' is just 'd' (Python drops the unrecognised escape).
In a raw string r'\d', the backslash is preserved literally, so re receives the
two-character sequence \d and interprets it as “any digit”. Using raw strings avoids
double-escaping ('\\d+') and matches the pattern you see in grep or sed.
4. Analyze this code. What does results contain after execution?
import re
text = "alice@example.com and bob@test.org"
results = re.findall(r'\w+@\w+\.\w+', text)
re.findall() returns a list of all non-overlapping matches. The pattern
\w+@\w+\.\w+ matches word characters around an @ and ., capturing both
email addresses. This combines \w+ (word chars), literal @, and escaped ..
5. (Spaced review — Step 6: List Comprehensions)
Which expression produces ['ERROR Connection failed: timeout', 'ERROR Disk usage at 94%']
from a variable lines containing all log lines as a list of strings?
A list comprehension with a filter: [line for line in lines if 'ERROR' in line].
This is the same pattern from Step 6 — [expr for var in iterable if condition].
Note: you could also use re.findall(r'ERROR.*', text) on the full text string
(as you just learned), but the list comprehension works on a list of lines.
Learning objective: After this step you will be able to implement command-line argument handling with
sys.argvand usesys.stderrfor error messages.
sys.argvimport sys
# SUB-GOAL: Parse command-line arguments
# sys.argv is a list: ["script.py", "arg1", "arg2", ...]
# C++ equivalent: argv[0], argv[1], ...
# SUB-GOAL: Validate arguments
if len(sys.argv) < 2:
print("Usage: python3 script.py <filename>", file=sys.stderr)
sys.exit(1) # Exit with non-zero code — just like in C++
# SUB-GOAL: Use the argument
filename = sys.argv[1]
sys.argv[0] is always the script name itself. Extra arguments start at index 1.
sys.exit(1) terminates the process with exit code 1 — the same convention as C’s exit(1).
stderr with print()By default print() writes to stdout. Error and diagnostic messages should go to stderr,
matching C++’s std::cerr and Bash’s >&2 redirect:
import sys
# C++: std::cout << "Done." << std::endl;
print("Done.") # → stdout
# C++: std::cerr << "Warning: file not found" << std::endl;
print("Warning: file not found", file=sys.stderr) # → stderr
Separating them lets callers redirect each stream independently:
python3 script.py > output.txt 2> errors.txt
Before writing any code, predict: if you run python3 script.py with no arguments, what is sys.argv? Is it an empty list, or does it contain something? Verify by adding print(sys.argv) to a test script.
Write safe_word_count.py from scratch. (Note: type data.txt into the “args: “ input box in the Output panel to add it to the program args to read this file). It should:
len(sys.argv) < 2), print Error: no filename given to sys.stderr and call sys.exit(1)filename = sys.argv[1] and print Reading: <filename> to sys.stderrTotal words: <count> to stdoutimport sys
# Write the complete script from scratch.
# Requirements:
# 1. Check sys.argv — error to stderr + exit(1) if no filename
# 2. Print "Reading: <filename>" to stderr
# 3. Count words, print "Total words: <count>" to stdout
the quick brown fox jumps over the lazy dog
pack my box with five dozen big liquor jugs
how vexingly quick daft zebras jump
1. A script is run with python3 myscript.py hello world. What is sys.argv[0]?
sys.argv[0] is always the script name itself. Arguments start at index 1:
sys.argv[1] is "hello", sys.argv[2] is "world".
This mirrors C/C++’s argv[0] convention.
2. Why should error messages be written to sys.stderr rather than printed normally?
When stdout and stderr are separate streams, users can capture output (> out.txt) and errors
(2> err.txt) independently. Mixing error messages into stdout breaks pipelines —
a downstream command would receive the error text as data. This is the same reason C++ uses
std::cerr and Bash scripts use echo "error" >&2.
3. A script should exit with code 1 and print an error if the user provides no arguments.
Evaluate these two approaches. Which is correct Python?
Approach A:
import sys
if len(sys.argv) == 1:
print("Error: no arguments", file=sys.stderr)
sys.exit(1)
import sys
if len(sys.argv) == 1:
print("Error: no arguments")
sys.exit(1)
Approach A is correct. Error messages should go to sys.stderr so that if the user pipes
stdout to another program or file, the error message doesn’t contaminate the data stream.
Approach B “works” but violates the Unix convention of separating output from diagnostics.
4. (Spaced review — Step 5: Loops) A student writes this code to print each word with its position number. What is wrong?
words = ["apple", "banana", "cherry"]
for i in words:
print(f"{i}: {words[i]}")
Python’s for i in words gives you the elements, not indices — this is
different from C++’s for (int i = 0; ...). Using words['apple'] causes
a TypeError. The Pythonic fix: for i, word in enumerate(words): gives both
the index and the value. This is a common negative transfer trap from C++.
5. (Spaced review — Step 7: File I/O)
What happens if you forget the with keyword and write f = open("data.txt") instead?
Without with, the file opens normally but there’s no automatic cleanup.
You must manually call f.close(). If an exception occurs between open() and
close(), the file handle leaks — exactly the same problem as forgetting fclose()
in C. The with statement guarantees cleanup via Python’s context manager protocol.
6. (Spaced review — Step 2: String Quotes)
In C++, 'A' is a char and "Alice" is a string — they are different types. What is the equivalent distinction in Python?
Python has no char type at all. 'A' and "A" are both str objects of length 1.
This means you can freely choose whichever quote style avoids escaping —
e.g., "It's easy" or '<div class="box">'. This is a key difference from C++
where mixing up 'x' and "x" is a compile error.
Learning objective: After this step you will be able to design and implement a complete Python script that integrates functions, file I/O, regex, list comprehensions, and command-line arguments.
You now have all the component skills. This capstone integrates them into a single real-world script — with no scaffolding. You decide how to structure the code.
Build log_analyzer.py — a command-line tool that analyzes a server log. (Note: type server.log into the “args: “ input box in the Output panel to add it to the program args to read this file).
Requirements:
sys.argv[1]. If missing, print an error to stderr and exit with code 1.re.findall() and a set)Log Analysis Report
===================
Total lines: 6
Unique IPs: 2
Errors: 2
Warnings: 1
Reading: <filename> to stderr at the start.Hints (only if you’re stuck):
count_by_level(), extract_ips())re.findall() to filter lineslen(set(...)) to count unique items{value:>8} right-align in 8 characters# Capstone: Build a complete log analyzer.
# No scaffolding — use everything you have learned.
import sys
import re
2024-01-15 09:23:11 INFO Server started on port 8080
2024-01-15 09:23:45 ERROR Connection failed: timeout
2024-01-15 09:24:02 INFO Request from 192.168.1.42
2024-01-15 09:24:18 WARNING Slow response: 2345ms
2024-01-15 09:24:33 ERROR Disk usage at 94%
2024-01-15 09:24:51 INFO Request from 10.0.0.7
1. You need to count the number of unique IP addresses in a log file.
You have a list of all IP addresses (with duplicates): ips = ['10.0.0.1', '10.0.0.2', '10.0.0.1'].
Which approach is most Pythonic?
set(ips) creates a set with only unique elements: {'10.0.0.1', '10.0.0.2'}.
len(...) gives the count. This is the Pythonic one-liner for “count unique items.”
Lists do not have a .unique() method (that’s pandas, not base Python).
2. Evaluate this code for a log analyzer. What is the bug?
import sys, re
filename = sys.argv[1]
with open(filename) as f:
text = f.read()
errors = re.findall(r'ERROR.*', text)
warnings = re.findall(r'WARNING.*', text)
ips = re.findall(r'\d+\.\d+\.\d+\.\d+', text)
print(f"Errors: {len(errors)}")
print(f"Warnings: {len(warnings)}")
print(f"Unique IPs: {len(ips)}")
Two bugs: (1) No argument validation — sys.argv[1] will raise IndexError if the user
runs the script without arguments. (2) len(ips) counts all IPs including duplicates;
len(set(ips)) would count unique IPs. Good code validates inputs and uses the right
data structure for the task.
3. Analyze the design of a log analyzer script. A student puts all logic in one long script with no functions. Another student breaks it into functions: parse_args(), read_log(), count_by_level(), extract_ips(), print_report().
Which approach is better, and why?
Breaking code into functions improves readability (the main flow reads like an outline), testability (each function can be tested independently), and reusability (functions can be imported by other scripts). This is the same principle as C++’s function decomposition, and it becomes even more important as scripts grow. Even for short scripts, named functions act as documentation.
4. (Spaced review — Step 5: Loops) You need to process a list of log lines and print each line’s number alongside it (starting from 1). Which approach is most Pythonic?
enumerate(lines, 1) is the Pythonic way: it yields (index, value) pairs without
manual indexing. The start=1 parameter avoids the +1 hack. Option A works but is
unpythonic. Option C is C-style manual counting. Option D is O(n²) and breaks on duplicates.
5. (Spaced review — Step 8: Regular Expressions)
A log analyzer needs to extract all timestamps matching the pattern 2024-01-15 14:30:22 from a log string. Which re call is correct?
re.findall() returns a list of ALL non-overlapping matches — exactly what you need
to extract every timestamp. re.search() finds only the first match. re.match() only
checks the start of the string. re.split() splits the string AT the pattern,
returning the parts between matches, not the matches themselves.