A Primer on Python's Advanced Usage
This interactive application serves as a handbook for experienced software engineers transitioning to or deepening their knowledge of Python. It moves beyond basic syntax to explore Python's internal architecture, memory model, performance characteristics, and advanced patterns. The goal is to provide a fast, consumable guide to help you write efficient, scalable, and robust Python code. Use the navigation on the left or click the links below to explore different topics:
Language Fundamentals
Quick reference for Python's core syntax, control flow, and object-oriented constructs.
Memory & Concurrency
Understand Python's memory management and strategies for handling concurrency, including the GIL.
Data Structures
Dive into Python's built-in data structures and their performance characteristics.
I/O Operations
Explore efficient file, network, and asynchronous I/O handling.
Architecture & Patterns
Learn about Clean Architecture principles and Pythonic design patterns.
Performance & Gotchas
Advanced optimization tips and common pitfalls to avoid.
Language Fundamentals: Quick Reference
This section provides a quick overview of essential Python syntax for defining basic programs, controlling flow, and structuring code with classes and "interfaces." It's designed as a rapid refresher for experienced developers.
Basic Program & Output
The simplest Python program prints "Hello, World!" to the console.
# Hello, World!
print("Hello, World!")
# Basic variable assignment and f-string formatting
name = "Alice"
age = 30
print(f"My name is {name} and I am {age} years old.")
Control Flow: If/Else & Loops
Python uses indentation for code blocks. `if/elif/else` for conditional logic, `for` for iterating over sequences, and `while` for conditional loops.
# Conditional Logic: if/elif/else
score = 85
if score >= 90:
print("Grade: A")
elif score >= 80:
print("Grade: B")
else:
print("Grade: C or lower")
# For Loop: Iterating over a list
fruits = ["apple", "banana", "cherry"]
for fruit in fruits:
print(f"I like {fruit}")
# For Loop: Using range
for i in range(3): # 0, 1, 2
print(f"Iteration {i}")
# While Loop
count = 0
while count < 3:
print(f"Count: {count}")
count += 1
Defining Classes & Methods
Classes define blueprints for objects, encapsulating data (attributes) and behavior (methods). `__init__` is the constructor.
class Dog:
# Class attribute
species = "Canis familiaris"
# Constructor method
def __init__(self, name, age):
self.name = name # Instance attribute
self.age = age # Instance attribute
# Instance method
def bark(self):
return f"{self.name} says Woof!"
# Another instance method
def get_age_in_dog_years(self):
return self.age * 7
# Creating objects (instances)
my_dog = Dog("Buddy", 3)
your_dog = Dog("Lucy", 5)
print(my_dog.name) # Output: Buddy
print(my_dog.bark()) # Output: Buddy says Woof!
print(your_dog.get_age_in_dog_years()) # Output: 35
print(Dog.species) # Accessing class attribute
Abstract Base Classes (ABCs) for "Interfaces"
Python doesn't have explicit interfaces like Java, but `abc` module allows defining abstract base classes (ABCs) to enforce method implementations in subclasses, mimicking interface behavior.
from abc import ABC, abstractmethod
class Vehicle(ABC): # Inherit from ABC
@abstractmethod
def start(self):
pass
@abstractmethod
def stop(self):
pass
class Car(Vehicle):
def start(self):
print("Car engine started.")
def stop(self):
print("Car engine stopped.")
class Bicycle(Vehicle):
def start(self):
print("Bicycle started pedaling.")
def stop(self):
print("Bicycle stopped pedaling.")
# Usage
my_car = Car()
my_car.start()
my_bicycle = Bicycle()
my_bicycle.stop()
# This would raise a TypeError: Can't instantiate abstract class
# abstract_vehicle = Vehicle()
Memory Management & Concurrency
This section dives into how Python handles memory and concurrency, which is critical for building high-performance applications. We'll explore automatic memory management via reference counting and garbage collection, and demystify the Global Interpreter Lock (GIL) and its implications for parallelism.
Automatic Memory Management
Reference Counting
The primary mechanism in CPython. Each object maintains a count of references to it. When a new reference to an object is created, its reference count increases; when a reference is removed or goes out of scope, the count decreases. Once an object's reference count drops to zero, it signifies that no part of the program is using it, and Python automatically deallocates the memory it occupied.
import sys
a = [] # Reference count of [] is 1
b = a # Reference count of [] is 2
c = b # Reference count of [] is 3
# print(sys.getrefcount(a)) # Output will be 4 (a, b, c, and the argument to getrefcount)
del a # Reference count of [] is 2
del b # Reference count of [] is 1
# When the last reference (c) is deleted, memory is reclaimed
del c
Generational Garbage Collector
This secondary mechanism exists specifically to detect and break reference cycles. It groups objects into three "generations" based on their age. Newer objects are checked more frequently, making the process efficient by focusing on objects most likely to become garbage.
import gc
class Node:
def __init__(self, value):
self.value = value
self.next = None
# Create a reference cycle
a = Node(1)
b = Node(2)
a.next = b
b.next = a # a and b now reference each other
# Delete external references
del a
del b
# At this point, a and b are still in memory due to the cycle.
# The generational garbage collector will eventually reclaim them.
# gc.collect() # Manually trigger collection
# The objects are now gone from memory.
The Global Interpreter Lock (GIL) & Concurrency Strategies
The GIL is a mutex in CPython that allows only one thread to execute Python bytecode at a time, even on multi-core processors. This simplifies memory management but limits true CPU-bound parallelism with threading. Choosing the right concurrency model is key.
Strategy | Best For | GIL Impact | Key Feature |
---|---|---|---|
Multi-threading | I/O-bound tasks (e.g., network requests, disk reads) | Limited by GIL for CPU tasks. Threads release GIL on I/O wait. | Shared memory, simplifies data sharing between threads. |
Multi-processing | CPU-bound tasks (e.g., heavy calculations, data processing) | Bypassed. Each process has its own interpreter and GIL. | Achieves true parallelism across multiple CPU cores. |
Asyncio | High-volume I/O-bound tasks (e.g., thousands of network connections) | Not applicable. Runs on a single thread with an event loop. | Highly efficient task switching with low overhead. |
Example: Multi-threading (I/O-bound)
import threading
import time
import requests
def download_site(url):
print(f"Starting download: {url}")
response = requests.get(url) # This is an I/O-bound operation
print(f"Finished download: {url}, size: {len(response.content)} bytes")
urls = [
"https://www.example.com",
"https://www.google.com",
"https://www.bing.com",
]
# threads = []
# for url in urls:
# thread = threading.Thread(target=download_site, args=(url,))
# threads.append(thread)
# thread.start()
# for thread in threads:
# thread.join() # Wait for all threads to complete
# print("All downloads complete with threading.")
Example: Multi-processing (CPU-bound)
import multiprocessing
import time
def cpu_bound_task(n):
print(f"Starting CPU-bound task for {n}...")
sum_val = 0
for i in range(n):
sum_val += i * i # CPU-intensive calculation
print(f"Finished CPU-bound task for {n}, sum: {sum_val}")
return sum_val
numbers = [10**7, 10**7, 10**7]
# if __name__ == '__main__': # Required for multiprocessing on Windows/macOS
# processes = []
# for num in numbers:
# process = multiprocessing.Process(target=cpu_bound_task, args=(num,))
# processes.append(process)
# process.start()
# for process in processes:
# process.join() # Wait for all processes to complete
# print("All CPU-bound tasks complete with multiprocessing.")
Example: Asyncio (High-concurrency I/O)
import asyncio
import time
async def async_fetch_data(url):
print(f"Async starting fetch: {url}")
await asyncio.sleep(1) # Simulate async I/O operation (e.g., network call)
print(f"Async finished fetch: {url}")
return f"Data from {url}"
async def main_async():
tasks = [
async_fetch_data("http://api.service.com/1"),
async_fetch_data("http://api.service.com/2"),
async_fetch_data("http://api.service.com/3"),
]
results = await asyncio.gather(*tasks) # Run coroutines concurrently
print("All async fetches complete:", results)
# To run:
# if __name__ == '__main__':
# asyncio.run(main_async())
Built-in Data Structures
Python's built-in data structures are highly optimized. Choosing the right one is fundamental to performance. This section provides an interactive comparison of their time complexities for common operations. Use the dropdown to see how they stack up.
Performance Comparison
List
Mutable, ordered, dynamic arrays. Excellent for stacks and general collections. Performance suffers for insertions/deletions in the middle (`O(n)`) due to element shifting.
my_list = [1, 2, 3]
my_list.append(4) # Add to end: O(1) amortized
my_list.insert(0, 0) # Add to beginning: O(n)
item = my_list[2] # Access by index: O(1)
# print(my_list) # [0, 1, 2, 3, 4]
Tuple
Immutable, ordered collections. Ideal for fixed data like coordinates or records. Their immutability allows them to be used as dictionary keys.
my_tuple = (10, 20, "hello")
x = my_tuple[0] # Access by index: O(1)
# my_tuple[0] = 5 # TypeError: 'tuple' object does not support item assignment
# print(my_tuple) # (10, 20, 'hello')
Set
Mutable, unordered collections of unique elements, based on hash tables. Blazing fast (`O(1)`) for membership testing and removing duplicates.
my_set = {1, 2, 3, 2} # Duplicates are ignored
my_set.add(4) # Add element: O(1) average
is_present = 3 in my_set # Membership test: O(1) average
# print(my_set) # {1, 2, 3, 4} (order not guaranteed)
Dictionary
Mutable key-value stores, also based on hash tables. Extremely fast (`O(1)`) for lookups, insertions, and deletions by key. The workhorse of Python.
my_dict = {"name": "Alice", "age": 30}
my_dict["city"] = "New York" # Add/Update: O(1) average
name = my_dict["name"] # Lookup by key: O(1) average
# print(my_dict) # {'name': 'Alice', 'age': 30, 'city': 'New York'}
I/O Operations
Efficient I/O is crucial for performance. Python offers robust tools for handling files, networks, and asynchronous operations. The key is to choose the right tool and use it idiomatically to prevent resource leaks and maximize throughput.
File I/O Best Practices
Always use the `with` statement for file operations. It guarantees that the file is closed correctly, even if errors occur, preventing resource leaks. For large files, use generators to iterate line-by-line instead of loading the entire file into memory.
# Good: Memory-efficient processing of a large file
def process_large_file(path):
with open(path, 'r') as f:
for line in f:
yield line.strip().upper()
# Bad: Can cause MemoryError on very large files
def process_large_file_badly(path):
with open(path, 'r') as f:
lines = f.readlines() # Loads everything into memory
return [line.strip().upper() for line in lines]
Network I/O: Sockets
The `socket` module provides low-level network access. For handling multiple clients concurrently without threads, use non-blocking sockets with the `select` module. This allows a single thread to monitor multiple sockets for I/O readiness, forming the basis of many high-performance servers.
import socket
import select
import time
# Client example for testing the servers
def socket_client(host, port, message):
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.connect((host, port))
s.sendall(message.encode())
data = s.recv(1024)
print(f"Received from {host}:{port}: {data.decode()}")
# Basic Blocking Socket Server Example
def blocking_server():
HOST = '127.0.0.1'
PORT = 65432
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.bind((HOST, PORT))
s.listen()
print(f"Blocking server listening on {HOST}:{PORT}")
conn, addr = s.accept() # Blocks until a client connects
with conn:
print(f"Blocking server connected by {addr}")
while True:
data = conn.recv(1024) # Blocks until data is received
if not data:
break
conn.sendall(data.upper()) # Echo back uppercase
print(f"Blocking server connection closed by {addr}")
# Basic Non-Blocking Socket Server with Select Example
def non_blocking_server():
HOST = '127.0.0.1'
PORT = 65433
server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_socket.setblocking(False) # Set to non-blocking
server_socket.bind((HOST, PORT))
server_socket.listen(5)
inputs = [server_socket]
print(f"Non-blocking server listening on {HOST}:{PORT}")
while True: # Keep running indefinitely
readable, _, _ = select.select(inputs, [], [], 1.0) # 1.0s timeout
if not readable:
continue
for sock in readable:
if sock is server_socket:
try:
conn, addr = sock.accept()
conn.setblocking(False)
inputs.append(conn)
print(f"Non-blocking server accepted connection from {addr}")
except BlockingIOError:
pass # No incoming connection ready yet
else:
try:
data = sock.recv(1024)
if data:
sock.sendall(data.upper())
else:
print(f"Non-blocking server closing connection from {sock.getpeername()}")
inputs.remove(sock)
sock.close()
except BlockingIOError:
pass # No data ready yet
except ConnectionResetError:
print(f"Non-blocking server connection reset by peer {sock.getpeername()}")
inputs.remove(sock)
sock.close()
# To run these examples (e.g., in separate terminals or using threading):
# import threading
# threading.Thread(target=blocking_server).start()
# threading.Thread(target=non_blocking_server).start()
# time.sleep(1) # Give servers time to start
# threading.Thread(target=socket_client, args=('127.0.0.1', 65432, 'hello from blocking client')).start()
# threading.Thread(target=socket_client, args=('127.0.0.1', 65433, 'hello from non-blocking client')).start()
Asynchronous I/O: `asyncio`
`asyncio` is the modern standard for high-performance I/O-bound applications in Python. Using `async/await` syntax, it allows a single thread to manage thousands of concurrent connections efficiently via an event loop. It's ideal for web servers, database clients, and API gateways.
import asyncio
import time
async def async_fetch_data(url):
print(f"Async starting fetch: {url}")
await asyncio.sleep(1) # Simulate async I/O operation (e.g., network call)
print(f"Async finished fetch: {url}")
return f"Data from {url}"
async def main_async():
tasks = [
async_fetch_data("http://api.service.com/1"),
async_fetch_data("http://api.service.com/2"),
async_fetch_data("http://api.service.com/3"),
]
results = await asyncio.gather(*tasks) # Run coroutines concurrently
print("All async fetches complete:", results)
# To run:
# if __name__ == '__main__':
# asyncio.run(main_async())
Clean Architecture & Design Patterns
Writing maintainable and scalable code goes beyond syntax. This section visualizes the principles of Clean Architecture and highlights key Pythonic design patterns that leverage the language's dynamic features for elegant solutions.
Clean Architecture Layers
Clean Architecture separates concerns into concentric layers with a strict dependency rule: dependencies only point inwards. This isolates your core business logic (Entities) from frameworks, databases, and UI, making the application adaptable, testable, and easier to maintain. Hover over the layers to learn more.
Hover over a layer to see its description.
Practical Clean Architecture Examples
Here are conceptual Python code snippets illustrating how each layer of Clean Architecture might be implemented. This demonstrates the separation of concerns.
1. Entities (Domain Layer)
Pure business objects, independent of any framework or database.
from dataclasses import dataclass
from datetime import datetime
@dataclass
class User:
id: str
name: str
email: str
created_at: datetime = datetime.now()
def is_active(self) -> bool:
# Simple domain logic
return True # Placeholder for more complex logic
2. Use Cases (Application Layer)
Application-specific business rules. Orchestrates entities and interacts with abstract interfaces (e.g., repositories).
from abc import ABC, abstractmethod
from typing import List, Optional
import uuid
# Abstract interface for user storage (defined in domain/application layer)
class UserRepository(ABC):
@abstractmethod
def get_by_id(self, user_id: str) -> Optional[User]:
pass
@abstractmethod
def save(self, user: User) -> None:
pass
@abstractmethod
def get_all(self) -> List[User]:
pass
class CreateUserUseCase:
def __init__(self, user_repo: UserRepository):
self.user_repo = user_repo
def execute(self, name: str, email: str) -> User:
# Application-specific business logic
if not "@" in email:
raise ValueError("Invalid email format")
user_id = str(uuid.uuid4())
new_user = User(id=user_id, name=name, email=email)
self.user_repo.save(new_user)
return new_user
class GetUsersUseCase:
def __init__(self, user_repo: UserRepository):
self.user_repo = user_repo
def execute(self) -> List[User]:
return self.user_repo.get_all()
3. Adapters (Infrastructure Layer)
Concrete implementations of interfaces defined in the Use Cases layer. Connects to external services like databases.
# Example: In-memory implementation of UserRepository
class InMemoryUserRepository(UserRepository):
def __init__(self):
self._users = {} # Simulates a database
def get_by_id(self, user_id: str) -> Optional[User]:
return self._users.get(user_id)
def save(self, user: User) -> None:
self._users[user.id] = user
def get_all(self) -> List[User]:
return list(self._users.values())
# Example: Conceptual SQLAlchemy implementation (requires SQLAlchemy setup)
# from sqlalchemy import create_engine, Column, String, DateTime
# from sqlalchemy.orm import sessionmaker, declarative_base
# Base = declarative_base()
# class UserTable(Base):
# __tablename__ = 'users'
# id = Column(String, primary_key=True)
# name = Column(String)
# email = Column(String)
# created_at = Column(DateTime)
# class SQLAlchemyUserRepository(UserRepository):
# def __init__(self, session_factory):
# self.session_factory = session_factory
# def get_by_id(self, user_id: str) -> Optional[User]:
# with self.session_factory() as session:
# user_data = session.query(UserTable).filter_by(id=user_id).first()
# return User(**user_data.__dict__) if user_data else None
# def save(self, user: User) -> None:
# with self.session_factory() as session:
# user_table = UserTable(**user.__dict__)
# session.add(user_table)
# session.commit()
# def get_all(self) -> List[User]:
# with self.session_factory() as session:
# return [User(**u.__dict__) for u in session.query(UserTable).all()]
4. Frameworks & Drivers (External Layer)
The outermost layer. Uses the Use Cases to drive the application, without the Use Cases knowing about the framework.
# Example: Flask Web Application (requires Flask)
# from flask import Flask, request, jsonify
# from dependency_injector.containers import DeclarativeContainer
# from dependency_injector.providers import Singleton, Factory
# # Simple Dependency Injection Container (conceptual)
# class Container(DeclarativeContainer):
# user_repository = Singleton(InMemoryUserRepository) # Or SQLAlchemyUserRepository
# create_user_use_case = Factory(CreateUserUseCase, user_repo=user_repository)
# get_users_use_case = Factory(GetUsersUseCase, user_repo=user_repository)
# app = Flask(__name__)
# container = Container()
# @app.route('/users', methods=['POST'])
# def create_user_endpoint():
# data = request.get_json()
# name = data.get('name')
# email = data.get('email')
# try:
# user = container.create_user_use_case().execute(name, email)
# return jsonify({"id": user.id, "name": user.name, "email": user.email}), 201
# except ValueError as e:
# return jsonify({"error": str(e)}), 400
# @app.route('/users', methods=['GET'])
# def get_users_endpoint():
# users = container.get_users_use_case().execute()
# return jsonify([{"id": u.id, "name": u.name, "email": u.email} for u in users]), 200
# # To run:
# # if __name__ == '__main__':
# # app.run(debug=True)
6.4. Deployment Strategies (Briefcase)
Briefcase is a powerful tool that allows you to package Python applications for native distribution across multiple platforms, including desktop (Windows, macOS, Linux), mobile (iOS, Android), and even the web (via WebAssembly). It handles the complexities of creating platform-specific project structures, bundling dependencies, and generating native installers or app bundles.
Key Briefcase Commands
- `briefcase new`: Initializes a new Briefcase project structure.
- `briefcase create
`: Creates the necessary platform-specific project scaffolding (e.g., Xcode project for iOS, Android Studio project for Android, native project for desktop). - `briefcase build
`: Builds the native application artifact for the specified platform (e.g., `.app` for macOS, `.exe` for Windows, `.apk` for Android). - `briefcase run
`: Runs the built application on the target platform or emulator. - `briefcase dev`: Runs the application in a development environment without full packaging.
Cross-Platform Deployment with Briefcase
Briefcase abstracts away much of the platform-specific tooling, allowing developers to focus on their Python code. It integrates with underlying platform SDKs (like Xcode, Android SDK) to produce native applications.
- Desktop Applications: Packages your Python app into native executables and installers for Windows (MSI), macOS (DMG), and Linux (AppImage, DEB, RPM). It bundles the Python interpreter and all dependencies.
- Mobile Applications: Generates Xcode projects for iOS and Android Studio projects for Android. Your Python code runs within a native wrapper, leveraging platform capabilities.
- Web Applications (via WebAssembly): Briefcase can also target web browsers by compiling Python to WebAssembly (WASM), often using tools like PyScript or similar technologies. This allows your Python application to run directly in a web browser, enabling rich client-side logic without server-side Python.
Example: Packaging for Web (Conceptual)
While the exact implementation details depend on the web backend (e.g., PyScript), the general flow involves Briefcase preparing your Python code and its dependencies to be compiled to WebAssembly and served as static web assets.
# Initialize a new Briefcase project (if not already done)
# briefcase new my-web-app
# Navigate into your project directory
# cd my-web-app
# Create the web project structure (this might involve a specific template for web)
# briefcase create web
# Build the web application (compiles Python to WASM, bundles assets)
# briefcase build web
# Run the web application (starts a local web server to serve the assets)
# briefcase run web
# This would typically generate a 'web' directory with HTML, JS, and WASM files
# that can be deployed to any static web host.
Performance & Common "Gotchas"
Even experienced developers can be caught by Python's peculiarities. This section covers key performance optimization strategies and interactive cards explaining common pitfalls. Click on a card to reveal the details and best practices.
Key Optimization Strategies
- Profile First: Don't guess. Use tools like `cProfile` and `line_profiler` to find actual bottlenecks before optimizing.
- Use Built-ins: Python's built-in functions (`sum`, `len`) and standard library modules are often C-optimized and faster than manual implementations.
- Embrace Generators: Use generators (`yield`) and generator expressions for large datasets to drastically reduce memory consumption.
- Cache with `lru_cache`: For expensive functions with repeated calls, cache previously computed results using `@functools.lru_cache`.
Example: Profiling with `cProfile`
import cProfile
import time
def expensive_function():
time.sleep(0.1)
sum(range(10**6))
def another_function():
time.sleep(0.05)
[x*x for x in range(10**5)]
def main_program():
for _ in range(5):
expensive_function()
another_function()
# To run the profiler:
# cProfile.run('main_program()')
# This will print a detailed report of function calls and times.
Example: Generators for Memory Efficiency
import sys
# List comprehension (loads all into memory)
my_list = [i * 2 for i in range(1000000)]
# print(f"List size: {sys.getsizeof(my_list) / (1024*1024):.2f} MB")
# Generator expression (produces values on demand)
my_generator = (i * 2 for i in range(1000000))
# print(f"Generator size: {sys.getsizeof(my_generator):.2f} bytes") # Much smaller!
# You can iterate over a generator
# for value in my_generator:
# pass # Process value here
Example: Caching with `functools.lru_cache`
import functools
import time
@functools.lru_cache(maxsize=None) # maxsize=None means unlimited cache
def fibonacci(n):
if n <= 1:
return n
time.sleep(0.001) # Simulate expensive computation
return fibonacci(n-1) + fibonacci(n-2)
# First call is slow
# start_time = time.time()
# fibonacci(30)
# print(f"First call: {time.time() - start_time:.4f} seconds")
# Subsequent calls with same input are fast (cached)
# start_time = time.time()
# fibonacci(30)
# print(f"Second call: {time.time() - start_time:.4f} seconds")
Common "Gotchas"
Mutable Default Arguments
The most famous Python pitfall.
The Problem
A function's default arguments are created ONCE, when the function is defined. If a default is mutable (like a list), it's shared across all calls, leading to unexpected behavior.
# Bad:
def add_item_bad(item, items=[]):
items.append(item)
return items
# print(add_item_bad(1)) # [1]
# print(add_item_bad(2)) # [1, 2] - Oops!
# Correct:
def add_item_good(item, items=None):
if items is None:
items = []
items.append(item)
return items
# print(add_item_good(1)) # [1]
# print(add_item_good(2)) # [2] - Correct!
Lambda Closures in Loops
A subtle scoping issue.
The Problem
Lambdas defined in a loop don't capture the loop variable's value at each iteration. They capture the variable itself, so they all see its final value after the loop finishes.
# Bad:
funcs_bad = []
for i in range(5):
funcs_bad.append(lambda: i*2)
# print(funcs_bad[0]()) # All print 8 (4*2)
# Correct: Capture value with default arg
funcs_good = [lambda i=i: i*2 for i in range(5)]
# print(funcs_good[0]()) # 0
# print(funcs_good[3]()) # 6
Identity vs. Equality
`is` vs `==`
The Problem
`==` checks if values are equal. `is` checks if two variables point to the exact same object in memory. CPython caches small integers (-5 to 256) and some strings, which can make `is` behave unexpectedly.
a = 257
b = 257
# print(a == b) # True (values are equal)
# print(a is b) # False (usually, different objects)
x = 10
y = 10
# print(x is y) # True (cached small integer)
list1 = [1, 2]
list2 = [1, 2]
# print(list1 == list2) # True
# print(list1 is list2) # False (different list objects)
Shallow vs. Deep Copies
Beware of nested mutable objects.
The Problem
A shallow copy creates a new collection but populates it with references to the original's elements. If elements are mutable, changes in one copy affect the other. A deep copy recursively duplicates all elements.
import copy
original = [1, [2, 3], 4]
# Shallow copy
shallow_copy = list(original) # or original[:]
shallow_copy[1].append(5)
# print(original) # [1, [2, 3, 5], 4] - Oops!
# Deep copy
deep_copy = copy.deepcopy(original)
deep_copy[1].append(6)
# print(original) # [1, [2, 3, 5], 4] - Original unchanged!
Modifying List While Iterating
Leads to skipped elements or errors.
The Problem
Modifying a list (adding/removing elements) while iterating over it can lead to unexpected behavior, such as skipping elements or `IndexError`.
numbers = [1, 2, 3, 4]
# Bad:
# for num in numbers:
# if num % 2 == 0:
# numbers.remove(num)
# print(numbers) # [1, 3] - Skipped 4!
# Correct: Iterate over a copy or build new list
numbers = [1, 2, 3, 4]
new_numbers = [num for num in numbers if num % 2 != 0]
# print(new_numbers) # [1, 3]
Global vs. Local Variables
Shadowing and unexpected assignments.
The Problem
If you assign to a variable inside a function, it becomes local unless explicitly declared `global` or `nonlocal`. This can lead to shadowing global variables or `UnboundLocalError`.
x = 10 # Global variable
def func_bad():
x = 5 # This creates a NEW local 'x'
# print(x) # 5
def func_good():
global x # Refer to the global 'x'
x = 5
# print(x) # 5
# func_bad()
# print(x) # 10 (global x is unchanged)
# func_good()
# print(x) # 5 (global x is changed)
Late Binding Closures (Non-Lambda)
Similar to lambda, but with regular functions.
The Problem
When creating functions in a loop that refer to the loop variable, they "close over" the variable itself, not its value at the time of definition. All functions will use the variable's final value.
# Bad:
def create_multipliers_bad():
multipliers = []
for i in range(3):
def multiplier():
return i * 10
multipliers.append(multiplier)
return multipliers
# funcs = create_multipliers_bad()
# print(funcs[0]()) # All print 20 (2*10)
# Correct: Pass as default argument
def create_multipliers_good():
multipliers = []
for i in range(3):
def multiplier(j=i): # Capture 'i' as default 'j'
return j * 10
multipliers.append(multiplier)
return multipliers
# funcs = create_multipliers_good()
# print(funcs[0]()) # 0
# print(funcs[1]()) # 10
`is` with Strings and Interning
String interning can be tricky.
The Problem
CPython "interns" (caches) short strings and strings that look like identifiers to save memory. This means `is` might unexpectedly return `True` for two identical string literals.
s1 = "hello"
s2 = "hello"
# print(s1 is s2) # True (interned)
s3 = "hello world"
s4 = "hello world"
# print(s3 is s4) # Often False (not interned by default)
s5 = "a" * 50
s6 = "a" * 50
# print(s5 is s6) # Almost always False (too long to intern)
# Always use == for string content comparison.
Unpacking Errors
`ValueError` on unpacking.
The Problem
When unpacking sequences (like tuples or lists) into variables, the number of variables must exactly match the number of elements in the sequence, or a `ValueError` occurs.
data = (1, 2, 3)
# Correct:
a, b, c = data
# print(a, b, c) # 1 2 3
# Bad: Too few values to unpack
# x, y = data # ValueError: too many values to unpack
# Bad: Too many values to unpack
# p, q, r, s = data # ValueError: not enough values to unpack
# Use * for flexible unpacking (Python 3+)
first, *rest, last = (1, 2, 3, 4, 5)
# print(first, rest, last) # 1 [2, 3, 4] 5