CODE FARM
Galaxy background

"The roots of education are bitter, but the fruit is sweet."

- Aristotle, Ancient Greek Philosopher

Python Learning Notes

> import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

1. Running

  • Using the interactive interpreter (shell)

    $ python3 -q
    >>> 2+2
    4
    >>> quit()

    IPython provides an enhanced text-based REPL with completion and introspection, whereas JupyterLab is a web-based environment that executes Python via an IPython kernel (ipykernel).

    $ pip install ipython
    $ ipython
    In [1]: 2+2
    Out[1]: 4
    
    In [2]: len?
    $ pip install jupyterlab
    $ jupyter lab
  • Using python files

    print(2+2)
    $ python3 test.py
    4
  • Using python files with shebang

    In computing, a shebang is the character sequence consisting of the characters number sign and exclamation mark (#!) at the beginning of a script. It is also called sharp-exclamation, sha-bang, hashbang, pound-bang, or hash-pling.

     — From Wikipedia, the free encyclopedia

    #!/usr/bin/env python3
    print(2+2)
    $ ./test.py
    4
  • Executing modules as scripts

    In Python, python -m executes installed modules as scripts directly from the command line, removing the need for a separate .py file.

    $ python3 -m venv --help
    usage: venv [-h] [--system-site-packages] [--symlinks | --copies] [--clear] [--upgrade] [--without-pip]
                [--prompt PROMPT] [--upgrade-deps]
                ENV_DIR [ENV_DIR ...]
    
    Creates virtual Python environments in one or more target directories.
    . . .
    $ python3 -m webbrowser https://www.google.com

2. Indentations, comments, and multi-line expressions

  • Python uses four-space indentation (PEP-8) instead of curly brackets or keywords to delimit code blocks.

    • Don’t mix tabs and spaces to avoid messesing up the indentation count.

    • Guido van Rossum designed Python to use indentation for structure, avoiding the parentheses and braces common in other languages.

      disaster = True
      if disaster:
          print("Woe!")
      else:
          print("Whee!")
    • A compound statement body can optionally follow the colon on the same line.

      if x > y: print(x)  # Simple statement on header line
  • Line breaks generally terminate statements automatically.

    x = 1  # x = 1;
  • Multiple statements may be placed on one line using semicolon separators.

    a = 1; b = 2; print(a + b) # Three statements on one line
  • Python expressions can span multiple lines when enclosed within delimiters like (), [], or {}.

    • In pre-3.0 Python, a trailing backslash (\) was required for line continuation, a practice now obsolete in modern versions.

      # Example in older Python (error-prone, not recommended)
      long_expression = (1 + 2 + 3 + 4 + 5 + \
                        6 + 7 + 8 + 9 + 10)
    • In modern Python, favor delimiters like (), [], or {} over the backslash (\) to improve readability and structure in multi-line expressions.

      # Parentheses for complex calculations
      long_calculation = (a * b +
                          c) * (d /
                                e - f)
      
      # Brackets for multi-line lists or data structures
      data = [
          "item1",
          "item2 with a longer description",
          "item3"
      ]
      
      # Braces for multi-line dictionaries
      person_info = {
          "name": "Alice",
          "age": 30,
          "hobbies": ["reading", "hiking"]
      }
  • A comment is marked by the # character (hash, sharp, pound, or octothorpe) and extends to the end of the line.

    # 60 sec/min * 60 min/hr * 24 hr/day
    seconds_per_day = 86400
    seconds_per_day = 86400 # 60 sec/min * 60 min/hr * 24 hr/day
    # Python does NOT
    # have a multiline comment.
    print("No comment: quotes make the # harmless.")

3. Keywords

False               class               from                or
None                continue            global              pass
True                def                 if                  raise
and                 del                 import              return
as                  elif                in                  try
assert              else                is                  while
async               except              lambda              with
await               finally             nonlocal            yield
break               for                 not

4. Types

  • Python is dynamically and strongly typed with built-in garbage collection.

    • A dynamically typed language determines a variable type at runtime rather than requiring an explicit declaration during definition.

      age = 30  # age is an integer (no need to declare the data type explicitly)
      age = "thirty"  # age is now a string
    • A statically typed language requires a variable type to be declared at compile time to ensure type compatibility.

      // In Java, declare the type of a variable before assigning a value.
      int age = 30;  // age is declared as an integer
      age = "thirty";  // error: incompatible types: String cannot be converted to int
    • A strongly typed language requires strict type safety by preventing operations between incompatible data types.

      Static typing dictates when a type is verified (compile time), whereas strong typing dictates how strictly that type is enforced (both compile time and runtime).
    • In Python, every data type is an object whose associated methods and attributes are verified for compatibility at runtime.

      # Python supports type inference on assignment.
      name = "Alice"  # String inferred
      name + 10       # TypeError: mixed types (Strongly typed)

      In computer programming, duck typing is an application of the duck test—"If it walks like a duck and it quacks like a duck, then it must be a duck"—to determine whether an object can be used for a particular purpose.

       — From Wikipedia, the free encyclopedia

      # str, tuple, list, bytes, bytearray
      # dict, set, frozenset
      # int, bool, float, complex, decimal, fraction
      # function, generator, class, method
      # module, NoneType, Ellipsis, type, code, frame, traceback
      bool    # True, False
      int     # 47, 25000, 25_000, 0b0100_0000, 0o100, 0x40, sys.maxsize, - sys.maxsize - 1
      float   # 3.14, 2.7e5, float('inf'), float('-inf'), float('nan')
      complex # 3j, 5 + 9j
      
      str # unicode: 'alas', "alack", '''a verse attack'''
      
      tuple # (2, 4, 8)
      list  # ['Winken', 'Blinken', 'Nod']
      
      bytes # b'ab\xff'
      bytearray # bytearray(...)
      
      dict      # {}, {'game': 'bingo', 'dog': 'dingo', 'drummer': 'Ringo'}
      set       # set([3, 5, 7])
      frozenset # frozenset(['Elsa', 'Otto'])
      
      # import decimal, fractions
      decimal.Decimal(1/3)     # Decimal('0.333333333333333314829616256247390992939472198486328125')
      fractions.Fraction(1, 3) # Fraction(1, 3)
      # int(), float(), bin(), oct(), hex(), chr(), and ord()
      int(True), int(False)                                # (1, 0)
      int(98.6), int(1.0e4)                                # (98, 10_000)
      int('99'), int('-23'), int('+12'), int('1_000_000')  # (99, -23, 12, 1_000_000)
      
      int('10', 2), 'binary', int('10', 8), 'octal', int('10', 16), 'hexadecimal', int('10', 22), 'chesterdigital'
      # (2, 'binary', 8, 'octal', 16, 'hexadecimal', 22, 'chesterdigital')
      
      float(True), float(False)                     # (1.0, 0.0)
      float('98.6'), float('-1.5'), float('1.0e4')  # (98.6, -1.5, 10_000.0)
      
      bin(65), oct(65), hex(65)  # ('0b1000001', '0o101', '0x41')
      chr(65), ord('A')          # ('A', 65)
      
      False + 0, True + 0, False + 0., True + 0.  # (0, 1, 0.0, 1.0)
      True + True, True + False, False + False    # (2, 1, 0)

4.1. type hints

  • In Python, type hints (annotations) provide optional metadata to specify expected data types for variables, parameters, and return values.

    from typing import Annotated, Any
    
    # primitives & unions (3.10+)
    age: int = 30
    pi: float | None = 3.14  # nullable | optional
    is_active: bool = True
    raw: bytes = b"\x01\x02"
    flex: Any = "can be anything"
    
    # generics (3.9+): list, dict, tuple, set
    def process(
        ids: list[int],
        data: dict[str, float],
        point: tuple[int, int, str],
        unique: set[bytes]
    ) -> str: ...
    
    # classes & metadata
    class User:
        def __init__(self, name: str): self.name = name
    
    def register(
        user: User,
        note: Annotated[str, "Max 20 chars"]
    ) -> bool:
        return True

    any is a built-in function for truthiness checks, whereas typing.Any is the type hint for unconstrained values.

    from typing import Any
    
    x: any = 10                               # function object
    y: Any = 10                               # type hint
    
    print(f"x hint: {__annotations__['x']}")  # <built-in function any>
    print(f"y hint: {__annotations__['y']}")  # typing.Any

4.2. assignments

  • In Python, variables must be assigned to an object before being referenced, otherwise, a NameError is raised.

    # assignment statements
    spam = 'Spam'                   # simple assignment
    spam, ham = 'yum', 'YUM'        # tuple unpacking
    [spam, ham] = ['yum', 'YUM']    # list unpacking
    a, b, c, d = 'spam'             # sequence unpacking (each character to a variable)
    a, *b = 'spam'                  # extended sequence unpacking (a='s', b=['p', 'a', 'm'])
    a, *_ = 'spam'                  # use the underscore (_) for unwanted variables
    spam = ham = 'lunch'            # multiple assignment (both variables refer to the same object)
    spams += 42                     # augmented assignment (equivalent to spams = spams + 42)
    spam = ham = eggs = 0           # multiple variable names can be assigned a value at the same time
    # swap variable names
    a, b = 1, 2
    b, a = a, b  # 2, 1

4.3. bindings

  • In python, variables are labels referencing memory objects (PyObjects) defined by a type, unique ID, value, and reference count.

    import sys
    
    # 1. Type & ID: Exploring the PyObject
    val = 5.20
    print(type(val))  # <class 'float'>
    print(id(val))    # Unique memory address (ID)
    
    # 2. Reference Counting: Labels on a PyObject
    x = y = z = 1000.1
    base_count = sys.getrefcount(x)
    
    del y
    print(sys.getrefcount(x) == base_count - 1)  # True: one label removed
    
    del z
    print(sys.getrefcount(x) == base_count - 2)  # True: only 'x' remains

4.4. identities

  • A class is a blueprint for creating objects; in Python, "class" and "type" are synonymous.

    type(7)             # <class 'int'>
    type(7) == int      # True
    
    isinstance(7, int)           # True
    isinstance(type(int), type)  # True
    
    # 1. instances vs. blueprints
    print(type(7) == int)          # True
    print(isinstance(7, int))      # True
    
    # 2. bool is a subclass of int
    print(issubclass(bool, int))   # True
    print(isinstance(True, int))   # True (True is an int instance)
    
    # 3. meta identity
    print(isinstance(int, type))   # True (Classes are type objects)

4.5. equality

  • In Python, == compares object values via recursive equivalence while is checks if two variables reference the same memory address.

    # 1. value equivalence (==)
    L1 = [1, 2, 3]
    L2 = [1, 2, 3]
    print(L1 == L2)    # True: content is identical
    print(L1 is L2)    # False: different objects in memory
    
    # 2. object identity (is)
    S1 = 'spam'
    S2 = 'spam'
    print(S1 == S2)    # True: same value
    print(S1 is S2)    # True: same object (interned)
    
    # 3. memory addresses (id)
    x, y = 1024, 1024
    print(x == y)      # True
    print(x is y)      # False: distinct IDs for large ints

4.6. sequences

  • Strings, tuples, and lists are ordered, zero-indexed collections; while tuples and lists store any data type, strings are strictly sequences of characters.

    # concatenation (+) and repetition (*)
    combo = ('cat',) + ('dog', 'cow')  # ('cat', 'dog', 'cow')
    alarm = ('bark',) * 3              # ('bark', 'bark', 'bark')
    
    # membership & unpacking
    'c' in 'cat'                        # True
    c, d, w = ['meow', 'bark', 'moo']   # unpacking
    
    # iteration: direct vs. indexed vs. enumerated
    items = ['meow', 'bark', 'moo']
    for item in items: ...
    for i in range(len(items)): ...
    for i, v in enumerate(items): ...
    # indexing
    s = 'hello!'  # len(s) is 6
    
    # positive offsets (0 to len-1)
    print(s[0])     # 'h' (first)
    print(s[5])     # '!' (last)
    
    # negative offsets (-1 to -len)
    print(s[-1])    # '!' (same as s[len(s)-1])
    print(s[-6])    # 'h' (same as s[0])
    
    # out of bounds
    # s[6]          # IndexError: index out of range
    # slicing
    s = 'hello!'
    
    # [start:stop] - stop is non-inclusive
    print(s[1:3])   # 'el' (offsets 1 and 2)
    print(s[:3])    # 'hel' (default start: 0)
    print(s[1:])    # 'ello!' (default end: len)
    
    # [start:stop:step]
    print(s[::2])   # 'hlo' (every 2nd character)
    print(s[::-1])  # '!olleh' (negative step reverses)
    
    # shadow copy
    print(s[:])              # 'hello!' (top-level copy)
    print(s[slice(0, 6, 1)]) # 'hello!' (the internal logic)

4.7. truthiness

  • In Python, truthiness and falsiness determine a value’s evaluation in a Boolean context where most non-empty collections and non-zero numbers are truthy while None and empty or zero-valued objects are falsy.

    # truthy: objects with content or non-zero value
    bool(42)          # True
    bool("hello")     # True
    bool([1, 2])      # True
    
    # falsy: empty, zero, or null
    bool(0)           # False
    bool("")          # False
    bool([])          # False
    bool(None)        # False

4.8. and, or, not

  • In Python, logical operators combine Boolean expressions where not negates a value and both and and or use short-circuiting to return the operand that determines the result.

    # 1. negation
    print(not True)           # False
    print(not 0)              # True
    
    # 2. short-circuiting AND: returns first Falsy or last Truthy
    print([] and "hello")     # []
    print(10 and "hello")     # "hello"
    
    # 3. short-circuiting OR: returns first Truthy or last Falsy
    print("apple" or "pear")  # "apple"
    print(None or 0)          # 0
    
    letter = 'o'
    if letter == 'a' or letter == 'e' or letter == 'i' or letter == 'o' or letter == 'u':
        print(letter, 'is a vowel')
    else:
        print(letter, 'is not a vowel')

4.9. ~, <<, >>, &, ^, |

  • In Python, bitwise operators perform bit-level manipulations with a precedence lower than arithmetic operators following the specific order of ~, << >>, &, ^, and then |.

    x = 5  # 0b0101
    y = 1  # 0b0001
    
    # 1. AND, OR, XOR
    print(f"0b{(x & y):04b}")  # 0b0001 (both bits must be 1)
    print(f"0b{(x | y):04b}")  # 0b0101 (either bit is 1)
    print(f"0b{(x ^ y):04b}")  # 0b0100 (bits must differ)
    
    # 2. shifts & inversion
    print(f"0b{(x << 1):04b}") # 0b1010 (shift left: multiply by 2)
    print(f"0b{(x >> 1):04b}") # 0b0010 (shift right: floor divide by 2)
    print(f"0b{~x:b}")         # 0b-110 (invert: -(x+1))

4.10. /, //, %

  • In Python, / performs true division returning a float while // and % perform floor division and modulo returning integers only if both operands are integers.

    # 1. true division (/): always float
    print(10 / 2)          # 5.0
    print(11 / 2)          # 5.5
    
    # 2. floor division (//): truncates toward negative infinity
    print( 11   // 2)         #  5   (int)
    print( 11.0 // 2)         #  5.0 (float if any operand is float)
    print(-11   // 2)         # -6   (floor of -5.5 is -6)
    
    # 3. modulo (%): remainder of division
    print( 10 %  3)          #  1
    print(-10 %  3)          #  2   (result sign matches divisor)
    print( 10 % -3)          # -2

5. Bytes and bytearray

  • In Python, eight-bit integer sequences represent values from 0 to 255 as either immutable bytes or mutable bytearray objects.

    # 1. bytes: immutable literal (b'...')
    b_seq = b'abc'
    # b_seq[0] = 65      # TypeError: immutable
    
    # 2. bytearray: mutable constructor
    ba_seq = bytearray(b'abc')
    ba_seq[0] = 65       # 'a' (97) becomes 'A' (65)
    print(ba_seq)        # bytearray(b'Abc')
    
    # 3. indexing returns integers, slicing returns new sequences
    print(b_seq[0])      # 97 (integer)
    print(b_seq[:1])     # b'a' (bytes)
    
    # 4. initialization from size or list
    empty_bytes = bytes(5)          # b'\x00\x00\x00\x00\x00'
    from_list = bytes([97, 98, 99]) # b'abc'

    Endianness is a computer architecture convention for multi-byte data where "big-endian" (standard for IBM mainframes and networking) stores the most significant byte at the lowest address and "little-endian" (standard for x86 and ARM) stores the least significant byte first.

    import sys, struct
    
    # 1. check local architecture
    print(sys.byteorder)      # 'little' (common)
    
    # 2. multi-byte integer (hex: 0x0400)
    n = 1024
    
    # 3. convert to bytes
    big    = n.to_bytes(2, 'big')    # b'\x04\x00' (MSB first)
    little = n.to_bytes(2, 'little') # b'\x00\x04' (LSB first)
    
    print(f"Big:    {big.hex(' ')}")     # Big:    04 00
    print(f"Little: {little.hex(' ')}")  # Little: 00 04
    
    # 4. interpretation risk
    wrong = int.from_bytes(little, 'big')
    print(wrong)  # 4 (interpreted as 0x0004)
    
    # 5. using struct '>' for network order
    network_pkt = struct.pack('>H', n)  # pack as big-endian (>) unsigned short (H)
    print(network_pkt)                  # b'\x04\x00'

6. Strings

In Python, strings exist as Unicode str for text, immutable bytes for binary data, and mutable bytearray for modified raw data.

  • In Python, files use text mode for Unicode strings or binary mode for raw, untranslated bytes.

# create a sample file with a special character
with open('demo.txt', 'w', encoding='utf-8') as f:
    f.write('Hi 👋')

# text mode: returns a 'str' (Unicode)
with open('demo.txt', 'r') as f:
    print(f"Text:   {f.read()}")

# binary mode: returns 'bytes' (Raw)
with open('demo.txt', 'rb') as f:
    print(f"Binary: {f.read()}")

Designed by Unix legends Ken Thompson and Rob Pike on a diner placemat in New Jersey, UTF-8 is a variable-width encoding that serves as the standard for Python, Linux, and the Web.

# 1. define Unicode string with an emoji
cafe = 'café ☕'

# 2. len() counts Unicode characters (the emoji is 1 char)
print(len(cafe))       # 6

# 3. encode to bytes
# 'é' is 2 bytes (\xc3\xa9) | '☕' is 3 bytes (\xe2\x98\x95)
cafe_bytes = cafe.encode()

# 4. len() counts raw bytes
print(len(cafe_bytes)) # 9

# 5. decode back to str
print(cafe_bytes.decode()) # 'café ☕'
  • Python strings are created using single, double, or triple quotes, with triple quotes specifically designed to handle multiline text and preserve formatting like newlines and indentation.

    # 1. single and double quotes (interchangeable)
    s1 = 'Snap'
    s2 = "Crackle"
    
    # 2. nesting quotes without escapes
    s3 = "'Nay!' said the naysayer."
    s4 = 'The rare double quote: ".'
    
    # 3. triple quotes for multiline blocks
    poem = """There was a Young Lady of Norway,
        Who casually sat in a doorway;
        When the door squeezed her flat,
        She exclaimed, "What of that?"
        This courageous Young Lady of Norway."""
    
    # 4. raw representation (showing \n and spaces)
    print(repr(poem))
    # 1. repeating and combining
    hi = 'Na ' * 4 + 'Hey ' * 4
    
    # 2. escaping and line continuation
    farewell = '\\' + '\t' + 'Goodbye.' \
               + ' Done.'
    
    # 3. implicit concatenation
    s = ("Auto-" "merged " "literals")
  • Python supports specialized string types via single-letter prefixes that determine how the interpreter processes formatting, escape sequences, and underlying data structures.

    # 1. f-strings: formatted string literals
    animal, loc = 'wereduck', 'werepond'
    print(f'The {animal} is in the {loc}')
    
    # 2. r-strings: raw strings (ignores backslashes)
    path = r'C:\Users\name'  # Interpreted as 'C:\\Users\\name'
    
    # 3. b-strings: bytes literals (binary data)
    blob = b'\x14\xcd'
    print(list(blob))        # [20, 205]
    
    # 4. fr-strings: raw f-strings (combined)
    var = "Value"
    14 print(fr'Raw plus {var}')
  • Python supports three formatting methodologies: legacy C-style expressions, the .format() method, and modern interpolated f-strings.

    actor = 'Richard Gere'
    cat, weight = 'Chester', 28
    
    # 1. C-style (%)
    s1 = 'Actor: %s' % actor
    s2 = 'Our cat %s weighs %d lbs' % (cat, weight)
    s3 = '%(cat)s is %(weight)d' % {'cat': cat, 'weight': weight}
    
    # 2. str.format()
    s4 = '{0}, {1} and {2}'.format('spam', 'ham', 'eggs')
    s5 = '{motto}, {0} and {food}'.format('ham', motto='spam', food='eggs')
    s6 = '{}, {} and {}'.format('spam', 'ham', 'eggs')
    
    # 3. f-strings
    s7 = f'Our cat {cat} weighs {weight} pounds'
  • Python’s re module provides a suite of tools for pattern matching, substitution, and splitting string data using regular expressions.

    import re
    
    source = "Charles Baudelaire's 'Les Fleurs du Mal'"
    
    # 1. compiling a pattern (optional, improves performance for reuse)
    pattern = re.compile('Les Fleurs du Mal')
    
    # 2. search(): find first occurrence anywhere
    m = pattern.search(source)
    if m:
        print("Match found within the string.")
    
    # 3. match(): find exact match at the START only
    print(re.match('Les Fleurs du Mal', source))  # None
    
    # 4. findall(): returns a list of all matches
    print(re.findall('es', source))  # ['es', 'es']
    
    # 5. split(): break string at every pattern occurrence
    print(re.split(r'\s', source))   # split by whitespace
    
    # 6. sub(): search and replace patterns
    print(re.sub("'", '?', source)) # replaces single quotes with ?

7. If, while, and for

The walrus operator (:=) assigns a value to a variable within an expression stored and evaluated simultaneously.

limit = 280
msg = "Blah " * 60

# value is stored in 'diff' and evaluated by '>=' simultaneously
if (diff := limit - len(msg)) >= 0: # walrus operator
    print("Fitting tweet")
else:
    print(f"Over by {abs(diff)}")
  • Branch with if, elif, and else:

    # 1. standard multi-way branching
    color = "mauve"
    if color == "red":
        print("a tomato")
    elif color == "green":
        print("a green pepper")
    else:
        print("unknown:", color)
    
    # 2. ternary expression
    result = 't' if 'spam' else 'f'
    
    # 3. chained comparisons
    x = 2.5
    if 4 > x > 2 > 1: ...  # evaluates as (4 > x) and (x > 2) and (2 > 1)
    
    # 4. dictionary-based branching (dispatch tables)
    menu = {'spam': 1.25, 'ham': 1.99, 'eggs': 0.99}
    price = menu.get('bacon', 'N/A')
    
    actions = {'spam': lambda: print("order spam"), 'ham': lambda: print("order ham")}
    actions.get('spam', lambda: print("default action"))()
  • Repeat with while, and break, continue, and else:

    items = [1, 3, 5]
    
    while items:
        if (val := items[0]) == 0:
            break           # exit and skip 'else'
    
        items = items[1:]   # slice to progress
    
        if val % 2 == 0:
            continue        # skip to next condition check
        print(f"{val} squared is {val**2}")
    else: # optional
        print("no zeros found") # ONLY if the break above was never hit
  • Iterate with for/in, and break, continue and else:

    # 1. loop control: continue and break
    for char in 'thud':
        if char == 'u': continue  # skip remaining block for this item
        if char == 'x': break     # exit loop immediately
        print(char)
    else: # optional
        print("no 'x' found")     # ONLY if the break above was never hit
    
    # 2. range-based loops (start, stop, step)
    for i in range(0, 10, 2):
        print(i, end=' ')         # 0 2 4 6 8
    
    # 3. parallel and indexed iteration
    s = 'spam'
    for i, char in enumerate(s):  # generates (index, item) pairs
        print(f'{i}: {char}')
    
    for a, b in zip(s, s.upper()): # pairs elements from multiple iterables
        print(a, b)                # s S, p P, a A, m M
    
    # 4. sequence unpacking
    pairs = [[1, 2], [3, 4]]
    for x, y in pairs:             # direct assignment to variables
        print(x + y)

8. Tuples and lists

  • A tuple is an immutable, ordered sequence built with commas as a structural operator or tuple() as a constructor for iterables.

    'cat',                   # singleton  (trailing comma)
    'cat', 'dog', 'cattle'   # multi-item (separating commas)
    
    tuple()                  # constructor: empty ()
    tuple('cat')             # constructor: iterable to ('c', 'a', 't')

    Parentheses are grouping symbols used for empty literals, visual clarity, resolving syntactic ambiguity, and defining generator expressions.

    ()                       # empty literal
    ('cat',)                 # tuple
    ('cat')                  # string
    
    type(('cat',))           # <class 'tuple'>
    type('cat',)             # <class 'str'>
    
    (x for x in range(10))   # generator expression

    A named tuple is a hybrid object factory that creates classes supporting positional indexing (tuple), dotted name attribute (class), and dictionary conversion (_asdict()).

    # modern class-based; supports PEP 484 type hints and IDE autocompletion
    from typing import NamedTuple
    
    class Rec(NamedTuple):
        name: str                   # explicit field type
        age: float                  # enables static analysis
        jobs: list[str]             # self-documenting schema
    # legacy factory-based; quick, dynamic, but lacks static type hints
    from collections import namedtuple
    
    Rec = namedtuple('Rec', ['name', 'age', 'jobs'])
    bob = Rec('Bob', age=40.5, jobs=['dev', 'mgr'])
    
    bob[0]                          # positional indexing (tuple)
    bob.name                        # dotted name attribute (class)
    bob._asdict()['name']           # dictionary conversion (dict)
  • A list is a mutable, ordered sequence built with brackets [] as a literal or list() as a constructor for iterables.

    []                                # []
    ['meow', 'bark', 'moo']           # ['meow', 'bark', 'moo']
    [('cat', 'meow'), 'bark', 'moo']  # [('cat', 'meow'), 'bark', 'moo']
    
    list()                            # []
    list('cat')                       # ['c', 'a', 't']
    # append(), insert(), extend()
    wow = ['meow']  # ['meow']
    wow.append('moo')  # ['meow', 'moo']
    wow.insert(1, 'bark')  # ['meow', 'bark', 'moo']
    wow.extend(['cluck', 'baa']) # ['meow', 'bark', 'moo', 'cluck', 'baa']
    
    ```py
    # plus(+), repeat(*)
    plus = ['meow', 'bark', 'moo'] + ['cluck', 'baa'] # ['meow', 'bark', 'moo', 'cluck', 'baa']
    repeat = ['bark'] * 3 # ['bark', 'bark', 'bark']
    
    ```py
    # index, and slice assignment
    L = ['spam', 'Spam', 'SPAM!']
    # index assignment
    L[1] = 'eggs'  # ['spam', 'eggs', 'SPAM!']
    # slice assignment: delete+insert  # list[start:stop:step] = iterable
    #   if the iterable is shorter, elements are deleted from the slice.
    #   if the iterable is longer, extra elements are inserted.
    L[0:2] = ['eat', 'more']  # ['eat', 'more', 'SPAM!']
    # del, remove(), clear()
    farm = ['cat', 'dog', 'cattle', 'chicken', 'duck']
    
    del farm[-1]
    # ['cat', 'dog', 'cattle', 'chicken']
    
    farm.remove('dog')
    # ['cat', 'cattle', 'chicken']
    
    farm.clear()
    # []
    # pop: remove and return item at index (default last).
    farm = ['cat', 'cattle', 'chicken']
    
    farm.pop()  # 'chicken'
    # ['cat', 'cattle']
    
    farm.pop(-1)  # 'cattle'
    # ['cat']
    # sort() and sorted()
    farm = ['cat', 'dog', 'cattle']
    
    # a sorted copy
    sorted(farm)  # ['cat', 'cattle', 'dog']
    print(farm)  # ['cat', 'dog', 'cattle']
    
    # sorting in-place
    farm.sort()
    print(farm)  # ['cat', 'cattle', 'dog']
    # list comprehensions: [expression for item in iterable]
    even_numbers = [2 * num for num in range(5)]
    # [0, 2, 4, 6, 8]
    
    # list comprehensions: [expression for item in iterable if condition]
    odd_numbers = [num for num in range(10) if num % 2 == 1]
    # [1, 3, 5, 7, 9]
    # shallow: copies the top-level container with shared nested objects.
    a = [['cat', 'meow'], ['dog', 'bark']]
    c = a[:]
    b = a.copy()  # slower than direct bytecode slicing a[:]
    d = list(c)
    
    # deep: creates an independent clone of the container and all nested objects.
    e = copy.deepcopy(a)  # import copy
    
    a[0][1] = 'moo'
    
    a  # [['cat', 'moo'], ['dog', 'bark']]
    b  # [['cat', 'moo'], ['dog', 'bark']]
    c  # [['cat', 'moo'], ['dog', 'bark']]
    d  # [['cat', 'moo'], ['dog', 'bark']]
    
    e  # [['cat', 'meow'], ['dog', 'bark']]

    A deque (double-ended queue) is optimized for O(1) appends and pops from either end, whereas a list incurs O(N) costs for left-side mutations.

    from collections import deque
    
    q = deque([], maxlen=5)         # fixed-length sliding window
    q.append(0)                     # O(1) end-point growth
    q.appendleft(5)                 # O(1) start-point growth (vs list's O(N))
    
    q.pop()                         # O(1) end-point shrinkage
    q.popleft()                     # O(1) start-point shrinkage
    
    q.extend([1, 2])                # right-side batch add
    q.extendleft([3, 4])            # left-side batch add: deque([4, 3, ...])

9. Dictionaries and sets

In Python, keys in dictionaries and elements in sets must be of immutable, or hashable data types.

The built-in hash() operates natively with built-in types, but delegates to __hash__ for user-defined types, defaulting to object identity.

# 1. built-in immutables: always hashable (int, str, tuple, frozenset)
hash('python')  # returns a stable integer

# 2. built-in mutables: never hashable (list, dict, set)
# hash([1, 2])  --> raises TypeError: unhashable type: 'list'

# 3. user-defined classes: hashable by default via identity
class Foo: ...
a, b = Foo(), Foo()
print(hash(a), hash(b))  # different hashes based on memory address (id)
  • A dict is a mutable, associative array/map of unique keys to values, built with curly braces {} as a literal or dict() as a constructor.

    {}                              # {}
    {'cat': 'meow', 'dog': 'bark'}  # {'cat': 'meow', 'dog': 'bark'}
    
    dict()                          # constructor: empty {}
    dict(cat='meow', dog='bark')    # constructor: keyword args
    dict([('cat', 'meow')])         # constructor: iterable of pairs
    # [key], get()
    animals = {'cat': 'meow', 'dog': 'bark'}
    animals['cattle'] = 'moo'  # {'cat': 'meow', 'dog': 'bark', 'cattle': 'moo'}
    animals['cat']  # 'meow'
    animals['sheep']  # KeyError: 'sheep'
    animals.get('sheep')  # None
    animals.get('sheep', 'baa')  # 'baa'
    
    # testing
    animals = {'cat': 'meow', 'dog': 'bark'}
    'cat' in animals  # True
    'sheep' in animals  # False
    animals['sheep'] if 'sheep' in animals else 'oops!'  # 'oops!'
    # keys(), values(), items(), len()
    animals.keys()  # dict_keys(['cat', 'dog', 'cattle'])
    animals.values()  # dict_values(['meow', 'bark', 'moo'])
    animals.items()  # dict_items([('cat', 'meow'), ('dog', 'bark'), ('cattle', 'moo')])
    len(animals)  # 3
    # `**`, update()
    {**{'cat': 'meow'}, **{'dog': 'bark'}}  # {'cat': 'meow', 'dog': 'bark'}
    animals = {'cat': 'meow'}
    animals.update({'dog': 'bark'})  # {'cat': 'meow', 'dog': 'bark'}
    # del, pop(), clear()
    animals = {'cat': 'meow', 'dog': 'bark', 'cattle': 'moo'}
    del animals['dog']
    # {'cat': 'meow', 'cattle': 'moo'}
    animals.pop('cattle')  # 'moo'
    # {'cat': 'meow'}
    animals.clear()
    # {}
    # iterations
    animals = {'cat': 'meow', 'dog': 'bark', 'cattle': 'moo'}
    for key in animals:  # for key in animals.keys()
        print(f'{key} => {animals[key]}', end='\t')
    # cat => meow	dog => bark	cattle => moo
    for key, value in animals.items():
        print(f'{key} => {value}', end='\t')
    # cat => meow     dog => bark     cattle => moo
    # dictionary comprehensions: {key_expression : value_expression for expression in iterable}
    word = 'letters'
    letter_counts = {letter: word.count(letter) for letter in word}
    # {'l': 1, 'e': 2, 't': 2, 'r': 1, 's': 1}
    
    # dictionary comprehensions: {key_expression : value_expression for expression in iterable if condition}
    vowels = 'aeiou'
    word = 'onomatopoeia'
    vowel_counts = {letter: word.count(letter)
                    for letter in set(word) if letter in vowels}
    # {'i': 1, 'o': 4, 'a': 2, 'e': 1}
    # setdefault()
    d = {}
    d[0].extend(range(5))  # KeyError: 0
    d.setdefault(0, []).extend(range(5))
    d[0]  # [0, 1, 2, 3, 4]
    • A defaultdict is a dict subclass that calls a factory function to provide a default value for missing keys.

      from collections import defaultdict  # defaultdict(default_factory=None, /, [...])
      
      # factory: list -> defaults to []
      d_list = defaultdict(list)
      d_list[0].extend(range(5))      # auto-creates [] then extends
      
      # factory: int -> defaults to 0
      d_int = defaultdict(int)
      d_int['count'] += 1             # auto-creates 0 then increments
    • A Counter is a dict subclass for counting hashable items, storing elements as keys and their frequencies as values.

      from collections import Counter  # Counter(iterable=None, /, **kwds)
      
      word = 'banana'
      
      # O(N²) — scans string for 'b', then 'a', then 'n'...
      {l: word.count(l) for l in set(word)}
      
      # O(N) — single pass population
      c = Counter(word)               # Counter({'a': 3, 'n': 2, 'b': 1})
      c.most_common(1)                # [('a', 3)]
      list(c.elements())              # ['b', 'a', 'a', 'a', 'n', 'n']
      c['z']                          # missing keys return 0
    • A typed dict is a dict-like factory that defines fixed keys and types for static validation, providing a schema for flexible JSON-like maps.

      from typing import TypedDict, NotRequired
      
      class User(TypedDict):
          name: str                   # Required
          id: int                     # Required
          email: NotRequired[str]     # Optional (PEP 655)
      
      # 1. type check: static tools (Mypy) flag missing keys or wrong types
      user: User = {"name": "Alice", "id": 42}
      
      # 2. runtime: a plain dict
      print(type(user))               # <class 'dict'>
      print(user["name"])             # Standard string-key access
  • A set is a mutable, unordered collection of unique, hashable elements, built with curly braces {} or the set() constructor.

    {}            # <class 'dict'>
    {0, 2, 4, 6}  # {0, 2, 4, 6}
    
    set()                                                 # set()
    set('letter')                                         # {'l', 't', 'r', 'e'}
    set({'cat': 'meow', 'dog': 'bark', 'cattle': 'moo'})  # {'cat', 'cattle', 'dog'}
    
    frozenset()                    # frozenset()
    frozenset([3, 1, 4, 1, 5, 9])  # frozenset({1, 3, 4, 5, 9})
    # len(), add(), remove()
    nums = {0, 1, 2, 3, 4, }
    len(nums)  # 5
    nums.add(5)  # {0, 1, 2, 3, 4, 5}
    nums.remove(0)  # {1, 2, 3, 4, 5}
    # iteration
    for num in {0, 2, 4, 6, 8}:
        print(num, end='\t')
    # 0	2	4	6	8
    # testing
    2 in {0, 2, 4}  # True
    3 in {0, 2, 4}  # False
    # `&`: intersection(), `|`: union(), `-`: difference(), `^`: symmetric_difference()
    a = {1, 3}
    b = {2, 3}
    a & b  # {3}
    a | b  # {1, 2, 3}
    a - b  # {1}
    a ^ b  # {1, 2}
    # `<=`: issubset(), `<`: proper subset, `>=`: issuperset(), `>`: proper superset
    a <= b  # False
    a < b  # False
    a >= b  # False
    a > b  # False
    # set comprehensions: { expression for expression in iterable }
    {num for num in range(10)}  # {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
    
    # set comprehensions: { expression for expression in iterable if condition }
    {num for num in range(10) if num % 2 == 0}  # {0, 2, 4, 6, 8}

10. Iterations

An iterable is an object supporting the iter() call, while an iterator is the specific object returned by that call which supports next() to produce values.

An iterator is any object with a __next__ method that yields results and raises StopIteration to signal completion, providing the mechanism for iteration protocol to advance and terminate.

The iteration protocol—utilized by tools like for loops, comprehensions, and map(), and implemented by objects like files, lists, and generators—relies on two key steps:

  • An iterable’s __iter__ method, triggered by iter(), returns an iterator to manage the state and lifecycle of the iteration.

  • The iterator’s __next__ method, triggered by next(), produces values sequentially until a StopIteration exception signals the end of the series.

nums = [1, 2]          # iterable
i = iter(nums)         # iterator created here
print(next(i))         # 1
print(next(i))         # 2
# next(i) now raises StopIteration

Iteration contexts in Python include the for loop; list comprehensions; the map built-in function; the in membership test expression; and the built-in functions sorted, sum, any, and all, and also includes the list and tuple built-ins, string join methods, and sequence assignments, all of which use the iteration protocol to step across iterable objects one item at a time.

List comprehensions are executed at internal C-level routines, running faster than manual for loops by bypassing the interpreter’s bytecode overhead.

L = [1, 2, 3, 4, 5]
res = []
for x in L:
    res.append(x+10)
print(res)  # [11, 12, 13, 14, 15]
res2 = [x + 10 for x in L]
print(res2)  # [11, 12, 13, 14, 15]
# filter clauses: if
[line.rstrip() for line in open('script2.py') if line[0] == 'p']
# nested loops: for
[x + y for x in 'abc' for y in 'lmn']
# all, any, map, filter, reduce, zip, enumerate, shuffle, sample, reversed, sorted
nums = list(range(10))  # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

all(num > 0 for num in nums)  # False
any(num > 0 for num in nums)  # True

map(lambda x: x * x, nums)  # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
filter(lambda x: x % 2 == 0, nums)  # [0, 2, 4, 6, 8]
# from functools import reduce
reduce(lambda x, y: x + y, nums)  # ((0 + 1) + 2) + ... = 45

zip(range(3), range(4), range(5))  # [(0, 0, 0), (1, 1, 1), (2, 2, 2)]

funcs = ['map', 'filter', 'reduce']
enumerate(funcs)     # [(0, 'map'), (1, 'filter'), (2, 'reduce')]
enumerate(funcs, 1)  # [(1, 'map'), (2, 'filter'), (3, 'reduce')]
[(i, func) for i, func in enumerate(funcs, start=1)]  # [(1, 'map'), (2, 'filter'), (3, 'reduce')]

# from random import shuffle, sample
shuffle(nums)  # Shuffle list x in place, and return None.
nums  # [4, 2, 5, 9, 6, 0, 1, 3, 8, 7]
sample(nums, k=len(nums))  # [5, 3, 7, 6, 8, 4, 0, 1, 2, 9]

reversed(nums)  # [7, 8, 3, 1, 0, 6, 9, 5, 2, 4]
nums[::-1]  # [7, 8, 3, 1, 0, 6, 9, 5, 2, 4]

sorted(nums, reverse=True)  # [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
import itertools

names = ["Alan", "Adam", "Wes", "Will", "Albert", "Steven"]

for letter, names in itertools.groupby(names, lambda x: x[0]):
    print(letter, list(names))

# A ['Alan', 'Adam']
# W ['Wes', 'Will']
# A ['Albert']
# S ['Steven']

for num in itertools.chain(range(3), range(3, 7), range(7, 10), [10]):
    print(num, end='\t')

# 0       1       2       3       4       5       6       7       8       9       10

list(itertools.combinations([0, 1, 2], 2))
[(0, 1), (0, 2), (1, 2)]

list(itertools.combinations_with_replacement([0, 1, 2], 2))
[(0, 0), (0, 1), (0, 2), (1, 1), (1, 2), (2, 2)]

list(itertools.permutations([0, 1, 2], 2))
[(0, 1), (0, 2), (1, 0), (1, 2), (2, 0), (2, 1)]

list(itertools.product([0, 1, 2], [3, 4, 5], repeat=1))
[(0, 3), (0, 4), (0, 5), (1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5)]

11. Files and directories

A file is a byte sequence identified by a filename within a directory-based filesystem, categorized into text files—which automatically handle Unicode encoding and line endings—and binary files, which provide raw, unaltered access via the bytes type.

  • A file is opened by open() with an optional mode indicating permissions and newline handling, resulting a stream object for data reading or writing.

    open(f, 'r')  # read an EXISTING file
    open(f, 'w')  # create or overwrite a file
    open(f, 'a')  # create or append to a file
    open(f, 'x')  # create a NON-EXISTING file (fails if exists)
    
    open(f, 'r+') # read and write an EXISTING file
    open(f, 'w+') # read and write a file (creates or overwrites)
    open(f, 'a+') # read and append to a file (creates if missing)
    
    open(f, 'rb') # read an EXISTING file as a raw stream of bytes
    open(f, 'wb') # write a file as a raw stream of bytes
    # text mode (str): .txt, .csv, .json
    with open("file.txt", "w", encoding="utf-8") as f:
        f.write("Line 1\n")            # write string
        f.writelines(["L2\n", "L3\n"]) # write list of strings
    
    with open("file.txt", "r") as f:
        content = f.read()             # read entire file as a single string
        fio.seek(0)
        lines = f.readlines()          # read entire file as a list of strings
        fio.seek(0)
        for line in f: ...             # read line by line (lazy loading)
    # binary mode (bytes): .jpg, .pdf, .zip, .exe
    with open("image.jpg", "rb") as f:
        header = f.read(10)            # read first 10 bytes
        data = f.read()                # read remainder as bytes object
    
    with open("copy.jpg", "wb") as f:
        f.write(data)                  # write bytes object
    • By default, files open in text mode (t) using universal newlines, which transparently maps OS-specific endings (CRLF on Windows, LF on Unix) to the standard \n character.

      open(f, 'r', newline=None)  # default enables universal newline translation
      open(f, 'r', newline='')    # disables translation to return raw endings
      open(f, 'w', newline='\n')  # forces LF line endings regardless of OS
    • By default, files open in system-dependent locale, causing cross-platform failures (e.g., cp1252) when reading UTF-8 files.

      import locale
      
      print(locale.getpreferredencoding())  # preferred encoding
      
      open(f, 'r', encoding='utf-8')        # explicit & safe
  • pathlib is a modern, object-oriented module for path manipulation, replacing the raw string-based logic of os.path.

    from pathlib import Path
    
    # 1. initialization
    p = Path("data/v1/config.yaml")   # object initialization
    p = Path.cwd() / "src" / "app.py" # path combination
    p = Path.home()                   # home dir
    
    # 2. attributes
    p.name    # app.py
    p.stem    # app"
    p.suffix  # .py
    p.parent  # parent dir
    p.parts   # ('/', 'src', 'app.py')
    
    # 3. verification & metadata
    p.exists()  # existence
    p.is_file() # file
    p.is_dir()  # directory
    p.stat()    # size, mtime, etc.
    p.resolve() # absolute path
    
    # 4. mutations
    p.mkdir(parents=True, exist_ok=True) # create dir + parents
    p.touch()                            # create file/update timestamp
    p.unlink(missing_ok=True)            # delete file
    p.rmdir()                            # delete empty dir
    p.rename("new.py")                   # move/rename
    p.replace("new.py")                  # atomic move/overwrite
    
    # 5. search & iteration
    p.iterdir()     # shallow contents generator
    p.glob("*.csv") # shallow pattern match
    p.rglob("*.py") # recursive pattern match
    
    # 6. stream & I/O
    with p.open('r') as f: f.read()  # manual stream
    p.read_text()                    # fast read (UTF-8)
    p.write_text("data")             # fast write (UTF-8)

    pathlib supports * (shallow), ** (recursive), ? (single-char), and [] (sets/ranges), but excludes shell-style {} expansion.

    p.glob("*.py")       # shallow  : current directory only
    p.glob("**/*.py")    # recursive: explicit double-star pattern
    p.rglob("*.py")      # recursive: shorthand (implies ** prefix)
    
    # multi-extension workaround (no {} support)
    target_exts = {'.jpg', '.png', '.gif'}
    images = (f for f in p.rglob("*") if f.suffix in target_exts)

12. Functions

Python functions are first-class objects existing as named blocks with def, anonymous expressions with lambda, or methods with a bound instance.

  • def is a statement creating a named function at runtime, while lambda is an expression coding an anonymous, single-expression function.

    def add(x, y): return x + y     # named, multiple statements
    add_alt = lambda x, y: x + y    # anonymous, one expression
    
    if persistent:
        def save(): ...             # def works inside logic blocks
    else:
        save = lambda: None         # lambda works where expressions are expected
    
    def future_func(): pass         # NOOP: classic
    def todo_func(): ...            # NOOP: modern
  • return sends a result and exits, while yield produces a result and suspends state to generate a series over time.

    def get_one(): return 1         # terminate
    def get_seq(): yield 1; yield 2 # generator
  • global binds names to the module-level scope, while nonlocal binds names to the nearest enclosing function scope.

    # global: modifies module-level x
    def change_global():
        global x; x = 2
    
    # nonlocal: modifies outer function y
    def outer():
        y = 1
        def inner():
            nonlocal y; y = 2
  • Python uses pass-by-assignment, matching arguments from left to right by default, or by keyword (name=value).

    def myfunc(arg1, arg2, meat='ham', *args, **kwargs): ...
    
    # 'spam', 'eggs' -> positional
    # meat=ham       -> keyword
    # *args          -> unpacks remaining positionals
    # **kargs        -> unpacks remaining keyword
    myfunc('spam', 'eggs', meat=ham, *args, **kargs)

    In Python, the / indicates that everything before it is positional-only, and the * (when used alone) indicates that everything after it is keyword-only.

    def feed_ animal(qty, /, kind="goat"): ...  # 'qty' is positional-only; 'kind' is standard
    feed_animal(5, "sheep")                    # valid
    # feed_animal(qty=5, kind="sheep")         # type error
    
    def harvest(*, crop, tool="scythe"): ...   # everything to the right must be named
    harvest(crop="wheat")                      # valid
    # harvest("wheat")                         # type error

12.1. Attributes and annotations

  • A function is a first-class object supporting system and user-defined attributes alongside metadata annotations.

    # 1. annotations (type hint vs. general metadata)
    def cube(n: int) -> int: ...            # type hint
    def spam(a: 'tag'): return a            # general metadata
    
    # 2. user-defined attribute
    cube.category = "math"
    
    # 3. system-defined attributes
    print(cube.__name__)                    # 'cube'
    print(cube.__annotations__)             # {'n': <class 'int'>, 'return': <class 'int'>}
    print(cube.category)                    # 'math'
    
    # 4. first-class citizen (high order)
    def execute(func, value):
        return func(value)
    
    print(execute(cube, 3))                 # 27
    print(execute(lambda n: n**3, 3))       # 27

12.2. Lambdas

  • A lambda expression is created by the keyword lambda with a comma-separated argument list and a single expression that returns the function’s result.

    from functools import reduce
    nums = range(10)
    
    # map: mapping functions over iterables
    list(map(lambda x: x+1, nums))              # [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    
    # filter: selecting items in iterables
    list(filter(lambda x: x % 2 == 0, nums))    # [0, 2, 4, 6, 8]
    
    # reduce: combining items in iterables
    reduce(lambda x, y: x+y, nums)              # 45

12.3. Namespaces

A namespace is a scope where names live within LEGB levels (local, enclosing, global, and built-in).

  • A name resolution is the process of searching LEGB levels in order and stopping at the first match.

  • A name assignment is bound to the local scope by default, unless overridden by global or nonlocal.

    a = 5.21                    # global (G)
    
    def tester(start):
        state = start           # enclosing (E)
        def nested(label):
            nonlocal state      # bound to 'state' in tester
            global a            # bound to 'a' at module level
    
            state += 1
            print(locals())     # local names and values
            print(globals())    # global names and values
    
            print(vars())       # local names and values (same as locals)
            import math
            print(vars(math))   # attribute names and values of math module
        return nested

12.4. Closures

  • A closure is a function object that remembers the values in its enclosing lexical environment even after the outer scope has finished executing.

    A lexical environment (or lexical scope) is a static structure where variable accessibility is determined by the physical placement of code at write-time rather than the execution path at runtime.

    # 1. named closure (using def)
    def maker(n):
        def action(x):
            return x ** n  # remembers n from enclosing scope
        return action
    
    f = maker(2)
    g = maker(3)
    
    print(f(4))  # 16 (remembers n=2)
    print(g(4))  # 64 (remembers n=3)
    
    # 2. anonymous closure (using lambda)
    def lambda_maker(n):
        return lambda x: x ** n  # n captured by lambda expression
    
    h = lambda_maker(4)
    print(h(2))  # 16 (remembers n=4)

    If a lambda or def defined within a function is nested inside a loop, all generated functions will share the loop variable’s final value because the variable is bound late at call-time rather than definition-time.

    # 1. the trap: late binding
    def make_actions():
        # 'i' is not "captured" yet; it is just a name to be looked up later
        return [lambda x: i ** x for i in range(5)]
    
    acts = make_actions()
    # At call-time, the loop is finished and 'i' is 4 in the enclosing scope
    print([f(2) for f in acts])  # [16, 16, 16, 16, 16]
    
    # 2. the fix: early binding
    def make_actions():
        # i=i binds the current value of 'i' to a local parameter immediately
        return [lambda x, i=i: i ** x for i in range(5)]
    
    acts = make_actions()
    print([f(2) for f in acts])  # [0, 1, 4, 9, 16]

12.5. Generators

  • A generator is a specialized iterator—a one-way stream that produces items one at a time on demand through the iteration protocol instead of returning a complete sequence at once.

    • A generator function is a generator factory with a def statement and yield to produce an object featuring state suspension, retaining its local scope and code position between yields.

      def count_factory(n):
          for i in range(n):
              yield i               # suspends execution, retains local scope/position
      
      def delegated_factory(n):
          yield from range(n)       # shorthand for "for i in range(n): yield i"
      
      for val in count_factory(5):
          print(val)                # 0, 1, 2, 3, 4
    • A generator expression is a memory-space optimization shorthand with () to produce items on-demand, running slower than list comprehensions due to iteration overhead but essential for large datasets.

      gen_exp = (i for i in range(5))  # memory-efficient
      
      print(next(gen_exp))             # 0            / yields on-demand
      print(list(gen_exp))             # [1, 2, 3, 4] / exhausts the remaining values
      
      next(gen_exp)                    # stop iteration exception

13. Classes

A class is a blueprint defining a namespace of shared attributes to create instance objects with a unique namespace for instance attributes while delegating shared attribute lookups to the class.

class Animal:
    """blueprint for creating animal instances with shared and unique traits."""

    kingdom    = "Animalia"       # public
    _territory = "Earth"          # protected (convention)
    __acient   = "Paleolith"      # private (mangled)

    def __init__(self, species, color, voice):
        self.voice     = voice    # public
        self._color    = color    # protected (convention)
        self.__species = species  # private (mangled)
        # self refers to the specific instance object being created

    @classmethod
    @property
    def territory(cls):
        """getter for protected class territory."""
        return cls._territory

    @classmethod
    @property
    def acient(cls):
        """getter for private class acient."""
        return cls.__acient

    @property
    def species(self):
        """getter for private instance species."""
        return self.__species

    @species.setter
    def species(self, value):
        """setter for private instance species."""
        self.__species = value

    @property
    def color(self):
        """getter for protected instance color."""
        return self._color

    @color.setter
    def color(self, value):
        self._color = value

    def wow(self):
        """prints the animal voice."""
        print(f"{self.voice}!")

    @classmethod
    def change_kingdom(cls, new_name):
        """modifies the shared class namespace."""
        cls.kingdom = new_name
        # cls refers to the class object itself, not a specific instance

    @staticmethod
    def is_living():
        """utility bound to the class namespace."""
        return True

dog = Animal("Canine", "Brown", "Woof")
cat = Animal("Feline", "Orange", "Meow")

dog.wow()                # Woof!
print(cat.species)       # Feline

print(Animal.territory)  # Earth
print(Animal.acient}")   # Paleolith

The term attribute is an umbrella for any named member accessed with dot (.) notation of an object (states, properties, or methods), which are categorized by scopes (class and instance), visibility (public, protected, and private), and storage mechanisms (__dict__ and __slots__).

  • __dict__ is an object-level dictionary that stores an object’s attributes for attribute lookup, and extension.

    class Animal:
        def __init__(self, species):
            self.species = species
    
    dog = Animal("Canine")
    print(dog.__dict__)                 # {'species': 'Canine'}
    
    dog.color = "Brown"                 # add a data attribute
    print(dog.__dict__)                 # {'species': 'Canine', 'color': 'Brown'}
    
    dog.wow = lambda: print("Woof!")    # add a method
    print(dog.__dict__)                 # {..., 'wow': <function <lambda> at ...>}
    
    dog.wow()                           # Woof!"
    
    cat = Animal("Feline")
    print(cat.__dict__)                 # {'species': 'Feline'}
    
    Animal.kingdom = "Animalia"         # add a class data attribute
    print(Animal.__dict__)              # {..., 'kingdom': 'Animalia'}
    
    print(cat.__dict__)                 # {'species': 'Feline'}
    
    print(dog.kingdom)                  # Animalia
    print(cat.kingdom)                  # Animalia
  • __slots__ is a class-level sequence that declares a fixed set of permissioned attributes for memory optimization by bypassing the per-instance __dict__.

    class SlottedAnimal:
        __slots__ = ('species', 'color')
    
        def __init__(self, species):
            self.species = species
    
    bird = SlottedAnimal("Avian")
    
    bird.color = "Blue"
    print(f"{bird.species}, {bird.color}") # Avian, Blue
    
    # print(bird.__dict__) -> 'SlottedAnimal' object has no attribute '__dict__'
    # bird.age = 5         -> 'SlottedAnimal' object has no attribute 'age' and no __dict__ for setting new attributes

__new__, __init__, and __del__ are specialized lifecycle methods to instantiate the object (memory), initialize its attributes (state), and clean up resources (destruction).

class Robot:
    def __new__(cls, *args, **kwargs):
        print("__new__")
        return super().__new__(cls)

    def __init__(self, name):
        self.name = name
        print("__init__")

    def __del__(self):
        print("__del__")

bot = Robot("WALL·E")  # triggers __new__ then __init__
del bot                 # triggers __del__

13.1. Methods

  • An instance method is a function that implicitly receives (binds) the instance as its first argument (self) to reference and manipulate instance attributes.

    class Robot:
        def __init__(self, name):
            self.name = name
    
        def rename(self, new_name):
            self.name = new_name
    
    bot = Robot('WALL·E')
    bot.rename('WALL·EVE')
  • A class method is a function decorated by @classmethod that implicitly receives (binds) the class as its first argument (cls) to reference and manage class attributes.

    class Robot:
        name = 'WALL·E'
    
        @classmethod
        def rename(cls, new_name):
            cls.name += new_name
    
    Robot.rename('WALL·EVE')
  • A static method is a function decorated by @staticmethod that receives no implicit argument and serves as a namespace-bound utility.

    class Robot:
        @staticmethod
        def is_sustainable(plant_count):
            """Check if life is sustainable based on current plant discovery."""
            return plant_count > 0
    
    if Robot.is_sustainable(1):
        print("Return to Earth!")

13.2. Inheritances

  • A subclass is a child class that extends or overrides the functionality from one or more base classes.

    class Robot:
        def __init__(self, name):
            self.name = name
    
        def move(self):
            print(f"{self.name} moves on treads.")
    
    class WallE(Robot):
        def __init__(self, name):
            self.name = name
    
        def work(self):
            print(f"{self.name} compacting trash.")
    • A Mixin is a small, specialized class used in multiple inheritance to plug in a specific feature (like flight or tread) without changing the core identity of the target class (like a Robot).

      class Robot:
          def __init__(self, name):
              self.name = name
      
      class FlightMixin:
          def move(self):
              print(f"{self.name} is flying through the air!")
      
      class TreadMixin:
          def move(self):
              print(f"{self.name} is rolling on treads.")
      
      class Eve(FlightMixin, Robot):
          """Identity: Robot | Feature: Flight"""
          def scan(self):
              # Uses the identity name to perform a unique action
              print(f"{self.name} is scanning for plant life...")
      
      class WallE(TreadMixin, Robot):
          """Identity: Robot | Feature: Treads"""
          def work(self):
              # Uses the identity name to perform a unique action
              print(f"{self.name} is compacting trash cubes.")
      
      issubclass(Eve, Robot)         # True
      issubclass(Eve, FlightMixin)   # True
      issubclass(WallE, Robot)       # True
      issubclass(WallE, TreadMixin)  # True
  • In Python, inheritance is an attribute lookup process that uses C3 Linearization to flatten class hierarchies into a single, predictable search path called the MRO (Method Resolution Order).

    • Class.mro() is a method that returns a list of classes representing the search path derived from C3 Linearization for attribute lookup.

      class Base: ...
      
      class Mixin(Base): ...
      
      class Child(Mixin, Base): ...
      
      print(Child.mro())              # [<class 'Child'>, <class 'Mixin'>, <class 'Base'>, <class 'object'>]
    • super() is a proxy object that delegates method calls to the next class in the MRO without specifying the class name explicitly.

      class Robot:
          def __init__(self, name):
              self.name = name
      
      class WallE(Robot):
          def __init__(self, name):
              super().__init__(name)      # FLEXIBLE: finds Robot automatically
      
      class Eve(Robot):
          def __init__(self, name):
              Robot.__init__(self, name)  # RIGID: must name class and pass 'self' manually
  • In Python, duck typing is a loose implementation of polymorphism that prioritizes an object’s behavior (methods and attributes) over its inheritance or class identity.

    # If it walks like a duck and quacks like a duck, it’s a duck.
    #     —— A Wise Person
    class Duck:
    
        def wow(self):
            return 'quack!'
    
    class Cat:
    
        def wow(self):
            return 'meow!'
    
    def speak(entity):
        print(entity.speak())
    
    speak(Duck())  # quack!
    speak(Cat())   # meow!
  • ABC (Abstract Base Class) is an explicit contract for interfaces with runtime type checking (i.e., isinstance()), while Protocol is an implicit shape using structural subtyping for static type checking (e.g., Pyright), aligning with Python’s duck typing philosophy.

    from abc import ABC, abstractmethod
    from typing import Protocol
    
    class RobotABC(ABC):
        @abstractmethod
        def move(self): ...
    
    class WallE(RobotABC):                  # explicitly inherits
        def move(self):
            print("Solar rolling...")
    
    class Flyer(Protocol):
        def move(self) -> None: ...
    
    class Eve:                              # implicitly matches
        def move(self):
            print("Thruster flight...")
    
    def activate(unit: Flyer):
        unit.move()
    
    activate(WallE())                       # works (has .move)
    activate(Eve())                         # works (has .move)

13.3. Operator overloading

  • A dataclass is a specialized class decorated by @dataclass (similar to Lombok in Java), designed primarily to store data while automatically generating boilerplate methods like __init__, __repr__, and __eq__.

    from dataclasses import dataclass
    
    @dataclass
    class Point:
        x: float
        y: float
    
    p1 = Point(1.0, 2.0)
    p2 = Point(1.0, 2.0)
    
    print(p1)        # Point(x=1.0, y=2.0)
    print(p1 == p2)  # True
  • Operator overloading lets classes intercept normal Python operations.

  • Classes can overload all Python expression operators.

  • Classes can also overload built-in operations such as printing, function calls, attribute access, etc.

  • Overloading makes class instances act more like built-in types.

  • Overloading is implemented by providing specially named methods in a class.

13.3.1. Indexing and slicing: __getitem__ and __setitem__

  • When an instance X appears in an indexing expression like X[i], Python calls the __getitem__ method inherited by the instance, passing X and the index in brackets to the arguments.

    class Indexer:
        def __getitem__(self, index):
            return index ** 2
    
    X = Indexer()
    X[2]  # X[i] calls X.__getitem__(i)
    # 4
    for i in range(5):
        print(X[i], end=' ')  # Runs __getitem__(X, i) each time
    # 0 1 4 9 16
  • In addition to indexing, __getitem__ is also called for slice expressions—using upper and lower bounds and a stride bundled up into a slice object.

    class Indexer:
        data = [5, 6, 7, 8, 9]
    
        def __getitem__(self, index: int | slice) -> int | list[int]:  # Called for index or slice
            print('getitem:', index)
            return self.data[index]  # Perform index or slice
    X = Indexer()
    X[0]
    # getitem: 0
    # 5
    X[-1]
    # getitem: -1
    # 9
    X[2:4]
    # getitem: slice(2, 4, None)
    # [7, 8]
    X[1:]
    # getitem: slice(1, None, None)
    # [6, 7, 8, 9]
    X[:-1]
    # getitem: slice(None, -1, None)
    # [5, 6, 7, 8]
    X[::2]
    # getitem: slice(None, None, 2)
    # [5, 7, 9]
  • The __getitem__ may be also called automatically as an iteration fallback option (all iteration contexts will try the __iter__ method first), for example, the for loops, in membership test, list comprehensions, the map built-in, list and tuple assignments, and type constructors.

    class StepperIndex:
        def __init__(self, data):
            self.data = data
    
        def __getitem__(self, i):
            return self.data[i]
    X = StepperIndex('Spam')
    
    X[1]  # Indexing calls __getitem__
    # 'p'
    
    for item in X:  # for loops call __getitem__
        print(item, end=' ')  # for indexes items 0..N
    # S p a m
    'p' in X  # All call __getitem__ too
    # True
    [c for c in X]  # List comprehension
    # ['S', 'p', 'a', 'm']
    list(map(str.upper, X))  # map calls
    # ['S', 'P', 'A', 'M']
    (a, b, c, d) = X  # Sequence assignments
    a, c, d
    # ('S', 'a', 'm')
    list(X), tuple(X), ''.join(X)  # And so on...
    # (['S', 'p', 'a', 'm'], ('S', 'p', 'a', 'm'), 'Spam')
  • The __setitem__ index assignment method similarly intercepts both index and slice assignments.

    class IndexSetter:
        def __init__(self, data):
            self.data = data
    
        def __setitem__(self, index, value):  # Intercept index or slice assignment
            self.data[index] = value  # Assign index or slice
  • The __index__ method returns an integer value for an instance when needed and is used by built-ins that convert to digit strings.

    class C:
        def __index__(self):
            return 255
    X = C()
    hex(X)  # '0xff'
    bin(X)  # '0b11111111'
    oct(X)  # '0o377'

13.3.2. Iterable objects: __iter__ and __next__

  • Technically, iteration contexts work by passing an iterable object to the iter built-in function to invoke an __iter__ method, which is expected to return an iterator object.

  • If it’s provided, Python then repeatedly calls the iterator object’s __next__ method to produce items until a StopIteration exception is raised.

  • A next built-in function is also available as a convenience for manual iterations—next(I) is the same as I.next().

  • In all iteration contexts, Python tries to use __iter__ first, which returns an object that supports the iteration protocol with a __next__ method: if no __iter__ is found by inheritance search, Python falls back on the __getitem__ indexing method, which is called repeatedly, with successively higher indexes, until an IndexError exception is raised.

    class Squares:
        def __init__(self, start, stop):  # Save state when created
            self.value = start - 1
            self.stop = stop
    
        def __iter__(self):  # Get iterator object on iter
            return self  # One-shot iteration, single traversal only
    
        def __next__(self):  # Return a square on each iteration
            if self.value == self.stop:  # Also called by next built-in
                raise StopIteration
            self.value += 1
            return self.value ** 2
  • If used, the yield statement can create the __next__ method automatically.

    class Squares:                          # __iter__ + yield generator
        def __init__(self, start, stop):    # __next__ is automatic/implied
            self.start = start
            self.stop = stop
    
        def __iter__(self):
            for value in range(self.start, self.stop + 1):
                yield value ** 2
  • To achieve the multiple-iterator effect on one object, __iter__ simply needs to define a new stateful object for the iterator, instead of returning self for each iterator request.

    class SkipObject:
        def __init__(self, wrapped):  # Save item to be used
            self.wrapped = wrapped
    
        def __iter__(self):
            return SkipIterator(self.wrapped)  # New iterator each time
    
    class SkipIterator:
        def __init__(self, wrapped):
            self.wrapped = wrapped  # Iterator state information
            self.offset = 0
    
        def __next__(self):
            if self.offset >= len(self.wrapped):  # Terminate iterations
                raise StopIteration
            else:
                item = self.wrapped[self.offset]  # else return and skip
                self.offset += 2
                return item

13.3.3. Membership: __contains__, __iter__, and __getitem__

  • In the iterations domain, classes can implement the in membership operator as an iteration, using either the __iter__ or __getitem__ methods.

  • To support more specific membership, though, classes may code a __contains__ method—when present, this method is preferred over __iter__, which is preferred over __getitem__.

  • The __contains__ method should define membership as applying to keys for a mapping (and can use quick lookups), and as a search for sequences.

    class Iters:
        def __init__(self, value):
            self.data = value
    
        def __getitem__(self, i):  # Fallback for iteration
            print('get[%s]:' % i, end='')  # Also for index, slice
            return self.data[i]
    
        def __iter__(self):  # Preferred for iteration
            print('iter=> next:', end='')  # Allows multiple active iterators
            for x in self.data:  # no __next__ to alias to next
                yield x
                print('next:', end='')
    
        def __contains__(self, x):  # Preferred for 'in'
            print('contains: ', end='')
            return x in self.data

13.3.4. Attribute access: __getattr__ and __setattr__

  • The __getattr__ method intercepts attribute references.

    • It’s called with the attribute name as a string whenever trying to qualify an instance with an undefined (nonexistent) attribute name.

    • It is not called if Python can find the attribute using its inheritance tree search procedure.

      class Empty:
          def __getattr__(self, attrname):  # On self.undefined
              if attrname == 'age':  # age becomes a dynamically computed attribute
                  return 40
              else:
                  raise AttributeError(attrname)  # raises the builtin AttributeError exception
      X = Empty()
      X.age  # 40
      getattr(X, 'age')  # 40
      
      X.name  # AttributeError: name
      getattr(X, 'name', 'Jon')  # 'Jon'
      
      hasattr(X, 'name')  # False
      
      setattr(X, 'name', 'Jon X')
      X.name  # 'Jon X'
  • In the same department, the __setattr__ intercepts all attribute assignments.

    • If the method is defined or inherited, self.attr = value becomes self.__setattr__('attr', value).

    • Assigning to any self attributes within __setattr__ calls __setattr__ again, potentially causing an infinite recursion loop.

    • Avoid loops by coding instance attribute assignments as assignments to attribute dictionary keys: self.dict['name'] = x, not self.name = x.

      class Accesscontrol:
          def __setattr__(self, attr, value):
              if attr == 'age':
                  self.__dict__[attr] = value + 10  # Not self.name=val or setattr
                  # It’s also possible to avoid recursive loops in a class that uses __setattr__ by routing
                  # any attribute assignments to a higher superclass with a call, instead of assigning keys
                  # in __dict__:
                  #    self.__dict__[attr] = value + 10 # OK: doesn't loop
                  #    object.__setattr__(self, attr, value + 10) # OK: doesn't loop (new-style only)
              else:
                  raise AttributeError(attr + ' not allowed')
      X = Accesscontrol()
      X.age = 40
      X.age  # 50
      X.name = 'Bob'  # AttributeError: name not allowed
  • A third attribute management method, __delattr__, is passed the attribute name string and invoked on all attribute deletions (i.e., del object.attr).

    • Like __setattr__, it must avoid recursive loops by routing attribute deletions with the using class through __dict__ or a superclass.

  • The built-in getattr function is used to fetch an attribute from an object by name string—getattr(X,N) is like X.N, except that N is an expression that evaluates to a string at runtime, not a variable.

    class Wrapper:  # A wrapper (sometimes called a proxy) class
        def __init__(self, object):
            self.wrapped = object  # Save object
    
        def __getattr__(self, attrname):
            print('Trace: ' + attrname)  # Trace fetch
            return getattr(self.wrapped, attrname)  # Delegate fetch

13.3.5. String representation: __repr__ and __str__

If defined, __repr__ (or its close relative, __str__) is called automatically when class instances are printed or converted to strings.

  • __str__ is tried first for the print operation and the str built-in function (the internal equivalent of which print runs). It generally should return a user-friendly display.

  • __repr__ is used in all other contexts: for interactive echoes, the repr function, and nested appearances, as well as by print and str if no __str__ is present. It should generally return an as-code string that could be used to re-create the object, or a detailed display for developers.

    class adder:
        def __init__(self, value=0):
            self.data = value  # Initialize data
    
        def __add__(self, other):
            self.data += other  # Add other in place (bad form?)
    x = adder()  # Default displays
    print(x)
    # <__main__.adder object at 0x7fd1fd745a50>
    x
    # <__main__.adder object at 0x7fd1fd745a50>
    class addrepr(adder):  # Inherit __init__, __add__
        def __repr__(self):  # Add string representation
            return 'addrepr(%s)' % self.data  # Convert to as-code string
    x = addrepr(2)
    x  # Runs __repr__
    # addrepr(2)
    print(x)  # Runs __repr__
    # addrepr(2)
    str(x), repr(x)  # Runs __repr__ for both
    # ('addrepr(2)', 'addrepr(2)')
    class addstr(adder):
        def __str__(self):  # __str__ but no __repr__
            return '[Value: %s]' % self.data  # Convert to nice string
    x = addstr(3)
    x  # Default __repr__
    # <demo.addstr object at 0x7fd1fd63d2d0>
    print(x)  # # Runs __str__
    # [Value: 3]
    str(x), repr(x)
    # ('[Value: 3]', '<demo.addstr object at 0x7fd1fd63d2d0>')
    class addboth(adder):
        def __str__(self):
            return '[Value: %s]' % self.data  # User-friendly string
    
        def __repr__(self):
            return 'addboth(%s)' % self.data  # As-code string
    x = addboth(4)
    x  # Runs __repr__
    # addboth(4)
    print(x)  # Runs __str__
    # [Value: 4]
    str(x), repr(x)
    # ('[Value: 4]', 'addboth(4)')

13.3.6. Right-side and in-place uses: __radd__ and __iadd__

  • Every binary operator has a left, right, and in-place variant overloading methods (e.g., __add__, __radd__, and __iadd__).

  • For example, the __add__ for objects on the left is called instead in all other cases and does not support the use of instance objects on the right side of the + operator.

    class Number:
        def __init__(self, value=0):
            self.data = value
    
        def __add__(self, other):
            return self.data+other
    x = Number(5)
    x + 2
    # 7
    2 + x
    # TypeError: unsupported operand type(s) for +: 'int' and 'Number'
  • To implement more general expressions, and hence support commutative-style operators, code the __radd__ method as well.

    class Number:
        def __init__(self, value=0):
            self.data = value
    
        def __add__(self, other):
            return self.data+other
    
        def __radd__(self, other):
            return self.data+other
    
        # Reusing __add__ in __radd__
        # def __radd__(self, other):
        #     return self.__add__(other)  # Call __add__ explicitly
        #     return self + other  # Swap order and re-add
        # __radd__ = __add__  # Alias: cut out the middleman
    x = Number(5)
    x + 2
    # 7
    2 + x
    # 7
  • To also implement += in-place augmented addition, code either an __iadd__ or an __add__. The latter is used if the former is absent, but may not be able optimize in-place cases.

    class Number:
        def __init__(self, value=0):
            self.data = value
    
        def __add__(self, other):
            return self.data+other
    
        __radd__ = __add__
    
        def __iadd__(self, other):  # __iadd__ explicit: x += y
            self.data += other  # Usually returns self
            return self
    x = Number(5)
    x += 1
    x += 1
    x.data
    # 7

13.3.7. Call expressions: __call__

  • Python runs a __call__ method for function call expressions applied to the instances, passing along whatever positional or keyword arguments were sent.

    class Callee:
        def __call__(self, *pargs, **kargs):  # Intercept instance calls
            print('Called:', pargs, kargs)  # Accept arbitrary arguments
    C = Callee()
    C(1, 2, 3)  # C is a callable object
    # Called: (1, 2, 3) {}
    C(1, 2, 3, x=4, y=5)
    # Called: (1, 2, 3) {'y': 5, 'x': 4}
    class C:
        def __call__(self, a, b, c=5, d=6): ...  # Normals and defaults
    
    class C:
        def __call__(self, *pargs, **kargs): ...  # Collect arbitrary arguments
    
    class C:
        def __call__(self, *pargs, d=6, **kargs): ...  # 3.X keyword-only argument

13.3.8. Boolean tests: __bool__ and __len__

  • In Boolean contexts, Python first tries __bool__ to obtain a direct Boolean value; if that method is missing, Python tries __len__ to infer a truth value from the object’s length.

    class Truth:
        def __bool__(self): return True
    X = Truth()
    if X: print('yes!')
    # yes!
    class Truth:
        def __bool__(self): return False
    X = Truth()
    bool(X)
    # False
    class Truth:
        def __len__(self): return 0
    X = Truth()
    if not X: print('no!')
    # no!
  • If both methods are present Python prefers __bool__ over __len__, because it is more specific:

    class Truth:
        def __bool__(self): return True # 3.X tries __bool__ first
        def __len__(self): return 0 # 2.X tries __len__ first
    X = Truth()
    if X: print('yes!')
    # yes!
  • If neither truth method is defined, the object is vacuously considered true (though any potential implications for more metaphysically inclined readers are strictly coincidental):

    class Truth:
        pass
    X = Truth()
    bool(X)
    # True

13.3.9. with/as Context Managers: __enter__ and __exit__

with expression [as variable], [expression [as variable]]:
    with-block

The with statement can be used with any object implementing __enter__() (resource acquisition/setup) and __exit__() (resource release/cleanup) to enable automatic resource management.

  • Files: The with open('filename', 'mode') as file: syntax opens a file, assigns it to a variable (file), and automatically closes the file when the indented block exits, even in case of exceptions.

  • Database Connections: with sqlite3.connect(':memory:') as con: creates a connection, assigns it to a variable, and guarantees closure upon exiting the block.

  • Locks: In multithreaded environments, with can be used with lock objects to acquire a lock at the beginning of the block and release it at the end, ensuring proper synchronization.

    fi = open('test.txt', 'w', encoding='utf-8')
    try:
        fi.write('hello world')
    finally:
        fi.close()
    with open('test.txt', 'r', encoding='utf-8') as fo:
        txt = fo.read()
        print(txt)
    with open('data', 'r', encoding='utf-8') as fin, open('res', 'wb') as fout:  # multiple context managers
        for line in fin:
            if 'some key' in line:
                fout.write(line)
class Cat:
    """A custom context manager class that simulates a cat entering and leaving."""

    def __enter__(self):
        """
        Called when entering the `with` block. Prints a message and returns itself.

        Returns:
            The Cat instance (self) to be used within the `with` block.
        """
        print("I'm coming in!")
        return self  # Return self to provide the managed object to the `with` block

    def __exit__(self, exc_type: type, exc_value: object, traceback: object) -> bool:
        """
        Called when exiting the `with` block, regardless of exceptions.
        Prints a message, optionally handles exceptions, and returns True to suppress them.

        Args:
            exc_type (type): The type of exception raised within the `with` block (if any).
            exc_value (object): The actual exception object raised (if any).
            traceback (object): A traceback object containing information about the call stack
                               (if any exception was raised).

        Returns:
            bool: True to suppress any exceptions raised within the `with` block,
                  False to re-raise them. (Can be modified for specific exception handling)
        """
        print("I'm going out.")
        # Suppress potential exceptions (modify for specific handling)
        return True

    def wow(self) -> None:
        """
        Method to simulate a cat's meow. Prints "meow!".

        Returns:
            None
        """
        print("meow!")


with Cat() as cat:  # type: Cat
    """Enters the context manager and assigns the Cat object to 'cat'."""
    cat.wow()  # Calls the cat's meow method within the context

# I'm coming in!
# meow!
# I'm going out.
from contextlib import contextmanager

class Cat:
    """A simple Cat class with a meow method."""

    def wow(self) -> None:
        """
        Method to simulate a cat's meow. Prints "meow!".

        Returns:
            None
        """
        print("meow!")


@contextmanager
def cat_context():
    """
    A generator-based context manager that simulates a cat entering and leaving.

    Yields:
        Cat: The Cat instance to be used within the `with` block.
    """
    print("I'm coming in!")
    cat = Cat()

    try:
        yield cat  # Provide the cat to the `with` block
    except Exception as e:
        # Suppress exceptions (like returning True in __exit__)
        pass  # Swallow the exception
    finally:
        print("I'm going out.")


with cat_context() as cat:
    """Enters the context manager and assigns the Cat object to 'cat'."""
    cat.wow()  # Calls the cat's meow method within the context

# I'm coming in!
# meow!
# I'm going out.

13.4. Enum

Added in version 3.4.

from enum import Enum

class Weekday(Enum):
    MONDAY = 1
    TUESDAY = 2
    WEDNESDAY = 3
    THURSDAY = 4
    FRIDAY = 5
    SATURDAY = 6
    SUNDAY = 7

Weekday(3)            # <Weekday.WEDNESDAY: 3>
Weekday["WEDNESDAY"]  # <Weekday.WEDNESDAY: 3>

print(Weekday.THURSDAY)         # Weekday.THURSDAY
print(Weekday.TUESDAY.name)     # TUESDAY
print(Weekday.WEDNESDAY.value)  # 3

for day in list(Weekday):
    print(day)
# Weekday.MONDAY
# Weekday.TUESDAY
# Weekday.WEDNESDAY
# Weekday.THURSDAY
# Weekday.FRIDAY
# Weekday.SATURDAY
# Weekday.SUNDAY
from enum import Flag

class Weekday(Flag):
    MONDAY = 1
    TUESDAY = 2
    WEDNESDAY = 4
    THURSDAY = 8
    FRIDAY = 16
    SATURDAY = 32
    SUNDAY = 64

weekend = Weekday.SATURDAY | Weekday.SUNDAY
# <Weekday.SATURDAY|SUNDAY: 96>
for day in weekend:
    print(day)
# Weekday.SATURDAY
# Weekday.SUNDAY

14. Exceptions

  • An exception is a class, which is a child of the class Exception.

    class OopsException(Exception): pass  # user-defined exception
  • The raise statement raises (triggers) a built-in or user-defined exception.

    raise instance  # raise instance of class
    raise clazz     # make and raise instance of class: makes an instance with no constructor arguments
    raise           # reraise the most recent exception
    try:
        1 / 0
    except Exception as E:
        raise TypeError('Bad') from E  # raise newexception from otherexception
    
    # Traceback (most recent call last):
    # ZeroDivisionError: division by zero
    #
    # The above exception was the direct cause of the following exception:
    #
    # Traceback (most recent call last):
    # TypeError: Bad
  • The assert statement raises an AssertionError exception if a condition is false.

    # assert test, data # the data part is optional
    assert False, 'Nobody expects the Spanish Inquisition!'  # AssertionError: Nobody expects the Spanish Inquisition!
  • The try statement catches and recovers from exceptions with one or more handlers for exceptions that may be raised during the block’s execution.

    # try -> except -> else -> finally
    try:
        raise OopsException('panic')  # raising exceptions
    except OopsException as err:  # 3.X localizes 'as' names to except block
        print(err)  # catch and recover from exceptions
    except (RuntimeError, TypeError, NameError) as err:  # multiple exceptions as a parenthesized tuple
        ...
    except Exception as other:  # except to catch all exceptions
        ...
    except:  # bare except to catch all exceptions
        ...
    else:
        ... # run if no exception was raised during try block
    finally:  # termination actions
        ...
  • The with/as statement is designed to automate startup and termination activities that must occur around a block of code.

    # try:
    #     file = open('lumberjack.txt', 'w', encoding='utf-8')
    #     file.write('The larch!\n')
    # finally:
    #     if file: file.close()
    with open('lumberjack.txt', 'w', encoding='utf-8') as file:  # always close file on exit
        file.write('The larch!\n')

15. Decorators

A decorator is a callable that returns a callable to specify management or augmentation code for functions and classes.

  • Function decorators, do name rebinding at function definition time, install wrapper objects to intercept later function calls and process them as needed, usually passing the call on to the original function to run the managed action.

    def decorator(F):
        # Process function F
        return F
    
    @decorator       # Decorate function
    def func(): ...  # func = decorator(func)
    def decorator(F):
        # Save or use function F
        # Return a different callable, a proxy: nested def, class with __call__, etc.
        ...
    
    @decorator
    def func(): ...  # func = decorator(func)
    def decorator(F):                 # On @ decoration
        def wrapper(*args, **kargs):  # On wrapped function call that retains the original function in an enclosing scope
            # Use F, args, and  kargs
            # F(*args, **kargs) calls original function
            ...
        return wrapper
    
    @decorator                         # func = decorator(func)
    def func(x, y, z=122):             # func is passed to decorator's F
        ...
    
    func(6, 7, 8)                      # 6, 7, 8 are passed to wrapper's *args, **kargs
    class decorator:
        def __init__(self, func):  # On @ decoration
            self.func = func
    
        def __call__(self, *args):  # On wrapped function call by overloading the call operation
            # Use self.func and args
            # self.func(*args) calls original function
    
    @decorator
    def func(x, y):                 # func = decorator(func)
        ...                         # func is passed to __init__
    
    func(6, 7)                      # 6, 7 are passed to __call__'s *args
    def decorator(A, B):
        # Save or use A, B
        def actualDecorator(F):
            # Save or use function F
            # Return a callable: nested def, class with __call__, etc.
            return callable
        return actualDecorator
    
    @decorator(A, B)
    def F(arg):  # F = decorator(A, B)(F) # Rebind F to result of decorator's return value
        ...
  • Class decorators, do name rebinding at class definition time, install wrapper objects to intercept later instance creation calls and process them as needed, usually passing the call on to the original class to create a managed instance.

    def decorator(C):
        # Process class C
        return C
    
    @decorator  # Decorate class
    class C:
        ...     # C = decorator(C)
    def decorator(C):
        # Save or use class C
        # Return a different callable, a proxy: nested def, class with __call__, etc.
    
    @decorator
    class C:
        ...  # C = decorator(C)
    def decorator(cls):                             # On @ decoration
        class Wrapper:
            def __init__(self, *args):              # On instance creation
                self.wrapped = cls(*args)
    
            def __getattr__(self, name):            # On attribute fetch
                return getattr(self.wrapped, name)
        return Wrapper
    
    @decorator
    class C:                        # C = decorator(C)
        def __init__(self, x, y):   # Run by Wrapper.__init__
            self.attr = 'spam'
    
    x = C(6, 7)                     # Really calls Wrapper(6, 7)
    print(x.attr)                   # Runs Wrapper.__getattr__, prints "spam"

16. Ellipsis (…​)

Ellipsis (…​) is Python’s built-in Ellipsis object, a singleton constant (like None, True, False).

>>> ...
Ellipsis
>>> type (...)
<class 'ellipsis'>
>>> Ellipsis is ...
True
>>>
  • Use …​ as a placeholder in function/class bodies (similar to pass):

    def function_to_implement_later():
        ...  # Placeholder - does nothing
    
    class IncompleteClass:
        ...  # Placeholder
    
    # Equivalent to:
    def function_to_implement_later():
        pass
  • In type annotations, …​ represents any number of elements:

    # Tuple with any number of ints
    def process(*args: tuple[int, ...]) -> None:
        pass
    
    # Callable with variadic arguments
    from collections.abc import Callable
    func: Callable[..., int]  # Any args, returns int
    
    # Fixed-length tuple
    point: tuple[int, int] = (1, 2)
    
    # Variadic tuple
    numbers: tuple[int, ...] = (1, 2, 3, 4, 5)
  • Libraries use …​ as a sentinel to mark special cases (distinct from None):

    from pydantic import BaseModel, Field
    
    class User(BaseModel):
        # Required field (no default)
        email: str = Field(..., description="Email address")
    
        # Optional field (defaults to None)
        avatar: str | None = Field(None, description="Avatar URL")
    
        # Optional with default value
        is_active: bool = Field(default=True, description="Active status")
    # Simplified Pydantic internal logic
    def Field(default=..., **kwargs):
        if default is ...:  # Checks if default is the Ellipsis object
            # Field is REQUIRED - no default value
            return RequiredField(**kwargs)
        else:
            # Field is OPTIONAL - has a default value
            return OptionalField(default=default, **kwargs)
  • In NumPy, …​ represents all remaining dimensions:

    import numpy as np
    
    arr = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
    # Shape: (2, 2, 2)
    
    arr[..., 0]    # All dimensions except last, then first element
    # Equivalent to: arr[:, :, 0]
    
    arr[0, ...]    # First element of first dimension, all others
    # Equivalent to: arr[0, :, :]
  • In type stub files, …​ indicates implementation not shown:

    # module.pyi (type stub)
    def complex_function(arg1: int, arg2: str) -> bool: ...
        # Implementation details not shown in stub
    
    class MyClass:
        def method(self) -> None: ...
        # Method signature only

17. Modules and packages

# A module is a single Python file (.py extension) containing Python code,
# that can include functions, classes, variables, and statements.

# animal.py (module file)
class Animal:
    def __init__(self, voice: str) -> None:
        self.__voice = voice

    def wow(self):
        print(f'{self.__voice}!')
# A package is a directory containing multiple Python modules and potentially
# subdirectories with even more modules, that represents a collection of related
# modules organized under a common namespace.
#
# A package import turns a directory into another Python namespace, with attributes
# corresponding to the subdirectories and module files that the directory contains.

# .
# ├── animals
# │   ├── cat.py
# │   ├── dog.py
# │   └── __init__.py
# └── main.py

# animals/cat.py
def wow():
    print('meow!')

# animals/dog.py
def wow():
    print('bark!')

# main.py
from animals import cat  # from package import module
import animals.dog as dog  # import package.module

cat.wow()  # meow!
dog.wow()  # bark!

17.1. search path

In the context of programming languages and environments, the search path refers to a list of directories that the program or interpreter looks at to locate specific files, particularly modules or libraries, that is composed of the concatenation of the four major components, that ultimately becomes sys.path, a mutable list of directory name strings:

  1. Home directory (automatic)

    • When running a program, this entry is the directory containing the program’s top-level script file.

    • When working interactively, this entry is the directory in the working (i.e., the current working directory).

  2. PYTHONPATH directories (if set)

    • In brief, PYTHONPATH is simply a list of user-defined and platform-specific names of directories that contain Python code files.

    • The os.pathsep constant in Python provides the provide platform-specific directory path separator on the module search path.

      • Windows: C:\Python310;C:\Users\YourName\Documents\my_modules

        import os, platform
        
        platform.system(), os.pathsep  # ('Windows', ';')
      • Linux/macOS: /usr/lib/python3.10/site-packages:/home/yourname/my_modules

        import os, platform
        
        platform.system(), os.pathsep  # ('Linux', ':')
  3. Standard library directories

  4. The contents of any .pth files (if present)

  5. The site-packages directory of third-party extensions (automatic)

import sys
for path in sys.path:
    print(f"'{path}'")

''  # current working directory where the script is located
'/usr/lib/python311.zip'  # standard library, built-in modules
'/usr/lib/python3.11'
'/usr/lib/python3.11/lib-dynload'  # dynamically loaded modules or libraries
'/usr/local/lib/python3.11/dist-packages'  # third-party libraries
'/usr/lib/python3/dist-packages'

# sys.path is a list, and can be updated programmlly
sys.path
# ['', '/usr/lib/python311.zip', '/usr/lib/python3.11', '/usr/lib/python3.11/lib-dynload', '/usr/local/lib/python3.11/dist-packages', '/usr/lib/python3/dist-packages']
sys.path.insert(0, '/tmp')
sys.path
# ['/tmp', '', '/usr/lib/python311.zip', '/usr/lib/python3.11', '/usr/lib/python3.11/lib-dynload', '/usr/local/lib/python3.11/dist-packages', '/usr/lib/python3/dist-packages']

17.2. __init__.py

# dir0\ # Container on module search path
#     dir1\
#         __init__.py
#         dir2\
#             __init__.py
#             mod.py

import dir1.dir2.mod
  • dir1 and dir2 both must contain an __init__.py file at least until Python 3.3.

  • dir0, the container, does not require an __init__.py file; this file will simply be ignored if present.

  • dir0, not dir0\dir1, must be listed on the module search path sys.path.

The __init__.py file serves as a hook for package initialization-time actions, declares a directory as a package, generates a module namespace for a directory, and implements the behavior of from * (i.e., from .. import *) statements when used with directory imports:

  • Package initialization: The first time a Python program imports through a directory, it automatically runs all the code in the directory’s __init__.py file which a natural place to put code to initialize the state required by files in a package.

  • Module usability declarations: Package __init__.py files are also partly present to declare that a directory is a regular module package.

  • Module namespace initialization: In the package import model, the directory paths in a script become real nested object paths after an import.

  • from * statement behavior: As an advanced feature, the __all__ lists in __init__.py files can define what is exported when a directory is imported with the from * statement form.

17.3. import and from statements, reload call

  • import fetches the module as a whole, and must qualify to fetch its names.

    import module_name
  • from fetches (or copies) specific names out of the module over to another scope, and when using a * (used only at the top level of a module file, not within a function) instead of specific names, it copies of all names assigned at the top level of the referenced module.

    # import specific functions or classes from a module.
    from module_name import element1, element2
    # import a specific element and assign it an alias for easier use.
    from module_name import element1 as alias
    # copy out _all_ variables
    from module_name import *
  • Like def, import and from are executable statements, not compile-time declarations, and they are implicit assignments:

    • import assigns an entire module object to a single name.

    • from assigns one or more names to objects of the same names in another module.

  • Modules are loaded and run on the first import or from, and only the first.

  • Unlike import and from:

    • reload is a function in Python, not a statement.

    • reload is passed an existing module object, not a new name.

    • reload lives in a module in Python 3.X and must be imported itself.

    # import module                 # initial import
    # ...use module.attributes...
    # ...                           # now, go change the module file
    # ...
    # from importlib import reload  # get reload itself (in 3.x)
    # reload(module)                # get updated exports
    # ...use module.attributes...
  • A namespace package is not fundamentally different from a regular package (must have an __init__.py file that is run automatically); it is just a different way of creating packages which are still relative to sys.path at the top level: the leftmost component of a dotted namespace package path must still be located in an entry on the normal module search path.

    import dir1.dir2.mod
    from dir1.dir2.mod import x
    import splitdir.mod
    mkdir -p /code/ns/dir{1,2}/sub  # two dirs of same name in different dirs
    # module files in different directories
    
    # /code/ns/dir1/sub/mod1.py
    print(r'dir1\sub\mod1')
    
    # /code/ns/dir2/sub/mod2.py
    print(r'dir2\sub\mod2')
    PYTHONPATH=/code/ns/dir1:/code/ns/dir2 python -q
    import sub
    sub  # namespace packages: nested search paths
    # <module 'sub' (<_frozen_importlib_external.NamespaceLoader object at 0x7fd1eeda5c50>)>
    sub.__path__
    # _NamespacePath(['/code/ns/dir1/sub', '/code/ns/dir2/sub'])
    
    from sub import mod1
    # dir1\sub\mod1
    import sub.mod2  # content from two different directories
    # dir2\sub\mod2
    
    mod1
    # <module 'sub.mod1' from '/code/ns/dir1/sub/mod1.py'>
    sub.mod2
    # <module 'sub.mod2' from '/code/ns/dir2/sub/mod2.py'>

17.4. relative imports

  • The from statement can use leading dots (.) to specify that it require modules located within the same package (known as package relative imports), instead of modules located elsewhere on the module import search path (called absolute imports).

    from . import string # relative to this package, imports mypkg.string
    from .string import name1, name2 # imports names from mypkg.string
    from .. import string # imports string sibling of mypkg
    ├── main.py
    └── spam
        ├── eggs.py
        ├── ham.py
        └── __init__.py
    # spam/ham.py
    from . import eggs
    print('eggs')
    # main.py
    from spam import ham
    $ python3 main.py
    eggs

    Running main.py directly sets the module’s __name__ attribute to "__main__", causing issues with relative imports which rely on it being set to the package name.

    # mypkg\
    #     main.py
    #     string.py
    # string.py
    def some_function(): ...
    # main.py
    from .string import some_function
    $ python3 main.py
    Traceback (most recent call last):
        from .string import some_function
    ImportError: attempted relative import with no known parent package

17.5. import best practices

  • Import statements should be organized in this order: standard library, third-party packages, local application imports, separated by blank lines between groups and sorted alphabetically within each group for consistency and easier maintenance.

    # Standard library
    from collections import defaultdict
    from datetime import datetime
    
    # Third-party
    from fastapi import APIRouter
    from sqlalchemy.orm import Session
    
    # Local application
    from app.models import User
    from app.services import UserService
  • Use absolute imports for cross-package imports and when importing from internal modules within the same package, and use relative imports primarily in __init__.py files to re-export from sibling modules.

    # Absolute import (preferred for most cases)
    from app.services.user import UserService
    
    # Relative import (for __init__.py files)
    from .user import UserService
# app/services/__init__.py
from .auth import AuthService
from .user import UserService
from .transaction import TransactionService
  • External code should import through the package’s init.py rather than directly from submodules to provide a stable public API and hides internal structure.

    # External code (e.g., routers, dependencies)
    from app.services import UserService  # Good - uses __init__.py
    
    # Not this:
    from app.services.user import UserService  # Bypasses package API
  • Never import through __init__.py from within the same package to prevent circular dependencies.

    # ❌ BAD: app/services/settlement.py
    from app.services import TransactionService  # Circular import!
    
    # ✅ GOOD: app/services/settlement.py
    from app.services.transaction import TransactionService  # Direct import
  • When modules in the same package need each other, use direct absolute imports to the specific module, not the package’s __init__.py.

    # Internal module importing from same package
    from app.services.user import UserService
    from app.services.transaction import TransactionService
  • Use TYPE_CHECKING from the typing module to import types that are only needed for type hints, not at runtime, preventing circular imports and reducing runtime overhead.

    from typing import TYPE_CHECKING
    
    if TYPE_CHECKING:
        from .transaction import TransactionService
        from .user import UserService
    
    def process(service: TransactionService) -> None:  # Type hint works
        pass
  • Never use wildcard imports (from module import *) except in specific controlled scenarios, as they pollute the namespace and make code harder to understand

17.6. _X, __all__, __name__, and __main__

  • Python looks for an __all__ list in the module first and copies its names irrespective of any underscores; if __all__ is not defined, from * copies all names without a single leading underscore (_X):

    # unders.py
    a, _b, c, _d = 1, 2, 3, 4
    from unders import * # Load non _X names only
    a, c  # (1, 3)
    _b  # NameError: name '_b' is not defined
    
    import unders # But other importers get every name
    unders._b  # 2
    # alls.py
    __all__ = ['a', '_c'] # __all__ has precedence over _X
    a, b, _c, _d = 1, 2, 3, 4
    from alls import *  # load __all__ names only
    a, _c  # (1, 3)
    b  # NameError: name 'b' is not defined
    from alls import a, b, _c, _d  # but other importers get every name
    a, b, _c, _d  # (1, 2, 3, 4)
    
    import alls
    alls.a, alls.b, alls._c, alls._d  # (1, 2, 3, 4)
  • If a module’s __name__ variable is the string "__main__", it means that the file is being executed as a top-level script as a program instead of being imported from another file as a library in the program.

    # cat.py
    def wow():
        return __name__
    
    if __name__ == '__main__':
        print(f'executed: {wow()}')
    $ python3 cat.py  # directly executed (as a script)
    executed: __main__
    # imported by another module
    from cat import wow
    print(f'imported: {wow()}')  # imported: cat

17.7. modules by name strings

  • To import the referenced module given its string name, build and run an import statement with exec, or pass the string name in a call to the __import__ or importlib.import_module.

    # The `import` statements can’t directly to load a module given its name as a
    # string—Python expects a variable name that’s taken literally and not evalu-
    # ated, not a string or expression.
    import 'string'
    #   File "<stdin>", line 1
    #     import 'string'
    #            ^^^^^^^^
    # SyntaxError: invalid syntax
    # The most general approach is to construct an `import` statement as a string of Python
    # code and pass it to the `exec` built-in function to run, but it must compile the `import`
    # statement each time it runs, and compiling can be slow.
    modname = 'string'
    exec('import ' + modname) # Run a string of code
    string
    # <module 'string' from '/usr/lib/python3.11/string.py'>
    # In most cases it’s probably simpler and may run quicker to use the built-in `__import__`
    # function to load from a name string instead, which returns the module object, so assign it
    # to a name here to keep it.
    modname = 'string'
    string = __import__(modname)
    string
    # <module 'string' from '/usr/lib/python3.11/string.py'>
    # The newer call `importlib.import_module` does the same work as the built-in `__import__`
    # function, and is generally preferred in more recent Pythons for direct calls to import
    # by name string.
    import importlib
    modname = 'string'
    string = importlib.import_module(modname)

17.8. pip: pip install packages

# ensure can run pip from the command line
python3 -m pip --version  # pip --version
# pip 23.0.1 from /usr/lib/python3/dist-packages/pip (python 3.11)

# OR, install pip, venv modules in Debian/Ubuntu for the system python.
apt install python3-pip python3-venv  # On Debian/Ubuntu systems

17.8.1. virtual environment

# create a virtual environment
python3 -m venv python-learning-notes_env

# active a virtual environment
source python-learning-notes_env/bin/activate

# ensure pip, setuptools, and wheel are up to date
pip install --upgrade pip setuptools wheel

# show pip version
pip --version  # python3 -m pip --version
# pip 24.0 from .../python-learning-notes_env/lib/python3.11/site-packages/pip (python 3.11)

# deactive a virtual environment: the deactivate command is often implemented as a shell function.
deactivate

17.8.2. Version specifiers

A version specifier consists of a series of version clauses, separated by commas. For example:

~= 0.9, >= 1.0, != 1.3.4.*, < 2.0

The comparison operator determines the kind of version clause:

Examples:

  • ~=3.1: version 3.1 or later, but not version 4.0 or later.

  • ~=3.1.2: version 3.1.2 or later, but not version 3.2.0 or later.

  • ~=3.1a1: version 3.1a1 or later, but not version 4.0 or later.

  • == 3.1: specifically version 3.1 (or 3.1.0), excludes all pre-releases, post releases, developmental releases and any 3.1.x maintenance releases.

  • == 3.1.*: any version that starts with 3.1. Equivalent to the ~=3.1.0 compatible release clause.

  • ~=3.1.0, != 3.1.3: version 3.1.0 or later, but not version 3.1.3 and not version 3.2.0 or later.

17.8.3. pip install

# install the latest stable version.
pip install <package_name>

# install a package with extras, i.e., optional dependencies (e.g., pip install 'transformers[torch]').
pip install <package_name>[extra1[,extra2,...]]

# install the exact version (e.g., pip install vllm==0.4.3).
pip install <package_name>==<version>

# install the latest version greater than or equal to the specified one (e.g., pip install vllm>=0.4.0 gets anything from 0.4.0 onwards), but within the same major version.
pip install <package_name>>=<version>

# install the latest patch version (tilde operator) within the specified major and minor version (e.g., pip install vllm~=0.4).
pip install <package_name>~=<version>

# upgrade an already installed to the latest from PyPI.
pip install --upgrade <package_name>

# install from an alternate index
pip install --index-url http://my.package.repo/simple/ <package_name>

# search an additional index during install, in addition to PyPI
pip install --extra-index-url http://my.package.repo/simple <package_name>

# install pre-release and development versions, in addition to stable versions
pip install --pre <package_name>

17.8.4. cache, configuration

# get the cache directory that pip is currently configured to use
pip cache dir  # ~/.cache/pip
# Configuration files can change the default values for command line options, and pip has 3 levels:
#   - global: system-wide configuration file, shared across users.
#   - user: per-user configuration file.
#   - site: per-environment configuration file; i.e. per-virtualenv.

# the names of the settings are derived from the long command line option.
[global]
timeout = 60
index-url = https://download.zope.org/ppix

# per-command section: pip install
[install]
ignore-installed = true
no-dependencies = yes
# finding the config directory programmatically:
Debian GNU/Linux$ pip config list -v
For variant 'global', will try loading '/etc/xdg/pip/pip.conf'
For variant 'global', will try loading '/etc/pip.conf'
For variant 'user', will try loading '~/.pip/pip.conf'
For variant 'user', will try loading '~/.config/pip/pip.conf'
For variant 'site', will try loading '$VIRTUAL_ENV/pip.conf' or '/usr/pip.conf'

Microsoft Windows 11 > pip config list -v
For variant 'global', will try loading '%ALLUSERSPROFILE%\pip\pip.ini'
For variant 'user', will try loading '%USERPROFILE%\pip\pip.ini'
For variant 'user', will try loading '%APPDATA%\pip\pip.ini'
For variant 'site', will try loading '%VIRTUAL_ENV%\pip.ini' or '%LOCALAPPDATA%\Programs\Python\Python312\pip.ini'

17.8.5. mirror

# default: https://pypi.org/simple

# set the PyPI mirror
pip config --user set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
# pip config --user set global.index-url https://mirrors.aliyun.com/pypi/simple/
# pip config set global.extra-index-url "https://mirrors.sustech.edu.cn/pypi/web/simple https://mirrors.aliyun.com/pypi/simple/"

17.8.6. conda

Conda is a free, open-source software program for package and environment management originally developed by Anaconda.

  • Miniconda is a free, miniature installation of Anaconda Distribution that includes only conda, Python, the packages they both depend on, and a small number of other useful packages.

    # download and install the latest version
    wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
    bash ~/Miniconda3-latest-Linux-x86_64.sh
    
    # optional: disable auto-activation of base environment on startup
    conda config --set auto_activate_base false
  • Conda channels are the locations where packages are stored that serve as the base for hosting and managing packages. By default, conda can serve packages from two main locations:

    • By default, Conda automatically uses repo.anaconda.com to download and update packages.

      $ conda config --get channels
      --add channels 'https://repo.anaconda.com/pkgs/msys2'   # lowest priority (1)
      --add channels 'https://repo.anaconda.com/pkgs/r' (2)
      --add channels 'https://repo.anaconda.com/pkgs/main'   # highest priority (3)
      1 A Windows-specific channel that provides Unix-like tools and libraries necessary for many packages to function on Windows.
      2 A specialized channel dedicated to packages for the R programming language.
      3 The default, general-purpose channel maintained by Anaconda, Inc., primarily hosting Python-based scientific computing packages.
    • In addition, Conda clients search conda.anaconda.org for community channels like conda-forge or bioconda.

      conda-forge is a separate, community-led channel, required to be added explicitly.

      $ conda config --add channels conda-forge
      $ conda config --get channels
      --add channels 'https://repo.anaconda.com/pkgs/msys2'   # lowest priority
      --add channels 'https://repo.anaconda.com/pkgs/r'
      --add channels 'https://repo.anaconda.com/pkgs/main'
      --add channels 'conda-forge'   # highest priority

      The conda config --add channels conda-forge command modifies the Conda configuration file (~/.condarc or %USERPROFILE%\.condarc), adding conda-forge to the top of the channels list, thereby assigning it the highest priority during package searches.

    • Conda can be configured to use mirror servers instead of the default online repositories.

      # mirror defaults
      default_channels: (1)
          - https://my-mirror.com/pkgs/main
          - https://my-mirror.com/pkgs/r
          - https://my-mirror.com/pkgs/msys2
      
      # mirror all community channels
      channel_alias: https://my-mirror.com (2)
      
      # mirror only some community channels
      custom_channels: (3)
          conda-forge: https://my-mirror.com/conda-forge
      1 The default_channels setting completely replaces Conda’s built-in default channels, redirecting all package requests for them to specified mirror URLs.
      2 The channel_alias setting establishes a base URL that prefixes all non-default channel names (e.g., conda-forge in conda install -c conda-forge), thereby redirecting their package requests to a designated mirror location.
      3 The custom_channels setting allows for direct mapping of specific channel names to particular mirror URLs, providing fine-grained control and overriding any channel_alias for those listed channels.
      # using TUNA mirrors
      show_channel_urls: true
      default_channels:
        - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
        - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/r
        - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/msys2
      custom_channels:
        conda-forge: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  • Conda is a powerful command line tool for package and environment management that runs on Windows, macOS, and Linux.

    • Create, activate, list, share, remove, and update environments.

      # create a new, empty environment
      conda create -n <env-name>
      
      # create a new environment with default packages
      conda create -n <env-name> python pandas
      
      # createa new environment with specific Python version
      conda create -n <env-name> python=3.12
      
      # create or update an environment from a file
      conda env create -f environment.yml
      # list all environments
      conda env list
      # activate an environment
      conda activate myenv
      
      # deactivate the current environment
      conda deactivate
      # export the current environment to a file (verbose)
      conda env export > environment.yml
      
      # export only explicitly installed packages (recommended)
      conda env export --from-history > environment.yml
      # remove an environment and all its packages
      conda env remove --name my_env
    • Run commands (conda run) in an environment without shell activation.

      # Best Practice: Use `conda run` in scripts to execute a command in an
      # environment without needing to activate it first. This is more robust
      # for automation as it doesn't modify the shell's state.
      
      # run a command in a specific conda environment without activating it
      conda run -n myenv python my_script.py
      
      # run an arbitrary command
      conda run -n myenv pytest
      
      # for interactive commands or long-running services, use --no-capture-output
      # to see the output in real-time instead of all at the end.
      conda run -n myenv --no-capture-output python my_interactive_app.py
      #!/bin/bash
      
      # A script that runs a Python application, demonstrating a priority-based
      # approach for choosing the Python interpreter:
      # 1. Use the python from an active Conda environment.
      # 2. If not active, use `conda run` with the environment from `environment.yml`.
      # 3. As a fallback, use the system's `python3`.
      
      PROJECT_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
      PYTHON_CMD="python3"
      
      # Check for active Conda environment (CONDA_PREFIX is set when conda env is activated)
      if [ -n "$CONDA_PREFIX" ]; then
          PYTHON_CMD="python"
      elif command -v conda >/dev/null && [ -f "$PROJECT_ROOT/environment.yml" ]; then
          ENV_NAME=$(grep -E "^name:" "$PROJECT_ROOT/environment.yml" | sed -E 's/^name:[[:space:]]*([^[:space:]#]+).*/\1/' | head -n1)
          # Use 'conda run' to execute in the specified environment without activating it
          [ -n "$ENV_NAME" ] && PYTHON_CMD="conda run --no-capture-output -n $ENV_NAME python"
      fi
      
      cd "$PROJECT_ROOT"
      
      # Execute the main module (e.g., 'cli.main', 'app.main', etc.), passing through all script arguments
      PYTHONPATH="$PROJECT_ROOT" $PYTHON_CMD -m cli.main "$@"
    • Find, install, remove, list, and update packages.

      # search for a package across all configured channels
      conda search scipy
      
      # search ONLY in a specific channel, ignoring all other configured channels
      conda search --override-channels -c conda-forge scipy
      
      # search an additional channel with highest priority (results are combined with configured channels)
      conda search -c conda-forge scipy
      # search for a package with a version pattern (e.g., greater than or equal to)
      conda search "numpy>=1.20"
      
      # search for a package with a version prefix
      conda search "numpy=1.20.*"
      
      # search for a package for a different platform
      conda search numpy --platform linux-64
      
      # filter search results using external tools like grep (e.g., for a specific Python build)
      conda search numpy | grep py39
      # show detailed information for a specific package build
      conda search scipy=1.15.2=py313hf4aebb8_1 --info
      # install a package into the currently active environment
      conda install matplotlib
      
      # install a package into a specific environment
      conda install -n myenv matplotlib
      
      # install a package from a specific channel
      conda install -c conda-forge numpy
      # remove a package from the currently active environment
      conda remove matplotlib
      
      # remove a package from a specific environment
      conda remove -n myenv pandas
      # list installed packages in the current environment
      conda list
      
      # list installed packages in a specific environment
      conda list -n myenv
      # update a specific package
      conda update biopython
      
      # update Python in the current environment
      conda update python
      
      # update a specific package in a specific environment
      conda update -n myenv biopython
    • Update the Conda package manager itself.

      # update conda itself (simple, but may use non-default channels)
      conda update conda
      
      # update conda from the official defaults channel (recommended for stability)
      conda update -n base -c defaults conda
    • Using pip in a Conda Environment.

      # install pip into an environment
      conda install -n myenv pip
      
      # use pip to install a package (after activating the environment)
      conda activate myenv
      pip install <package-name>

17.8.7. uv

uv is an extremely fast Python package and project manager, written in Rust, to replace pip, pip-tools, pipx, virtualenv, and more.

  • uv provides a standalone installer to download and install uv:

    $ curl -LsSf https://astral.sh/uv/install.sh | sh
    downloading uv 0.9.26 x86_64-unknown-linux-gnu
    no checksums to verify
    installing to /home/user/.local/bin
      uv
      uvx
    everything's installed!
    pip install uv
    winget install --id=astral-sh.uv  -e
    echo 'eval "$(uv generate-shell-completion bash)"' >> ~/.bashrc
  • installing and managing Python versions.

    # install Python versions.
    uv python install
    
    # view available Python versions.
    uv python list
    
    # find an installed Python version.
    uv python find
    
    # pin the current project to use a specific Python version.
    uv python pin
    
    # uninstall a Python version.
    uv python uninstall
  • creating and working on Python projects, i.e., with a pyproject.toml.

    # create a new Python project.
    uv init
    
    # add a dependency to the project.
    uv add
    
    # remove a dependency from the project.
    uv remove
    
    # sync the project's dependencies with the environment.
    uv sync
    
    # create a lockfile for the project's dependencies.
    uv lock
    
    # run a command in the project environment.
    uv run
    
    # view the dependency tree for the project.
    uv tree
    
    # build the project into distribution archives.
    uv build
    
    # publish the project to a package index.
    uv publish
  • running and installing tools published to Python package indexes, e.g., ruff or black.

    # run a tool in a temporary environment.
    uvx # an alias for `uv tool run`
    
    # install a tool user-wide.
    uv tool install
    
    # uninstall a tool.
    uv tool uninstall
    
    # list installed tools.
    uv tool list
    
    # update the shell to include tool executables.
    uv tool update-shell
  • managing and inspecting uv’s state, such as the cache, storage directories, or performing a self-update:

    # remove cache entries.
    uv cache clean
    
    # remove outdated cache entries.
    uv cache prune
    
    # show the uv cache directory path.
    uv cache dir
    
    # show the uv tool directory path.
    uv tool dir
    
    # show the uv installed Python versions path.
    uv python dir
    
    # update uv to the latest version.
    uv self update

18. Testing

  • unittest

    # **Key Points About `unittest` in Python:**
    #
    # * **Test Cases:** Individual units of testing that verify specific functionality.
    # * **Test Suites:** Collections of test cases that can be run together.
    # * **Assertions:** Methods used to check if expected results match actual results.
    # * **Test Case Structure:** Arrange-Act-Assert (AAA) is a common structure.
    # * **Test Fixtures:** `setUp()` and `tearDown()` methods for setup and cleanup.
    # * **Running Tests:** `unittest.main()` is the primary way to run tests.
    # * **Best Practices:** Write clear, concise, and well-organized tests.
    # * **Naming Conventions:** Test case functions must be prefixed with `test_`.
    #
    # **Common Assertions:**
    #
    # * `assertEqual(a, b)`: Checks if `a` equals `b`.
    # * `assertNotEqual(a, b)`: Checks if `a` does not equal `b`.
    # * `assertTrue(condition)`: Checks if `condition` is `True`.
    # * `assertFalse(condition)`: Checks if `condition` is `False`.
    # * `assertIn(item, container)`: Checks if `item` is in `container`.
    # * `assertNotIn(item, container)`: Checks if `item` is not in `container`.
    
    # test_cap.py
    import unittest
    
    def cap(text: str) -> str:
        return text.capitalize()
    
    class TestCap(unittest.TestCase):
        def setUp(self) -> None:
            pass
    
        def tearDown(self) -> None:
            pass
    
        def test_one_word(self):
            text = 'duck'  # _arrange_ the objects, create and set them up as necessary.
    
            result = cap(text)  # _act_ on an object.
    
            self.assertEqual('Duck', result)  # _assert_ that something is as expected.
    
        def test_multi_words(self):
            text = 'hello world'  # _arrange_ the objects, create and set them up as necessary.
    
            result = cap(text)  # _act_ on an object.
    
            self.assertEqual('Hello World', result)  # _assert_ that something is as expected.
    
        def test_table_driven(self):
            # _arrange_ the objects, create and set them up as necessary.
            tests = [
                ('duck', 'Duck'),
                ('hello world', 'Hello World')
            ]
    
            for text, expected in tests:
                result = cap(text)  # _act_ on an object.
                self.assertEqual(result, expected)  # _assert_ that something is as expected.
    
    if __name__ == '__main__':
        unittest.main()
    $ python3 test_cap.py
    F.
    ======================================================================
    FAIL: test_multi_words (__main__.TestCap.test_multi_words)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "...", line 27, in test_multi_words
        self.assertEqual('Hello World', result)
    AssertionError: 'Hello World' != 'Hello world!'
    - Hello World
    ?       ^
    + Hello world
    ?       ^
    
    
    ----------------------------------------------------------------------
    Ran 2 tests in 0.003s
    
    FAILED (failures=1)
  • doctest

    # doctest_cap.py
    def cap(text: str) -> str:
        """
        >>> cap('duck')
        'Duck'
        >>> cap('hello world')
        'Hello World'
        """
        return text.capitalize()
    
    if __name__ == '__main__':
        import doctest
        doctest.testmod()
    $ python3 doctest_cap.py
    **********************************************************************
    File "...", line 5, in __main__.cap
    Failed example:
        cap('hello world')
    Expected:
        'Hello World'
    Got:
        'Hello world'
    **********************************************************************
    1 items had failures:
       1 of   2 in __main__.cap
    ***Test Failed*** 1 failures.
  • pytest

    # test_cap.py
    def cap(text: str) -> str:
        return text.capitalize()
    
    def test_one_word():
        text = 'duck'
        result = cap(text)
        assert result == 'Duck'
    
    def test_multiple_words():
        text = 'hello world'
        result = cap(text)
        assert result == 'Hello World'
    $ pipenv install pytest
    Installing pytest...
    Installing dependencies from Pipfile.lock (207fdb)...
    $ pytest
    ============================================== test session starts ==============================================
    platform linux -- Python 3.11.2, pytest-8.2.1, pluggy-1.5.0
    rootdir: ...
    collected 2 items
    
    test_cap.py .F                                                                                            [100%]
    
    =================================================== FAILURES ====================================================
    ______________________________________________ test_multiple_words ______________________________________________
    
        def test_multiple_words():
            text = 'hello world'
            result = cap(text)
    >       assert result == 'Hello World'
    E       AssertionError: assert 'Hello world' == 'Hello World'
    E
    E         - Hello World
    E         ?       ^
    E         + Hello world
    E         ?       ^
    
    test_cap.py:12: AssertionError
    ============================================ short test summary info ============================================
    FAILED test_cap.py::test_multiple_words - AssertionError: assert 'Hello world' == 'Hello World'
    ========================================== 1 failed, 1 passed in 0.09s ==========================================

19. Processes and concurrency

# The standard library’s os module provides a common way of accessing some system information.
import os
os.uname()
# posix.uname_result(sysname='Linux', nodename='node-0', release='6.1.0-21-amd64', version='#1 SMP PREEMPT_DYNAMIC Debian 6.1.90-1 (2024-05-03)', machine='x86_64')
os.getloadavg()
# (0.05126953125, 0.03955078125, 0.00341796875)
os.cpu_count()
# 4
(os.getpid(), os.getcwd(), os.getuid(), os.getgid())
# (1295, '/tmp', 1000, 1000)
os.system('date -u')
# Thu Jun  6 11:23:23 AM UTC 2024
# 0
# get system and process information with the third-party package psutil
import psutil  # pip install psutil
print(psutil.cpu_times(percpu=True))
# [scputimes(user=4.37, nice=0.0, system=6.71, idle=1468.69, iowait=0.26, irq=0.0, softirq=1.86, steal=0.0, guest=0.0, guest_nice=0.0), scputimes(user=11.84, nice=0.0, system=9.3, idle=1465.29, iowait=1.02, irq=0.0, softirq=0.75, steal=0.0, guest=0.0, guest_nice=0.0), scputimes(user=10.31, nice=0.0, system=8.58, idle=1468.4, iowait=1.66, irq=0.0, softirq=0.97, steal=0.0, guest=0.0, guest_nice=0.0), scputimes(user=9.11, nice=0.0, system=10.02, idle=1467.95, iowait=0.81, irq=0.0, softirq=0.65, steal=0.0, guest=0.0, guest_nice=0.0)]
print(psutil.cpu_percent(percpu=False))
# 0.0
print(psutil.cpu_percent(percpu=True))
# [0.3, 0.4, 0.4, 0.1]

19.1. subprocess and multiprocessing

import subprocess

# run another program in a shell
# and grab whatever output it created (both standard output and standard error output)
print(subprocess.getoutput('date'))  # Thu Jun  6 07:19:50 PM CST 2024

# A variant method called `check_output()` takes a list of the command and arguments.
# By default it returns standard output only as type bytes rather than a string, and
# does not use the shell:
print(subprocess.check_output(['date', '-u']))  # b'Thu Jun  6 11:30:09 AM UTC 2024\n'

# return a tuple with the status code and output of the other program
print(subprocess.getstatusoutput('date'))  # (0, 'Thu Jun  6 07:32:25 PM CST 2024')

# capture the exit status only
ret = subprocess.call('date -u', shell=True)
# Thu Jun  6 11:45:51 AM UTC 2024
print(ret)
# 0

# makes a list of the arguments, not need to call the shell
ret = subprocess.call(['date', '-u'])
# Thu Jun  6 11:50:04 AM UTC 2024
print(ret)
# 0
# create multiple independent processes
import multiprocessing
import os

def whoami(what):
    print("Process %s says: %s" % (os.getpid(), what))

if __name__ == "__main__":
    whoami("I'm the main program")
    for n in range(4):
        p = multiprocessing.Process(
            target=whoami, args=("I'm function %s" % n,))
        p.start()

# Process 1648 says: I'm the main program
# Process 1649 says: I'm function 0
# Process 1650 says: I'm function 1
# Process 1651 says: I'm function 2
# Process 1652 says: I'm function 3
# kill a process with terminate()
import multiprocessing
import time
import os

def whoami(name):
    print("I'm %s, in process %s" % (name, os.getpid()))

def loopy(name):
    whoami(name)
    start = 1
    stop = 1000000
    for num in range(start, stop):
        print("\tNumber %s of %s. Honk!" % (num, stop))
        time.sleep(1)

if __name__ == "__main__":
    whoami("main")
    p = multiprocessing.Process(target=loopy, args=("loopy",))
    p.start()
    time.sleep(5)
    p.terminate()

# I'm main, in process 13084
# I'm loopy, in process 14664
#         Number 1 of 1000000. Honk!
#         Number 2 of 1000000. Honk!
#         Number 3 of 1000000. Honk!
#         Number 4 of 1000000. Honk!
#         Number 5 of 1000000. Honk!

19.2. Queues, processes, and threads

A queue is like a list: things are added at one end and taken away from the other, which most common is referred to as FIFO (first in, first out). In general, queues transport messages, which can be any kind of information, for distributed task management, also known as work queues, job queues, or task queues.

Threads can be dangerous. Like manual memory management in languages such as C and C++, they can cause bugs that are extremely hard to find, let alone fix. To use threads, all the code in the program (and in external libraries that it uses) must be thread safe.

In Python, threads do not speed up CPU-bound tasks because of an implementation detail in the standard Python system called the Global Interpreter Lock (GIL).

  • Use threads for I/O-bound problems

  • Use processes, networking, or events (discussed in the next section) for CPU-bound problems

import multiprocessing as mp

def washer(dishes, output):
    for dish in dishes:
        print('Washing', dish, 'dish')
        output.put(dish)

def dryer(input):
    while True:
        dish = input.get()
        print('Drying', dish, 'dish')
        input.task_done()

dish_queue = mp.JoinableQueue()
dryer_proc = mp.Process(target=dryer, args=(dish_queue,))
dryer_proc.daemon = True
dryer_proc.start()
dishes = ['salad', 'bread', 'entree', 'dessert']
washer(dishes, dish_queue)
dish_queue.join()

# Washing salad dish
# Washing bread dish
# Washing entree dish
# Washing dessert dish
# Drying salad dish
# Drying bread dish
# Drying entree dish
# Drying dessert dish
import threading
import queue
import time

def washer(dishes, dish_queue):
    for dish in dishes:
        print("Washing", dish)
        time.sleep(5)
        dish_queue.put(dish)

def dryer(dish_queue):
    while True:
        dish = dish_queue.get()
        print("Drying", dish)
        time.sleep(10)
        dish_queue.task_done()

dish_queue = queue.Queue()
for n in range(2):
    dryer_thread = threading.Thread(target=dryer, args=(dish_queue,))
    dryer_thread.start()
dishes = ['salad', 'bread', 'entree', 'dessert']
washer(dishes, dish_queue)
dish_queue.join()

# Washing salad
# Washing bread
# Drying salad
# Washing entree
# Drying bread
# Washing dessert
# Drying entree
# Drying dessert

19.3. concurrent.futures

The concurrent.futures module in the standard library can be used to schedule an asynchronous pool of workers, using threads (when I/O-bound) or processes (when CPU-bound), and get back a future to track their state and collect the results.

Use concurrent.futures any time to launch a bunch of concurrent tasks, such as the following:

  • Crawling URLs on the web

  • Processing files, such as resizing images

  • Calling service APIs

from concurrent import futures
import math
import sys

def calc(val):
    result = math.sqrt(float(val))
    return val, result

def use_threads(num, values):
    with futures.ThreadPoolExecutor(num) as tex:
        tasks = [tex.submit(calc, value) for value in values]
        for f in futures.as_completed(tasks):
            yield f.result()

def use_processes(num, values):
    with futures.ProcessPoolExecutor(num) as pex:
        tasks = [pex.submit(calc, value) for value in values]
        for f in futures.as_completed(tasks):
            yield f.result()

def main(workers, values):
    print(f"Using {workers} workers for {len(values)} values")
    print("Using threads:")
    for val, result in use_threads(workers, values):
        print(f'{val} {result:.4f}')
    print("Using processes:")
    for val, result in use_processes(workers, values):
        print(f'{val} {result:.4f}')

if __name__ == '__main__':
    workers = 3
    if len(sys.argv) > 1:
        workers = int(sys.argv[1])
        values = list(range(1, 6))  # 1 .. 5
    main(workers, values)

19.4. Asynchronous I/O

Python 3.4 introduced the asyncio module for asynchronous programming, and Python 3.5 added the async and await keywords, enabling coroutines (pausable functions) and an event loop for scheduling them, which facilitates high-performance networking, web servers, database connections, and distributed task queues, making it ideal for IO-bound and structured network applications.

import asyncio

async def say(phrase, seconds):
    print(phrase)
    await asyncio.sleep(seconds)

async def wicked():
    task_1 = asyncio.create_task(say("Surrender,", 2))
    task_2 = asyncio.create_task(say("Dorothy!", 0))
    await task_1
    await task_2

#  blocking: runs the passed coroutine in the default executor, which given a timeout duration of 5 minutes to shutdown
asyncio.run(wicked())
import asyncio

async def say(phrase, seconds):
    print(phrase)
    await asyncio.sleep(seconds)

async def wicked():
    task_1 = asyncio.create_task(say("Surrender,", 2))
    task_2 = asyncio.create_task(say("Dorothy!", 0))
    await asyncio.gather(task_1, task_2)  # Wait for all tasks to finish concurrently

loop = asyncio.get_event_loop()
loop.run_until_complete(wicked())
loop.close()

20. SQL

DB-API (Database API), similar to JDBC in Java, is a standardized interface for Python that allows us to interact with various relational databases using a consistent set of functions and methods, which can simplify database access by providing a common ground for working with different database systems like MySQL, PostgreSQL, SQL Server, and SQLite.

  • DB-API focuses on fundamental database operations like connecting, executing SQL queries, fetching results, and committing/rolling back transactions.

  • Different database modules (e.g., MySQLdb, psycopg2, sqlite3) implement the DB-API standard, ensuring consistency in these core functionalities across various systems.

  • DB-API promotes parameterization of SQL queries using placeholders (%s, ?, etc.) for values, which enhances security by preventing SQL injection vulnerabilities and improves portability by separating data from the query itself.

20.1. Using DB-API with SQLite in Memory

import sqlite3

# Connect to an in-memory database (no file needed)
with sqlite3.connect(":memory:") as connection:

    # Create a cursor object
    cursor = connection.cursor()

    # Create a table (assuming you don't have one)
    cursor.execute('''
CREATE TABLE IF NOT EXISTS users (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  username TEXT NOT NULL,
  email TEXT UNIQUE NOT NULL)
''')

    # Insert some data using parameterization
    users = [("Alice", "alice@example.com"), ("Bob", "bob@example.com")]
    cursor.executemany(
        "INSERT INTO users (username, email) VALUES (?, ?)", users)

    # Commit the changes
    connection.commit()

    # Query the data
    cursor.execute("SELECT * FROM users")

    # Fetch all results
    results = cursor.fetchall()

    # Print the results
    for row in results:
        print(f"ID: {row[0]}, Username: {row[1]}, Email: {row[2]}")

Appendix A: Install Python from Source Code on Linux

  1. Download Python Source Releases

    # replace the Python version (e.g. 3.13.0) as needed
    curl -LO https://www.python.org/ftp/python/3.13.0/Python-3.13.0.tar.xz
  2. Extract the XZ compressed source tarball

    tar xvf Python-3.13.0.tar.xz
  3. Configure, make and install the Python

    cd Python-3.13.0 && ./configure && sudo make install

    By default, make install will install all the files in /usr/local/bin, /usr/local/lib etc. You can specify an installation prefix other than /usr/local using --prefix on ./configure, for instance --prefix=$HOME.

    $ ls /usr/local/lib/
    libpython3.12.a  libpython3.13.a  pkgconfig  python3.11  python3.12  python3.13
    $ ls /usr/local/bin/
    2to3  2to3-3.12  idle3  idle3.12  idle3.13  pip3  pip3.12  pip3.13  pydoc3  pydoc3.12  pydoc3.13  python3  python3.12  python3.12-config  python3.13  python3.13-config  python3-config
  4. Check the Python version

    $ python3 --version
    Python 3.13.0

Appendix B: Build a Docker Image for FastAPI

$ ls
Dockerfile  main.py  requirements.txt
# syntax=docker/dockerfile:1
ARG PYTHON_VERSION=3.11

FROM python:${PYTHON_VERSION}-alpine AS builder

ARG PYTHON_VERSION

WORKDIR /app

COPY requirements.txt .

RUN --mount=type=cache,target=/root/.cache/pip \
    pip install -r requirements.txt \
        -i https://pypi.tuna.tsinghua.edu.cn/simple

FROM python:${PYTHON_VERSION}-alpine
ARG PYTHON_VERSION

ENV APP_UID=1654
RUN apk add --no-cache shadow \
    && groupadd -r app -g $APP_UID \
    && useradd --no-log-init -r -g app -u $APP_UID app
USER app

COPY --from=builder /usr/local/lib/python${PYTHON_VERSION}/site-packages /usr/local/lib/python${PYTHON_VERSION}/site-packages
COPY --from=builder /usr/local/bin /usr/local/bin
COPY . .

EXPOSE 8000

CMD ["fastapi", "run", "main.py"]

References