> import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

1. Running Python

  • Using the interactive interpreter (shell)

    $ python3 -q
    >>> 2+2
    4
    >>> quit()
  • Using python files

    test.py
    print(2+2)
    $ python3 test.py
    4
  • Using python files with shebang

    In computing, a shebang is the character sequence consisting of the characters number sign and exclamation mark (#!) at the beginning of a script. It is also called sharp-exclamation, sha-bang, hashbang, pound-bang, or hash-pling.

     — From Wikipedia, the free encyclopedia

    test.py
    #!/usr/bin/env python3
    print(2+2)
    $ chmod +x test.py
    $ ./test.py
    4
  • Executing modules as scripts

    In Python, python -m is a command-line construct used to execute modules as scripts directly from the command line without explicitly writing a separate Python script file (.py).

    $ python3 -m venv --help
    usage: venv [-h] [--system-site-packages] [--symlinks | --copies] [--clear] [--upgrade] [--without-pip]
                [--prompt PROMPT] [--upgrade-deps]
                ENV_DIR [ENV_DIR ...]
    
    Creates virtual Python environments in one or more target directories.
    . . .
    $ python3 -m webbrowser https://www.google.com

2. Indentations, comments, and multi-line expressions

  • Python uses whitespace indentation (the recommended style, called PEP-8, is to use four spaces), rather than curly brackets or keywords, to delimit blocks.

    • Don’t use tabs, or mix tabs and spaces; it messes up the indent count.

    • When designing the language that became Python, Guido van Rossum decided that the indentation itself was enough to define a program’s structure, and avoided typing all those parentheses and curly braces. Python is unusual in this use of white space to define program structure.

    disaster = True
    if disaster:
        print("Woe!")
    else:
        print("Whee!")
    • As one special case here, the body of a compound statement can instead appear on the same line as the header in Python, after the colon:

      if x > y: print(x)  # # Simple statement on header line
  • In Python, the general rule is that the end of a line automatically terminates the statement that appears on that line.

    x = 1  # x = 1;

    Although normally appearing one per line, it is possible to squeeze more than one statement onto a single line in Python by separating them with semicolons:

    a = 1; b = 2; print(a + b) # Three statements on one line
  • Python allows to write expressions that span multiple lines within certain delimiters.

    • In older versions of Python (pre-3.0), the backslash character (\) at the end of a line was used to indicate that the line continued on the next line, which is no longer required in modern Python (versions 3.0 and above).

      # Example in older Python (error-prone, not recommended)
      long_expression = (1 + 2 + 3 + 4 + 5 + \
                        6 + 7 + 8 + 9 + 10)
    • In modern Python, avoid using the continuation character (\) for line continuation, and utilize parentheses (()), brackets ([]), or braces ([]) for readability and structure in multi-line expressions.

      # Parentheses for complex calculations
      long_calculation = (a * b +
                          c) * (d /
                                e - f)
      
      # Brackets for multi-line lists or data structures
      data = [
          "item1",
          "item2 with a longer description",
          "item3"
      ]
      
      # Braces for multi-line dictionaries
      person_info = {
          "name": "Alice",
          "age": 30,
          "hobbies": ["reading", "hiking"]
      }
  • A comment is marked by using the # (names: hash, sharp, pound, or or the sinister-sounding octothorpe) character; everything from that point on to the end of the current line is part of the comment.

    # 60 sec/min * 60 min/hr * 24 hr/day
    seconds_per_day = 86400
    seconds_per_day = 86400 # 60 sec/min * 60 min/hr * 24 hr/day
    # Python does NOT
    # have a multiline comment.
    print("No comment: quotes make the # harmless.")

3. Types

False               class               from                or
None                continue            global              pass
True                def                 if                  raise
and                 del                 import              return
as                  elif                in                  try
assert              else                is                  while
async               except              lambda              with
await               finally             nonlocal            yield
break               for                 not
# Python's major built-in object types, organized by categories.
Collections:
  Sequences:
    Immutable:
      String:
      Unicode (2.X):
      Bytes (3.X):
      Tuple:
    Mutable:
      List:
      Bytearray (3.X/2.6+):
  Mappings:
    Dictionary:
  Sets:
    Set:
    Fronzenset:
Numbers:
  Integers:
    Integer:
    Long (2.X):
    Boolean:
  Float:
  Complex:
  Decimal:
  Fraction:
Callables:
  Function:
  Generator:
  Class:
  Method:
    Bound:
    Unbound (2.X):
Other:
  Module:
  Instance:
  File:
  None:
  View (3.X/2.7):
Internals:
  Type:
  Code:
  Frame:
  Traceback:

Python is a dynamically, strongly typed and garbage-collected programming language.

  • In a dynamically typed language, the data type of a variable is NOT explicitly declared at the time of definition, and is determined at runtime.

    age = 30  # age is an integer (no need to declare the data type explicitly)
    age = "thirty"  # age is now a string
  • In a statically typed language, the data type of a variable MUST be declared at compile time and the compiler ensures type compatibility throughout the code.

    // In Java, declare the type of a variable before assigning a value.
    int age = 30;  // age is declared as an integer
    age = "thirty";  // error: incompatible types: String cannot be converted to int
  • In a strongly typed language, the data type of a variable MUST be declared at the time of definition, and the compiler or interpreter enforces type safety.

  • In Python, everything is ultimately an object, even data types like integers and strings, that has associated methods and attributes. During runtime, Python checks if the methods or attributes involved are compatible with the object’s type.

    # Like dynamic languages, Python infers types based on assigned values.
    name = "Alice"  # name is a string
    name + 10  # This would cause a TypeError in Python (mixing string and number)

    In computer programming, duck typing is an application of the duck test—"If it walks like a duck and it quacks like a duck, then it must be a duck"—to determine whether an object can be used for a particular purpose.

     — From Wikipedia, the free encyclopedia

bool # True, False

int # 47, 25000, 25_000, 0b0100_0000, 0o100, 0x40

float # 3.14, 2.7e5

complex # 3j, 5 + 9j

# In Python 3, strings are Unicode character sequences, not byte arrays.
str # 'alas', "alack", '''a verse attack'''

list # ['Winken', 'Blinken', 'Nod']
tuple # (2, 4, 8)

bytes # b'ab\xff'
bytearray # bytearray(...)

set # set([3, 5, 7])
frozenset # frozenset(['Elsa', 'Otto'])

dict # {}, {'game': 'bingo', 'dog': 'dingo', 'drummer': 'Ringo'}

decimal.Decimal('1.0'), fractions.Fraction(1, 3)  # Decimal and fraction extension types
  • In Python, variables are NOT places, just names, and a name is a reference to an object rather than the object itself, which is a chunk of data that contains at least a type, a unique id, a value, and a reference count.

    type(5.20)  # <class 'float'>
    id(5.20)  # 140683748269744
    x = y = z = 0  # More than one variable name can be assigned a value at the same time
    sys.getrefcount(x)  # 1000000591
    del y
    sys.getrefcount(x)  # 1000000590
    del z
    sys.getrefcount(x)  # 1000000589
  • A class is the definition of an object, and "class" and "type" mean pretty much the same thing.

    type(7)  # <class 'int'>
    type(7) == int  # True
    isinstance(7, int)  # True
  • Strings, tuples and lists are common built-in sequences, which are zero-based indexing and ordered collections that can store elements of any data types, except strings, which are sequences of characters themselves.

    # iteration
    for item in ['meow', 'bark', 'moo']:
        print(item)
    # enumeration
    for index, item in enumerate(['meow', 'bark', 'moo']):
        print(f'Index: {index}, Item: {item}')
    # comparisons
    ('meow', 'bark', 'moo') == ('meow', 'bark', 'moo')  # True
    ('meow', 'bark', 'moo') >= ('meow', 'bark')  # True
    ('meow', 'bark', 'moo') > ('meow', 'bark')  # True
    # `+`, `*`
    ('cat',) + ('dog', 'cattle')  # ('cat', 'dog', 'cattle')
    ('bark',) * 3  # ('bark', 'bark', 'bark')
    # unpacking
    cat, dog, cattle = ('meow', 'bark', 'moo')
    # testing with `in`
    'c' in 'cat'  # True
    'meow' in ['cat', 'cattle', 'dog']  # False
    # indexing, and slicing a shallow copy subsequence:
    s = 'hello!'  # len(S) is 6
    # S[-7], S[6]  # IndexError: string index out of range
    
    # The slice expression X[I:J:K] is equivalent to indexing with a slice object: X[slice(I, J, K)]:
    #    slice(stop)
    #    slice(start, stop[, step])
    #
    # [:] extracts the entire sequence from start to end.
    # [ start :] specifies from the start offset to the end.
    # [: end ] specifies from the beginning to the end offset minus 1.
    # [ start : end ] indicates from the start offset to the end offset minus 1.
    # [ start : end : step ] extracts from the start offset to the end offset minus 1, skipping characters by step.
    
    # Indexing (S[i]) fetches components at offsets:
    #   The first item is at offset 0.
    #   Negative indexes mean to count backward from the end or right.
    #     Technically, a negative offset is added to the length of a sequence to derive a positive offset.
    #   S[0] fetches the first item.
    #   S[−2] fetches the second item from the end (like S[len(S)−2]).
    #
    # Slicing(S[i:j]) extracts contiguous sections of sequences:
    #   The upper bound is noninclusive.
    #   Slice boundaries default to 0 and the sequence length, if omitted.
    #   S[1:3] fetches items at offsets 1 up to but not including 3.
    #   S[1:] fetches items at offset 1 through the end(the sequence length).
    #   S[:3] fetches items at offset 0 up to but not including 3.
    #   S[:−1] fetches items at offset 0 up to but not including the last item.
    #   S[:] fetches items at offsets 0 through the end—making a top-level copy of S.
    #
    # Extended slicing (S[i:j:k]) accepts a step ( or stride) k, which defaults to + 1:
    #   Allows for skipping items and reversing order(using a negative stride).
    
    s[:], s[0:6], s[:6], s[:6:], s[0:6:], s[0:6:1]  # ('hello!', 'hello!', 'hello!', 'hello!', 'hello!', 'hello!')
    s[::-1]  # '!olleh'
    len(s), s[-1], s[len(s)-1], s[-len(s)], s[0]  # (6, '!', '!', 'h', 'h')
  • In Python, truthiness and falsiness are used to check a value in a Boolean context:

    • Truthy: Values that evaluate to True, which includes most non-zero numbers, non-empty strings, lists, dictionaries, and many objects.

    • Falsy: Values that evaluate to False, which include False, zero numbers (0, 0.0), empty strings (""), lists ([]), and tuples (()), and None.

  • In Python, the logical operators and, or, not are used to combine Boolean values (True/False) or expressions that evaluate to Boolean values.

    letter = 'o'
    if letter == 'a' or letter == 'e' or letter == 'i' or letter == 'o' or letter == 'u':
        print(letter, 'is a vowel')
    else:
        print(letter, 'is not a vowel')
  • int(), float(), bin(), oct(), hex(), chr(), and ord()

    int(True), int(False)  # (1, 0)
    int(98.6), int(1.0e4)  # (98, 10_000)
    int('99'), int('-23'), int('+12'), int('1_000_000')  # (99, -23, 12, 1_000_000)
    
    int('10', 2), 'binary', int('10', 8), 'octal', int('10', 16), 'hexadecimal', int('10', 22), 'chesterdigital'
    # (2, 'binary', 8, 'octal', 16, 'hexadecimal', 22, 'chesterdigital')
    
    float(True), float(False)  # (1.0, 0.0)
    float('98.6'), float('-1.5'), float('1.0e4')  # (98.6, -1.5, 10_000.0)
    
    bin(65), oct(65), hex(65)  # ('0b1000001', '0o101', '0x41')
    
    chr(65), ord('A')  # ('A', 65)
    
    # Python also promotes booleans to integers or floats:
    False + 0, True + 0, False + 0., True + 0.  # (0, 1, 0.0, 1.0)
  • type hints (or type annotations): variable_name: type, def func(argument: type) -> type

    age: int = 30
    pi: float = 3.14159
    def greet(name: str) -> str:
      """Greets the provided name."""
      return f"Hello, {name}!"
  • Python provides bit-level integer operators, similar to those in the C language.

    x = 5  # 0b0101
    y = 1  # 0b0001
    
    print(f"0b{(x & y):04b}")  # and
    # 0b0001
    print(f"0b{(x | y):04b}")  # or
    # 0b0101
    print(f"0b{(x ^ y):04b}")  # exclusive or
    # 0b0100
    print(f'0b{~x:04b}')  # flip bits
    # 0b-110
    print(f'0b{(x << 1):04b}')  # left shift
    # 0b1010
    print(f'0b{(x >> 1):04b}')  # right shift
    # 0b0010
  • Test for equality: == and is

    # The `==` operator tests value equivalence.
    #   Python performs an equivalence test, comparing all nested objects recursively.
    #
    # The `is` operator tests object identity.
    #   Python tests whether the two are really the same object (i.e., live at the same address in memory).
    S1 = 'spam'
    S2 = 'spam'
    S1 == S2, S1 is S2
    (True, True)

4. Strings, bytes and bytearray

In Python 3.X there are three string types: str is used for Unicode text (including ASCII), bytes is used for binary data (including encoded text), and bytearray is a mutable variant of bytes. Files work in two modes: text, which represents content as str and implements Unicode encodings, and binary, which deals in raw bytes and does no data translation.

  • UTF-8 is the standard text encoding in Python, Linux, and HTML.

    Ken Thompson and Rob Pike, whose names will be familiar to Unix developers, designed the UTF-8 dynamic encoding scheme one night on a placemat in a New Jersey diner. It uses one to four bytes per Unicode character:

    • One byte for ASCII

    • Two bytes for most Latin-derived (but not Cyrillic) languages

    • Three bytes for the rest of the basic multilingual plane

    • Four bytes for the rest, including some Asian languages and symbols

    cafe = 'café'
    
    # len() function on string counts Unicode characters, not bytes:
    len(cafe)  # 4
    
    cafe_bytes = cafe.encode()  # b'caf\xc3\xa9'
    
    # len() returns the number of bytes:
    len(cafe_bytes)  # 5
    
    cafe_text = cafe_bytes.decode()  # 'café'
  • Strings are created by enclosing characters in matching single, double, or triple quotes:

    'Snap'
    "Crackle"
    "'Nay!' said the naysayer. 'Neigh?' said the horse."
    'The rare double quote in captivity: ".'
    '''Boom!'''
    """Eek!"""
  • Triple quotes are very useful to create multiline strings, like this classic poem from Edward Lear:

    poem = '''There was a Young Lady of Norway,
        Who casually sat in a doorway;
        When the door squeezed her flat,
        She exclaimed, "What of that?"
        This courageous Young Lady of Norway.'''
    print(poem)
    There was a Young Lady of Norway,
        Who casually sat in a doorway;
        When the door squeezed her flat,
        She exclaimed, "What of that?"
        This courageous Young Lady of Norway.
    # the line ending characters, and leading or trailing spaces are preserved as below:
    'There was a Young Lady of Norway,\n    Who casually sat in a doorway;\n    When the door squeezed her flat,\n    She exclaimed, "What of that?"\n    This courageous Young Lady of Norway.'
  • Escape with \, combine by using +, duplicate with *

    hi = 'Na ' 'Na ' 'Na ' 'Na ' \ # literal strings (not string variables) just one after the other
        + 'Hey ' * 4 \
        + '\\' + '\t' + 'Goodbye.'
    print(hi)  # Na Na Na Na Hey Hey Hey Hey \	Goodbye.
  • Python has a few special types of strings, indicated by a letter before the first quote.

    • f or F starts an f-string, used for formatting.

      thing = 'wereduck'
      place = 'werepond'
      print(f'The {thing} is in the {place}')  # 'The wereduck is in the werepond'
    • r or R starts a raw string, used to prevent escape sequences in the string.

      info = r'Type a \n to get a new line'  # info = 'Type a \\n to get a new line'
      # raw string does not undo any real (not `\n`) newlines:
      poem = r'''Boys and girls, come out to play.
      The moon doth shine as bright as day.'''  # 'Boys and girls, come out to play.\nThe moon doth shine as bright as day.'
      print(poem)
      Boys and girls, come out to play.
      The moon doth shine as bright as day.
    • fr (or FR, Fr, or fR), the combination, that starts a raw f-string.

      hello = 'Hello'
      world = '世界'
      print(fr'{hello}, {world}!')  # Hello, 世界!
    • u starts a Unicode string, which is the same as a plain string.

      Python 3 strings are Unicode character sequences, not byte arrays.
      hi = u'Hello, 世界!'  # same as: hi = 'Hello, 世界!'
    • b starts a value of type bytes.

      ip = [20, 205, 243, 166]
      bytes(ip)  # b'\x14\xcd\xf3\xa6'
  • Python has three ways of formatting strings.

    actor = 'Richard Gere'
    cat = 'Chester'
    weight = 28
    # old style (supported in Python 2 and 3): format_string % data
    'My wife\'s favorite actor is %s' % actor  # "My wife's favorite actor is Richard Gere"
    'Our cat %s weighs %d pounds' % (cat, weight)  # 'Our cat Chester weighs 28 pounds'
    'Our cat %(cat)s weighs %(weight)d pounds' % {'cat': cat, 'weight': weight}  # dictionary-based expressions
    # new style (Python 2.6 and up): format_string.format(data)
    '{0}, {1} and {2}'.format('spam', 'ham', 'eggs')  # By position
    '{motto}, {pork} and {food}'.format(motto='spam', pork='ham', food='eggs')  # By keyword
    '{motto}, {0} and {food}'.format('ham', motto='spam', food='eggs')  # By both
    '{}, {} and {}'.format('spam', 'ham', 'eggs')  # By relative position
    # 'spam, ham and eggs'
    # f-strings (Python 3.6 and up): f, F
    f'Our cat {cat} weighs {weight} pounds'  # 'Our cat Chester weighs 28 pounds'
  • Python 3 introduced the following sequences of eight-bit integers, with possible values from 0 to 255, in two types:

    • bytes is immutable, like a tuple of bytes

    • bytearray is mutable, like a list of bytes

    Endian order refers to the byte order used to store multi-byte values (like integers, floats) in computer memory.

    • Big-Endian: In big-endian order, the most significant byte (MSB) of a multi-byte value is stored at the beginning (lower memory address) of the allocated space. The remaining bytes follow in decreasing order of significance.

    • Little-Endian: In little-endian order, the least significant byte (LSB) is stored at the beginning (lower memory address), followed by bytes of increasing significance.

    blist = [1, 2, 3, 255]
    
    the_bytes = bytes(blist)
    print(the_bytes)
    # b'\x01\x02\x03\xff'
    
    the_byte_array = bytearray(blist)
    print(the_byte_array)
    # bytearray(b'\x01\x02\x03\xff')
    
    the_bytes[0] = 127  # TypeError: 'bytes' object does not support item assignment
    
    the_byte_array[0] = 127
    
    the_byte_array[1] = 256  # ValueError: byte must be in range(0, 256)
    
    the_bytes = bytes(range(0, 256))
    for i in range(0, len(the_bytes), 16):
        end_index = min(i+16, len(the_bytes))
        print(the_bytes[i:end_index])
    # b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f'
    # b'\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f'
    # b' !"#$%&\'()*+,-./'
    # b'0123456789:;<=>?'
    # b'@ABCDEFGHIJKLMNO'
    # b'PQRSTUVWXYZ[\\]^_'
    # b'`abcdefghijklmno'
    # b'pqrstuvwxyz{|}~\x7f'
    # b'\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f'
    # b'\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f'
    # b'\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf'
    # b'\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf'
    # b'\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf'
    # b'\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf'
    # b'\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef'
    # b'\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
  • regular expressions

    import re
    
    p = 'Les Fleurs du Mal'  # pattern
    c = re.compile(p)  # compile
    s = "Charles Baudelaire's 'Les Fleurs du Mal'"  # source
    m = c.search(s)  # match
    if m:  # m != None
        print("Mon cœur est comme une feuille sèche, emportée par le vent...")
    m = re.match('Les Fleurs du Mal', s)  # find exact beginning match with match()
    print(m)  # return a Match object
    # None
    
    m = re.search('Les Fleurs du Mal', s)  # find first match with search()
    print(m)  # return a Match object
    # <re.Match object; span=(22, 39), match='Les Fleurs du Mal'>
    
    m = re.findall('es', s)  # find all matches with findall()
    print(m)  # return a list
    # ['es', 'es']
    
    m = re.split(r'\s', s)  # split at matches with split()
    print(m)  # return a list
    # ['Charles', "Baudelaire's", "'Les", 'Fleurs', 'du', "Mal'"]
    
    m = re.sub("'", '?', s)  # replace at matches with sub()
    print(m)  # return a string
    # Charles Baudelaire?s ?Les Fleurs du Mal?

5. If, while, and for

  • In Python (version 3.8 and above), the walrus operator (:=, formally known as the assignment expression operator) combines assignment and expression evaluation in a single line.

    tweet_limit = 280
    tweet_string = "Blah" * 50
    if diff := tweet_limit - len(tweet_string) >= 0:  # walrus operator
        print("A fitting tweet")
    else:
        print("Went over by", abs(diff))
  • Compare with if, elif, and else:

    color = "mauve"
    if color == "red":
        print("It's a tomato")
    elif color == "green":
        print("It's a green pepper")
    else:
        print("I've never heard of the color", color)
  • The if/else ternary expression:

    # Python runs expression Y only if X turns out to be true, and runs expression Z only if X turns out to be false.
    # A = Y if X else Z  # equivalent to `((X and Y) or Z)`
    A = 't' if 'spam' else 'f'  # (('spam' and 't') or 'f')
    A  # 't'
  • Dictionary-based multiway branching:

    # Handling switch defaults
    branch = {'spam': 1.25,
              'ham': 1.99,
              'eggs': 0.99}
    print(branch.get('spam', 'Bad choice'))  # 1.25
    print(branch.get('bacon', 'Bad choice'))  # Bad choice
    # membership test in an if statement can have the same default effect:
    choice = 'bacon'
    if choice in branch:
        print(branch[choice])
    else:
        print('Bad choice')  # Bad choice
    
    # handle defaults by catching and handling the exceptions they'd otherwise trigger:
    try:
        print(branch[choice])
    except KeyError:
        print('Bad choice')
    
    # Handling larger actions
    branch = {'spam': lambda: ...,  # A table of callable function objects
              'ham': function,
              'eggs': lambda: ...}
    branch.get(choice, default)()
  • Repeat with while, and break, continue, and else:

    while True:
        value = input("Integer, please [q to quit]: ")
        if value == 'q':  # quit
            break
        number = int(value)
        if number % 2 == 0:  # an even number
            continue
        print(number, "squared is", number*number)
    while x:  # Exit when x empty
        if match(x[0]):  # Value at front?
            print('Ni')
            break  # Exit, go around else
        x = x[1:]  # Slice off front and repeat
    else:  # break not called
        print('Not found')  # Only here if exhausted x
  • Iterate with for/in, and break, continue and else:

    word = 'thud'
    for letter in word:
        if letter == 'u':
            continue
        print(letter)
    word = 'thud'
    for letter in word:
        if letter == 'x':
            print("Eek! An 'x'!")
            break
        print(letter)
    else:  # break not called
        print("No 'x' in there.")
    # counter loops: range
    for num in range(0, 10, 2):
        print(num)  # 0 2 ... 8
    # generating both offsets and items: enumerate
    for (index, item) in enumerate('spam'):
        print(f'{index}: {item}', end='\t')  # 0: s	1: p	2: a	3: m
    # parallel traversals: zip
    for nums in zip(range(0, 10, 2), range(1, 10, 2)):
        print(nums)  # (0, 1) (2, 3) .. (8, 9)

6. Tuples and lists

  • Tuples are built-in immutable sequences.

    # to make a tuple with one or more elements, follow each element with a comma (`,`):
    'cat',  # ('cat',)
    'cat', 'dog', 'cattle'  # ('cat', 'dog', 'cattle')
    
    # to make an empty tuple, using `()`, or `tuple()`:
    ()  # ()
    tuple()  # ()
    
    # the comma is required to make a tuple
    ('cat')  # 'cat'
    
    # the parentheses is not required, but could make the tuple more visible
    ('cat',)  # ('cat',)
    ('cat', 'dog', 'cattle')  # ('cat', 'dog', 'cattle')
    
    # for cases in which commas might also have another use, the parentheses is needed
    type('cat',)  # <class 'str'>
    type(('cat',))  # <class 'tuple'>
    
    # tuple()
    tuple('cat')  # ('c', 'a', 't')
    
    # zip()
    for x in zip([1, 2, 8], [1, 4, 9], ('cat', 'dog', 'cattle', 'chicken')):
         print(x)
    # (1, 1, 'cat')
    # (2, 4, 'dog')
    # (8, 9, 'cattle')
    # named tuples are a tuple/class/dictionary hybrid.
    from collections import namedtuple  # import extension type
    Rec = namedtuple('Rec', ['name', 'age', 'jobs'])  # make a generated class
    bob = Rec('Bob', age=40.5, jobs=['dev', 'mgr'])  # a named-tuple record
    print(bob)  # Rec(name='Bob', age=40.5, jobs=['dev', 'mgr'])
    
    bob[0], bob[2]  # access by position
    ('Bob', ['dev', 'mgr'])
    
    bob.name, bob.jobs  # access by attribute
    ('Bob', ['dev', 'mgr'])
    
    # converting to a dictionary supports key-based behavior when needed:
    O = bob._asdict()  # dictionary-like form
    O['name'], O['jobs']  # access by key too
    ('Bob', ['dev', 'mgr'])
    O
    # OrderedDict([('name', 'Bob'), ('age', 40.5), ('jobs', ['dev', 'mgr'])])
  • Lists are built-in mutable sequences.

    # create with `[]` or `list()`
    []  # []
    ['meow', 'bark', 'moo']  # ['meow', 'bark', 'moo']
    [('cat', 'meow'), 'bark', 'moo']  # [('cat', 'meow'), 'bark', 'moo']
    list()  # []
    list('cat')  # ['c', 'a', 't']
    
    # append(), insert()
    wow = ['meow']  # ['meow']
    wow.append('moo')  # ['meow', 'moo']
    wow.insert(1, 'bark')  # ['meow', 'bark', 'moo']
    
    # index, and slice assignment
    L = ['spam', 'Spam', 'SPAM!']
    # index assignment
    L[1] = 'eggs'  # ['spam', 'eggs', 'SPAM!']
    # slice assignment: delete+insert
    L[0:2] = ['eat', 'more']  # ['eat', 'more', 'SPAM!']
    
    # del, remove(), pop(), clear()
    farm = ['cat', 'dog', 'cattle', 'chicken', 'duck']
    
    del farm[-1]
    # ['cat', 'dog', 'cattle', 'chicken']
    
    farm.remove('dog')
    # ['cat', 'cattle', 'chicken']
    
    farm.pop()  # 'chicken'
    # ['cat', 'cattle']
    
    farm.pop(-1)  # 'cattle'
    # ['cat']
    
    farm.clear()
    # []
    
    # sort() and sorted()
    farm = ['cat', 'dog', 'cattle']
    
    # a sorted copy
    sorted(farm)  # ['cat', 'cattle', 'dog']
    print(farm)  # ['cat', 'dog', 'cattle']
    
    # sorting in-place
    farm.sort()
    print(farm)  # ['cat', 'cattle', 'dog']
    
    # shallow copy: any changes made to the elements within the original list will also be reflected in the copy.
    a = [['cat', 'meow'], ['dog', 'bark']]
    c = a[:]
    b = a.copy()  # equivalent to list slicing ([:] )but might be slightly less efficient.
    d = list(c)
    
    # deep copy: changes to elements within the original list won't affect the copy (and vice versa) because they point to different objects in memory.
    import copy
    e = copy.deepcopy(a)
    
    a[0][1] = 'moo'
    a  # [['cat', 'moo'], ['dog', 'bark']]
    b  # [['cat', 'moo'], ['dog', 'bark']]
    c  # [['cat', 'moo'], ['dog', 'bark']]
    d  # [['cat', 'moo'], ['dog', 'bark']]
    
    e  # [['cat', 'meow'], ['dog', 'bark']]
    
    # list comprehensions: [expression for item in iterable]
    even_numbers = [2 * num for num in range(5)]
    # [0, 2, 4, 6, 8]
    # list comprehensions: [expression for item in iterable if condition]
    odd_numbers = [num for num in range(10) if num % 2 == 1]
    # [1, 3, 5, 7, 9]

7. Dictionaries and sets

In Python, keys in dictionaries (dict) and elements in sets must be of immutable, or hashable data types.

Dictionaries

# `{}`
{}  # {}
{'cat': 'meow', 'dog': 'bark'}  # {'cat': 'meow', 'dog': 'bark'}

# dict(): keyword argument names need to be legal variable names (no spaces, no reserved words)
dict(cat='meow', dog='bark')  # {'cat': 'meow', 'dog': 'bark'}

# dict(): zipping together sequences of keys and values into a dictionary
dict([['cat', 'meow'], ['dog', 'bark']])  # {'cat': 'meow', 'dog': 'bark'}

# [key], get()
animals = {'cat': 'meow', 'dog': 'bark'}
animals['cattle'] = 'moo'  # {'cat': 'meow', 'dog': 'bark', 'cattle': 'moo'}
animals['cat']  # 'meow'
animals['sheep']  # KeyError: 'sheep'
animals.get('sheep')  # None
animals.get('sheep', 'baa')  # 'baa'

# testing
animals = {'cat': 'meow', 'dog': 'bark'}
'cat' in animals  # True
'sheep' in animals  # False
animals['sheep'] if 'sheep' in animals else 'oops!'  # 'oops!'

# keys(), values(), items(), len()
animals.keys()  # dict_keys(['cat', 'dog', 'cattle'])
animals.values()  # dict_values(['meow', 'bark', 'moo'])
animals.items()  # dict_items([('cat', 'meow'), ('dog', 'bark'), ('cattle', 'moo')])
len(animals)  # 3

# `**`, update()
{**{'cat': 'meow'}, **{'dog': 'bark'}}  # {'cat': 'meow', 'dog': 'bark'}
animals = {'cat': 'meow'}
animals.update({'dog': 'bark'})  # {'cat': 'meow', 'dog': 'bark'}

# del, pop(), clear()
animals = {'cat': 'meow', 'dog': 'bark', 'cattle': 'moo'}
del animals['dog']
# {'cat': 'meow', 'cattle': 'moo'}
animals.pop('cattle')  # 'moo'
# {'cat': 'meow'}
animals.clear()
# {}

# iterations
animals = {'cat': 'meow', 'dog': 'bark', 'cattle': 'moo'}
for key in animals:  # for key in animals.keys()
    print(f'{key} => {animals[key]}', end='\t')
# cat => meow	dog => bark	cattle => moo

# dictionary comprehensions: {key_expression : value_expression for expression in iterable}
word = 'letters'
letter_counts = {letter: word.count(letter) for letter in word}
# {'l': 1, 'e': 2, 't': 2, 'r': 1, 's': 1}

# dictionary comprehensions: {key_expression : value_expression for expression in iterable if condition}
vowels = 'aeiou'
word = 'onomatopoeia'
vowel_counts = {letter: word.count(letter)
                for letter in set(word) if letter in vowels}
# {'i': 1, 'o': 4, 'a': 2, 'e': 1}

Sets

# `{}`, set(), frozenset()
{}  # <class 'dict'>
{0, 2, 4, 6}  # {0, 2, 4, 6}

set()  # set()
set('letter')  # {'l', 't', 'r', 'e'}
set({'cat': 'meow', 'dog': 'bark', 'cattle': 'moo'})  # {'cat', 'cattle', 'dog'}

frozenset()  # frozenset()
frozenset([3, 1, 4, 1, 5, 9])  # frozenset({1, 3, 4, 5, 9})

# len(), add(), remove()
nums = {0, 1, 2, 3, 4, }
len(nums)  # 5
nums.add(5)  # {0, 1, 2, 3, 4, 5}
nums.remove(0)  # {1, 2, 3, 4, 5}

# iteration
for num in {0, 2, 4, 6, 8}:
    print(num, end='\t')
# 0	2	4	6	8

# testing
2 in {0, 2, 4}  # True
3 in {0, 2, 4}  # False

# `&`: intersection(), `|`: union(), `-`: difference(), `^`: symmetric_difference()
a = {1, 3}
b = {2, 3}
a & b  # {3}
a | b  # {1, 2, 3}
a - b  # {1}
a ^ b  # {1, 2}

# `<=`: issubset(), `<`: proper subset, `>=`: issuperset(), `>`: proper superset
a <= b  # False
a < b  # False
a >= b  # False
a > b  # False

# set comprehensions: { expression for expression in iterable }
{num for num in range(10)}  # {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
# set comprehensions: { expression for expression in iterable if condition }
{num for num in range(10) if num % 2 == 0}  # {0, 2, 4, 6, 8}

8. Iterations and comprehensions

The terms "iterable" and "iterator" are sometimes used interchangeably to refer to an object that supports iteration in general. For clarity, using the term iterable to refer to an object that supports the iter call, and iterator to refer to an object returned by an iterable on iter that supports the next(I) call.

Any object with a __next__ method to advance to a next result, which raises StopIteration at the end of the series of results, is considered an iterator, that may also be stepped through with a for loop or other iteration tool, because all iteration tools normally work internally by calling __next__ on each iteration and catching the StopIteration exception to determine when to exit.

print(open('script2.py').read())
# import sys
# print(sys.path)
# x = 2
# print(x**32)

f = open('script2.py')
f.__next__()
# 'import sys\n'
f.__next__()
# 'print(sys.path)\n'
f.__next__()
# 'x = 2\n'
f.__next__()
# 'print(x**32)\n'
f.__next__()
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
# StopIteration
# manual iteration: what for loops usually do
with open('script2.py', 'rt', encoding='utf-8') as fi:
    while True:
        try:
            # To simplify manual iteration code, Python 3.X also provides a built-in function, next,
            # that automatically calls an object’s __next__ method.
            line = fi.__next__()  # same as: line = next(fi)
            print(line, end='')
        except StopIteration:
            break
for line in open('script2.py'):  # use file iterators to read by lines
    print(line.upper(), end='')  # calls __next__, catches StopIteration

When the for loop begins, it first uses the iteration protocol to obtain an iterator from the iterable object by passing it to the iter built-in function; the object returned by iter in turn has the required next method. The iter function internally runs the __iter__ method, much like next and __next__.

The Python iteration protocol, used by for loops, comprehensions, maps, and more, and supported by files, lists, dictionaries, generators, and more.

  • The iterable object you request iteration for, whose __iter__ is run by iter.

  • The iterator object returned by the iterable that actually produces values during the iteration, whose __next__ is run by next and raises StopIteration when finished producing results.

    L = [1, 2, 3]  # iterable
    I = iter(L)  # iterator
    next(I)
    # 1
    next(I)
    # 2
    next(I)
    # 3
    next(I)
    # Traceback (most recent call last):
    #   File "<stdin>", line 1, in <module>
    # StopIteration

Iteration contexts in Python include the for loop; list comprehensions; the map built-in function; the in membership test expression; and the built-in functions sorted, sum, any, and all, and also includes the list and tuple built-ins, string join methods, and sequence assignments, all of which use the iteration protocol to step across iterable objects one item at a time.

Technically speaking, list comprehensions are never really required because a list of expression results can be always built up manually with for loops, however, list comprehensions might run much faster than manual for loop statements (often roughly twice as fast) because their iterations are performed at C language speed inside the interpreter, rather than with manual Python code.

L = [1, 2, 3, 4, 5]
res = []
for x in L:
    res.append(x+10)
print(res)  # [11, 12, 13, 14, 15]
res2 = [x + 10 for x in L]
print(res2)  # [11, 12, 13, 14, 15]
# filter clauses: if
[line.rstrip() for line in open('script2.py') if line[0] == 'p']
# nested loops: for
[x + y for x in 'abc' for y in 'lmn']

9. Files and directories

A file is a sequence of bytes, stored in some filesystem, and accessed by a filename. A directory (or folder) is a collection of files, and possibly other directories.

  • Text files represent content as normal str strings, perform Unicode encoding and decoding automatically, and perform end-of-line translation by default.

  • Binary files represent content as a special bytes string type and allow programs to access file content unaltered.

  • open(filename, mode): Opens a file in the specified mode, and returns a file object used for reading or writing data.

    • file.read(size): Read a specified number of characters (or bytes) from the file (or all remaining bytes if no size is provided).

    • file.readline(): Read a single line from the file.

    • file.readlines(): Read all lines from the file into a list.

    • for line in open('data'): use line: File iterators read line by line.

    • file.write(data): Write a string of characters (or bytes) data to the file.

    • file.writelines(aList): Write all line strings in a list into file.

    • file.flush(): Flush output buffer to disk without closing.

    • file.seek(N): Change file position to offset N for next operation.

    • mode (optional): a string specifies how the file will be opened, which determines the access permissions and how newline characters (for text files) are handled.

      • r (read): Opens the file for reading. The file must exist, or an error will be raised.

      • w (write): Opens the file for writing. An existing file will be truncated (emptied) before writing. If the file doesn’t exist, it will be created.

      • a (append): Opens the file for appending. New data will be written to the end of the file. If the file doesn’t exist, it will be created.

      • x (exclusive creation): Attempts to create a new file. If the file already exists, an error will be raised.

      • r+ (read and write): Opens the file for both reading and writing. The file must exist.

      • w+ (read and write): Opens the file for both reading and writing. An existing file will be truncated before any operations. If the file doesn’t exist, it will be created.

      • a+ (append and read): Opens the file for both appending and reading. If the file doesn’t exist, it will be created.

      • By default, Python opens files in text mode (t), that handles newline characters differently based on the operating system (CRLF on Windows, LF on Unix/Linux).

      • The binary mode (b) can be specified by appending it to any mode (e.g., rb, wb), that treats the file as a raw stream of bytes without newline conversion.

      • Python 3 offers a universal newline mode (U) that attempts to handle various newline conventions consistently (consult documentation for details).

      poem = '''
      Je suis l'automne, la saison des pluies,
      Le temps des fruits mûrs et des feuilles jaunies,
      Le soleil pâle et les jours qui décroissent,
      Le vent qui hurle et les chaumes qui gémissent.
      
      Je suis l'automne, la saison des regrets,
      Le temps où meurent les amours et les joies,
      Le temps des souvenirs et des larmes secrètes,
      Le temps des nuits longues et des tristesses froides.
      
      Je suis l'automne, la saison des douleurs,
      Le temps des fièvres et des maladies,
      Le temps où l'on se sent mourir sans pouvoir guérir,
      Le temps où l'on voudrait mourir et qu'on n'ose pas.
      
      Je suis l'automne, la saison de la mort,
      Le temps où l'on se couche dans la terre humide,
      Le temps où l'on dort pour toujours sans rêver,
      Le temps où l'on ne souffre plus et qu'on n'aime plus.
      '''
      
      with open('autumn_song.txt', 'w+') as fio:
          fio.write(poem)
      
          fio.seek(0)
          lines = fio.readlines()
          for line in lines:
              print(line, sep='', end='')
      
          fio.seek(0)
          for line in fio:  # iterate over lines in the file object (text mode only)
              print(line, sep='', end='')
  • os.mkdir(directory_name): Create a single directory.

  • os.makedirs(directory_path) : Create nested directories if they don’t exist.

  • os.remove(filename): Delete a single file.

  • shutil.rmtree(directory_path): Delete a directory and its contents recursively.

  • os.rename(old_name, new_name): Rename a file or directory.

  • os.getcwd(): Get the current working directory.

  • os.chdir(new_path): Change the working directory.

  • os.listdir(directory_path): Get a list of files and subdirectories within a directory.

  • os.path.exists(path): Check if a file or directory exists.

  • os.path.getsize(path): Get a file size.

  • os.path.isdir(path): Check if it’s a directory.

  • os.path.isfile(path): Check whether a path is a regular file.

  • os.walk(directory): Iterate through a directory recursively, yielding a 3-tuple for each directory containing its path, subdirectories, and filenames.

  • glob.glob(pathname): Return a list of paths matching a pathname pattern.

10. Functions

# Function-related statements and expressions

# call expressions
myfunc('spam', 'eggs', meat=ham, *rest)

# def
def printer(messge):
    print('Hello ' + message)

# return
def adder(a, b=1, *c):
    return a + b + c[0]

# global
x = 'old'
def changer():
    global x; x = 'new'

# nonlocal (3.X)
def outer():
    x = 'old'
    def changer():
        nonlocal x; x = 'new'

# yield
def squares(x):
  for i in range(x): yield i ** 2

# lambda
funcs = [lambda x: x**2, lambda x: x**3]
# pass
def do_nothing():
    pass  # NOOP
do_nothing()

Python 3.X (but not 2.X) allows ellipses coded as …​ (literally, three consecutive dots) to appear any place an expression can. Because ellipses do nothing by themselves, this can serve as an alternative to the pass statement, especially for code to be filled in later—a sort of Python "TBD":

def func1():
    ... # Alternative to pass
def func2():
    ...
func1() # Does nothing if called

Ellipses can also appear on the same line as a statement header and may be used to initialize variable names if no specific type is required:

def func1(): ... # Works on same line too
def func2(): ...
X = ... # Alternative to None
X  # Ellipsis

This notation is new in Python 3.X—and goes well beyond the original intent of …​ in slicing extensions—so time will tell if it becomes widespread enough to challenge pass and None in these roles.

# None
def whatis(thing):  # def whatis(thing: any) -> None:
    if thing is None:
        print(thing, "is None")
    elif thing:
        print(thing, "is True")

whatis(None)  # None is None
# docstring
def echo(anything):
    'echo returns its input argument'
    return anything

print(echo.__doc__)  # 'echo returns its input argument'
help(echo)
# arguments
def menu(wine, entree, dessert):
    return {'wine': wine, 'entree': entree, 'dessert': dessert}

# positional (or named) arguments: passed by order
menu('chardonnay', 'chicken', 'cake')
# {'wine': 'chardonnay', 'entree': 'chicken', 'dessert': 'cake'}

# keyword arguments: passed by name
menu(entree='beef', dessert='bagel', wine='bordeaux')
# {'wine': 'bordeaux', 'entree': 'beef', 'dessert': 'bagel'}

# mix positional and keyword arguments
menu('frontenac', dessert='flan', entree='fish')
# {'wine': 'frontenac', 'entree': 'fish', 'dessert': 'flan'}
# optional positional arguments
def print_args(*args):
    print(args)  # gather as a tuple

print_args()
# ()
print_args('meow', 'bark', 'moo')
# ('meow', 'bark', 'moo')
print_args(('meow', 'bark', 'moo'))
# (('meow', 'bark', 'moo'),)
print_args(*('meow', 'bark', 'moo'))  # explode a tuple with `*`
# ('meow', 'bark', 'moo')
# optional keyword arguments
def print_kargs(**kargs):
    print(kargs)  # gather as a dict

print_kargs()
# {}
print_kargs(cat='meow', dog='bark', cattle='moo')
# {'cat': 'meow', 'dog': 'bark', 'cattle': 'moo'}
print_kargs(**{'cat': 'meow', 'dog': 'bark', 'cattle': 'moo'})  # explode a dict with `**`
# {'cat': 'meow', 'dog': 'bark', 'cattle': 'moo'}
# default parameters
def menu(wine, entree, dessert='pudding'):
    return {'wine': wine, 'entree': entree, 'dessert': dessert}

menu('chardonnay', 'chicken')
# {'wine': 'chardonnay', 'entree': 'chicken', 'dessert': 'pudding'}
# keyword-only arguments `*`
def print_data(data, *, start=0, end=100):
    """
    the parametes start and end must be provided as keyword/named arguments
    """
    for v in data[start:end]:
        print(v, end='\t')

print_data(('meow', 'bark', 'moo'))
# meow	bark	moo
print_data(('meow', 'bark', 'moo'), start=1)
# bark	moo
def the_order_of_arguments(
    required: str,
    optional: str = None,
    *args: tuple,
    key: str = None,
    **kwargs: dict
) -> None:
  """
  This function demonstrates the order of arguments in Python.

  Args:
      required (str): A required positional argument.
      optional (str, optional): An optional positional argument with a default value of None.
      *args (tuple, optional): Captures any remaining positional arguments as a tuple.
      key (str, optional): A keyword-only argument with a default value of None.
      **kwargs (dict, optional): Captures any remaining keyword arguments as a dictionary.

  Returns:
      None
  """
  # Function body (can be replaced with actual logic)
  print(f"Required argument: {required}")
  print(f"Optional argument: {optional}")
  print(f"Positional arguments (as tuple): {args}")
  print(f"Keyword-only argument: {key}")
  print(f"Keyword arguments (as dictionary): {kwargs}")

the_order_of_arguments("This is required", "This is optional", x=10, y="hello")
# functions are first-class citizens
def answer():
    print(42)

def run_sth(func):
    func()

run_sth(answer)  # 42

# inner functions
def outer(a, b):
    def inner(c, d):
        return c+d
    return inner(a, b)

# closures
def wow(voice):
    def inner():
        return f'Wow: {voice}'
    return inner

cat = wow('meow')
dog = wow('bark')
cat()  # 'Wow: meow'
dog()  # 'Wow: bark'

# recursion
def flatten(lol):
    for item in lol:
        if isinstance(item, list):
            yield from flatten(item)  # yield from expression
        else:
            yield item

lol = [1, 2, [3, 4, 5], [6, [7, 8, 9], []]]
list(flatten(lol))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

# anonymous functions: lambda
def is_odd(num):
    return num % 2 == 1

nums = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
list(filter(is_odd, nums))
# [1, 3, 5, 7, 9]
list(filter(lambda num: num % 2 == 0, nums))
# [0, 2, 4, 6, 8]

10.1. Generators

A generator is a Python sequence creation object, which is often the source of data for iterators.

  • It can be used to iterate through potentially huge sequences without creating and storing the entire sequence in memory at once.

  • Every time iteration through a generator, it keeps track of where it was the last time it was called and returns the next value.

  • A generator can be run only once, and can’t be to restart or back up.

  • A generator function is a normal function, but it returns its value with a yield statement rather than return.

    def xrange(start=0, stop=10, step=1):
        number = start
        while number < stop:
            yield number
            number += step
    
    ranger = xrange(1, 5)
    print(ranger)  # <generator object xrange at 0x7f119757b220>
    
    for num in ranger:
        print(num, end='\t')  # 1	2	3	4

10.2. Decorators

A decorator is a function that takes one function as input and returns another function.

def document_it(func):
    def new_function(*args, **kwargs):
        print('Running function:', func.__name__)
        print('Positional arguments:', args)
        print('Keyword arguments:', kwargs)
        result = func(*args, **kwargs)
        print('Result:', result)
        return result
    return new_function

def add_ints(a, b):
    return a+b

cooler_add_ints = document_it(add_ints)  # manual decorator assignment
cooler_add_ints(1, 2)
# Running function: add_ints
# Positional arguments: (1, 2)
# Keyword arguments: {}
# Result: 3
# 3

@document_it  # an alternative to the manual decorator assignment
def add_floats(a: float, b: float) -> float:
    return a + b

def square_it(func):
    def new_function(*args, **kargs):
        result = func(*args, **kargs)
        return result*result
    return new_function

# more than one decorator for a function
@document_it
@square_it
def add_numbers(a: float, b: float) -> float:
    return a + b

add_numbers(2, 3)
# Running function: new_function
# Positional arguments: (2, 3)
# Keyword arguments: {}
# Result: 25
# 25
def dump(func):
    "Print input arguments and output value(s)"
    def wrapped(*args, **kwargs):
        print("Function name:", func.__name__)
        print("Input arguments:", ' '.join(map(str, args)))
        print("Input keyword arguments:", kwargs.items())
        output = func(*args, **kwargs)
        print("Output:", output)
        return output
    return wrapped

10.3. Exceptions

An exception is a class, which is a child of the class Exception.

class OopsException(Exception):
    pass

try:
    raise OopsException('panic')  # raising exceptions
except OopsException as err:
    print(err)  # panic
except (RuntimeError, TypeError, NameError) as err:  # multiple exceptions as a parenthesized tuple
    pass
except Exception as other:  # except to catch all exceptions
    pass
except:  # bare except to catch all exceptions
    pass

10.4. locals() and globals()

Python provides two functions to access the contents of the namespaces:

  • locals() returns a dictionary of the contents of the local namespace.

  • globals() returns a dictionary of the contents of the global namespace.

a = 5.21

def print_global_a():
 global a  # the global keyword: explicit is better than implicit
 print(a)

print_global_a()
# 5.21

def print_locals_globals():
    a: int = 0
    b: float = 3.14
    print(locals())
    print(globals())

print_locals_globals()
# {'a': 0, 'b': 3.14}
# {'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, 'print_locals': <function print_locals at 0x7fab761ade40>, 'print_globals': <function print_globals at 0x7fab761adee0>, 'print_locals_globals': <function print_locals_globals at 0x7fab761bbba0>, 'a': 5.21}
  • vars() without arguments, equivalent to locals().

    print(vars())
    # {'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>}

11. Objects and classes

# define a class
class Cat:  # standard class definition
    pass

class Cat():  # less common approach (equivalent in functionality)
    pass

# create an object from a class
cat = Cat()

# assign attributes directly to an object anytime after its creation.
cat.wow = 'meow'
cat.wow  # 'meow'

# initialization: __init__(), to save syllables, double underscores (__), also pronounce as dunder.
class Cat:
    # self is not a reserved word, but it’s common as the first argument to refer to the object itself.
    def __init__(self, name):  # initializer
        self.name = name

    # a method is a function in a class or object.
    def wow(self):
        print(f'{self.name:}: meow!')


cat = Cat('Tom')
cat.wow()  # Tom: meow!
Cat.wow(cat)  # Tom: meow!

# class and object attributes
class Cat:
    color = 'red'

tom = Cat()
jerry = Cat()
print(tom.color)  # red
print(jerry.color)  # red

tom.color = 'black'  # object attributes take precedence over class attributes when accessed or modified
Cat.color = 'blue'  # affect existing and new objects

butch = Cat()
print(jerry.color)  # blue
print(tom.color)  # black
print(butch.color)  # blue
# inheritance
class Animal:
    def __init__(self, voice) -> None:
        self.voice = voice

    def wow(self):
        print(f'{self.voice}!')


class Cat(Animal):
    pass


class Dog(Animal):
    def __init__(self) -> None:
        super().__init__('bark')

    def wow(self):
        print(f'{self.voice}! '*3)

cat = Cat('meow')
cat.wow()  # meow!

dog = Dog()
dog.wow()  # bark! bark! bark!

# multiple inheritance: method resolution order
class Animal:
    def wow(self):
        print('I speak!')

class Horse(Animal):
    def wow(self):
        print('Neigh!')

class Donkey(Animal):
    def wow(self):
        print('Hee-haw!')

class Mule(Donkey, Horse):
    pass

print(Mule.mro())
# [<class '__main__.Mule'>, <class '__main__.Donkey'>, <class '__main__.Horse'>, <class '__main__.Animal'>, <class 'object'>]

class Hinny(Horse, Donkey):
    pass

print(Hinny.__mro__)
# (<class '__main__.Hinny'>, <class '__main__.Horse'>, <class '__main__.Donkey'>, <class '__main__.Animal'>, <class 'object'>)
# Mixins in Python are a code reuse technique used to add functionalities to classes
# without relying on traditional inheritance to achieve modularity.
class PrettyMixin():
    def dump(self):
        import pprint
        pprint.pprint(vars(self))

class Thing():
    def __init__(self) -> None:
        self.name = "Nyarlathotep"
        self.feature = "ichor"
        self.age = "eldritch"

# Mixins are included in a class definition using multiple inheritance syntax.
class PrettyThing(Thing, PrettyMixin):
    pass

t = PrettyThing()
t.dump()  # {'age': 'eldritch', 'feature': 'ichor', 'name': 'Nyarlathotep'}
# Python doesn’t have private attributes, but has a naming convention for attributes that
# should not be visible outside of their class definition: begin with two underscores (__).
class Cat:
    def __init__(self, name) -> None:
        self.__name = name

    @property
    def name(self):  # getter
        return self.__name

    @name.setter
    def name(self, name):  # setter
        self.__name = name

cat = Cat('Tom')
print(cat.name)  # Tom
cat.name = 'Jerry'
print(cat.name)  # Jerry
# instance methods, class methods, static methods
class Cat:
    # Class attribute (shared by all instances)
    species = "Felis catus"

    def __init__(self, name, age):
        self.name = name
        self.age = age

    # Instance method (operates on a specific instance)
    def meow(self):
        print(f"{self.name} says meow!")

    @classmethod
    def create_from_dict(cls, cat_dict):
        """
        Class method to create a Cat object from a dictionary.

        Args:
            cls (class): The Cat class itself.
            cat_dict (dict): A dictionary containing cat data (name, age).

        Returns:
            Cat: A new Cat object.
        """
        return cls(cat_dict["name"], cat_dict["age"])

    @staticmethod
    def is_adult(age):
        """
        Static method to check if a cat is considered adult (age >= 1).

        Args:
            age (int): The cat's age.

        Returns:
            bool: True if the cat is adult, False otherwise.
        """
        return age >= 1


# Create Cat objects
cat1 = Cat("Whiskers", 2)
cat2 = Cat.create_from_dict({"name": "Luna", "age": 5})

# Instance method call (operates on specific objects)
cat1.meow()  # Output: Whiskers says meow!
cat2.meow()  # Output: Luna says meow!

# Class method call
new_cat = Cat.create_from_dict({"name": "Simba", "age": 1})

# Static method call
is_cat1_adult = Cat.is_adult(cat1.age)

# Output: Simba is 1 years old.
print(f"{new_cat.name} is {new_cat.age} years old.")
# Output: Is Whiskers an adult? True
print(f"Is Whiskers an adult? {is_cat1_adult}")
# duck typing: a loose implementation of polymorphism
# If it walks like a duck and quacks like a duck, it’s a duck.
#     —— A Wise Person
class Duck:
    def __init__(self, name) -> None:
        self.__name = name

    def who(self):
        return self.__name

    def wow(self):
        return 'quack!'

class Cat:
    def __init__(self, name) -> None:
        self.__name = name

    def who(self):
        return self.__name

    def wow(self):
        return 'meow!'

def who_wow(obj):
    print(f'{obj.who()}: {obj.wow()}')

who_wow(Duck('Donald'))  # Donald: quack!
who_wow(Cat('Tom'))  # Tom: meow!
# dataclasses
from dataclasses import dataclass

@dataclass
class Cat:
    name: str
    age: int
    color: str = 'blue'

tom = Cat('tom', 3)
print(tom)  # Cat(name='tom', age=3, color='blue')

12. Automatic resource management

fi = open('test.txt', 'w', encoding='utf-8')
try:
    fi.write('hello world')
finally:
    fi.close()
with open('test.txt', 'r', encoding='utf-8') as fo:
    txt = fo.read()
    print(txt)

The with statement can be used with any object that implements the __enter__() and __exit__() special methods that provide hooks for initializing and finalizing resource management. Common resources managed with with include:

  • Files: The with open('filename', 'mode') as file: syntax opens a file, assigns it to a variable (file), and automatically closes the file when the indented block exits, even in case of exceptions.

  • Database Connections: with sqlite3.connect(':memory:') as con: creates a connection, assigns it to a variable, and guarantees closure upon exiting the block.

  • Locks: In multithreaded environments, with can be used with lock objects to acquire a lock at the beginning of the block and release it at the end, ensuring proper synchronization.

class Cat:
    """A custom context manager class that simulates a cat entering and leaving."""

    def __enter__(self) -> "Cat":
        """
        Called when entering the `with` block. Prints a message and returns itself.

        Returns:
            The Cat instance (self) to be used within the `with` block.
        """
        print("I'm coming in!")
        return self  # Return self to provide the managed object to the `with` block

    def __exit__(self, exc_type: type, exc_value: object, traceback: object) -> bool:
        """
        Called when exiting the `with` block, regardless of exceptions.
        Prints a message, optionally handles exceptions, and returns True to suppress them.

        Args:
            exc_type (type): The type of exception raised within the `with` block (if any).
            exc_value (object): The actual exception object raised (if any).
            traceback (object): A traceback object containing information about the call stack
                               (if any exception was raised).

        Returns:
            bool: True to suppress any exceptions raised within the `with` block,
                  False to re-raise them. (Can be modified for specific exception handling)
        """
        print("I'm going out.")
        # Suppress potential exceptions (modify for specific handling)
        return True

    def wow(self) -> None:
        """
        Method to simulate a cat's meow. Prints "meow!".

        Returns:
            None
        """
        print("meow!")


with Cat() as cat:  # type: Cat
    """Enters the context manager and assigns the Cat object to 'cat'."""
    cat.wow()  # Calls the cat's meow method within the context

# I'm coming in!
# meow!
# I'm going out.

13. Modules and packages

# A module is a single Python file (.py extension) containing Python code,
# that can include functions, classes, variables, and statements.

# animal.py (module file)
class Animal:
    def __init__(self, voice: str) -> None:
        self.__voice = voice

    def wow(self):
        print(f'{self.__voice}!')
# the `import` statement is `import module`, where `module` is the name
# of another Python file, without the .py extension.
from animal import Animal as Duck  # import only what you want from a module
from animal import Animal
import animal as mouse  # import a module with another name
import animal  # import a module

donald = Duck('quack')
donald.wow()  # quack!

tom = Animal('meow')
tom.wow()  # meow!

jerry = mouse.Animal('peep')
jerry.wow()  # peep!

butch = animal.Animal('bark')
butch.wow()  # bark!

13.1. packages

A package is a directory containing multiple Python modules and potentially subdirectories with even more modules, that represents a collection of related modules organized under a common namespace.

If the version of Python is earlier than 3.3, it’ll need one more thing in the sources subdirectory to make it a Python package: a file named __init__.py.
# .
# ├── animals
# │   ├── cat.py
# │   ├── dog.py
# │   └── __init__.py
# └── main.py

# animals/cat.py
def wow():
    print('meow!')

# animals/dog.py
def wow():
    print('bark!')

# main.py
from animals import cat  # from package import module
import animals.dog as dog  # import package.module

cat.wow()  # meow!
dog.wow()  # bark!

13.2. main

Identifying the main module: the entry point for a Python program’s execution.

  • Python uses a special variable called __name__.

  • When a module is directly executed (as a script), the __name__ variable within that module is set to the string '__main__'.

  • When a module is imported by another module, the __name__ variable within the imported module gets the actual module name (e.g., 'my_module').

# cat.py
def wow():
    return __name__

if __name__ == '__main__':
    print(f'executed: {wow()}')
$ python3 cat.py  # directly executed (as a script)
executed: __main__
# imported by another module
from cat import wow
print(f'imported: {wow()}')  # imported: cat

13.3. import

  • Basic structure:

    import module_name
  • Importing specific elements:

    # import specific functions or classes from a module.
    from module_name import element1, element2
    # import a specific element and assign it an alias for easier use.
    from module_name import element1 as alias
  • Importing a module with an alias:

    # assign an alias to a whole module for shorter references.
    import module_name as alias
  • Importing sub-modules: use the dot (.) to navigate within package hierarchies:

    # import a sub-module from a package.
    import package_name.submodule_name
    
    # import a specific element from a sub-module.
    from package_name.submodule_name import element
  • Relative imports (within packages): use the dot (.) to navigate within the same package structure:

    # import from a sub-module within the same package.
    from .submodule_name import element

13.4. search path

In the context of programming languages and environments, the search path refers to a list of directories that the program or interpreter looks at to locate specific files, particularly modules or libraries.

import sys
for path in sys.path:
    print(f"'{path}'")

''  # current working directory where the script is located
'/usr/lib/python311.zip'  # standard library, built-in modules
'/usr/lib/python3.11'
'/usr/lib/python3.11/lib-dynload'  # dynamically loaded modules or libraries
'/usr/local/lib/python3.11/dist-packages'  # third-party libraries
'/usr/lib/python3/dist-packages'

# sys.path is a list, and can be updated programmlly
sys.path
# ['', '/usr/lib/python311.zip', '/usr/lib/python3.11', '/usr/lib/python3.11/lib-dynload', '/usr/local/lib/python3.11/dist-packages', '/usr/lib/python3/dist-packages']
sys.path.insert(0, '/tmp')
sys.path
# ['/tmp', '', '/usr/lib/python311.zip', '/usr/lib/python3.11', '/usr/lib/python3.11/lib-dynload', '/usr/local/lib/python3.11/dist-packages', '/usr/lib/python3/dist-packages']

13.5. pip install packages

# ensure can run pip from the command line
python3 -m pip --version  # pip --version
# pip 23.0.1 from /usr/lib/python3/dist-packages/pip (python 3.11)

# OR, install pip, venv modules in Debian/Ubuntu for the system python.
apt install python3-pip python3-venv  # On Debian/Ubuntu systems

13.5.1. virtual environment

# create a virtual environment
python3 -m venv python-learning-notes_env

# active a virtual environment
source python-learning-notes_env/bin/activate

# ensure pip, setuptools, and wheel are up to date
pip install --upgrade pip setuptools wheel

# show pip version
pip --version  # python3 -m pip --version
# pip 24.0 from .../python-learning-notes_env/lib/python3.11/site-packages/pip (python 3.11)

# deactive a virtual environment: the deactivate command is often implemented as a shell function.
deactivate

13.5.2. pip install

# install the latest stable version.
pip install <package_name>

# install a package with extras, i.e., optional dependencies (e.g., pip install 'transformers[torch]').
pip install <package_name>[extra1[,extra2,...]]

# install the exact version (e.g., pip install vllm==0.4.3).
pip install <package_name>==<version>

# install the latest version greater than or equal to the specified one (e.g., pip install vllm>=0.4.0 gets anything from 0.4.0 onwards), but within the same major version.
pip install <package_name>>=<version>

# install the latest patch version (tilde operator) within the specified major and minor version (e.g., pip install vllm~0.4).
pip install <package_name>~<version>

# upgrade an already installed to the latest from PyPI.
pip install --upgrade <package_name>

# install from an alternate index
pip install --index-url http://my.package.repo/simple/ <package_name>

# search an additional index during install, in addition to PyPI
pip install --extra-index-url http://my.package.repo/simple <package_name>

# install pre-release and development versions, in addition to stable versions
pip install --pre <package_name>

13.5.3. cache, configuration

# get the cache directory that pip is currently configured to use
pip cache dir  # ~/.cache/pip
# INI format configuration files can change the default values for command line options.
#   - global: system-wide configuration file, shared across users.
#   - user: per-user configuration file.
#   - site: per-environment configuration file; i.e. per-virtualenv.

# the names of the settings are derived from the long command line option.
[global]
timeout = 60
index-url = https://download.zope.org/ppix

# per-command section: pip install
[install]
ignore-installed = true
no-dependencies = yes

13.5.4. mirror

# set the PyPI mirror
pip config --user set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
# pip config --user set global.index-url https://mirrors.aliyun.com/pypi/simple/
# pip config set global.extra-index-url "https://mirrors.sustech.edu.cn/pypi/web/simple https://mirrors.aliyun.com/pypi/simple/"

13.6. pipenv

Pipenv is a dependency manager for Python projects, is similar in spirit to Node.js’ npm or Ruby’s bundler.

# install pipenv in Debian/Ubuntu for the system python.
apt install pipenv
# install pipenv for the user python.
pip install pipenv --user

# If pipenv isn’t available in a shell after installation, add the user site-packages binary directory to `PATH`.
#
# On Windows, the user base binary directory can be found by running
# `python -m site --user-site`
# and replacing `site-packages` with `Scripts`.
#
# On Linux and macOS, find the user base binary directory by running
# `python -m site --user-base`
# and appending `bin` to the end.

Debian/Linux might not work due to limitations with user-based installations.

  1. Using apt

    apt install pipenv
  2. Using pip with virtualenv

    # Create a virtual environment
    python3 -m venv pipenv_env
    
    # Activate the virtual environment (replace "pipenv_env" with your chosen name)
    source pipenv_env/bin/activate
    
    # Install pipenv within the virtual environment
    pip install pipenv
    
    # Deactivate the virtual environment (optional)
    deactivate
# Pipenv manages dependencies on a per-project basis.
mkdir myproject && cd myproject
pipenv install requests
ls  # Pipfile  Pipfile.lock
# activate the project's virtualenv:
pipenv shell
# main.py
import requests

response = requests.get('https://httpbin.org/ip')

print('Your IP is {0}'.format(response.json()['origin']))
# run a command inside the virtualenv:
pipenv run python main.py
# Your IP is 9.5.2.7
pipenv check         # Checks for PyUp Safety security vulnerabilities and against
                     # PEP 508 markers provided in Pipfile.
pipenv clean         # Uninstalls all packages not specified in Pipfile.lock.
pipenv graph         # Displays currently-installed dependency graph information.
pipenv install       # Installs provided packages and adds them to Pipfile, or (if no
                     # packages are given), installs all packages from Pipfile.
pipenv lock          # Generates Pipfile.lock.
pipenv open          # View a given module in your editor.
pipenv requirements  # Generate a requirements.txt from Pipfile.lock.
pipenv run           # Spawns a command installed into the virtualenv.
pipenv scripts       # Lists scripts in current environment config.
pipenv shell         # Spawns a shell within the virtualenv.
pipenv sync          # Installs all packages specified in Pipfile.lock.
pipenv uninstall     # Uninstalls a provided package and removes it from Pipfile.
pipenv update        # Runs lock, then sync.
pipenv upgrade       # Resolves provided packages and adds them to Pipfile, or (if no
                     # packages are given), merges results to Pipfile.lock
pipenv verify        # Verify the hash in Pipfile.lock is up-to-date.

14. Testing

  • unittest

    # test_cap.py
    import unittest
    
    def cap(text: str) -> str:
        return text.capitalize()
    
    class TestCap(unittest.TestCase):
        def setUp(self) -> None:
            pass
    
        def tearDown(self) -> None:
            pass
    
        def test_one_word(self):
            text = 'duck'  # _arrange_ the objects, create and set them up as necessary.
    
            result = cap(text)  # _act_ on an object.
    
            self.assertEqual('Duck', result)  # _assert_ that something is as expected.
    
        def test_multi_words(self):
            text = 'hello world'  # _arrange_ the objects, create and set them up as necessary.
    
            result = cap(text)  # _act_ on an object.
    
            self.assertEqual('Hello World', result)  # _assert_ that something is as expected.
    
    if __name__ == '__main__':
        unittest.main()
    $ python3 test_cap.py
    F.
    ======================================================================
    FAIL: test_multi_words (__main__.TestCap.test_multi_words)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "...", line 27, in test_multi_words
        self.assertEqual('Hello World', result)
    AssertionError: 'Hello World' != 'Hello world!'
    - Hello World
    ?       ^
    + Hello world
    ?       ^
    
    
    ----------------------------------------------------------------------
    Ran 2 tests in 0.003s
    
    FAILED (failures=1)
  • doctest

    # doctest_cap.py
    def cap(text: str) -> str:
        """
        >>> cap('duck')
        'Duck'
        >>> cap('hello world')
        'Hello World'
        """
        return text.capitalize()
    
    if __name__ == '__main__':
        import doctest
        doctest.testmod()
    $ python3 doctest_cap.py
    **********************************************************************
    File "...", line 5, in __main__.cap
    Failed example:
        cap('hello world')
    Expected:
        'Hello World'
    Got:
        'Hello world'
    **********************************************************************
    1 items had failures:
       1 of   2 in __main__.cap
    ***Test Failed*** 1 failures.
  • pytest

    # test_cap.py
    def cap(text: str) -> str:
        return text.capitalize()
    
    def test_one_word():
        text = 'duck'
        result = cap(text)
        assert result == 'Duck'
    
    def test_multiple_words():
        text = 'hello world'
        result = cap(text)
        assert result == 'Hello World'
    $ pipenv install pytest
    Installing pytest...
    Installing dependencies from Pipfile.lock (207fdb)...
    $ pytest
    ============================================== test session starts ==============================================
    platform linux -- Python 3.11.2, pytest-8.2.1, pluggy-1.5.0
    rootdir: ...
    collected 2 items
    
    test_cap.py .F                                                                                            [100%]
    
    =================================================== FAILURES ====================================================
    ______________________________________________ test_multiple_words ______________________________________________
    
        def test_multiple_words():
            text = 'hello world'
            result = cap(text)
    >       assert result == 'Hello World'
    E       AssertionError: assert 'Hello world' == 'Hello World'
    E
    E         - Hello World
    E         ?       ^
    E         + Hello world
    E         ?       ^
    
    test_cap.py:12: AssertionError
    ============================================ short test summary info ============================================
    FAILED test_cap.py::test_multiple_words - AssertionError: assert 'Hello world' == 'Hello World'
    ========================================== 1 failed, 1 passed in 0.09s ==========================================

15. Processes and concurrency

# The standard library’s os module provides a common way of accessing some system information.
import os
os.uname()
# posix.uname_result(sysname='Linux', nodename='node-0', release='6.1.0-21-amd64', version='#1 SMP PREEMPT_DYNAMIC Debian 6.1.90-1 (2024-05-03)', machine='x86_64')
os.getloadavg()
# (0.05126953125, 0.03955078125, 0.00341796875)
os.cpu_count()
# 4
(os.getpid(), os.getcwd(), os.getuid(), os.getgid())
# (1295, '/tmp', 1000, 1000)
os.system('date -u')
# Thu Jun  6 11:23:23 AM UTC 2024
# 0
# get system and process information with the third-party package psutil
import psutil  # pip install psutil
print(psutil.cpu_times(percpu=True))
# [scputimes(user=4.37, nice=0.0, system=6.71, idle=1468.69, iowait=0.26, irq=0.0, softirq=1.86, steal=0.0, guest=0.0, guest_nice=0.0), scputimes(user=11.84, nice=0.0, system=9.3, idle=1465.29, iowait=1.02, irq=0.0, softirq=0.75, steal=0.0, guest=0.0, guest_nice=0.0), scputimes(user=10.31, nice=0.0, system=8.58, idle=1468.4, iowait=1.66, irq=0.0, softirq=0.97, steal=0.0, guest=0.0, guest_nice=0.0), scputimes(user=9.11, nice=0.0, system=10.02, idle=1467.95, iowait=0.81, irq=0.0, softirq=0.65, steal=0.0, guest=0.0, guest_nice=0.0)]
print(psutil.cpu_percent(percpu=False))
# 0.0
print(psutil.cpu_percent(percpu=True))
# [0.3, 0.4, 0.4, 0.1]

15.1. subprocess and multiprocessing

import subprocess

# run another program in a shell
# and grab whatever output it created (both standard output and standard error output)
print(subprocess.getoutput('date'))  # Thu Jun  6 07:19:50 PM CST 2024

# A variant method called `check_output()` takes a list of the command and arguments.
# By default it returns standard output only as type bytes rather than a string, and
# does not use the shell:
print(subprocess.check_output(['date', '-u']))  # b'Thu Jun  6 11:30:09 AM UTC 2024\n'

# return a tuple with the status code and output of the other program
print(subprocess.getstatusoutput('date'))  # (0, 'Thu Jun  6 07:32:25 PM CST 2024')

# capture the exit status only
ret = subprocess.call('date -u', shell=True)
# Thu Jun  6 11:45:51 AM UTC 2024
print(ret)
# 0

# makes a list of the arguments, not need to call the shell
ret = subprocess.call(['date', '-u'])
# Thu Jun  6 11:50:04 AM UTC 2024
print(ret)
# 0
# create multiple independent processes
import multiprocessing
import os

def whoami(what):
    print("Process %s says: %s" % (os.getpid(), what))

if __name__ == "__main__":
    whoami("I'm the main program")
    for n in range(4):
        p = multiprocessing.Process(
            target=whoami, args=("I'm function %s" % n,))
        p.start()

# Process 1648 says: I'm the main program
# Process 1649 says: I'm function 0
# Process 1650 says: I'm function 1
# Process 1651 says: I'm function 2
# Process 1652 says: I'm function 3
# kill a process with terminate()
import multiprocessing
import time
import os

def whoami(name):
    print("I'm %s, in process %s" % (name, os.getpid()))

def loopy(name):
    whoami(name)
    start = 1
    stop = 1000000
    for num in range(start, stop):
        print("\tNumber %s of %s. Honk!" % (num, stop))
        time.sleep(1)

if __name__ == "__main__":
    whoami("main")
    p = multiprocessing.Process(target=loopy, args=("loopy",))
    p.start()
    time.sleep(5)
    p.terminate()

# I'm main, in process 13084
# I'm loopy, in process 14664
#         Number 1 of 1000000. Honk!
#         Number 2 of 1000000. Honk!
#         Number 3 of 1000000. Honk!
#         Number 4 of 1000000. Honk!
#         Number 5 of 1000000. Honk!

15.2. Queues, processes, and threads

A queue is like a list: things are added at one end and taken away from the other, which most common is referred to as FIFO (first in, first out). In general, queues transport messages, which can be any kind of information, for distributed task management, also known as work queues, job queues, or task queues.

Threads can be dangerous. Like manual memory management in languages such as C and C++, they can cause bugs that are extremely hard to find, let alone fix. To use threads, all the code in the program (and in external libraries that it uses) must be thread safe.

In Python, threads do not speed up CPU-bound tasks because of an implementation detail in the standard Python system called the Global Interpreter Lock (GIL).

  • Use threads for I/O-bound problems

  • Use processes, networking, or events (discussed in the next section) for CPU-bound problems

import multiprocessing as mp

def washer(dishes, output):
    for dish in dishes:
        print('Washing', dish, 'dish')
        output.put(dish)

def dryer(input):
    while True:
        dish = input.get()
        print('Drying', dish, 'dish')
        input.task_done()

dish_queue = mp.JoinableQueue()
dryer_proc = mp.Process(target=dryer, args=(dish_queue,))
dryer_proc.daemon = True
dryer_proc.start()
dishes = ['salad', 'bread', 'entree', 'dessert']
washer(dishes, dish_queue)
dish_queue.join()

# Washing salad dish
# Washing bread dish
# Washing entree dish
# Washing dessert dish
# Drying salad dish
# Drying bread dish
# Drying entree dish
# Drying dessert dish
import threading
import queue
import time

def washer(dishes, dish_queue):
    for dish in dishes:
        print("Washing", dish)
        time.sleep(5)
        dish_queue.put(dish)

def dryer(dish_queue):
    while True:
        dish = dish_queue.get()
        print("Drying", dish)
        time.sleep(10)
        dish_queue.task_done()

dish_queue = queue.Queue()
for n in range(2):
    dryer_thread = threading.Thread(target=dryer, args=(dish_queue,))
    dryer_thread.start()
dishes = ['salad', 'bread', 'entree', 'dessert']
washer(dishes, dish_queue)
dish_queue.join()

# Washing salad
# Washing bread
# Drying salad
# Washing entree
# Drying bread
# Washing dessert
# Drying entree
# Drying dessert

15.3. concurrent.futures

The concurrent.futures module in the standard library can be used to schedule an asynchronous pool of workers, using threads (when I/O-bound) or processes (when CPU-bound), and get back a future to track their state and collect the results.

Use concurrent.futures any time to launch a bunch of concurrent tasks, such as the following:

  • Crawling URLs on the web

  • Processing files, such as resizing images

  • Calling service APIs

from concurrent import futures
import math
import sys

def calc(val):
    result = math.sqrt(float(val))
    return val, result

def use_threads(num, values):
    with futures.ThreadPoolExecutor(num) as tex:
        tasks = [tex.submit(calc, value) for value in values]
        for f in futures.as_completed(tasks):
            yield f.result()

def use_processes(num, values):
    with futures.ProcessPoolExecutor(num) as pex:
        tasks = [pex.submit(calc, value) for value in values]
        for f in futures.as_completed(tasks):
            yield f.result()

def main(workers, values):
    print(f"Using {workers} workers for {len(values)} values")
    print("Using threads:")
    for val, result in use_threads(workers, values):
        print(f'{val} {result:.4f}')
    print("Using processes:")
    for val, result in use_processes(workers, values):
        print(f'{val} {result:.4f}')

if __name__ == '__main__':
    workers = 3
    if len(sys.argv) > 1:
        workers = int(sys.argv[1])
        values = list(range(1, 6))  # 1 .. 5
    main(workers, values)

15.4. Asynchronous programming with async and await

In Python 3.4, Python added a standard asynchronous module called asyncio. Python 3.5 then added the keywords async and await. These implement some new concepts:

  • Coroutines are functions that pause at various points

  • An event loop that schedules and runs coroutines

import asyncio

async def say(phrase, seconds):
    print(phrase)
    await asyncio.sleep(seconds)

async def wicked():
    task_1 = asyncio.create_task(say("Surrender,", 2))
    task_2 = asyncio.create_task(say("Dorothy!", 0))
    await task_1
    await task_2

#  blocking: runs the passed coroutine in the default executor, which given a timeout duration of 5 minutes to shutdown
asyncio.run(wicked())
import asyncio

async def say(phrase, seconds):
    print(phrase)
    await asyncio.sleep(seconds)

async def wicked():
    task_1 = asyncio.create_task(say("Surrender,", 2))
    task_2 = asyncio.create_task(say("Dorothy!", 0))
    await asyncio.gather(task_1, task_2)  # Wait for all tasks to finish concurrently

loop = asyncio.get_event_loop()
loop.run_until_complete(wicked())
loop.close()

16. SQL

DB-API (Database API), similar to JDBC in Java, is a standardized interface for Python that allows us to interact with various relational databases using a consistent set of functions and methods, which can simplify database access by providing a common ground for working with different database systems like MySQL, PostgreSQL, SQL Server, and SQLite.

  • DB-API focuses on fundamental database operations like connecting, executing SQL queries, fetching results, and committing/rolling back transactions.

  • Different database modules (e.g., MySQLdb, psycopg2, sqlite3) implement the DB-API standard, ensuring consistency in these core functionalities across various systems.

  • DB-API promotes parameterization of SQL queries using placeholders (%s, ?, etc.) for values, which enhances security by preventing SQL injection vulnerabilities and improves portability by separating data from the query itself.

16.1. Using DB-API with SQLite in Memory

import sqlite3

# Connect to an in-memory database (no file needed)
with sqlite3.connect(":memory:") as connection:

    # Create a cursor object
    cursor = connection.cursor()

    # Create a table (assuming you don't have one)
    cursor.execute('''
CREATE TABLE IF NOT EXISTS users (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  username TEXT NOT NULL,
  email TEXT UNIQUE NOT NULL)
''')

    # Insert some data using parameterization
    users = [("Alice", "alice@example.com"), ("Bob", "bob@example.com")]
    cursor.executemany(
        "INSERT INTO users (username, email) VALUES (?, ?)", users)

    # Commit the changes
    connection.commit()

    # Query the data
    cursor.execute("SELECT * FROM users")

    # Fetch all results
    results = cursor.fetchall()

    # Print the results
    for row in results:
        print(f"ID: {row[0]}, Username: {row[1]}, Email: {row[2]}")

References