1. Running Python

  • Using the interactive interpreter (shell)

    $ python3
    Python 3.11.2 (main, Mar 13 2023, 12:18:29) [GCC 12.2.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> 2+2
    4
    >>> quit()
  • Using python files

    $ echo 'print(2+2)' > test.py
    $ python3 test.py
    4
  • Using python files with shebang

    In computing, a shebang is the character sequence consisting of the characters number sign and exclamation mark (#!) at the beginning of a script. It is also called sharp-exclamation, sha-bang, hashbang, pound-bang, or hash-pling.

     — From Wikipedia, the free encyclopedia

    $ cat <<EOF > test.py
    > #!/usr/bin/env python3
    > print(2+2)
    > EOF
    $ chmod +x test.py
    $ ./test.py
    4
  • Executing modules as scripts

    In Python, python -m is a command-line construct used to execute modules as scripts directly from the command line without explicitly writing a separate Python script file (.py).

    $ python3 -m venv --help
    usage: venv [-h] [--system-site-packages] [--symlinks | --copies] [--clear] [--upgrade] [--without-pip]
                [--prompt PROMPT] [--upgrade-deps]
                ENV_DIR [ENV_DIR ...]
    
    Creates virtual Python environments in one or more target directories.
    . . .
    $ python3 -m webbrowser https://www.google.com

2. Indentations, comments, and multi-line expressions

  • Python uses whitespace indentation (the recommended style, called PEP-8, is to use four spaces), rather than curly brackets or keywords, to delimit blocks.

    • Don’t use tabs, or mix tabs and spaces; it messes up the indent count.

    • When designing the language that became Python, Guido van Rossum decided that the indentation itself was enough to define a program’s structure, and avoided typing all those parentheses and curly braces. Python is unusual in this use of white space to define program structure.

    disaster = True
    if disaster:
        print("Woe!")
    else:
        print("Whee!")
  • A comment is marked by using the # (names: hash, sharp, pound, or or the sinister-sounding octothorpe) character; everything from that point on to the end of the current line is part of the comment.

    # 60 sec/min * 60 min/hr * 24 hr/day
    seconds_per_day = 86400
    seconds_per_day = 86400 # 60 sec/min * 60 min/hr * 24 hr/day
    # Python does NOT
    # have a multiline comment.
    print("No comment: quotes make the # harmless.")
  • Python allows to write expressions that span multiple lines within certain delimiters.

    • In older versions of Python (pre-3.0), the backslash character (\) at the end of a line was used to indicate that the line continued on the next line, which is no longer required in modern Python (versions 3.0 and above).

      # Example in older Python (not recommended)
      long_expression = (1 + 2 + 3 + 4 + 5 + \
                        6 + 7 + 8 + 9 + 10)
    • In modern Python, avoid using the continuation character (\) for line continuation, and utilize parentheses (()), brackets ([]), or braces ([]) for readability and structure in multi-line expressions.

      # Parentheses for complex calculations
      long_calculation = (a * b +
                          c) * (d /
                                e - f)
      
      # Brackets for multi-line lists or data structures
      data = [
          "item1",
          "item2 with a longer description",
          "item3"
      ]
      
      # Braces for multi-line dictionaries
      person_info = {
          "name": "Alice",
          "age": 30,
          "hobbies": ["reading", "hiking"]
      }

3. Types, values, variables, and names

Python is a dynamically, strongly typed and garbage-collected programming language.

  • In a dynamically typed language, the data type of a variable is NOT explicitly declared at the time of definition, and is determined at runtime.

    age = 30  # age is an integer (no need to declare the data type explicitly)
    age = "thirty"  # age is now a string
  • In a statically typed language, the data type of a variable MUST be declared at compile time and the compiler ensures type compatibility throughout the code.

    // In Java, declare the type of a variable before assigning a value.
    int age = 30;  // age is declared as an integer
    age = "thirty";  // error: incompatible types: String cannot be converted to int
  • In a strongly typed language, the data type of a variable MUST be declared at the time of definition, and the compiler or interpreter enforces type safety.

  • In Python, everything is ultimately an object, even data types like integers and strings, that has associated methods and attributes. During runtime, Python checks if the methods or attributes involved are compatible with the object’s type.

    # Like dynamic languages, Python infers types based on assigned values.
    name = "Alice"  # name is a string
    name + 10  # This would cause a TypeError in Python (mixing string and number)

    In computer programming, duck typing is an application of the duck test—"If it walks like a duck and it quacks like a duck, then it must be a duck"—to determine whether an object can be used for a particular purpose.

     — From Wikipedia, the free encyclopedia

bool # True, False

int # 47, 25000, 25_000, 0b0100_0000, 0o100, 0x40

float # 3.14, 2.7e5

complex # 3j, 5 + 9j

# In Python 3, strings are Unicode character sequences, not byte arrays.
str # 'alas', "alack", '''a verse attack'''

list # ['Winken', 'Blinken', 'Nod']
tuple # (2, 4, 8)

bytes # b'ab\xff'
bytearray # bytearray(...)

set # set([3, 5, 7])
frozenset # frozenset(['Elsa', 'Otto'])

dict # {'game': 'bingo', 'dog': 'dingo', 'drummer': 'Ringo'}
  • In Python, variables are NOT places, just names, and a name is a reference to an object rather than the object itself, which is a chunk of data that contains at least a type, a unique id, a value, and a reference count.

    >>> type(5.20)
    <class 'float'>
    >>> id(5.20)
    140683748269744
    >>> x = y = z = 0  # More than one variable name can be assigned a value at the same time
    >>> sys.getrefcount(x)
    1000000591
    >>> del y
    >>> sys.getrefcount(x)
    1000000590
    >>> del z
    >>> sys.getrefcount(x)
    1000000589
  • A class is the definition of an object, and "class" and "type" mean pretty much the same thing.

    >>> type(7)
    <class 'int'>
    >>> type(7) == int
    True
    >>> isinstance(7, int)
    True
  • Strings, tuples and lists are common built-in sequences, which are zero-based indexing and ordered collections that can store elements of any data types, except strings, which are sequences of characters themselves.

    # iteration
    for item in ['meow', 'bark', 'moo']:
        print(item)
    # enumeration
    for index, item in enumerate(['meow', 'bark', 'moo']):
        print(f'Index: {index}, Item: {item}')
    # comparisons
    ('meow', 'bark', 'moo') == ('meow', 'bark', 'moo')  # True
    ('meow', 'bark', 'moo') >= ('meow', 'bark')  # True
    ('meow', 'bark', 'moo') > ('meow', 'bark')  # True
    # `+`, `*`
    ('cat',) + ('dog', 'cattle')  # ('cat', 'dog', 'cattle')
    ('bark',) * 3  # ('bark', 'bark', 'bark')
    # unpacking
    cat, dog, cattle = ('meow', 'bark', 'moo')
    # testing with `in`
    'c' in 'cat'  # True
    'meow' in ['cat', 'cattle', 'dog']  # False
    # indexing, and slicing a shallow copy subsequence:
    hi = 'hello world!'
    hi[-13], hi[12]  # IndexError: string index out of range
    
    #   [:] extracts the entire sequence from start to end.
    #   [ start :] specifies from the start offset to the end.
    #   [: end ] specifies from the beginning to the end offset minus 1.
    #   [ start : end ] indicates from the start offset to the end offset minus 1.
    #   [ start : end : step ] extracts from the start offset to the end offset minus 1, skipping characters by step.
    hi[:], hi[0:5], hi[:5], hi[:5:], hi[0:5:], hi[0:5:1]  # ('hello world!', 'hello', 'hello', 'hello', 'hello', 'hello')
    len(hi), hi[-1], hi[-12], hi[11], hi[0]  # (12, '!', 'h', '!', 'h')
  • In Python, truthiness and falsiness are used to check a value in a Boolean context:

    • Truthy: Values that evaluate to True, which includes most non-zero numbers, non-empty strings, lists, dictionaries, and many objects.

    • Falsy: Values that evaluate to False, which include False, zero numbers (0, 0.0), empty strings (""), lists ([]), and tuples (()), and None.

  • In Python, the logical operators and, or, not are used to combine Boolean values (True/False) or expressions that evaluate to Boolean values.

    letter = 'o'
    if letter == 'a' or letter == 'e' or letter == 'i' or letter == 'o' or letter == 'u':
        print(letter, 'is a vowel')
    else:
        print(letter, 'is not a vowel')
  • int(), float(), bin(), oct(), hex(), chr(), and ord()

    int(True), int(False)  # (1, 0)
    int(98.6), int(1.0e4)  # (98, 10_000)
    int('99'), int('-23'), int('+12'), int('1_000_000')  # (99, -23, 12, 1_000_000)
    
    int('10', 2), 'binary', int('10', 8), 'octal', int('10', 16), 'hexadecimal', int('10', 22), 'chesterdigital'
    # (2, 'binary', 8, 'octal', 16, 'hexadecimal', 22, 'chesterdigital')
    
    float(True), float(False)  # (1.0, 0.0)
    float('98.6'), float('-1.5'), float('1.0e4')  # (98.6, -1.5, 10_000.0)
    
    bin(65), oct(65), hex(65)  # ('0b1000001', '0o101', '0x41')
    
    chr(65), ord('A')  # ('A', 65)
    
    # Python also promotes booleans to integers or floats:
    False + 0, True + 0, False + 0., True + 0.  # (0, 1, 0.0, 1.0)
  • type hints (or type annotations): variable_name: type, def func(argument: type) -> type

    age: int = 30
    pi: float = 3.14159
    def greet(name: str) -> str:
      """Greets the provided name."""
      return f"Hello, {name}!"
  • Python provides bit-level integer operators, similar to those in the C language.

    x = 5  # 0b0101
    y = 1  # 0b0001
    
    print(f"0b{(x & y):04b}")  # and
    # 0b0001
    print(f"0b{(x | y):04b}")  # or
    # 0b0101
    print(f"0b{(x ^ y):04b}")  # exclusive or
    # 0b0100
    print(f'0b{~x:04b}')  # flip bits
    # 0b-110
    print(f'0b{(x << 1):04b}')  # left shift
    # 0b1010
    print(f'0b{(x >> 1):04b}')  # right shift
    # 0b0010

4. Strings

  • UTF-8 is the standard text encoding in Python, Linux, and HTML.

    Ken Thompson and Rob Pike, whose names will be familiar to Unix developers, designed the UTF-8 dynamic encoding scheme one night on a placemat in a New Jersey diner. It uses one to four bytes per Unicode character:

    • One byte for ASCII

    • Two bytes for most Latin-derived (but not Cyrillic) languages

    • Three bytes for the rest of the basic multilingual plane

    • Four bytes for the rest, including some Asian languages and symbols

    cafe = 'café'
    
    # len() function on string counts Unicode characters, not bytes:
    len(cafe)  # 4
    
    cafe_bytes = cafe.encode()  # b'caf\xc3\xa9'
    
    # len() returns the number of bytes:
    len(cafe_bytes)  # 5
    
    cafe_text = cafe_bytes.decode()  # 'café'
  • Strings are created by enclosing characters in matching single, double, or triple quotes:

    'Snap'
    "Crackle"
    "'Nay!' said the naysayer. 'Neigh?' said the horse."
    'The rare double quote in captivity: ".'
    '''Boom!'''
    """Eek!"""
  • Triple quotes are very useful to create multiline strings, like this classic poem from Edward Lear:

    poem = '''There was a Young Lady of Norway,
        Who casually sat in a doorway;
        When the door squeezed her flat,
        She exclaimed, "What of that?"
        This courageous Young Lady of Norway.'''
    print(poem)
    There was a Young Lady of Norway,
        Who casually sat in a doorway;
        When the door squeezed her flat,
        She exclaimed, "What of that?"
        This courageous Young Lady of Norway.
    # the line ending characters, and leading or trailing spaces are preserved as below:
    'There was a Young Lady of Norway,\n    Who casually sat in a doorway;\n    When the door squeezed her flat,\n    She exclaimed, "What of that?"\n    This courageous Young Lady of Norway.'
  • Escape with \, combine by using +, duplicate with *

    hi = 'Na ' 'Na ' 'Na ' 'Na ' \ # literal strings (not string variables) just one after the other
        + 'Hey ' * 4 \
        + '\\' + '\t' + 'Goodbye.'
    print(hi)  # Na Na Na Na Hey Hey Hey Hey \	Goodbye.
  • Python has a few special types of strings, indicated by a letter before the first quote.

    • f or F starts an f-string, used for formatting.

      thing = 'wereduck'
      place = 'werepond'
      print(f'The {thing} is in the {place}')  # 'The wereduck is in the werepond'
    • r or R starts a raw string, used to prevent escape sequences in the string.

      info = r'Type a \n to get a new line'  # info = 'Type a \\n to get a new line'
      # raw string does not undo any real (not `\n`) newlines:
      poem = r'''Boys and girls, come out to play.
      The moon doth shine as bright as day.'''  # 'Boys and girls, come out to play.\nThe moon doth shine as bright as day.'
      print(poem)
      Boys and girls, come out to play.
      The moon doth shine as bright as day.
    • fr (or FR, Fr, or fR), the combination, that starts a raw f-string.

      hello = 'Hello'
      world = '世界'
      print(fr'{hello}, {world}!')  # Hello, 世界!
    • u starts a Unicode string, which is the same as a plain string.

      Python 3 strings are Unicode character sequences, not byte arrays.
      hi = u'Hello, 世界!'  # same as: hi = 'Hello, 世界!'
    • b starts a value of type bytes.

      ip = [20, 205, 243, 166]
      bytes(ip)  # b'\x14\xcd\xf3\xa6'
  • Python has three ways of formatting strings.

    actor = 'Richard Gere'
    cat = 'Chester'
    weight = 28
    # old style (supported in Python 2 and 3): format_string % data
    "My wife's favorite actor is %s" % actor  # "My wife's favorite actor is Richard Gere"
    "Our cat %s weighs %d pounds" % (cat, weight)  # 'Our cat Chester weighs 28 pounds'
    # new style (Python 2.6 and up): format_string.format(data)
    "Our cat {} weighs {} pounds".format(cat, weight)  # 'Our cat Chester weighs 28 pounds'
    # f-strings (Python 3.6 and up): f, F
    f"Our cat {cat} weighs {weight} pounds"  # 'Our cat Chester weighs 28 pounds'
  • regular expressions

    import re
    
    p = 'Les Fleurs du Mal'  # pattern
    c = re.compile(p)  # compile
    s = "Charles Baudelaire's 'Les Fleurs du Mal'"  # source
    m = c.search(s)  # match
    if m:  # m != None
        print("Mon cœur est comme une feuille sèche, emportée par le vent...")
    m = re.match('Les Fleurs du Mal', s)  # find exact beginning match with match()
    print(m)  # return a Match object
    # None
    
    m = re.search('Les Fleurs du Mal', s)  # find first match with search()
    print(m)  # return a Match object
    # <re.Match object; span=(22, 39), match='Les Fleurs du Mal'>
    
    m = re.findall('es', s)  # find all matches with findall()
    print(m)  # return a list
    # ['es', 'es']
    
    m = re.split(r'\s', s)  # split at matches with split()
    print(m)  # return a list
    # ['Charles', "Baudelaire's", "'Les", 'Fleurs', 'du', "Mal'"]
    
    m = re.sub("'", '?', s)  # replace at matches with sub()
    print(m)  # return a string
    # Charles Baudelaire?s ?Les Fleurs du Mal?

5. If, while, and for

  • In Python (version 3.8 and above), the walrus operator (:=, formally known as the assignment expression operator) combines assignment and expression evaluation in a single line.

    tweet_limit = 280
    tweet_string = "Blah" * 50
    if diff := tweet_limit - len(tweet_string) >= 0:  # walrus operator
        print("A fitting tweet")
    else:
        print("Went over by", abs(diff))
  • Compare with if, elif, and else:

    color = "mauve"
    if color == "red":
        print("It's a tomato")
    elif color == "green":
        print("It's a green pepper")
    else:
        print("I've never heard of the color", color)
  • Repeat with while, and break, continue, and else:

    while True:
        value = input("Integer, please [q to quit]: ")
        if value == 'q':  # quit
            break
        number = int(value)
        if number % 2 == 0:  # an even number
            continue
        print(number, "squared is", number*number)
    numbers = [1, 3, 5]
    position = 0
    while position < len(numbers):
        number = numbers[position]
        if number % 2 == 0:
            print('Found even number', number)
            break
        position += 1
    else:  # break not called
        print('No even number found')
  • Iterate with for and in, and break, continue and else:

    word = 'thud'
    for letter in word:
        if letter == 'u':
            continue
        print(letter)
    word = 'thud'
    for letter in word:
        if letter == 'x':
            print("Eek! An 'x'!")
            break
        print(letter)
    else:
        print("No 'x' in there.")
    for num in range(0, 10, 2):
        print(num)  # 0 2 ... 8
    for nums in zip(range(0, 10, 2), range(1, 10, 2)):
        print(nums)  # (0, 1) (2, 3) .. (8, 9)

6. Tuples and lists

  • Tuples are built-in immutable sequences.

    # to make a tuple with one or more elements, follow each element with a comma (`,`):
    'cat',  # ('cat',)
    'cat', 'dog', 'cattle'  # ('cat', 'dog', 'cattle')
    
    # to make an empty tuple, using `()`, or `tuple()`:
    ()  # ()
    tuple()  # ()
    
    # the comma is required to make a tuple
    ('cat')  # 'cat'
    
    # the parentheses is not required, but could make the tuple more visible
    ('cat',)  # ('cat',)
    ('cat', 'dog', 'cattle')  # ('cat', 'dog', 'cattle')
    
    # for cases in which commas might also have another use, the parentheses is needed
    type('cat',)  # <class 'str'>
    type(('cat',))  # <class 'tuple'>
    
    # tuple()
    tuple('cat')  # ('c', 'a', 't')
    
    # zip()
    for x in zip([1, 2, 8], [1, 4, 9], ('cat', 'dog', 'cattle', 'chicken')):
         print(x)
    # (1, 1, 'cat')
    # (2, 4, 'dog')
    # (8, 9, 'cattle')
  • Lists are built-in mutable sequences.

    # create with `[]` or `list()`
    []  # []
    ['meow', 'bark', 'moo']  # ['meow', 'bark', 'moo']
    [('cat', 'meow'), 'bark', 'moo']  # [('cat', 'meow'), 'bark', 'moo']
    list()  # []
    list('cat')  # ['c', 'a', 't']
    
    # append(), insert()
    wow = ['meow']  # ['meow']
    wow.append('moo')  # ['meow', 'moo']
    wow.insert(1, 'bark')  # ['meow', 'bark', 'moo']
    
    # del, remove(), pop(), clear()
    farm = ['cat', 'dog', 'cattle', 'chicken', 'duck']
    
    del farm[-1]
    # ['cat', 'dog', 'cattle', 'chicken']
    
    farm.remove('dog')
    # ['cat', 'cattle', 'chicken']
    
    farm.pop()  # 'chicken'
    # ['cat', 'cattle']
    
    farm.pop(-1)  # 'cattle'
    # ['cat']
    
    farm.clear()
    # []
    
    # sort() and sorted()
    farm = ['cat', 'dog', 'cattle']
    
    # a sorted copy
    sorted(farm)  # ['cat', 'cattle', 'dog']
    print(farm)  # ['cat', 'dog', 'cattle']
    
    # sorting in-place
    farm.sort()
    print(farm)  # ['cat', 'cattle', 'dog']
    
    # copy() and deepcopy()
    a = [['cat', 'meow'], ['dog', 'bark']]
    b = a.copy()
    c = a[:]
    d = list(c)
    
    import copy
    e = copy.deepcopy(a)
    
    a[0][1] = 'moo'
    a  # [['cat', 'moo'], ['dog', 'bark']]
    b  # [['cat', 'moo'], ['dog', 'bark']]
    c  # [['cat', 'moo'], ['dog', 'bark']]
    d  # [['cat', 'moo'], ['dog', 'bark']]
    
    e  # [['cat', 'meow'], ['dog', 'bark']]
    
    # list comprehensions: [expression for item in iterable]
    even_numbers = [2 * num for num in range(5)]
    # [0, 2, 4, 6, 8]
    # list comprehensions: [expression for item in iterable if condition]
    odd_numbers = [num for num in range(10) if num % 2 == 1]
    # [1, 3, 5, 7, 9]

7. Dictionaries and sets

In Python, keys in dictionaries (dict) and elements in sets must be of immutable, or hashable data types.

Dictionaries

# `{}`
{}  # {}
{'cat': 'meow', 'dog': 'bark'}  # {'cat': 'meow', 'dog': 'bark'}

# dict(): argument names need to be legal variable names (no spaces, no reserved words)
dict(cat='meow', dog='bark')  # {'cat': 'meow', 'dog': 'bark'}

# dict(): convert two-value sequences into a dictionary
dict([['cat', 'meow'], ['dog', 'bark']])  # {'cat': 'meow', 'dog': 'bark'}

# [key], get()
animals = {'cat': 'meow', 'dog': 'bark'}
animals['cattle'] = 'moo'  # {'cat': 'meow', 'dog': 'bark', 'cattle': 'moo'}
animals['cat']  # 'meow'
animals['sheep']  # KeyError: 'sheep'
animals.get('sheep')  # None
animals.get('sheep', 'baa')  # 'baa'

# keys(), values(), items(), len()
animals.keys()  # dict_keys(['cat', 'dog', 'cattle'])
animals.values()  # dict_values(['meow', 'bark', 'moo'])
animals.items()  # dict_items([('cat', 'meow'), ('dog', 'bark'), ('cattle', 'moo')])
len(animals)  # 3

# `**`, update()
{**{'cat': 'meow'}, **{'dog': 'bark'}}  # {'cat': 'meow', 'dog': 'bark'}
animals = {'cat': 'meow'}
animals.update({'dog': 'bark'})  # {'cat': 'meow', 'dog': 'bark'}

# del, pop(), clear()
animals = {'cat': 'meow', 'dog': 'bark', 'cattle': 'moo'}
del animals['dog']
# {'cat': 'meow', 'cattle': 'moo'}
animals.pop('cattle')  # 'moo'
# {'cat': 'meow'}
animals.clear()
# {}

# iterations
>>> animals = {'cat': 'meow', 'dog': 'bark', 'cattle': 'moo'}
for key in animals:  # for key in animals.keys()
    print(f'{key} => {animals[key]}', end='\t')
# cat => meow	dog => bark	cattle => moo

# dictionary comprehensions: {key_expression : value_expression for expression in iterable}
word = 'letters'
letter_counts = {letter: word.count(letter) for letter in word}
# {'l': 1, 'e': 2, 't': 2, 'r': 1, 's': 1}

# dictionary comprehensions: {key_expression : value_expression for expression in iterable if condition}
vowels = 'aeiou'
word = 'onomatopoeia'
vowel_counts = {letter: word.count(letter)
                for letter in set(word) if letter in vowels}
# {'i': 1, 'o': 4, 'a': 2, 'e': 1}

Sets

# `{}`, set(), frozenset()
{}  # <class 'dict'>
{0, 2, 4, 6}  # {0, 2, 4, 6}

set()  # set()
set('letter')  # {'l', 't', 'r', 'e'}
set({'cat': 'meow', 'dog': 'bark', 'cattle': 'moo'})  # {'cat', 'cattle', 'dog'}

frozenset()  # frozenset()
frozenset([3, 1, 4, 1, 5, 9])  # frozenset({1, 3, 4, 5, 9})

# len(), add(), remove()
nums = {0, 1, 2, 3, 4, }
len(nums)  # 5
nums.add(5)  # {0, 1, 2, 3, 4, 5}
nums.remove(0)  # {1, 2, 3, 4, 5}

# iteration
for num in {0, 2, 4, 6, 8}:
    print(num, end='\t')
# 0	2	4	6	8

# testing
2 in {0, 2, 4}  # True
3 in {0, 2, 4}  # False

# `&`: intersection(), `|`: union(), `-`: difference(), `^`: symmetric_difference()
a = {1, 3}
b = {2, 3}
a & b  # {3}
a | b  # {1, 2, 3}
a - b  # {1}
a ^ b  # {1, 2}

# `<=`: issubset(), `<`: proper subset, `>=`: issuperset(), `>`: proper superset
a <= b  # False
a < b  # False
a >= b  # False
a > b  # False

# set comprehensions: { expression for expression in iterable }
{num for num in range(10)}  # {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
# set comprehensions: { expression for expression in iterable if condition }
{num for num in range(10) if num % 2 == 0}  # {0, 2, 4, 6, 8}

8. Bytes and bytearray

Python 3 introduced the following sequences of eight-bit integers, with possible values from 0 to 255, in two types:

  • bytes is immutable, like a tuple of bytes

  • bytearray is mutable, like a list of bytes

Endian order refers to the byte order used to store multi-byte values (like integers, floats) in computer memory.

  • Big-Endian: In big-endian order, the most significant byte (MSB) of a multi-byte value is stored at the beginning (lower memory address) of the allocated space. The remaining bytes follow in decreasing order of significance.

  • Little-Endian: In little-endian order, the least significant byte (LSB) is stored at the beginning (lower memory address), followed by bytes of increasing significance.

blist = [1, 2, 3, 255]

the_bytes = bytes(blist)
print(the_bytes)
# b'\x01\x02\x03\xff'

the_byte_array = bytearray(blist)
print(the_byte_array)
# bytearray(b'\x01\x02\x03\xff')

the_bytes[0] = 127  # TypeError: 'bytes' object does not support item assignment

the_byte_array[0] = 127

the_byte_array[1] = 256  # ValueError: byte must be in range(0, 256)

the_bytes = bytes(range(0, 256))
for i in range(0, len(the_bytes), 16):
    end_index = min(i+16, len(the_bytes))
    print(the_bytes[i:end_index])
# b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f'
# b'\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f'
# b' !"#$%&\'()*+,-./'
# b'0123456789:;<=>?'
# b'@ABCDEFGHIJKLMNO'
# b'PQRSTUVWXYZ[\\]^_'
# b'`abcdefghijklmno'
# b'pqrstuvwxyz{|}~\x7f'
# b'\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f'
# b'\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f'
# b'\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf'
# b'\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf'
# b'\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf'
# b'\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf'
# b'\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef'
# b'\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'

9. Functions

# pass
def do_nothing():
    pass  # NOOP
do_nothing():
# None
def whatis(thing):
    if thing is None:
        print(thing, "is None")
    elif thing:
        print(thing, "is True")

whatis(None)  # None is None
# arguments
def menu(wine, entree, dessert):
    return {'wine': wine, 'entree': entree, 'dessert': dessert}

# positional arguments
menu('chardonnay', 'chicken', 'cake')
# {'wine': 'chardonnay', 'entree': 'chicken', 'dessert': 'cake'}

# keyword arguments
menu(entree='beef', dessert='bagel', wine='bordeaux')
# {'wine': 'bordeaux', 'entree': 'beef', 'dessert': 'bagel'}

# mix positional and keyword arguments
menu('frontenac', dessert='flan', entree='fish')
# {'wine': 'frontenac', 'entree': 'fish', 'dessert': 'flan'}

# default parameters
def menu(wine, entree, dessert='pudding'):
    return {'wine': wine, 'entree': entree, 'dessert': dessert}

menu('chardonnay', 'chicken')
# {'wine': 'chardonnay', 'entree': 'chicken', 'dessert': 'pudding'}
# (tuple) explode/gather optional positional arguments with `*`
def print_args(*args):
    print(args)

print_args()
# ()
print_args('meow', 'bark', 'moo')
# ('meow', 'bark', 'moo')
print_args(('meow', 'bark', 'moo'))
# (('meow', 'bark', 'moo'),)
print_args(*('meow', 'bark', 'moo'))
# ('meow', 'bark', 'moo')

# (dict) explode/gather optional keyword arguments with `**`
def print_kargs(**kargs):
    print(kargs)

print_kargs()
# {}
print_kargs(cat='meow', dog='bark', cattle='moo')
# {'cat': 'meow', 'dog': 'bark', 'cattle': 'moo'}
print_kargs(**{'cat': 'meow', 'dog': 'bark', 'cattle': 'moo'})
# {'cat': 'meow', 'dog': 'bark', 'cattle': 'moo'}
# keyword-only arguments `*`
def print_data(data, *, start=0, end=100):
    """
    the parametes start and end must be provided as named arguments
    """
    for v in data[start:end]:
        print(v, end='\t')

print_data(('meow', 'bark', 'moo'))
# meow	bark	moo
print_data(('meow', 'bark', 'moo'), start=1)
# bark	moo
def the_order_of_arguments(
    required: str,
    optional: str = None,
    *args: tuple,
    key: str = None,
    **kwargs: dict
) -> None:
  """
  This function demonstrates the order of arguments in Python.

  Args:
      required (str): A required positional argument.
      optional (str, optional): An optional positional argument with a default value of None.
      *args (tuple, optional): Captures any remaining positional arguments as a tuple.
      key (str, optional): A keyword-only argument with a default value of None.
      **kwargs (dict, optional): Captures any remaining keyword arguments as a dictionary.

  Returns:
      None
  """
  # Function body (can be replaced with actual logic)
  print(f"Required argument: {required}")
  print(f"Optional argument: {optional}")
  print(f"Positional arguments (as tuple): {args}")
  print(f"Keyword-only argument: {key}")
  print(f"Keyword arguments (as dictionary): {kwargs}")

the_order_of_arguments("This is required", "This is optional", x=10, y="hello")
# docstring
def echo(anything):
    'echo returns its input argument'
    return anything

print(echo.__doc__)  # 'echo returns its input argument'
help(echo)
# functions are first-class citizens
def answer():
    print(42)

def run_sth(func):
    func()

run_sth(answer)  # 42

# inner functions
def outer(a, b):
    def inner(c, d):
        return c+d
    return inner(a, b)

# closures
def wow(voice):
    def inner():
        return f'Wow: {voice}'
    return inner

cat = wow('meow')
dog = wow('bark')
cat()  # 'Wow: meow'
dog()  # 'Wow: bark'

# recursion
def flatten(lol):
    for item in lol:
        if isinstance(item, list):
            yield from flatten(item)  # yield from expression
        else:
            yield item

lol = [1, 2, [3, 4, 5], [6, [7, 8, 9], []]]
list(flatten(lol))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

# anonymous functions: lambda
def is_odd(num):
    return num % 2 == 1

nums = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
list(filter(is_odd, nums))
# [1, 3, 5, 7, 9]
list(filter(lambda num: num % 2 == 0, nums))
# [0, 2, 4, 6, 8]

9.1. Generators

A generator is a Python sequence creation object, which is often the source of data for iterators.

  • It can be used to iterate through potentially huge sequences without creating and storing the entire sequence in memory at once.

  • Every time iteration through a generator, it keeps track of where it was the last time it was called and returns the next value.

  • A generator can be run only once, and can’t be to restart or back up.

  • A generator function is a normal function, but it returns its value with a yield statement rather than return.

    def xrange(start=0, stop=10, step=1):
        number = start
        while number < stop:
            yield number
            number += step
    
    ranger = xrange(1, 5)
    print(ranger)  # <generator object xrange at 0x7f119757b220>
    
    for num in ranger:
        print(num, end='\t')  # 1	2	3	4

9.2. Decorators

A decorator is a function that takes one function as input and returns another function.

def document_it(func):
    def new_function(*args, **kwargs):
        print('Running function:', func.__name__)
        print('Positional arguments:', args)
        print('Keyword arguments:', kwargs)
        result = func(*args, **kwargs)
        print('Result:', result)
        return result
    return new_function

def add_ints(a, b):
    return a+b

cooler_add_ints = document_it(add_ints)  # manual decorator assignment
cooler_add_ints(1, 2)
# Running function: add_ints
# Positional arguments: (1, 2)
# Keyword arguments: {}
# Result: 3
# 3

@document_it  # an alternative to the manual decorator assignment
def add_floats(a: float, b: float) -> float:
    return a + b

def square_it(func):
    def new_function(*args, **kargs):
        result = func(*args, **kargs)
        return result*result
    return new_function

# more than one decorator for a function
@document_it
@square_it
def add_numbers(a: float, b: float) -> float:
    return a + b

add_numbers(2, 3)
# Running function: new_function
# Positional arguments: (2, 3)
# Keyword arguments: {}
# Result: 25
# 25
def dump(func):
    "Print input arguments and output value(s)"
    def wrapped(*args, **kwargs):
        print("Function name:", func.__name__)
        print("Input arguments:", ' '.join(map(str, args)))
        print("Input keyword arguments:", kwargs.items())
        output = func(*args, **kwargs)
        print("Output:", output)
        return output
    return wrapped

9.3. Exceptions

An exception is a class, which is a child of the class Exception.

class OopsException(Exception):
    pass

try:
    raise OopsException('panic')  # raising exceptions
except OopsException as err:
    print(err)  # panic
except (RuntimeError, TypeError, NameError) as err:  # multiple exceptions as a parenthesized tuple
    pass
except Exception as other:  # except to catch all exceptions
    pass
except:  # bare except to catch all exceptions
    pass

9.4. locals() and globals()

Python provides two functions to access the contents of the namespaces:

  • locals() returns a dictionary of the contents of the local namespace.

  • globals() returns a dictionary of the contents of the global namespace.

a = 5.21

def print_global_a():
 global a  # the global keyword: explicit is better than implicit
 print(a)

print_global_a()
# 5.21

def print_locals_globals():
    a: int = 0
    b: float = 3.14
    print(locals())
    print(globals())

print_locals_globals()
# {'a': 0, 'b': 3.14}
# {'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, 'print_locals': <function print_locals at 0x7fab761ade40>, 'print_globals': <function print_globals at 0x7fab761adee0>, 'print_locals_globals': <function print_locals_globals at 0x7fab761bbba0>, 'a': 5.21}
  • vars() without arguments, equivalent to locals().

    print(vars())
    # {'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>}

10. Objects and classes

# define a class
class Cat:  # standard class definition
    pass

class Cat():  # less common approach (equivalent in functionality)
    pass

# create an object from a class
cat = Cat()

# assign attributes directly to an object anytime after its creation.
cat.wow = 'meow'
cat.wow  # 'meow'

# initialization: __init__(), to save syllables, double underscores (__), also pronounce as dunder.
class Cat:
    # self is not a reserved word, but it’s common as the first argument to refer to the object itself.
    def __init__(self, name):  # initializer
        self.name = name

    # a method is a function in a class or object.
    def wow(self):
        print(f'{self.name:}: meow!')


cat = Cat('Tom')
cat.wow()  # Tom: meow!
Cat.wow(cat)  # Tom: meow!

# class and object attributes
class Cat:
    color = 'red'

tom = Cat()
jerry = Cat()
print(tom.color)  # red
print(jerry.color)  # red

tom.color = 'black'  # object attributes take precedence over class attributes when accessed or modified
Cat.color = 'blue'  # affect existing and new objects

butch = Cat()
print(jerry.color)  # blue
print(tom.color)  # black
print(butch.color)  # blue
# inheritance
class Animal:
    def __init__(self, voice) -> None:
        self.voice = voice

    def wow(self):
        print(f'{self.voice}!')


class Cat(Animal):
    pass


class Dog(Animal):
    def __init__(self) -> None:
        super().__init__('bark')

    def wow(self):
        print(f'{self.voice}! '*3)

cat = Cat('meow')
cat.wow()  # meow!

dog = Dog()
dog.wow()  # bark! bark! bark!

# multiple inheritance: method resolution order
class Animal:
    def wow(self):
        print('I speak!')

class Horse(Animal):
    def wow(self):
        print('Neigh!')

class Donkey(Animal):
    def wow(self):
        print('Hee-haw!')

class Mule(Donkey, Horse):
    pass

print(Mule.mro())
# [<class '__main__.Mule'>, <class '__main__.Donkey'>, <class '__main__.Horse'>, <class '__main__.Animal'>, <class 'object'>]

class Hinny(Horse, Donkey):
    pass

print(Hinny.__mro__)
# (<class '__main__.Hinny'>, <class '__main__.Horse'>, <class '__main__.Donkey'>, <class '__main__.Animal'>, <class 'object'>)
# Mixins in Python are a code reuse technique used to add functionalities to classes
# without relying on traditional inheritance to achieve modularity.
class PrettyMixin():
    def dump(self):
        import pprint
        pprint.pprint(vars(self))

class Thing():
    def __init__(self) -> None:
        self.name = "Nyarlathotep"
        self.feature = "ichor"
        self.age = "eldritch"

# Mixins are included in a class definition using multiple inheritance syntax.
class PrettyThing(Thing, PrettyMixin):
    pass

t = PrettyThing()
t.dump()  # {'age': 'eldritch', 'feature': 'ichor', 'name': 'Nyarlathotep'}
# Python doesn’t have private attributes, but has a naming convention for attributes that
# should not be visible outside of their class definition: begin with two underscores (__).
class Cat:
    def __init__(self, name) -> None:
        self.__name = name

    @property
    def name(self):  # getter
        return self.__name

    @name.setter
    def name(self, name):  # setter
        self.__name = name

cat = Cat('Tom')
print(cat.name)  # Tom
cat.name = 'Jerry'
print(cat.name)  # Jerry
# instance methods, class methods, static methods
class Cat:
    # Class attribute (shared by all instances)
    species = "Felis catus"

    def __init__(self, name, age):
        self.name = name
        self.age = age

    # Instance method (operates on a specific instance)
    def meow(self):
        print(f"{self.name} says meow!")

    @classmethod
    def create_from_dict(cls, cat_dict):
        """
        Class method to create a Cat object from a dictionary.

        Args:
            cls (class): The Cat class itself.
            cat_dict (dict): A dictionary containing cat data (name, age).

        Returns:
            Cat: A new Cat object.
        """
        return cls(cat_dict["name"], cat_dict["age"])

    @staticmethod
    def is_adult(age):
        """
        Static method to check if a cat is considered adult (age >= 1).

        Args:
            age (int): The cat's age.

        Returns:
            bool: True if the cat is adult, False otherwise.
        """
        return age >= 1


# Create Cat objects
cat1 = Cat("Whiskers", 2)
cat2 = Cat.create_from_dict({"name": "Luna", "age": 5})

# Instance method call (operates on specific objects)
cat1.meow()  # Output: Whiskers says meow!
cat2.meow()  # Output: Luna says meow!

# Class method call
new_cat = Cat.create_from_dict({"name": "Simba", "age": 1})

# Static method call
is_cat1_adult = Cat.is_adult(cat1.age)

# Output: Simba is 1 years old.
print(f"{new_cat.name} is {new_cat.age} years old.")
# Output: Is Whiskers an adult? True
print(f"Is Whiskers an adult? {is_cat1_adult}")
# duck typing: a loose implementation of polymorphism
# If it walks like a duck and quacks like a duck, it’s a duck.
#     —— A Wise Person
class Duck:
    def __init__(self, name) -> None:
        self.__name = name

    def who(self):
        return self.__name

    def wow(self):
        return 'quack!'

class Cat:
    def __init__(self, name) -> None:
        self.__name = name

    def who(self):
        return self.__name

    def wow(self):
        return 'meow!'

def who_wow(obj):
    print(f'{obj.who()}: {obj.wow()}')

who_wow(Duck('Donald'))  # Donald: quack!
who_wow(Cat('Tom'))  # Tom: meow!
# dataclasses
from dataclasses import dataclass

@dataclass
class Cat:
    name: str
    age: int
    color: str = 'blue'

tom = Cat('tom', 3)
print(tom)  # Cat(name='tom', age=3, color='blue')

11. Modules and packages

# A module is a single Python file (.py extension) containing Python code,
# that can include functions, classes, variables, and statements.

# animal.py (module file)
class Animal:
    def __init__(self, voice: str) -> None:
        self.__voice = voice

    def wow(self):
        print(f'{self.__voice}!')
# the `import` statement is `import module`, where `module` is the name
# of another Python file, without the .py extension.
from animal import Animal as Duck  # import only what you want from a module
from animal import Animal
import animal as mouse  # import a module with another name
import animal  # import a module

donald = Duck('quack')
donald.wow()  # quack!

tom = Animal('meow')
tom.wow()  # meow!

jerry = mouse.Animal('peep')
jerry.wow()  # peep!

butch = animal.Animal('bark')
butch.wow()  # bark!

11.1. packages

A package is a directory containing multiple Python modules and potentially subdirectories with even more modules, that represents a collection of related modules organized under a common namespace.

If the version of Python is earlier than 3.3, it’ll need one more thing in the sources subdirectory to make it a Python package: a file named __init__.py.
# .
# ├── animals
# │   ├── cat.py
# │   ├── dog.py
# │   └── __init__.py
# └── main.py

# animals/cat.py
def wow():
    print('meow!')

# animals/dog.py
def wow():
    print('bark!')

# main.py
from animals import cat  # from package import module
import animals.dog as dog  # import package.module

cat.wow()  # meow!
dog.wow()  # bark!

11.2. main

Identifying the main module: the entry point for a Python program’s execution.

  • Python uses a special variable called __name__.

  • When a module is directly executed (as a script), the __name__ variable within that module is set to the string '__main__'.

  • When a module is imported by another module, the __name__ variable within the imported module gets the actual module name (e.g., 'my_module').

# cat.py
def wow():
    return __name__

if __name__ == '__main__':
    print(f'executed: {wow()}')
$ python3 cat.py  # directly executed (as a script)
executed: __main__
# imported by another module
from cat import wow
print(f'imported: {wow()}')  # imported: cat

11.3. import

  • Basic structure:

    import module_name
  • Importing specific elements:

    # import specific functions or classes from a module.
    from module_name import element1, element2
    # import a specific element and assign it an alias for easier use.
    from module_name import element1 as alias
  • Importing a module with an alias:

    # assign an alias to a whole module for shorter references.
    import module_name as alias
  • Importing sub-modules: use the dot (.) to navigate within package hierarchies:

    # import a sub-module from a package.
    import package_name.submodule_name
    
    # import a specific element from a sub-module.
    from package_name.submodule_name import element
  • Relative imports (within packages): use the dot (.) to navigate within the same package structure:

    # import from a sub-module within the same package.
    from .submodule_name import element

11.4. search path

In the context of programming languages and environments, the search path refers to a list of directories that the program or interpreter looks at to locate specific files, particularly modules or libraries.

import sys
for path in sys.path:
    print(f"'{path}'")

''  # current working directory where the script is located
'/usr/lib/python311.zip'  # standard library, built-in modules
'/usr/lib/python3.11'
'/usr/lib/python3.11/lib-dynload'  # dynamically loaded modules or libraries
'/usr/local/lib/python3.11/dist-packages'  # third-party libraries
'/usr/lib/python3/dist-packages'

# sys.path is a list, and can be updated programmlly
sys.path
# ['', '/usr/lib/python311.zip', '/usr/lib/python3.11', '/usr/lib/python3.11/lib-dynload', '/usr/local/lib/python3.11/dist-packages', '/usr/lib/python3/dist-packages']
sys.path.insert(0, '/tmp')
sys.path
# ['/tmp', '', '/usr/lib/python311.zip', '/usr/lib/python3.11', '/usr/lib/python3.11/lib-dynload', '/usr/local/lib/python3.11/dist-packages', '/usr/lib/python3/dist-packages']

11.5. pip install packages

# ensure can run pip from the command line
python3 -m pip --version  # pip --version
# pip 23.0.1 from /usr/lib/python3/dist-packages/pip (python 3.11)

# OR, install pip, venv modules in Debian/Ubuntu for the system python.
apt install python3-pip python3-venv  # On Debian/Ubuntu systems

11.5.1. virtual environment

# create a virtual environment
python3 -m venv python-learning-notes_env

# active a virtual environment
source python-learning-notes_env/bin/activate

# ensure pip, setuptools, and wheel are up to date
pip install --upgrade pip setuptools wheel

# show pip version
pip --version  # python3 -m pip --version
# pip 24.0 from .../python-learning-notes_env/lib/python3.11/site-packages/pip (python 3.11)

# deactive a virtual environment: the deactivate command is often implemented as a shell function.
deactivate

11.5.2. pip install

# install the latest stable version.
pip install <package_name>

# install a package with extras, i.e., optional dependencies (e.g., pip install 'transformers[torch]').
pip install <package_name>[extra1[,extra2,...]]

# install the exact version (e.g., pip install vllm==0.4.3).
pip install <package_name>==<version>

# install the latest version greater than or equal to the specified one (e.g., pip install vllm>=0.4.0 gets anything from 0.4.0 onwards), but within the same major version.
pip install <package_name>>=<version>

# install the latest patch version (tilde operator) within the specified major and minor version (e.g., pip install vllm~0.4).
pip install <package_name>~<version>

# upgrade an already installed to the latest from PyPI.
pip install --upgrade <package_name>

# install from an alternate index
pip install --index-url http://my.package.repo/simple/ <package_name>

# search an additional index during install, in addition to PyPI
pip install --extra-index-url http://my.package.repo/simple <package_name>

# install pre-release and development versions, in addition to stable versions
pip install --pre <package_name>

11.5.3. cache, configuration

# get the cache directory that pip is currently configured to use
pip cache dir  # ~/.cache/pip
# INI format configuration files can change the default values for command line options.
#   - global: system-wide configuration file, shared across users.
#   - user: per-user configuration file.
#   - site: per-environment configuration file; i.e. per-virtualenv.

# the names of the settings are derived from the long command line option.
[global]
timeout = 60
index-url = https://download.zope.org/ppix

# per-command section: pip install
[install]
ignore-installed = true
no-dependencies = yes

11.5.4. mirror

# set the PyPI mirror
pip config --user set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
# pip config --user set global.index-url https://mirrors.aliyun.com/pypi/simple/
# pip config set global.extra-index-url "https://mirrors.sustech.edu.cn/pypi/web/simple https://mirrors.aliyun.com/pypi/simple/"

11.6. pipenv

Pipenv is a dependency manager for Python projects, is similar in spirit to Node.js’ npm or Ruby’s bundler.

# install pipenv in Debian/Ubuntu for the system python.
apt install pipenv
# install pipenv for the user python.
pip install pipenv --user

# If pipenv isn’t available in a shell after installation, add the user site-packages binary directory to `PATH`.
#
# On Windows, the user base binary directory can be found by running
# `python -m site --user-site`
# and replacing `site-packages` with `Scripts`.
#
# On Linux and macOS, find the user base binary directory by running
# `python -m site --user-base`
# and appending `bin` to the end.

Debian/Linux might not work due to limitations with user-based installations.

  1. Using apt

    apt install pipenv
  2. Using pip with virtualenv

    # Create a virtual environment
    python3 -m venv pipenv_env
    
    # Activate the virtual environment (replace "pipenv_env" with your chosen name)
    source pipenv_env/bin/activate
    
    # Install pipenv within the virtual environment
    pip install pipenv
    
    # Deactivate the virtual environment (optional)
    deactivate
# Pipenv manages dependencies on a per-project basis.
mkdir myproject && cd myproject
pipenv install requests
ls  # Pipfile  Pipfile.lock
# activate the project's virtualenv:
pipenv shell
# main.py
import requests

response = requests.get('https://httpbin.org/ip')

print('Your IP is {0}'.format(response.json()['origin']))
# run a command inside the virtualenv:
pipenv run python main.py
# Your IP is 9.5.2.7

12. Testing

  • unittest

    # test_cap.py
    import unittest
    
    def cap(text: str) -> str:
        return text.capitalize()
    
    class TestCap(unittest.TestCase):
        def setUp(self) -> None:
            pass
    
        def tearDown(self) -> None:
            pass
    
        def test_one_word(self):
            text = 'duck'  # _arrange_ the objects, create and set them up as necessary.
    
            result = cap(text)  # _act_ on an object.
    
            self.assertEqual('Duck', result)  # _assert_ that something is as expected.
    
        def test_multi_words(self):
            text = 'hello world'  # _arrange_ the objects, create and set them up as necessary.
    
            result = cap(text)  # _act_ on an object.
    
            self.assertEqual('Hello World', result)  # _assert_ that something is as expected.
    
    if __name__ == '__main__':
        unittest.main()
    $ python3 test_cap.py
    F.
    ======================================================================
    FAIL: test_multi_words (__main__.TestCap.test_multi_words)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "...", line 27, in test_multi_words
        self.assertEqual('Hello World', result)
    AssertionError: 'Hello World' != 'Hello world!'
    - Hello World
    ?       ^
    + Hello world
    ?       ^
    
    
    ----------------------------------------------------------------------
    Ran 2 tests in 0.003s
    
    FAILED (failures=1)
  • doctest

    # doctest_cap.py
    def cap(text: str) -> str:
        """
        >>> cap('duck')
        'Duck'
        >>> cap('hello world')
        'Hello World'
        """
        return text.capitalize()
    
    if __name__ == '__main__':
        import doctest
        doctest.testmod()
    $ python3 doctest_cap.py
    **********************************************************************
    File "...", line 5, in __main__.cap
    Failed example:
        cap('hello world')
    Expected:
        'Hello World'
    Got:
        'Hello world'
    **********************************************************************
    1 items had failures:
       1 of   2 in __main__.cap
    ***Test Failed*** 1 failures.
  • pytest

    # test_cap.py
    def cap(text: str) -> str:
        return text.capitalize()
    
    def test_one_word():
        text = 'duck'
        result = cap(text)
        assert result == 'Duck'
    
    def test_multiple_words():
        text = 'hello world'
        result = cap(text)
        assert result == 'Hello World'
    $ pipenv install pytest
    Installing pytest...
    Installing dependencies from Pipfile.lock (207fdb)...
    $ pytest
    ============================================== test session starts ==============================================
    platform linux -- Python 3.11.2, pytest-8.2.1, pluggy-1.5.0
    rootdir: ...
    collected 2 items
    
    test_cap.py .F                                                                                            [100%]
    
    =================================================== FAILURES ====================================================
    ______________________________________________ test_multiple_words ______________________________________________
    
        def test_multiple_words():
            text = 'hello world'
            result = cap(text)
    >       assert result == 'Hello World'
    E       AssertionError: assert 'Hello world' == 'Hello World'
    E
    E         - Hello World
    E         ?       ^
    E         + Hello world
    E         ?       ^
    
    test_cap.py:12: AssertionError
    ============================================ short test summary info ============================================
    FAILED test_cap.py::test_multiple_words - AssertionError: assert 'Hello world' == 'Hello World'
    ========================================== 1 failed, 1 passed in 0.09s ==========================================

13. Files and Directories

A file is a sequence of bytes, stored in some filesystem, and accessed by a filename. A directory (or folder) is a collection of files, and possibly other directories.

Opens a file for various operations like reading, writing, or appending.

fileobj = open(filename, mode='r')
  • fileobj is the file object returned by open()

  • filename: a string representing the path to the file to open.

  • mode (optional): a string specifies how the file will be opened, which determines the access permissions and how newline characters (for text files) are handled.

    • 'r' (read): Opens the file for reading. The file must exist, or an error will be raised.

    • 'w' (write): Opens the file for writing. An existing file will be truncated (emptied) before writing. If the file doesn’t exist, it will be created.

    • 'a' (append): Opens the file for appending. New data will be written to the end of the file. If the file doesn’t exist, it will be created.

    • 'x' (exclusive creation): Attempts to create a new file. If the file already exists, an error will be raised.

    • 'r+' (read and write): Opens the file for both reading and writing. The file must exist.

    • 'w+' (read and write): Opens the file for both reading and writing. An existing file will be truncated before any operations. If the file doesn’t exist, it will be created.

    • 'a+' (append and read): Opens the file for both appending and reading. If the file doesn’t exist, it will be created.

    • By default, Python opens files in text mode ('t'), that handles newline characters differently based on the operating system (CRLF on Windows, LF on Unix/Linux).

    • The binary mode ('b') can be specified by appending it to any mode (e.g., 'rb', 'wb'), that treats the file as a raw stream of bytes without newline conversion.

    • Python 3 offers a universal newline mode ('U') that attempts to handle various newline conventions consistently (consult documentation for details).

poem = '''
Je suis l'automne, la saison des pluies,
Le temps des fruits mûrs et des feuilles jaunies,
Le soleil pâle et les jours qui décroissent,
Le vent qui hurle et les chaumes qui gémissent.

Je suis l'automne, la saison des regrets,
Le temps où meurent les amours et les joies,
Le temps des souvenirs et des larmes secrètes,
Le temps des nuits longues et des tristesses froides.

Je suis l'automne, la saison des douleurs,
Le temps des fièvres et des maladies,
Le temps où l'on se sent mourir sans pouvoir guérir,
Le temps où l'on voudrait mourir et qu'on n'ose pas.

Je suis l'automne, la saison de la mort,
Le temps où l'on se couche dans la terre humide,
Le temps où l'on dort pour toujours sans rêver,
Le temps où l'on ne souffre plus et qu'on n'aime plus.
'''

with open('autumn_song.txt', 'w+') as fio:
    fio.write(poem)
    fio.seek(0)
    lines = fio.readlines()
    for line in lines:
        print(line, sep='', end='')

14. Processes and concurrency

# The standard library’s os module provides a common way of accessing some system information.
import os
os.uname()
# posix.uname_result(sysname='Linux', nodename='node-0', release='6.1.0-21-amd64', version='#1 SMP PREEMPT_DYNAMIC Debian 6.1.90-1 (2024-05-03)', machine='x86_64')
os.getloadavg()
# (0.05126953125, 0.03955078125, 0.00341796875)
os.cpu_count()
# 4
(os.getpid(), os.getcwd(), os.getuid(), os.getgid())
# (1295, '/tmp', 1000, 1000)
os.system('date -u')
# Thu Jun  6 11:23:23 AM UTC 2024
# 0
# get system and process information with the third-party package psutil
import psutil  # pip install psutil
print(psutil.cpu_times(percpu=True))
# [scputimes(user=4.37, nice=0.0, system=6.71, idle=1468.69, iowait=0.26, irq=0.0, softirq=1.86, steal=0.0, guest=0.0, guest_nice=0.0), scputimes(user=11.84, nice=0.0, system=9.3, idle=1465.29, iowait=1.02, irq=0.0, softirq=0.75, steal=0.0, guest=0.0, guest_nice=0.0), scputimes(user=10.31, nice=0.0, system=8.58, idle=1468.4, iowait=1.66, irq=0.0, softirq=0.97, steal=0.0, guest=0.0, guest_nice=0.0), scputimes(user=9.11, nice=0.0, system=10.02, idle=1467.95, iowait=0.81, irq=0.0, softirq=0.65, steal=0.0, guest=0.0, guest_nice=0.0)]
print(psutil.cpu_percent(percpu=False))
# 0.0
print(psutil.cpu_percent(percpu=True))
# [0.3, 0.4, 0.4, 0.1]

14.1. subprocess and multiprocessing

import subprocess

# run another program in a shell
# and grab whatever output it created (both standard output and standard error output)
print(subprocess.getoutput('date'))  # Thu Jun  6 07:19:50 PM CST 2024

# A variant method called `check_output()` takes a list of the command and arguments.
# By default it returns standard output only as type bytes rather than a string, and
# does not use the shell:
print(subprocess.check_output(['date', '-u']))  # b'Thu Jun  6 11:30:09 AM UTC 2024\n'

# return a tuple with the status code and output of the other program
print(subprocess.getstatusoutput('date'))  # (0, 'Thu Jun  6 07:32:25 PM CST 2024')

# capture the exit status only
ret = subprocess.call('date -u', shell=True)
# Thu Jun  6 11:45:51 AM UTC 2024
print(ret)
# 0

# makes a list of the arguments, not need to call the shell
ret = subprocess.call(['date', '-u'])
# Thu Jun  6 11:50:04 AM UTC 2024
print(ret)
# 0
# create multiple independent processes
import multiprocessing
import os

def whoami(what):
    print("Process %s says: %s" % (os.getpid(), what))

if __name__ == "__main__":
    whoami("I'm the main program")
    for n in range(4):
        p = multiprocessing.Process(
            target=whoami, args=("I'm function %s" % n,))
        p.start()

# Process 1648 says: I'm the main program
# Process 1649 says: I'm function 0
# Process 1650 says: I'm function 1
# Process 1651 says: I'm function 2
# Process 1652 says: I'm function 3
# kill a process with terminate()
import multiprocessing
import time
import os

def whoami(name):
    print("I'm %s, in process %s" % (name, os.getpid()))

def loopy(name):
    whoami(name)
    start = 1
    stop = 1000000
    for num in range(start, stop):
        print("\tNumber %s of %s. Honk!" % (num, stop))
        time.sleep(1)

if __name__ == "__main__":
    whoami("main")
    p = multiprocessing.Process(target=loopy, args=("loopy",))
    p.start()
    time.sleep(5)
    p.terminate()

# I'm main, in process 13084
# I'm loopy, in process 14664
#         Number 1 of 1000000. Honk!
#         Number 2 of 1000000. Honk!
#         Number 3 of 1000000. Honk!
#         Number 4 of 1000000. Honk!
#         Number 5 of 1000000. Honk!

14.2. Queues, processes, and threads

A queue is like a list: things are added at one end and taken away from the other, which most common is referred to as FIFO (first in, first out). In general, queues transport messages, which can be any kind of information, for distributed task management, also known as work queues, job queues, or task queues.

Threads can be dangerous. Like manual memory management in languages such as C and C++, they can cause bugs that are extremely hard to find, let alone fix. To use threads, all the code in the program (and in external libraries that it uses) must be thread safe.

In Python, threads do not speed up CPU-bound tasks because of an implementation detail in the standard Python system called the Global Interpreter Lock (GIL).

  • Use threads for I/O-bound problems

  • Use processes, networking, or events (discussed in the next section) for CPU-bound problems

import multiprocessing as mp

def washer(dishes, output):
    for dish in dishes:
        print('Washing', dish, 'dish')
        output.put(dish)

def dryer(input):
    while True:
        dish = input.get()
        print('Drying', dish, 'dish')
        input.task_done()

dish_queue = mp.JoinableQueue()
dryer_proc = mp.Process(target=dryer, args=(dish_queue,))
dryer_proc.daemon = True
dryer_proc.start()
dishes = ['salad', 'bread', 'entree', 'dessert']
washer(dishes, dish_queue)
dish_queue.join()

# Washing salad dish
# Washing bread dish
# Washing entree dish
# Washing dessert dish
# Drying salad dish
# Drying bread dish
# Drying entree dish
# Drying dessert dish
import threading
import queue
import time

def washer(dishes, dish_queue):
    for dish in dishes:
        print("Washing", dish)
        time.sleep(5)
        dish_queue.put(dish)

def dryer(dish_queue):
    while True:
        dish = dish_queue.get()
        print("Drying", dish)
        time.sleep(10)
        dish_queue.task_done()

dish_queue = queue.Queue()
for n in range(2):
    dryer_thread = threading.Thread(target=dryer, args=(dish_queue,))
    dryer_thread.start()
dishes = ['salad', 'bread', 'entree', 'dessert']
washer(dishes, dish_queue)
dish_queue.join()

# Washing salad
# Washing bread
# Drying salad
# Washing entree
# Drying bread
# Washing dessert
# Drying entree
# Drying dessert

14.3. concurrent.futures

The concurrent.futures module in the standard library can be used to schedule an asynchronous pool of workers, using threads (when I/O-bound) or processes (when CPU-bound), and get back a future to track their state and collect the results.

Use concurrent.futures any time to launch a bunch of concurrent tasks, such as the following:

  • Crawling URLs on the web

  • Processing files, such as resizing images

  • Calling service APIs

from concurrent import futures
import math
import sys

def calc(val):
    result = math.sqrt(float(val))
    return val, result

def use_threads(num, values):
    with futures.ThreadPoolExecutor(num) as tex:
        tasks = [tex.submit(calc, value) for value in values]
        for f in futures.as_completed(tasks):
            yield f.result()

def use_processes(num, values):
    with futures.ProcessPoolExecutor(num) as pex:
        tasks = [pex.submit(calc, value) for value in values]
        for f in futures.as_completed(tasks):
            yield f.result()

def main(workers, values):
    print(f"Using {workers} workers for {len(values)} values")
    print("Using threads:")
    for val, result in use_threads(workers, values):
        print(f'{val} {result:.4f}')
    print("Using processes:")
    for val, result in use_processes(workers, values):
        print(f'{val} {result:.4f}')

if __name__ == '__main__':
    workers = 3
    if len(sys.argv) > 1:
        workers = int(sys.argv[1])
        values = list(range(1, 6))  # 1 .. 5
    main(workers, values)

14.4. Asynchronous programming with async and await

In Python 3.4, Python added a standard asynchronous module called asyncio. Python 3.5 then added the keywords async and await. These implement some new concepts:

  • Coroutines are functions that pause at various points

  • An event loop that schedules and runs coroutines

import asyncio

async def say(phrase, seconds):
    print(phrase)
    await asyncio.sleep(seconds)

async def wicked():
    task_1 = asyncio.create_task(say("Surrender,", 2))
    task_2 = asyncio.create_task(say("Dorothy!", 0))
    await task_1
    await task_2

#  blocking: runs the passed coroutine in the default executor, which given a timeout duration of 5 minutes to shutdown
asyncio.run(wicked())
import asyncio

async def say(phrase, seconds):
    print(phrase)
    await asyncio.sleep(seconds)

async def wicked():
    task_1 = asyncio.create_task(say("Surrender,", 2))
    task_2 = asyncio.create_task(say("Dorothy!", 0))
    await asyncio.gather(task_1, task_2)  # Wait for all tasks to finish concurrently

loop = asyncio.get_event_loop()
loop.run_until_complete(wicked())
loop.close()

References