Python development as a C gal

In my previous articles i wrote a lot about my opinions and experiences using C/Hare/Go for development. I even taught the introductory C lecture aimed at math and physics student for numerical simulations back in university. So my main field of expertise is high-performance code using low level languages (C & Fortran being my most used languages) with libraries like OpenMP and MPI. After graduating last fall i joined Elster, the public German digital tax infrastructure project, as a Linux SysAdmin/DevOps something. With this shift in environment also came a shift in development environment and goals. High performance is no longer the primary goal - or even relevant at all - for the software i write. Rather maintainability and simplicity take center stage. For my job i mainly write small sized Bash scripts and mid-sized Python libraries + tools (<1000 loc) now. In this article i want to reflect the frustrations while adapting and the solutions i came up with.

I love static typing

Both Python and Bash are dynamically typed languages, which means that any value can be assigned to any variable.

Python
---------------------------
# This variable is a string
myvar = "a string"
# Now it's an integer
myvar = 123
# And now it's a float
myvar = 1.2

In statically typed languages a variable is assigned a fixed type.

C
--------------------------
char[] myvar = "a string";
myvar = 123;

The code above gives the following error while compiling.

mia@tuxedo-mia:~$ gcc test.c
test.c: In function ‘main’:
test.c:5:15: error: assignment to expression with array type
    5 |         myvar = 123;
      |               ^

For some people this may even be a feature, but it opens your code up to a lot of additional errors. In Python basically any function is a generic function, which means it can handle multiple different data types. A simple example is the following add function.

def add(a, b):
    return a+b

It can be used with any types, so adding two integers returns the sum, while adding two string returns the concatenated strings.

add(1, 2)       -> 3
add("ab", "cd") -> "abcd"
add(1.2, 3)     -> 4.2

Again this may seem quite useful at first - and it may well be for some use cases. However when writing code i want to write functions that model specific use cases and reduce complexity as much as possible. Usually a function is meant to be used with a specific class and i want code to fail if something unexpected is passed to the function. To me Go produces the ideal behaviour here.

func add(a int, b int) int {
    return a + b
}

func main() {
    add(2, 3.2)
}
--------------------------------------------------------
./main.go:10:18: cannot use 3.2 (untyped float constant)
as int value in argument to add (truncated)

I realize this example is a bit short, so let’s take a look at a more realistic use case. Consider you want to model timestamps in your program. First you build structs/classes/data types which model the behaviour of a timestamp - let’s consider hour:minute:second here. This means you restrict seconds and minutes to the range between 0 and 60 and hours to 0 to 24 using an appropriate constructor. Your functions rely on those checks to validate the results. In this case passing another type to those functions should fail immediately, otherwise it may fail somewhere in those methods with unclear messages or - even worse - not fail and produce wrong results.

The way i got around this problem while still using Python is mypy, a static type checker for Python. Let’s look at the example above again using type checking.

def add(a: int, b: int) -> int:
    return a + b

add(1.2, 2)

This time the code is annotated by type hints using the typing package. [1] Checking the code with mypy, the following message is produced.

mia@tuxedo-mia:~$ mypy test.py
test.py:4: error: Argument 1 to "add" has incompatible type "float"; expected "int"  [arg-type]
Found 1 error in 1 file (checked 1 source file)

Great! While just executing the code using python3 still works, this is good enough for me.

I love documentation

As you might have guessed from this blog i really like writing and documentation is no exception. First i want to discuss three different approaches to documentation in languages: the fully external approach in older languages like C, the “simple” integration as docstrings in Python/Go/Hare and the more complex solutions like documentation in Rust.

External

I’ll focus on C in this part. Using this approach you should write code with comments explaining how your code works. Documenting your API is done using external tools, like man-pages and wikis. If you don’t work on Linux you might not know the documentation tool man. From the man man-page (haha): [2]

man - an interface to the system reference manuals

Linux/Unix systems save documentation for installed programs in some central directory (/usr/share/man for Debiany distros) and those files can be read using the command line tool man.

man sudo

Those files can also be used to document your software APIs. This approach is explicitly supported by man, as section 3 is reserved for library documentation.

man 1 man
----------------------------------------------------
1   Executable programs or shell commands
2   System calls (functions provided by the kernel)
3   Library calls (functions within program
    libraries)

One example is the documentation for the C library stdio.h.

stdio(3)

NAME
        stdio - standard input/output library functions

LIBRARY
        Standard C library (libc, -lc)

SYNOPSIS
        #include <stdio.h>
        ...

DESCRIPTION
        ...
    List of functions
        ...

The resulting documentation is properly embedded in either your OS docs or your web browsing habits. You have to handle the compilation of your documentation by yourself though.

Simple internal integration

Python allows docstrings at the beginning of a file and after class/function declarations. [3]

def add(a: int, b: int) -> int:
    """This function adds two integers and returns 
    the result.

    Args:
        - a: First number
        - b: Second number

    Returns: Sum of both integers
    """

Using the integrated call pydoc3 you get the following documentation.

FUNCTIONS
    add(a: int, b: int) -> int
        This function adds two integers and returns
        the result.

        Args:
            - a: First number
            - b: Second number

        Returns: Sum of both integers

This allows you to easier modify the documentation when changing code, since it’s all done in the same file. However it is another tool different from your system docs you have to keep in mind.

Sophisticated infrastructure

Rustdoc takes a similar, but more sophisticated approach. [4] Cargo can build documentation for all crates (~libraries) used in your rust project using cargo doc. The documentation is done using markdown either in the source code or in separate files. The build command produces html docs in a common wiki format. It is similar to the python utility sphinx often used for documentation wikis.

This approach is the most convenient approach, since all documentation pages you need for any project are collected in one place. However it is a lot of infrastructure to keep in mind.

My opinion

I have to make a confession here. In my previous programming jobs i didn’t write much documentation at all. Sure my lecture was basically one big documentation for C basics and some exercise code, but when writing code in large numerics projects it somehow took a backseat. A lot of the code in those projects is a one-off implementation of some master’s or PHD student and not intended, written or documented for use by someone else. My current job is really useful in learning this aspect, since a lot of different people need to use my code. There are the SysAdmins who just want to use my executables with a minimal amount of hassle, the infrastructure people who want to build small “glue” scripts for common tasks using my tools and libraries and the other code people using the actual code itself. With this in mind, my favorite approach is a mix of the first and second one. The simple built-in tools producing man-like output fit into my workflow the best. In my work i use Python docstrings for library and function/class documentation and scdoc [5] (-> man pages) for executables. As a result people actually working with Python can use pydoc or sphinx to read documentation, while people more on the Linux side can use man-pages.

References

[1] Python3 type hints
[2] Online manpage for man
[3] Python3 docstrings
[4] Rustdoc
[5] scdoc


This site uses the wonderful Catppuccin color scheme. My RSS Feed.
Mail: mia.schambeck@mailbox.org
XMPP: mia.schambeck@mailbox.org
Bluesky: @miajara.bsky.social
Mastodon: @miasb.sueden.social