And Then …

2023, March 10

Abstract: there’s that adage/meme about people writing simple code when they are beginners, then writing increasingly complicated code as they gain experience, only to go back to simplicity when they are very experienced. But I don’t think we do “go back”.

I’m a big fan of the show South Park. The creators, Matt Stone and Trey Parker, gave one of my favorite talks. It’s a short talk about how to write stories.

Links and videos have a way of disappearing, so I’ve done some transcription:

[Parker:] We can take these beats, which are basically the beats of your outline. And if the words “and then” belong between those beats, you’re fucked.

[Parker:] What should happen between every beat that you’ve written down is either the word “therefore”, or “but”.

[Stone:] You see movies where you’re just watching and it’s like “this happens, and then this happens, and then this happens” … that’s when you’re in movies just going like “what the fuck am I watching this movie for”.

I was thinking of this video recently while talking about code complexity with a friend. There’s that well known meme about complexity vs. experience, which takes various forms. I like this one:

Code Complexity vs. Experience from @flaviocopes pic.twitter.com/q9iSa1CiYp
— raupach (@raupach) November 24, 2021

Looking back at my own career as a programmer and extrapolating a bit, I think that when one begins to program there is a natural tendency to fall into “and then” as the nexus between the parts of your program; of chronological reasoning. This is bad for code, for similar reasons to what Parker and Stone say for writing stories. It’s funny to think of code in this way, but then, writing code is a form of writing.

Let me elaborate on “and then”. When learning to program, it is natural to get fixated on loops and conditionals. These are constructs that one doesn’t think about consciously before starting to program. In my old Commodore VIC-20 programming manual, there’s this snippet of BASIC to write a triangle made of A’s in the screen. As a beginner, you learn to “be the machine” to get it.

10  A$ = "++"
20  PRINT A$
30  A$ = "A" + A$ + "-"
40  GOTO 20

Beginning programmers are learning to think with loops and conditionals and control flow. They are trying to do a series of tasks one after another, and it is natural to fall into “and then” applied slavishly. This is “simple” code only in the small. Once you get past, say, 50 lines, this kind of code becomes awfully tangled.

Even experienced programmers are likely to default to this form of chronological thinking when writing a quick-and-dirty script to automate some repetitive task. E.g

open a connection to this public API

and then: send a request for this file

and then: open the file

and then: for each line in the file, parse it into columns

…

Chronological thinking often goes hand in hand with some form of Top-Down design.

In 1972, David Parnas published the paper On the criteria to be used in decomposing systems into modules¹. It is one of the most concentrated distillations of programming wisdom I’ve ever read. The paper investigates how to decompose systems into modules, because a good decomposition into modules helps make systems easier to understand, easier to extend, easier to fix.
Parnas proposes a medium task to be solved with a program, then compares two criteria for decomposition, leading to two module decompositions for the task. One of the criteria is chronological / top-down. He comments about it:

Experience on a small scale indicates that this is approximately the decomposition that would be proposed by most programmers for the task specified

Parnas proposes information hiding as a criterion leading to better decomposition. Each module should hide or abstract from the rest of the code one particular design choice. For instance, a module might offer a complete abstraction of data storage/retrieval. Thoughtful decomposition into modules is not done often enough in industrial practice, but I think most experienced programmers have learned to aim in this direction.

Let’s look at another example where one learns to abandon chronological thinking: something programmers have to consider as they grow in experience is that their code should be testable. Like Parnas, let’s imagine a simple task: a program to get payroll records from a corporate database, compute employee tax withholdings, then write those back to the corporate database.

The mythical beginner might write this:

func computeTaxWithholdings() {
    open connection to corporate database
    get employee payroll table
    compute tax withholdings
    write tax withholdings into tax-withholdings table
    close database connection
}

This code is not testable. If the developer alters the tax computations to try out some idea, they will be tampering with the company’s tax-withholdings for employees. One learns to parametrize dependencies, or in Object Oriented terminology to inject dependencies, or do dependency inversion (I dislike this expression for the assumption it makes on what the “direct” dependency is.)

func computeTaxWithholdings(databaseConnection) {
    get employee payroll table
    compute tax withholdings
    write tax withholdings into tax-withholding table
}

In this new implementation, we could pass a connection to a different database for testing, instead of the real corporate database. We could even pass a pseudo-database implemented with a file in the developer’s machine, depending on how we defined the databaseConnection.

Once you learn to write this way, you don’t unlearn it.

In your program running in production, when is that databaseConnection made? Well, it needs to be there before the computeTaxWithholdings function can be called. But as long as you ensure this precedence, you don’t need to care about the particular sequence of events.

Dear reader, I know you’re a smart person, and you wonder, why even pass a database connection as a parameter? We just need the payroll records, and where we retrieve them from is immaterial. So how about:

func computeTaxWithholdings(payrollRetriever) {
    for payrollRetriever.Next() {
        record := payrollRetriever.Read()
        …
    }
}

or:

func computeTaxWithholdings(payrollRecords) {
    for record in payrollRecoreds {
        …
    }
}

How you abstract your system’s components is part of your job as a system designer, which is how you start seeing your work as a programmer once you have gained enough experience.

The beginner’s chronological bend, with its attending fixation on loops and conditionals, is reflected in the “flow charts” that were once popular and that one still sees today. Another classic beginner mistake is the under-use of data structures.
Learning to leverage data structures is another thing you don’t ever go back from. Like the monoliths in 2001 a Space Odyssey.

Fred Brooks wrote²:

Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.

Today’s reader might translate tables as data representation. Once you learn that you can simplify complicated control flow with data structures, you never unlearn that.

Every budding programmer should learn about quicksort, heapsort, and merge sort, and about binary search trees³. Heapsort in particular was one of those “2001 monolith” moments for me.

Something I see often in code and find unnecessarily complicated is the use of nested loops to match two arrays. Let’s look at that.

Say we have a PersonnelRecord holding company employee data. Sometimes we have PersonnelRecord’s for the same person stored in different places, and we would like to collate the information.

type PersonnelRecord struct {
    ID         int
    Name       string
    FamilyName string
    SSN        string
}

// combine does something with two records for the same employee
func combine(a, b PersonnelRecord) PersonnelRecord {
    return PersonnelRecord{Name: a.Name, FamilyName: b.FamilyName}
}

Let’s collate two lists of records, using double-nested loops.

func collate(aList, bList []PersonnelRecord) []PersonnelRecord {
    var re []PersonnelRecord
    for _, recordA := range aList {
        for _, recordB := range bList {
            if recordB.ID == recordA.ID {
                re = append(re, combine(recordA, recordB))
                break
            }
        }
    }
    return re
}

Whenever I see this kind of nested loop, I need to go over the whole code several times, even if I know what the intent is. If you used continue instead of break, the above code would not work. But you could also rewrite the loop to use continue instead of break. The point is, you need to get the control flow just right.

Now let’s use an intermediate map.

func collateBetter(aList, bList []PersonnelRecord) []PersonnelRecord {
    var re []PersonnelRecord

    bByID := make(map[int]PersonnelRecord)
    for _, recordB := range bList {
        bByID[recordB.ID] = recordB
    }
    for _, recordA := range aList {
        matchingB, found := bByID[recordA.ID]
        if found {
            re = append(re, combine(recordA, matchingB))
        }
    }
    return re
}

In my opinion, this is much better code. No double-nested loop there, no break nor continue, no complicated temporal flow. Fewer ways to screw this up.
If the lists of records came to us sorted by ID, then we could do something even simpler.

There are many things one does unlearn with experience though. Learning sometimes requires moving backwards. Personally I think I write simpler code now than I did 10 years ago, or even 5 years ago.

I’ve been a bit tricky with my use of Parker and Stone’s put-down of “and then”. Their beef with “and then” is not about chronological reasoning or complicated flow. It is about scenes arriving without motivation, or if you will, about lack of a narrative thread, lack of intent.

Expressing intent is relevant for programming. This too, is something programmers learn to pay more attention to, and get better at with experience.

But I want to go back to Matt Stone’s quote:

You see movies where you’re just watching and it’s like “this happens, and then this happens, and then this happens” … that’s when you’re in movies just going like “what the fuck am I watching this movie for”.

Paraphrase this, replacing “movie” with “code”, and “watching a movie” with “doing a code review”, and … my sentiments exactly.

On the criteria to be used in decomposing systems into modules, David L. Parnas, Communications of the ACM, Vol. 15 (12),1972 pp. 1053-1058 ↩︎
The Mythical Man-Month, Fred Brooks, Addison-Wesley,1975 ↩︎
any algorithms text should do. I learned from Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2001), Introduction to Algorithms (2nd ed.), MIT Press and McGraw-Hill ↩︎

<< Newer	Differential Forms are simpler than you’ve been told
Older >>	Making a living from open-source