And Then …
Abstract: there’s that adage/meme about people writing simple code when they are beginners, then writing increasingly complicated code as they gain experience, only to go back to simplicity when they are very experienced. But I don’t think we do “go back”.
I’m a big fan of the show South Park. The creators, Matt Stone and Trey Parker, gave one of my favorite talks. It’s a short talk about how to write stories.
Links and videos have a way of disappearing, so I’ve done some transcription:
[Parker:] We can take these beats, which are basically the beats of your outline. And if the words “and then” belong between those beats, you’re fucked.
[Parker:] What should happen between every beat that you’ve written down is either the word “therefore”, or “but”.
[Stone:] You see movies where you’re just watching and it’s like “this happens, and then this happens, and then this happens” … that’s when you’re in movies just going like “what the fuck am I watching this movie for”.
I was thinking of this video recently while talking about code complexity with a friend. There’s that well known meme about complexity vs. experience, which takes various forms. I like this one:
Code Complexity vs. Experience from @flaviocopes pic.twitter.com/q9iSa1CiYp
— raupach (@raupach) November 24, 2021
Looking back at my own career as a programmer and extrapolating a bit, I think that when one begins to program there is a natural tendency to fall into “and then” as the nexus between the parts of your program; of chronological reasoning. This is bad for code, for similar reasons to what Parker and Stone say for writing stories. It’s funny to think of code in this way, but then, writing code is a form of writing.
Let me elaborate on “and then”. When learning to program, it is natural to get fixated on loops and conditionals. These are constructs that one doesn’t think about consciously before starting to program. In my old Commodore VIC-20 programming manual, there’s this snippet of BASIC to write a triangle made of A’s in the screen. As a beginner, you learn to “be the machine” to get it.
10 A$ = "++"
20 PRINT A$
30 A$ = "A" + A$ + "-"
40 GOTO 20
Beginning programmers are learning to think with loops and conditionals and control flow. They are trying to do a series of tasks one after another, and it is natural to fall into “and then” applied slavishly. This is “simple” code only in the small. Once you get past, say, 50 lines, this kind of code becomes awfully tangled.
Even experienced programmers are likely to default to this form of chronological thinking when writing a quick-and-dirty script to automate some repetitive task. E.g
open a connection to this public API
and then: send a request for this file
and then: open the file
and then: for each line in the file, parse it into columns
…
Chronological thinking often goes hand in hand with some form of Top-Down design.
In 1972, David Parnas published the paper On the criteria to be used in
decomposing systems into modules1. It is one of the most concentrated
distillations of programming wisdom I’ve ever read. The paper
investigates how to decompose systems into modules, because a good decomposition
into modules helps make systems easier to understand, easier to extend, easier
to fix.
Parnas proposes a medium task to be solved with a program, then compares two
criteria for decomposition, leading to two module decompositions for the task.
One of the criteria is chronological / top-down. He comments about it:
Experience on a small scale indicates that this is approximately the decomposition that would be proposed by most programmers for the task specified
Parnas proposes information hiding as a criterion leading to better decomposition. Each module should hide or abstract from the rest of the code one particular design choice. For instance, a module might offer a complete abstraction of data storage/retrieval. Thoughtful decomposition into modules is not done often enough in industrial practice, but I think most experienced programmers have learned to aim in this direction.
Let’s look at another example where one learns to abandon chronological thinking: something programmers have to consider as they grow in experience is that their code should be testable. Like Parnas, let’s imagine a simple task: a program to get payroll records from a corporate database, compute employee tax withholdings, then write those back to the corporate database.
The mythical beginner might write this:
func computeTaxWithholdings() {
open connection to corporate database
get employee payroll table
compute tax withholdings
write tax withholdings into tax-withholdings table
close database connection
}
This code is not testable. If the developer alters the tax computations to try out some idea, they will be tampering with the company’s tax-withholdings for employees. One learns to parametrize dependencies, or in Object Oriented terminology to inject dependencies, or do dependency inversion (I dislike this expression for the assumption it makes on what the “direct” dependency is.)
func computeTaxWithholdings(databaseConnection) {
get employee payroll table
compute tax withholdings
write tax withholdings into tax-withholding table
}
In this new implementation, we could pass a connection to a different database
for testing, instead of the real corporate database. We could even pass a
pseudo-database implemented with a file in the developer’s machine, depending
on how we defined the databaseConnection
.
Once you learn to write this way, you don’t unlearn it.
In your program running in production, when is that databaseConnection
made?
Well, it needs to be there before the computeTaxWithholdings
function can be
called. But as long as you ensure this precedence, you don’t need to care about
the particular sequence of events.
Dear reader, I know you’re a smart person, and you wonder, why even pass a database connection as a parameter? We just need the payroll records, and where we retrieve them from is immaterial. So how about:
func computeTaxWithholdings(payrollRetriever) {
for payrollRetriever.Next() {
record := payrollRetriever.Read()
…
}
}
or:
func computeTaxWithholdings(payrollRecords) {
for record in payrollRecoreds {
…
}
}
How you abstract your system’s components is part of your job as a system designer, which is how you start seeing your work as a programmer once you have gained enough experience.
The beginner’s chronological bend, with its attending fixation on loops and
conditionals, is reflected in the “flow charts” that were once popular and that
one still sees today. Another classic beginner mistake is the under-use of data
structures.
Learning to leverage data structures is another thing you don’t ever go back
from. Like the monoliths in 2001 a Space Odyssey.
Fred Brooks wrote2:
Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.
Today’s reader might translate tables as data representation. Once you learn that you can simplify complicated control flow with data structures, you never unlearn that.
Every budding programmer should learn about quicksort, heapsort, and merge sort, and about binary search trees3. Heapsort in particular was one of those “2001 monolith” moments for me.
Something I see often in code and find unnecessarily complicated is the use of nested loops to match two arrays. Let’s look at that.
Say we have a PersonnelRecord
holding company employee data. Sometimes we have
PersonnelRecord’s for the same person stored in different places, and we would
like to collate the information.
type PersonnelRecord struct {
ID int
Name string
FamilyName string
SSN string
}
// combine does something with two records for the same employee
func combine(a, b PersonnelRecord) PersonnelRecord {
return PersonnelRecord{Name: a.Name, FamilyName: b.FamilyName}
}
Let’s collate two lists of records, using double-nested loops.
func collate(aList, bList []PersonnelRecord) []PersonnelRecord {
var re []PersonnelRecord
for _, recordA := range aList {
for _, recordB := range bList {
if recordB.ID == recordA.ID {
re = append(re, combine(recordA, recordB))
break
}
}
}
return re
}
Whenever I see this kind of nested loop, I need to go over the whole code
several times, even if I know what the intent is. If you used continue
instead
of break
, the above code would not work. But you could also rewrite the loop
to use continue
instead of break
. The point is, you need to get the control
flow just right.
Now let’s use an intermediate map
.
func collateBetter(aList, bList []PersonnelRecord) []PersonnelRecord {
var re []PersonnelRecord
bByID := make(map[int]PersonnelRecord)
for _, recordB := range bList {
bByID[recordB.ID] = recordB
}
for _, recordA := range aList {
matchingB, found := bByID[recordA.ID]
if found {
re = append(re, combine(recordA, matchingB))
}
}
return re
}
In my opinion, this is much better code. No double-nested loop there, no break
nor continue
, no complicated temporal flow. Fewer ways to screw this up.
If the lists of records came to us sorted by ID
, then we could do something
even simpler.
There are many things one does unlearn with experience though. Learning sometimes requires moving backwards. Personally I think I write simpler code now than I did 10 years ago, or even 5 years ago.
I’ve been a bit tricky with my use of Parker and Stone’s put-down of “and then”. Their beef with “and then” is not about chronological reasoning or complicated flow. It is about scenes arriving without motivation, or if you will, about lack of a narrative thread, lack of intent.
Expressing intent is relevant for programming. This too, is something programmers learn to pay more attention to, and get better at with experience.
But I want to go back to Matt Stone’s quote:
You see movies where you’re just watching and it’s like “this happens, and then this happens, and then this happens” … that’s when you’re in movies just going like “what the fuck am I watching this movie for”.
Paraphrase this, replacing “movie” with “code”, and “watching a movie” with “doing a code review”, and … my sentiments exactly.
-
On the criteria to be used in decomposing systems into modules, David L. Parnas, Communications of the ACM, Vol. 15 (12),1972 pp. 1053-1058 ↩︎
-
The Mythical Man-Month, Fred Brooks, Addison-Wesley,1975 ↩︎
-
any algorithms text should do. I learned from Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2001), Introduction to Algorithms (2nd ed.), MIT Press and McGraw-Hill ↩︎