Not going to criticize the article much, because the author is right - sometimes I just don’t like the code/style, so it’s just harder to relate to, not that it is actually hard to understand.
But there is a single “case” I’d never agree on: the code whose complexity and/or composition ends with nothing. The more experience you get, the more high-level structure you are able to pick on, but sometimes structures are just there, serving nothing at all. Overzealous (or compelled) decomposition into 5-line chunks over tens of files. Pointless renaming, re-exporting and encapsulation. Higher-order code golf. All this done to a finished non-extensible project makes no sense and makes it harder to read, for no reason.
This is a lenghty article, but it is readable in one chunk, a good piece of read. Now imagine the author split it into a number of submodules, then in each they’d give a new name to every phenomenon. Then instead of using English they’d construct a new sub-language (also in modules) to express meaning in a shorter way. E.g. this:
I can’t read the code because I don’t have sufficient experience or expertise (with the language or domain).
I haven’t spent enough time trying to read and understand the code (“it’s not obvious” or “it’s not intuitive”).
I don’t have much interest in understanding this code, I prefer to rewrite it in my own style.
…
turns into this:
With code as object, let me as you negating:
frobnicate causality of read object over sufficient experience or expertise (with the language or domain).
frobnicate relation of enough time and attempt to read and understand object (“it’s obvious” or “it’s intuitive”).
have much interest in understanding object, no prefer to rewrite it in my own style. // “no” negates negation
include more items from [here|https://…].
See frobnicate in my frobnication article.
This is why I hate plug-ins that claim to produce readability or code-cleanliness metrics, because they very typically have the opposite effect.
A 1000 line function broken down into 100 different 10 line functions which are then each called in turn (and need lots of parameters to pass around the inevitable shared state) is actually often less readable than just a 1000 line function.
The way to deal with such a function to ask why there needed to be a 1000 line function. Why does so much apparently need to be done here and now and so much state shared across so many lines.
Code metrics tools would rather you break everything down into micro-functions which if you're blindly trying to follow the tooling recommendations to get it past a check-in gate will probably end up badly named and with little thinking into actual abstraction.
If you have two functions but whenever you call function foo = Bar() you have to then call function Baz(foo) then you don't really have two functions at all.
For what it's worth, my take on it is that you should extract whatever bits of pure functions you can find in the 1000-line ones.
This would help testing and understanding. You can test and understand the parts independently, and only have to test the huge function in order to check that the parts are attached correctly.
Well not necessarily just 'pure'. You also want unsurprising. For example imagine a pure function that takes a string and returns a string. The returned string has every character lower cased. Okay name it to_lower and move it out.
But now imagine that it returns the complete text of Shakespeare if the input it 'bob' (why? customer requirement. their business literally falls apart if this doesn't happen). In this case you probably shouldn't pull it out into it's own thing because it's so specific (and weird) that you'll want it nearby it's use case (and not available for developers to accidentally use for the non Shakespeare uses).
* yeah this is convoluted. There are better ways to factor such a function. However the point is that something can be 'pure' but still not a good candidate to be pulled out of context.
> In this case you probably shouldn't pull it out into it's own thing because it's so specific (and weird) that you'll want it nearby it's use case (and not available for developers to accidentally use for the non Shakespeare uses).
I guess it depends on how much this weird Shakespearean to_lower is used around the codebase.
If there is one spot where it's used, I would probably factor out to_lower and add a special case check for "bob" in the business logic code (outside the scope of to_lower).
If the Shakespearean to_lower gets used in various places it actually might make sense to be factored into its own function.
* yeah this is convoluted. There are better ways to factor such a function. However the point is that something can be 'pure' but still not a good candidate to be pulled out of context.
> For what it's worth, my take on it is that you should extract whatever bits of pure functions you can find in the 1000-line ones.
Do these bits of pure functions appear anywhere else in the code? If so, sure, I happily pull them out into their own functions.
If not, I will leave them exactly where they are. It's easier to read a laundry list than to look up many different parts in isolation without context.
As for testability: If some bit of code only appears in that one long laundry-list function, what do I gain by testing it outside of that context?
On the other hand, since you don’t know future requirements, it’s easier to refactor/abstract unabstracted code than to refactor/reabstract already-/wrongly-abstracted code.
Function extraction adds indirection that doesn’t necessarily pay its own rent. It’s not free.
>The chunks are easier to understand. But the laundry list function that calls them isn't because it's logic is now spread over 10 different functions
That's... the point of abstractions and programming languages. At work you don't regularly care how System.out.println() is implemented, do you?
If we follow your way of thinking through all we would have would be binary. Maybe assembly.
The whole point of having programming languages and functions is having high level descriptions of what it is doing without having to worry about the implementation unless we have to.
Now, maybe the 1000 lines of code is actually ultra specific and has no bearing at all on everything else, but that's not really common. Could also be that the 1000 lines are more data structure than code.
There is a level of abstraction that's conducive to understanding stuff. Going at higher level costs obscuring implementation details, going at lower level costs making it hard to understand what it does.
One thousand lines of code is probably way too low level. Chunking those 1000 lines into smaller abstractions is already what your brain will do when trying to make sense of it (because the cache is small for abstractions and concepts that aren't already internalized)
This a bit of an aside, but I wonder if people would be less prone to advocate pure functions if they had a name that didn't sound like a paladin had come up with them. Would people as eagerly advocate rewriting code to break it into greasy functions?
Uhhh if you have a 1,000 line function it definitely should be broken down. Even a 100 line function is borderline. It's gonna be nigh impossible to write comprehensive unit tests for a 1,000 line function.
I had a method around... 700-800 lines. I extracted a bunch of things in to smaller bits, and the resulting block was around 140 lines, with generally understandable names. It's 'doing a lot' because... well, we have a complex block of data with many bits that need to be pulled in based on some logic.
Someone new started and... within 10 minutes commented in the code "150 lines is way too long, we need to understand why this is so bad and how to refactor this to be good - you should never do so much in a method".
Could I break it up even more? And have one method that composes 4 then each compose 4 that each compose 3? Possibly. But there's a tradeoff, and... what annoyed me more was that there was 1) no questions asked, 2) no view of the history of how it went from 800 down to 150, 3) no review of the tests in place which document and demonstrate that it does, in fact, work.
Pursuing brevity at all costs has a higher price than some people understand. Worked with the person a while longer, and they eschewed tests because... "hey, everything I write is already so small and understandable, there's no need for tests", demonstrating a pretty basic misunderstanding of tests (imo).
Anyone new that immediately shits on a codebase is a huge red flag.
I have seen 1500 method in an app I was rewriting. The dev no longer worked for us. It was badly written, I told the management it wasn't optimal and there were some issues with it but I said I don't know under what constraints the dev was working. I simply wasn't there when it was written and have no idea if it was incompetence or some other factors.
The point is anyone who thinks certain way is BAD and needs to be immediately improved while barely knowing the code base is a bit incompetent. I honestly would wait weeks to start bringing up some issues I may see in the code base or sql. I would learn the team dynamics first and try to see if there is any other reason for those things.
Many languages have ways to reduce scope in a single function (rust: drop, c/c++ java: {}, etc) so you can have a long function that means "these things need to happen in sequence" without suffering from excess shared state. Though obviously not perfect.
let (x, y) = {
... (some code that only serves to create x and y)
};
I really really like that feature for controlling scoping - it's like a single use function that doesn't involve so much navigation.
Other languages allow similar constructs with blocks too. For ones that allow inline lambdas, you can also get similar results by defining and immediately calling the function.
> A 1000 line function broken down into 100 different 10 line functions which are then each called in turn (and need lots of parameters to pass around the inevitable shared state) is actually often less readable than just a 1000 line function.
Aside from the 1000 line function (sheesh), this is a failure of the IDE, not of the practice. We have computers that can inline these functions so you could see the code in a big 1000 line thunk if you would like, but only a LISP IDE gives you inlining afaik.
> “Good code is simple” doesn’t actually say anything.
That's a bold claim and I will assert it is wholly incorrect.
Good Code is simple and simple code is VERBOSE
> whether that translates to “simple” code depends on the programmers.
That is incorrect. It largely depends on the language abstractions. When the abstractions are familiar, it goes from incomprehensible (literally) to familiar. Code cannot seem simple if you do not have the familiarity with the abstractions used.
> Should we strive to satisfy the Shakespeare for Dummies demographic
Yes.
> Programmers seem to believe in a realm of beautiful, readable, easy-to-maintain code that they haven’t seen or worked with yet
Lots of programmers believe it because they have seen it. I have. It was embedded in hard-to-maintain code, but it was there.
It doesn't mean anything because both good and simple are subject to on-the-spot redefinition. At what skill level does code become "simple"? Does all code have to be ELI5 to be good? When a Node developer looks at CUDA implementation of k-means regression analysis and finds it "difficult", does that mean the code isn't good?
When the abstractions are familiar
So you're saying it does depend on the programmer, because the programmer should be familiar with the abstractions used?
Short answer: No. "good" is a different bar (whatever you mean by it). It's not possible for every function to be simple, for more than 1 reason (for sure, some calculations are inherently complex, some relationships are complex, etc). However, the vast majority of code can be.
> So you're saying it does depend on the programmer, because the programmer should be familiar with the abstractions used?
Correct. It is expected the viewer is familiar with the language, not additional abstraction on top of that. You have to assume there is a cost to those new abstractions that elevate it beyond simple.
In case you have not seen it, there is an excellent talk by Rich Hickey on what "simple" is and how it differs from "familiar" or "easy": https://www.infoq.com/presentations/Simple-Made-Easy He proposes that "simple" is more objective than subjective.
> 1000 line function (sheesh), this is a failure of the IDE, not of the practice
This can be argued the other way too. Decent IDEs have code folding and other features so you can look at a long function at a higher level without needing to abuse language features for subjective purposes.
> Overzealous (or compelled) decomposition into 5-line chunks over tens of files. Pointless renaming, re-exporting and encapsulation. Higher-order code golf. All this done to a finished non-extensible project makes no sense and makes it harder to read, for no reason.
Exactly.
For decades, people were told this is the way to go, and would somehow, almost magically, make code readable, maintainable, reusable. The result is the exact opposite, a gigantic pile of needless abstractions and spread-out functionality, most of which exist solely to satisfy some paradigms.
There is a definition of "unreadable" missing from the bullet point list at the top. This is code that is known to be correctly functioning, but the implementation of that functionality is objectively obscure (this is not equivilant to the second bullet point). In my experience you see it most in code that is old and has had many programmers work on it. For example, your reading a function trying to figure out how it could ever return a correct value - and yet good tests show it does every time. You realize 100 lines can be replaced by a hash lookup such-n-such, so you make the change expecting the tests to fail because you didn't understand something, but nope, you were right - simple lookup was all it needed. How does code like this come to be? It grows organically. The latest minor tweak is a tiny blip in the long twisted history of the functions existence.
This might seem like a minor addition to the list, but I think it's an important one. And it has a deeper philosophical implication: if you end up with a "Rube Goldberg machine" chunk of code it is not simply stylistic preference, even if the code functions correctly. In other words: you should not make the claim that just because it functions correctly that it is "correctly written". You certainly can make that claim (and I have seen many do, and I have done so myself).
There is code that is complex because it needs to be. There is code that is complex even though it doesn't need to be. If you don't actively prune the second type, it will grow all by itself and take over your garden.
This is has been a popular topic for decades and I feel like most people get two fundamentals always twisted. Syntax and semantics. Syntax != Semantics.
Code readability = syntax that make semantics obvious.
If a block of code is written in an imperative style or functional style doesn't matter. Ask yourself instead, are the semantics clear? Are we coping or moving the values? Are we cloning the references? Are we iterating over mutable references or copies of the value?
A good language, with good code readability, makes semantics obvious. A good programmer, encourages good readability. With good readability we don't have doubts about what a piece of code is doing.
This is a really good comment, and I hadn't really thought of it in those terms. So it's insightful to me. It encapsulates a lot of things I hadn't really directly tried to answer before.
My previous answer is that good code makes it easy to discover intent.
But I think the semantics are important too - for example in scala we use monad transformers to treat futures (async code) much like lists. But the problem with this approach is it's not clear without inspecting the type signature if you're doing something that has a high algorithmic complexity eg. spawning threads or not.
I think the best language i've ever read for semantics is elixir.
But i think the lack of static typing makes it harder to code review, especially on github. Maybe Gleam is the answer?
Based on the way the article is written, I wouldn’t trust the author to write readable code either. It seems unfocused and lacks a coherent point… or did I just fail to understand it?
Readability is a very real concern and something every programmer should think about. It’s a form of communication.
It all starts with naming. Naming the entities, classes, methods and variables is probably 70% of making really readable code. In my experience, the difference between ‘spaghetti mess’ and ‘nice, clean code’ can often be achieved with an identical AST structure but with well-thought-out and consistently applied names.
The other 30% probably deserves it’s own book. It involves breaking complex portions of code up into well-named segments (ex: functions or variables), keeping related segments near each other (not spreading functionality out over dozens of files), using consistent patterns and much more.
There is a such thing as unreadable code. The author here has some nuggets of wisdom buried inside a lot of chaos. Maybe his style serves a good purpose for prose, but it isn’t an example of efficient and clear communication. Readable code should be efficient and clear communication.
> When we say that some code “is unreadable” we actually mean one or more of:
> 1. I can’t read the code because I don’t have sufficient experience or expertise (with the language or domain).
> 2. I haven’t spent enough time trying to read and understand the code (“it’s not obvious” or “it’s not intuitive”).
> 3. I don’t have much interest in understanding this code, I prefer to rewrite it in my own style.
> 4. The code offends my sense of aesthetics; I would write it differently.
> 5. The original programmer didn’t know how to write code.
> 6. The code appears to violate some principles or patterns I believe in.
The rest of the article mainly deals with 1-4 here, and only selectively tackles 6 as a "performative" repeat of 4.
It does mention vaguely toward the end that there may be some objective sense of readability, but doesn't go into any technical analysis. Which is a shame as I don't think it's that complex.
A few sibling commenters have alluded to this using various terms: ultimately I think objective code readability is about context and locality.
Consider a static analysis security tool that produces a control flow analysis tree: often a long tree is a very good sign of bad readability (in practice a long tree can be caused by many things: excessive boilerplate, frequent variable mutation or reassignment, etc.)
These same tools often fall down analysing control flow of applications with weird state patterns: global state, "magic" model loading, etc. Things that further hamper a reader's ability to hold a complete picture of any given file they're reading in their head at once.
I think this is interesting because not only is it a fairly simple idea to reason about (context & locality) but it's even potentially automatable as a metric.
(at least the programmer would explain the choices made)
and I've found that the need to explain the code in a literate mode has resulted in my better understanding the problem and possible approaches, resulting in a successful coding solution.
what makes code totally unreadable for me, is when people think adding more smaller classes and more smaller functions is actually simplifying the code. NO! NO! NO! You are splitting up the code and spreading it around so i cant see the whole picture and make it very difficult for me to track which function calls what function. After 20 goto definitions... i mean. common. abstracting your code is making it more complex. or. maybe im just working with terrible developers that dont know how to abstract in a simple way. Anyway, i dont like it. Procedural programming is best in my opinion. And if you are going to abstract, dont use too many layers. e.g. A calls B calls C and D, and D calls also C and C sometimes calls E and E calls B. etc... TOTAL NIGHTMARE.
Yeah I never got that whole "a function can never be longer than X lines". Screw that. Put everything that's relevant in that context in the function and only start breaking things up once it has to be re-used. Which happens a lot less often than many books and guides make out to be.
I once read a comment or post on HN that was saying something around the line : 'good code is code that you can throw away easily'. A long function that can be broken down easily meets that criterion.
Also, internal functions/lambdas first before exposing it externally makes the most sense in many cases. Reduces namespace pollution and makes it easier to rewrite those inner ones.
Bit of a shortsighted assessment IMO. Systems abstract tons of stuff that you don't see, even being a programmer. And not having to worry about that is VERY convenient. It sure is nice to not have to give a rip about how to arrange data on a hard drive, or speak network protocols directly.
What you are actually looking for is code that operates in the correct problem domain. That requires abstraction which is done correctly.
My guess is that you have worked on codebases with crappy abstractions.
For me, readable code is smart code wrote in a way that makes it readable by dummies. I've seen both 1000locXfile imperative spaghetti messes and super modular dogmatic Clean Code/Hexagonal Architecture/DDD stuff, and I can tell you that both can be nightmare inducing. I'm no big brainer, but I've always been praised for being the guy who produces legible code which other devs, even juniors, don't fear messing with. How? I simply don't push my code to extremes like it's gonna prove I'm smarter than I need to be for my job. And I model pretty damn intricated logic for a job.
> I may be generalising but I find older (40+) programmers more likely to write good code
I think it's also because they don't deal with complexity as well as younger programmers so they strive for simplicity (which is good).
> I must admit, the _art_ of writing readable code really appeals to me and is one of my finest joys in programming
Totally agree. Unfortunately, this isn't rewarded as much in a professional settings were you're expected to write features. Refactoring is much less visible, unless you're refactoring a piece of code which everyone was struggling with.
>I think it's also because they don't deal with complexity as well as younger programmers so they strive for simplicity (which is good).
Yeah, I'm going to have to ask for details here. When I was a youngster, it was the older folks who were my guides through the forests of complexity, especially when it came to interconnections between things, unintended side effects, and far-ranging consequences of design choices. When I got to that ripe old state of being 40+, that was my role as well.
> I must admit, the _art_ of writing readable code really appeals to me and is one of my finest joys in programming
One of the best compliments I've ever received was when a team member told me I created the most beautiful looking code she's ever seen. This was from both a pleasing to the eye and ability to grok perspective. The latter was mostly due to naming things in ways that made their purpose obvious and avoiding the temptation to do as much work in a single line of code as possible.
>I may be generalising but I find older (40+) programmers more likely to write good code
[...]
>For example, in Common Lisp
Technology choice can correlate with age. In Lisp's case I would expect that it's long past "cool", i.e. that it's attracting fewer people than it used to, and so I would expect it to skew older.
Just like perl and tcl and awk.
Also you would have to take survivor bias into account - if you only see good lisp projects, maybe that's because the bad lisp projects died out? Maybe the bad old lisp programmers left?
There is a lot of bad lisp code out there, especially these days (most open source projects in CL are more about look how smart I am but by themselves are mostly terrible code).
But I agree with your first point, the _older_ lisp programmers may very well be the ones that kept at it and honed their skills while the poorer programmers jumped ship before then
The article is missing a very important part of the topic what I call "reasoning".
Clean code is easier to reason about.
The easier to reason about the code the cleaner it is and the less context one need to understand the problem the better. State is the real enemy - field variables, mutable objects, lots of incoming parameters, context holding objects all makes it incredibly hard to understand the edge cases and general mechanism of the code.
Thankfully goto statements, global variables and singletons already has stigma attached to them...
On the other hand, singletons and global variables can be incredibly useful for writing clear and simple code.
As a web and mobile app developer, I find that globals can be useful. Developers will go through extraordinary lengths to avoid making things global, but the truth is that for end-user applications a great deal of the relevant product requirements are essentially singletons. There’s only one active profile and one user and one catalog and one persistence layer. There’s only one DOM and one window. The result is that code which endeavors to make state which is truly shared and global actually shared and global will often be much simpler and less bug-prone, provided that reasonable abstractions are chosen for any mutation and event APIs related to global state.
> Developers will go through extraordinary lengths to avoid making things global, but the truth is that for end-user applications a great deal of the relevant product requirements are essentially singletons.
Globals are appropriate when "there can be only one"; however, I tend to find that's rarely the case. Much more common is "there can be only one at a time"; in which case, dynamic variables are far more useful than global variables. The most obvious example is in tests.
> There’s only one active profile and one user and one catalog and one persistence layer. There’s only one DOM and one window.
But --- hopefully --- there are more than one unit/component tests. ;-)
But I agree, there are contexts in which global variables may make sense, e.g. sometimes in an embedded system, sometimes in a short throw away program. As usual, the community of software developers discusses guidelines without clarifying the context or the assumptions they are working with. Since there are almost no universal guidelines for programming those discussion go on forever.
There’s only one DOM and one window. And then you start using web workers, service workers, audio weblets, … Or you want to write tests for your code, or you want to use the same code in a backend node service or with react-native.
Globals are never the good choice. Hidden/ambient dependencies always ends up causing more problems than it solves. Extraordinary lengths are merely passing an parameter through your stack or a tightly controlled context. This requires proper design if you have a lot of things to pass around, it's merely organizing your code dependencies and not hiding under the global carpet.
Imagine I tell you that story, but before that let me tell you something else. Nevermind, let's go on with the story. One moment, what was I saying? Oh sure, yes, the story. The story goes as follows... The ending, preceded by most of the story, preceded by its beginning.
Some people write code that reads like that. Readability counts.
> Does that mean all programmers should dumb their code down so even beginners with no domain expertise can understand it at a glance? Should we strive to satisfy the Shakespeare for Dummies demographic?
Having been on call for other people's code I can tell you that the idiom "Always code as if the person who ends up maintaining your code is a violent psychopath who knows where you live" is a very good idea and you should stick to it.
If I get woken up at 2am due to an outage you caused and open your file to find you've written War and Peace code instead of The Hungry Caterpillar code I will be very very very upset with you.
Like everything else there’s a tendency to lean towards categorizing it into one binary category or another, I think this article makes some great points however about the topic of simplicity I think rich hickeys talk about “simple made easy” is really informative for thinking about design and building systems (it also presents a interesting definition of the two categories)
Is code reading a use case for GPT-2 text summarization? The examples I see are text to text or code to code. I am wondering what code to text could be like.
But there is a single “case” I’d never agree on: the code whose complexity and/or composition ends with nothing. The more experience you get, the more high-level structure you are able to pick on, but sometimes structures are just there, serving nothing at all. Overzealous (or compelled) decomposition into 5-line chunks over tens of files. Pointless renaming, re-exporting and encapsulation. Higher-order code golf. All this done to a finished non-extensible project makes no sense and makes it harder to read, for no reason.
This is a lenghty article, but it is readable in one chunk, a good piece of read. Now imagine the author split it into a number of submodules, then in each they’d give a new name to every phenomenon. Then instead of using English they’d construct a new sub-language (also in modules) to express meaning in a shorter way. E.g. this:
turns into this: