The broken promise of static typing


Daniel Lebrero photo I was quite surprised at a recent blog post by Uncle Bob Martin, titled: "Type Wars", in which he writes: "Therefore, I predict, that as Test Driven Development becomes ever more accepted as a necessary professional discipline, dynamic languages will become the preferred languages. The Smalltalkers will, eventually, win."

This statement didn't sit well with some people ​in the static typing community, who argued that in a sufficiently advanced statically typed language, types are proofs and they make unit tests mostly redundant​. Haskell even claims that "once your code compiles it usually works"​!

For me all these claims translate to a simple promise: less bugs.

And I really hate bugs. I find them to be one of the worst wastes of time and energy for a project, and there is nothing that annoys me more than getting to the end of iteration demo and the team being somehow proud of saying, "We did X story points and we fixed 20 bugs! Hurray!"

To me it sounds like, "In the last iteration we wrote more than 20 bugs, but our clients were able to find just 20! And we were paid for both writing and fixing them! Hurray!"

Chartin​​​g bugs​​

With that in mind, I tried to find some empirical evidence that static types do actually help avoid bugs. Unfortunately the best source that I found suggests that I am out of luck, so I had to settle for a more naïve approach: searching Github.

The following are some charts that compare the number of issues labelled "bug" versus the number of repositories in GitHub for different languages. I also tried removing some noise by just using repositories with some stars, on the assumption that repositories with no stars means that nobody is using them, so nobody will report bugs against them.

In green, in the "advanced" static typed languages corner: Haskell, Scala and F#.
In orange, in the "old and boring" static typed languages corner: Java, C++ and Go.
In red, in the dynamic typed language corner: JavaScript, Ruby, Python, Clojure and Erlang.

Round 1. Lang​​​​uages sorted by bug density. All repos

Round 2. Language​s​​ sorted by bug density. More than 10 stars repos

Round 3. Languages sorte​d by bug density. More than 100 stars repos

Whilst not conclusive, the lack of evidence in the charts that more advanced type languages are going to save us from writing bugs is very disturbing.

Static vs Dynamic is not th​​e issue

The charts show no evidence of static/dynamic typing making any difference, but they do show, at least in my humble opinion, a gap between languages that focus on simplicity versus ones that don't.

Both Rob Pike (Go creator) and Rich Hickey (Clojure creator) have very good talks about simplicity being a core part of their languages. And that simplicity means that your application is going to be easier to understand, easier to change, easier to maintain, and more flexible. All of which means that you are going to write less bugs.

What characterizes a simple language? Listing the things in common between Go, Erlang and Clojure, we get:

  • No manual memory management
  • No mutex-based concurrency
  • No classes
  • No inheritance
  • No complex type system
  • No multiparadigm
  • Not a lot of syntax
  • Not academic
Maybe all those shiny things that we get in our languages are actually the sharp tools that we end up hurting ourselves with - creating bugs and wasting our time - and that all they do is bring a lot of additional complexity, when what we really need is a simpler language.

As Tony Hoare said: "there are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies."​​

About the author:

Daniel Lebrero works as a technical architect at IG on the Big Data team. With more than 15 years of Java experience and 4 of Clojure, he is now a big advocate of the functional approach.  Find him on Twitter, LinkedIn, or his personal blog.

Comments

Have you also normalised by lines of code? If, for instance, Java projects tend to be larger than Ruby projects, your figures would become skewed otherwise.

I don't want to start a flame but I think that your process isn't very fair so I'm not very inclined to agree to your conclusions. A few things you might want to consider to make it a fair comparison:

- as bronger said in a previous comment, normalizing by LOC is a first step
- Static Typing doesn't prevent all bugs, only a category. You should stick to that.
- Project complexity. If you compare a programming language or an asynchronous HTTP multithreaded server written in Java or C++ (which are usually the kind of projects you use those languages for) against a static blog generator the comparison is also unfair. The more complex a project is, the more bugs (especially those you can't prevent with static typing) are going to show up.

I'd love to see a comparison of similar projects for bugs that static typing can actually avoid. For example:

if( i == 5 )

The above in PHP can lead to quite a lot of surprise if i is a string depending on the actual content because the interpreter just tries to convert the arguments for the equality operator to something that can be compared, and I know there's === but that's not the point). That is a bug for which static typing is useful. On the other hand, if instead of 5 you should have written 10, that is a programming error and static typing can't do much.

Whilst I agree that lines of code can measure complexity, they do not measure only the inherent complexity of a project, they can also be a sign of incidental complexity. See https://labs.ig.com/lines-of-code-matter for an example. I am a Java developer and I wouldn't say that of the two million repositories in Github, the typical project is an async HTTP server. Java is awesome because you have libraries for absolutely everything. Also, when I think about Erlang, the first thing that comes to my mind is distributed programming, which in my opinion is in the category of "very difficult problems".

This is interesting, although like bronger I do wonder if this is an artifact of Java projects tending to be larger. Any chance you could make available the code you used to obtain the data?

Thanks,

Alex

Interesting results; certainly food for thought.

Since some of my tweets were linked, I'd like to provide a bit of nuance: while Robert C. Martin's conclusion didn't "sit well with me", I never claimed that types make unit tests largely redundant. What I did state, as much as Twitter's character limit allowed me, is that the stronger the type system, the fewer tests you need to write.

As I've already elaborated, the stronger the type system, the fewer tests you need to write, simply because there are categories of errors that become impossible. As an example, in Haskell (and, to a degree, in F#) nulls are impossible, so it makes no sense to test what happens if a return value is null. In Haskell, you can't even meaningfully write such a test.

The absence of nulls eliminates an entire category of errors, which means that there's an entire category of edge cases you don't need to unit test. Given that even Sir Tony Hoare himself has admitted that null references were a billion dollar mistake, I'd consider this feature of a strong type system valuable.

The numbers reported here, though, are food for thought. Did you also run the numbers on C#?

I think it'd be interesting to see the numbers for C#, for the following reason: if we were to approach this question slightly scientifically, I'd like to make a prediction: I'd expect the numbers for C# to be similar to the numbers for Java. The reason is that these two languages, while running on two different platforms, are quite similar. They have the same sort of type system, they both allow nulls, they're both object-oriented C-descendants, and they are both widely used for enterprise development.

If the C# numbers line up with the Java numbers, then this would strengthen the confidence we can have in the analysis. If the C# numbers don't match, then I think further study is required in order to understand why this is so.

Sorry if I misinterpreted you, but my reasoning is that if I do not need to test for nulls, it is not possible to forget that I need to test for nulls, hence I should have less bugs as I am human and I tend to forget things, like testing for nulls. C# numbers are 0.16, 9.26 and 64 By the way, your blog post series is excellent and very inspiring.

Not sure one can normalise by loc, the expressiveness of the languages varies too greatly for that. Regarding static typing, the absence of it to some extent means bugs otherwise detected during compilation don't necessarily make it into production. They can still be found during system or unit testing. That process isn't usually seen to be mitigated by strong typing.

I'd like to suggest a tweak in the measurements. Instead of testing based on starred projects, also try to correlate with how old the project is.

The rationale is that projects that are older (and still have activity - like commits in the last few weeks) had more time to be battle tested and, therefore, more issues were found. New projects, even those with many stars and lots of activity, still didn't have too much time for bugs to show up.

Just for this perspective, it would make more sense for Java or C++ projects to present more issues compared to newer languages such as Clojure.

Age could certainly be related, but the charts show plenty of languages that are as old and popular as Java/C++. Note that Erlang is older than Java. Also, what about just looking at the "new" languages? Does the numbers make sense if we just compare them?