Logging levels: the wrong abstraction
So you just wrote another try/catch block and again you start wondering: should I log this exception as an ERROR or a WARN? Or perhaps you wrote another one of those conditional branches that “should never happen” and again you start wondering: what log level should I use for this “impossible” case?
You are asking the wrong question.
These questions are at the wrong abstraction level as they do not help you to come up with an answer. Instead you have to look at this question from the perspective of how the log statement will be used.
At IG we have generally come to the following log level conventions:
- DEBUG: information that is useful during development. Usually very chatty, and will not show in production.
- INFO: information you will need to debug production issues.
- WARN: someone in the team will have to investigate what happened, but it can wait until tomorrow.
- ERROR: Oh-oh, call the fireman! This needs to be investigated now!
With this usage in mind, the log level question becomes: “Do I want to wake up in the middle of the night to investigate this, or can it wait until tomorrow morning?”, which is very concrete and a question that even your Business Manager/Production Owner should be able to answer.
Of course this convention needs to be taught to new team members, and constantly reminded to existing ones. And as much as we love wikis, code reviews and email reminders, having to do the mental mapping from the log level to how the log statement is going to be used is not a natural one, and it is going to be forgotten.
So, after one of those code reviews where I asked several times “Why do you want to wake me up in the middle of the night for this exception?” I decided that we would switch the logger interface from SLF4J to an experimental “IGLogger” one which looks like this:
The results were interesting:
- Laughs when reading “wakeMeInTheMiddleOfTheNight”
- Extra care when using “wakeMeInTheMiddleOfTheNight”
- Less mental gymnastics when logging
- Way easier to make the correct logging choice
- Nothing to explain to new team members, nothing to remind to the existing ones
- Adopted by developers’ choice on the rest of the team’s projects
And last, but not least, we got more sleep: the number of ERROR logs was reduced to 0, not because nobody could be bothered to type “wakeMeInTheMiddleOfTheNight”, but because this service was an offline, non-critical one, so any issue could wait for the next day.
Good night and sweet dreams!
About the author:
Daniel Lebrero works as a technical architect at IG on the Big Data team. With more than 15 years of Java experience and 4 of Clojure, he is now a big advocate of the functional approach. Find him on Twitter or LinkedIn.