Sun 02 March 2014 | -- (permalink)
I've encountered a few hilariously useless error messages while coding recently. They got me thinking about error messages in general and wondering about the best way to build good error messages into my own software.
Some hilarious messages I've seen:
Error in listClassName("Compressed", class(x)) : Could not find a 'CompressedList' subclass for values of type 'GRangesListCompressedListGenomicRangesListGenomicRangesORGRangesListListVectorAnnotated'
Holy long type name, Batman! This is an R error. I was super confused because I have never EVER created, used, or seen an R object of type GRangesListCompressedListGenomicRangesListGenomicRangesORGRangesListListVectorAnnotated because that's a ridiculous name for a class. The problem was actually that I was trying to call the function
split on an object of class
GRangesList, but the
GRangesList class doesn't have a split method. It might be hard to implement a better message, since
split is a generic function and object-oriented programming in R is not the best, but hey, I like to dream big.
This is the best/worst error message I've ever gotten in my life. I got it in 2011 when I was fitting an MCMC using WinBUGS. What I love about this error message is that it doesn't say the word ERROR anywhere in it. It just came up in a little dialog box with an "OK" button, so I clicked OK and results happened. I didn't even realize anything had gone wrong for like half an hour. When my friend told me this was an error message, I tried to fix it by changing some of my prior distributions. When that didn't work, I got really angry and became a frequentist1.
Error: bad restore file magic number (file may be corrupted) -- no data loaded. In addition: warning message: file 'yourfile' has magic number 'chr1'
I had to look up what a "magic number" is. I don't completely understand it yet. I'd bet that only a tiny proportion of R users know what a magic number is. The likely problem here is that the user is loading a data file the wrong way. To be more specific: usually you use
load() to load a workspace (a .rda or .RData file),
source() to run a previously-saved script, and
read.table() to read in data from some delimited file. The magic number error pops up if you use the wrong one of these commands for your data type. I saw it when a friend of mine ran
load() on a data file instead of
read.table(). I feel like this error message would be better if it said something like "error: incorrect file type for function, or corrupt file."
Error: negative-length vectors not allowed
So many questions here! What on earth is a negative-length vector? How do I make one? What would I do with it? Are they crucial to our existence, like imaginary numbers? Anyway, joking aside -- a quick Google search will tell you that this is the error you get in R when you try to make a vector with a length of 2^31 or more. Great! I can fix that. I probably could have fixed it faster if the error message just said "YO you're making a vector with more than 2^31 - 1 entries" rather than harping on about a negative length.
In addition to making me laugh and really confusing me, these got me thinking about error messages in general, from both from the software-user and software-creator perspective. As a user, I'm frustrated by error messages like these: they tell me about symptoms, but not causes. They tell me that there is a problem, but don't give me clues about how to start solving it2. But as a programmer, it's really hard to anticipate every possible thing that might go wrong with a given function and write an informative error message for each of those things. I've often chosen not to built error messages into my own functions, relying instead on my functions' internal functions to throw errors, but I think that's probably how error messages become uninformative in the first place, especially in complex/nested function calls.
The problem is that it's hard to have a user mindset for software I've written myself. I'm too deep in and can't anticipate enough of the ways users might break things. I'm starting to think that the best way to make user-friendly software is to (a) write the software, (b) brace yourself, (c) recruit some people that weren't involved in the development to use your software and give you feedback, and (d) fix all the problems they have with the software. Let them break things, and then you fix all the broken stuff! This process is described in this blog post about "complaint-driven development". I've been finding that it works on a smaller scale too: one of my current projects is a software package, and two colleagues have used it for data analysis and told me about all the weird problems they encountered. I never would have anticipated those without them, and I've been able to improve the error messages because of them -- and that's only two test users! I think that if I really want this project to have great error messages, I just need to get more people to break it and give feedback on the messages.
- Just kidding. I don't actually believe in frequentist/Bayesian dogma. And if I were a frequentist, it would only partially be because of how much I dislike WinBUGS.
- As a user, I'm also appreciative of good error messages. The Python folks at Hacker School were always waxing poetic about how awesome Python error messages are. The more I use Python, the more I agree. I like that Python errors have types, so you at least get one hint as to what's happening. The error types also make it so you can illustrate the Python debugging process with a beautiful flowchart.