Dangerously Confusing Interfaces V: The Erroneous ERRORLEVEL

“Design interfaces that are easy to use correctly and hard to use incorrectly.”
— Scott Meyers

Dangerously confusing interfaces can lurk anywhere, even in the venerable (yuck!) DOS batch scripting language. Some time ago, I burnt my fingers when I made a tiny tweak to an existing batch file, deploy.bat, which was part of a larger build script:

Because we had seen the ‘copy’ command fail in the past, I tried to improve things a little by adding an ‘if’ statement to ensure that we would get a clear error message in such events:

Alas, it didn’t work. There still was no error message produced in case the copy command failed. Worse yet, the outer build script happily continued to run. Puzzled, I opened a DOS box and did some experiments:

Hmm. Everything worked as expected. Why didn’t it work in deploy.bat? Next, I changed deploy.bat to output the exit code:

And tried again:

What? The copy command failed and yet the exit code was zero? How can this be? After some head scratching, I vaguely remembered that there was another (arcane) way of checking the exit code, namely ERRORLEVEL (without the percentage signs), so I tried it out:

I never really liked this style of checking the exit code, because ‘ERRORLEVEL n’ actually doesn’t test whether the last exit code was n; it rather checks if the last exit code was at least n. Thus, this statement

doesn’t check if the exit code is zero (ie. no error occurred). What it really does is check if the exit code is greater to or equal to zero, which is more or less always true, no matter the value of the exit code. That’s pretty confusing, if you’d ask me.

Anyway, for some reason, it seemed to work nicely in deploy.bat:

I hardly couldn’t believe my eyes. The copy command obviously failed, %ERRORLEVEL% was obviously zero, still the if statement detected a non-zero exit code. What was going on? I delved deeply into the documentation of the DOS batch language. After some searching I found this paragraph:

%ERRORLEVEL% will expand into a string representation of the current value of ERRORLEVEL, provided that there is not already an environment variable with the name ERRORLEVEL, in which case you will get its value instead.

Whoa! There are two kinds of ERRORLEVEL, who knew? One (the one whose value you can query with %ERRORLEVEL%) will be set to the value of the former, provided there is no variable named ERRORLEVEL already. Now I had a suspision what was going on. I opened the parent batch file and came across the following:

In an attempt to clear the error level, some unlucky developer introduced a variable named ERRORLEVEL which shadowed the value %ERRORLEVEL% from this point on. This can be easily verified:

Once the problem was understood, it was easy to fix: clear the error level in an “accepted way” (yuck, again!) instead of wrongly tying it to zero:

Even though the interface to DOS exit codes is dangerously confusing (and disgusting as well), it facilitates a nice practical joke: next time a colleague leaves the room without locking the screen, open Windows control panel, create a new global environment variable called ERRORLEVEL and set it to 0.