« Archives in August, 2011

When bytes bite

rulers.jpgNot long ago I had a discussion with a friend who had just bought a new 2 TB hard disk. He complained that those hard disk manufacturers always cheat: they sell you 2 TB but in the end the drives only have 1.8 TB of memory available. For them, a kilobyte comprises only 1000 bytes and not 1024, as for the rest of mankind.

We use the term “byte” everyday but it is surprising how man developers don’t know exactly what a byte is. “Byte” — and especially byte quantifiers like kilo, mega, and giga — seem to be surrounded by many misuses and misconceptions.

Traditionally, a byte is a collection of bits used to encode a single character for a system. It could be 4, 7, 8, 9, or any other number that the designer of a system happens to choose. This is the main reason for the CHAR_BITS symbolic constant in ISO C’s limits.h: it specifies precisely how many bits there are in a character (char).

Today, of course, we can safely assume that a byte comprises exactly 8 bits, but it is important to note that this is no universal, standardized definition. That’s why the term “octet” is used in RFCs and ASN.1: an octet is defined to be always 8 bits.

But what the heck is a kilobyte? Is it 1024 bytes? Or 1000? We use byte quantifiers so frequently, but not always correctly.

Some folks use the well known SI prefixes to mean powers of 1024:

While hard disk manufacturers usually have a different definition:

Which makes quite a difference, especially if sizes grow larger: for a 1 TB hard disk you might get about 10% less than you thought you’d paid for…

But don’t blame it on the hard disk manufacturers. They’re right, of course. Using SI prefixes to mean ‘powers of 1024’ is strongly discouraged by the SI institute.

Still, it is sometimes useful to use powers of 1024 but how should this be done without annoying the SI institute, or — more importantly — without confusing people?

Fortunately, there is an ISO standard (with roots back to work done by IEC in 1998) that addresses this problem: ISO/IEC IEC 80000-13:2008.

According to this standard, you use binary prefixes like “kibi” and it’s friends, where kibi is short for “kilo binary”; and instead of using SI’s “k” prefix you use “Ki”. Have a look at this table:

Binary prefixes can (and should) be applied to any other unit when the value is based on 1024: 1 Mib/s is 1024^2 bits per second, while 1 Mb/s is 1000^2 bits per second.

The international system of units is a wonderful achievement of the 20th century. We, especially as developers, should honor it and finally stop misusing its well-defined prefixes and instead use ISO binary prefixes.