Category Archives: General

When bytes bite

Not long ago I had a discussion with a friend who had just bought a new 2 TB hard disk. He complained that those hard disk manufacturers always cheat: they sell you 2 TB but in the end the drives only have 1.8 TB of memory available. For them, a kilobyte comprises only 1000 bytes and not 1024, as for the rest of mankind.

We use the term “byte” everyday but it is surprising how man developers don’t know exactly what a byte is. “Byte” — and especially byte quantifiers like kilo, mega, and giga — seem to be surrounded by many misuses and misconceptions.

Traditionally, a byte is a collection of bits used to encode a single character for a system. It could be 4, 7, 8, 9, or any other number that the designer of a system happens to choose. This is the main reason for the CHAR_BITS symbolic constant in ISO C’s limits.h: it specifies precisely how many bits there are in a character (char).

Today, of course, we can safely assume that a byte comprises exactly 8 bits, but it is important to note that this is no universal, standardized definition. That’s why the term “octet” is used in RFCs and ASN.1: an octet is defined to be always 8 bits.

But what the heck is a kilobyte? Is it 1024 bytes? Or 1000? We use byte quantifiers so frequently, but not always correctly.

Some folks use the well known SI prefixes to mean powers of 1024:


1 kB = 1024^1 bytes = 1024 bytes
1 MB = 1024^2 bytes = 1048576 bytes
1 GB = 1024^3 bytes = 1073741824 bytes
...

1 kB = 1024^1 bytes = 1024 bytes

1 MB = 1024^2 bytes = 1048576 bytes

1 GB = 1024^3 bytes = 1073741824 bytes

...

While hard disk manufacturers usually have a different definition:


1 kB = 1000^1 bytes = 1000 bytes
1 MB = 1000^2 bytes = 1000000 bytes
1 GB = 1000^3 bytes = 1000000000 bytes
...

1 kB = 1000^1 bytes = 1000 bytes

1 MB = 1000^2 bytes = 1000000 bytes

1 GB = 1000^3 bytes = 1000000000 bytes

...

Which makes quite a difference, especially if sizes grow larger: for a 1 TB hard disk you might get about 10% less than you thought you’d paid for…

But don’t blame it on the hard disk manufacturers. They’re right, of course. Using SI prefixes to mean ‘powers of 1024’ is strongly discouraged by the SI institute.

Still, it is sometimes useful to use powers of 1024 but how should this be done without annoying the SI institute, or — more importantly — without confusing people?

Fortunately, there is an ISO standard (with roots back to work done by IEC in 1998) that addresses this problem: ISO/IEC IEC 80000-13:2008.

According to this standard, you use binary prefixes like “kibi” and it’s friends, where kibi is short for “kilo binary”; and instead of using SI’s “k” prefix you use “Ki”. Have a look at this table:


1 KiB   1 kibibyte  1024^1 bytes = 1024 bytes
1 MiB   1 mebibyte  1024^2 bytes = 1048567 bytes
1 GiB   1 gibibyte  1024^3 bytes = 1073741824 bytes
...

1 KiB 1 kibibyte 1024^1 bytes = 1024 bytes

1 MiB 1 mebibyte 1024^2 bytes = 1048567 bytes

1 GiB 1 gibibyte 1024^3 bytes = 1073741824 bytes

...

Binary prefixes can (and should) be applied to any other unit when the value is based on 1024: 1 Mib/s is 1024^2 bits per second, while 1 Mb/s is 1000^2 bits per second.

The international system of units is a wonderful achievement of the 20th century. We, especially as developers, should honor it and finally stop misusing its well-defined prefixes and instead use ISO binary prefixes.

The Answer To The Last Question

Today is towel day, but due to higher priorities I have to celebrate this important day all by myself. I can’t make it to Innsbruck this year, but I swear to you that I’m wearing my towel around my neck while I’m typing this blog entry, which I dedicate to Douglas Adams.

Few people know that his idea about “The great question of life, the universe, and everything” (to which the answer is, as everybody knows, “fourty-two”) was in fact a little bit inspired by Isaac Asimov’s great short story “The Last Question“, where generations after generations build more powerful computers to find out how to reverse entropy and thus prevent the universe from becoming an infinite starless nothingness. “The Last Question” is a great read with a surprising end. I won’t spoil it, don’t worry.

While it is impossible for humans to stop “real” entropy from increasing (let alone reversing it) it is certainly doable in the software world. But how?

It’s not by carrying out the big refactorings and redesigns that nobody wants to do and that no profit-oriented organization can afford: for months valuable resources are so busy cleaning up instead of implementing cool features customers are willing to pay for. It’s the small stuff that counts: the teeny-weeny improvements that you do on a regular basis. Like James O. Coplien said: “Quality is the result of a million selfless acts of care”

I very much like Uncle Bob’s boy scout rule analogy: “Always leave the campground cleaner than you found it”. This principle is helpful for life in general and software development in particular. If the system you are working on is a complete mess, don’t resign. Even if you just improve a comment or rename a badly named variable, you have made the system better. Then, if everybody acts like you, software entropy will be reversed.

Personal Scrum

Even though I’ve never participated in a Scrum project, I’m a big Scrum fan. I’m convinced that a feedback-enabled, quantitative project management approach, one which puts the customer in the driver’s seat, is key to avoiding delays and frustration.

Especially the concept of time-boxing is very powerful: the Scrum team sets their own goals that they want to achieve within a given period of time. In Scrum, this period of time — or iteration — is called “sprint” and usually lasts two to four weeks. Because the sprint deadline is in the not-so-distant future, developers stay on track and the likelihood of procrastination and gold-plating is fairly low.

But there is even more time-boxing in Scrum: Every day at the “Daily Scrum Meeting” the team comes together and everyone tells what they have achieved and what they want to achieve until the next daily scrum. In practice, that’s another 24 hours (or 8 work-hours) time-box.

Still, getting things done is not easy. If you are like me you are distracted dozens of times every day. While hacking away, you are suddenly reminded of something else. Maybe it’s a phone call that you have to make. Or you want to check-out the latest news on “Slashdot“. Maybe a colleague pops by to tell you about the weird compiler bug he just discovered in the GNU C++ compiler…

If you give in to these interruptions, you won’t get much done in a day. You won’t get into what psychologists call “flow“: a highly productive state were you are totally immersed in your work.

Is there a way to combat such distractions? There is, but let me first tell you what doesn’t work: quiet hours. Quiet hours are team-agreed fixed periods of time were you are not interruptible, say, from 9.00 to 11.00 in the morning and from 14.00 to 16.00 in the afternoon. Every team member is expected to respect these hours. Sounds like a nice idea, but it fails miserably in practice. Especially in large projects, people depend on each other and productivity drops if developers are blocked because they cannot ask for help for two hours. All teams I belonged to and which tried quiet hours abandoned them shortly after they had introduced them.

The solution is to make the period of highly focused work much shorter, say 25 minutes. If interruptions occur, you make a note of them in your backlog and carry on with your task. When the time expires, you take a quick break (usually 5 minutes), check your backlog and decide what to do next: either continue with your original task or handle one of your queued interrupts. In any case, you start another period of highly efficient 25 minutes and after 4 such iterations, you take a bigger break (15 – 30 minutes). That’s the Pomodoro technique in a nutshell.

Pomodoro (Italian for tomato) was invented by Francesco Cirillo, a student who had problems focusing on his studies. He wanted to find a method that allowed him to study effectively — even if only for 10 minutes — without distractions. He used a mechanical kitchen timer in the shape of a tomato to keep track of time, and hence named his technique after his kitchen timer. He experimented with different durations, but finally came to the conclusion that iterations of 25 minutes (so-called “Pomodoros”) work best.

I like to think of the Pomodoro technique as “Personal Scrum”. To me, a 25 minute time-box is just perfect. It’s enough time to get something done, yet short enough to ensure that important issues that crop up are not delayed for too long. In his freely available book, Francesco writes that while there are software Pomodoro timers available, a mechanical kitchen timer usually works best — and I definitely agree. The act of manually winding up the timer is a gesture of committing to a task and the ticking sound helps staying focused, since you are constantly reminded of time. However, mechanical timers are a clear no-no if you share your office with others: the ticking and especially the ringing sound would be too annoying.

When I’m all by myself, I prefer a mechanical kitchen timer, but if I share a room with someone else, I prefer something softer. I’ve asked the folks at AudioSparx to implement a Pomodoro kitchen timer MP3 for me: 25 minutes of ticking, followed by a 10 seconds gentle ring (yes, you can download it — it’s USD 7.95 and no, I don’t get commission). I listen to it on my PC’s MP3 player wearing headphones, which has two additional benefits: first, headphones shut off office noise and second, they signal to others that I wish to be left alone, so they only interrupt me if it is really, really urgent.

“I have a deadline. I’m glad. I think that will help me get it done.”
–Michael Chabon

Get into ‘Insert’ Mode

Here I am, trying to write something. I’m sitting at my desk, staring at my screen an it looks like this:

It is empty. I just have no clue how to even start.

Are you familiar with such situations? Among writers, this is a well-known phenomenon and it’s called “writer’s block”. But similar things happen in all creative fields: sooner or later, people hit a massive roadblock and don’t know where to start. A painter sits in front of a blank canvas, an engineer in front of a blank piece of paper and a programmer in front of an empty editor buffer.

Is there any help? Sure. You can use a technique called “free writing“, which means you just write down whatever comes to your mind, regardless of how silly it looks. It’s important that you don’t judge what you write, you don’t pay attention to spelling or layout, your only job is to produce a constant stream of words — any words. This exercise will warm-up your brains and hopefully remove the block. Applied to programming, you set up a project, you write a “main” routine (even if it only prints out “Hello, World, I don’t know how to implement this freaking application”) and a test driver that invokes it.

The next thing that you do is write a “shitty first draft“, as suggested by Anne Lamott. You probably know the old saying: the better is the enemy of the good. By looking for the perfect solution, we often end up achieving nothing because we cannot accept temporary uncertainty and ugliness. That’s really, really sad. Instead, write a first draft, even if it is a lousy one. Then, put it aside and let it mature, but make sure you revisit it regularly. You will be amazed at how new ideas and insights emerge. Experienced programmers are familiar with this idea, but they call it prototyping. They jot down code, they smear and sketch without paying attention to things like style and error-handling, often in a dynamic language like Perl or Python.

So if you have an idea that you think is worthwhile implementing, start it. Start somewhere — anywhere — even if the overall task seems huge. Get into ‘insert’ mode (if you are using the ‘vi’ editor, press the ‘I’ key). Remember the Chinese proverb: “The hardest part of a journey of a thousand miles is leaving your house”.

Intended Use vs. Real Use

Often, things are invented to solve a particular problem, but then the invention is used for something completely different.

Take Post-it® Notes, for instance. In 1970, Spencer Silver at 3M research laboratories was looking for a very strong adhesive, but what he found was much weaker than what was already available at his company: It stuck to objects, but could easily be lifted off. Years later, a colleague of his, Arthur Fry, digged up Spencer’s weak adhesive — the rest is history.

Another example is the discovery of this blue little pill called Viagra®. Pfizer was looking for medications to treat heart diseases, but the desired effects of the drug were minimal. Instead, male subjects reported completely different effects — again, the rest is history.

In 1991, a team of developers at Sun were working on a new programming language called “Oak” — the goal was to create a language and execution platform for all kinds of embedded electronic devices. They changed the name to “Java” and it has become a big success: You can find it almost everywhere, except — big surprise — in embedded systems.

I would never have guessed how minute Java’s impact on embedded systems was until I read Michael Barr’s recent article, provokingly called “Real men program in C” where he presents survey result showing the usage statistics of various programming languages on embedded systems projects.

The 60-80% dominance of C didn’t surprise me — C is the lingua franca of systems programming: high-level enough to support most system-level programming abstractions, yet low-level enough to give you efficient access to hardware. If it is fine for the Linux kernel (which is around 10 million lines of uncommented source code, SLOC) it should be fine for your MP3 player as well.

Naturally, at least to me, C++ must be way behind C — Barr reports a 25% share. C++ is a powerful but difficult language. It is more or less built on top of C, so it is “backwards-efficient”. Alas, to master it, you need to read at least 10 books by Bjarne Stroustrup, Scott Myers, Herb Sutter et. al. and practice for five years — day and night. But the biggest problem with C++ is that it somehow encourages C++ experts to endlessly tinker with their code, using more and more advanced and difficult language features until nobody else understands the code anymore. (Even days after everything is already working they keep polishing — if people complain that they don’t understand their template meta-programming gibberisch, they turn away in disgust.)

But how come Java is only at 2%? Barr, who mentions Java only in his footnotes (maybe to stress the insignificance of Java even more) has this to say: “The use of Java has never been more than a blip in embedded software development, and peaked during the telecom bubble — in the same year as C++.”

Compared to C++, Java has even more weaknesses when it comes to embedded systems programming. First of all, there is no efficient access to hardware, so Java code is usually confined to upper layers of the system. Second, Java, being an interpreted language, cannot be as fast as compiled native code and JIT (just-in-time) compilation is only feasible on larger systems with enough memory and computational horsepower. As for footprint, it is often claimed that Java code is leaner than native code. Obviously, this is true, as the instruction set of the JVM is more “high-level” than the native instruction set of the target CPU. However, for small systems, the size of the VM and the Java runtime libraries have to be taken into account and this “overhead” will only amortize in larger systems. But two more properties of Java frequently annoy systems programmers: the fact that all memory allocation goes via the heap (i. e. you cannot efficiently pass objects via the stack) and the fact that the ‘byte’ data type is signed, which can be quite a nuisance if you want to work with unsigned 8-bit data (something that happens rather frequently in embedded systems). Finally, if C++ seduces programmers to over-engineer their code by using every obscure feature the language has to offer, Java seduces programmers to over-objectify their code — something that can lead to a lot of inefficiency by itself.

I don’t think that the embedded world is that black and white. I’m convinced that for small systems (up to 20 KSLOC) C is usually the best choice — maybe sprinkled with some assembly language in the device drivers and other performance-critical areas. Medium-sized systems can and large systems definitely will benefit from languages like C++ and Java, but only in upper layers like application/user interface frameworks and internal applications. Java clearly wins if external code (e. g. applets, plug-ins) will be installed after the system has been deployed. In such cases, Java has proven as a reliable, secure and portable framework for dynamically handling applications. For the rest, that is, the “core” or the “kernel” of a larger system, C is usually the best and most efficient choice.

I, Engineer

I could hardly wait for my new Linux PC to arrive. When it finally did, I ripped the cardboard box open, connected everything, pressed the power button and … was utterly disappointed.

I didn’t want an off-the-shelf PC, partly to avoid the usual Microsoft tax (aka. pre-installed Windows Vista) but mostly because I wanted a quiet PC. All the components (including a passively cooled graphics card) were selected with this goal in mind. Still, my PC sounded like a freaking lawn mower.

One option would have been to send everything straight back, but this would have been rather cumbersome; the other to take care of this problem myself.

I used to be a big fan of “McGyver”, hero of the eponymous 1980s action series. “Mac” was a wonderful person: a good-looking, daredevil who avoids conflicts and doesn’t wear firearms; instead he always carries a Swiss Army Knife and duct tape. He knows how to defuse bombs, how to hotwire cars and is able to fix everything with everyday stuff like paper clips. In short, he is a great problem solver, a great hacker and a great role model.

McGyver would not have sent back the PC — he would have taken care of this problem himself. So I opened the case and found out that — even though I had a passively cooled graphics card — there are four fans in my case: a power supply fan, two case fans (one mounted on the front and a larger one mounted on the back) and a CPU fan.

It turned out that the manufacturer saved a couple of bucks by using really cheap fans, so I ordered ultra-silent replacement fans; yet for my taste the CPU fan was still too loud. I measured the current that ran through it and did a quick calculation to find out which resistor I needed to slow it down to 1000 rpm. Alas, I only had two resistors that sustained the amount of current flowing through my fan: one that was too big (which prevented the fan from starting up) and another one that was too small (the fan was still sounding like a lawn mower). I could have ordered the perfect resistor, but this would have meant waiting a couple of days and paying 10 EUR for shipping and handling. The right “hack” was of course to connect them in parallel, which yielded a resistance very close to the one I calculated. After a little bit of soldering I protected the solder joints with heat-shrink tubing and — voila! — I had a decently quiet PC!

Too many programmers I’ve met are not able to cope with everyday situations. Maybe they know how to optimize SQL queries, but they can’t fix a dripping tap. That’s rather unfortunate as this means that such folks are forever dependent on others. On the other hand, I’ve often observed that principles from other fields can be applied to software development as well, for instance, to build better metaphors. Such metaphors play a major role in getting a deeper understanding of software development, which is very useful for explaining software issues to non-technical people. (As an example, I personally like comparing refactoring to gardening: if you don’t constantly take care of your garden by weeding, fertilizing, watering, mowing, it will require a huge investment in time and money later.)

So step out of the computer-nerd zone and rather be a “jack-of-all-trades”; try to be a true engineer, a person who is able to solve all kinds of technical problems with technical/scientific knowledge and creativity — for your own benefit as well as the benefit of the people around you, but also for fun and profit.

[update 2009-10-29: Alert reader Jörg (obviously a true engineer) discovered an embarrassing mistake: if you connect resistors in parallel, the resulting resistor value is of course smaller than the smallest resistor value of any resistor connected, which means that part of my story just doesn’t make sense. Damn! I checked whether the resistors in my PC are really connected in parallel — which they are. I tried hard to recall what the real story was but ultimately gave up. The hack works — that’s what counts in the end, doesn’t it? ;-)
— end update]

Approxion

Code – People – Everything