 Sometimes, it is necessary to find out how an application uses dynamically memory. Especially on constrained (think embedded) systems you should know, for instance, what block sizes are typically requested, how much memory is allocated at any one time, and how much memory has been requested at most so far (allocation high-water mark). Data like this allows you to fine-tune your memory allocator and/or roll out your own highly-efficient, special-purpose allocator for popular block sizes.
Sometimes, it is necessary to find out how an application uses dynamically memory. Especially on constrained (think embedded) systems you should know, for instance, what block sizes are typically requested, how much memory is allocated at any one time, and how much memory has been requested at most so far (allocation high-water mark). Data like this allows you to fine-tune your memory allocator and/or roll out your own highly-efficient, special-purpose allocator for popular block sizes.
I recently implemented such a statistics-enhanced version of operator new/delete and one of its tasks was to find out how frequently blocks of certain sizes were requested. Instead of keeping track of exact request sizes (which would have been prohibitively expensive on the target system) I came up with a simpler, but sufficient scheme where I counted allocations to block sizes, rounded to the next base-2 boundary:
| 1 2 3 4 5 6 7 | void* operator new(std::size_t n) {     ...     ++g_allocations[log2ceil(n)];     ... } | 
log2ceil is the ‘ceiled’ (rounded-up) binary logarithm of a given value. As an example, log2ceil(100) and log2ceil(128) both yield the same value of 7. Hence, the value g_allocations[7] would tell me how many allocations of block sizes in the range of 65 – 128 bytes there have been.
Below is the code of a log2ceil function; it works by counting the number of leading (left-most) zero bits in a word and subtracting this value from 31. In order to get the desired rounding behavior, a well-known trick is applied: first, decrement the argument by one and then increment the result by one.
This implementation is straightforward and all the tests pass. Still, it has a subtle portability issue. Can you see it?