« Posts under Uncategorized

Bug Hunting Adventures #2: Monitoring Temperature Sensors

Imagine a distributed control system that relies on correct and timely temperature measurements. Various temperature controllers distribute their temperature readings periodically over a bus for other controllers to consume.

To ensure that the temperature sensors work as expected, a ‘TemperatureMonitor’ was implemented. It listens for messages from the temperature controllers and checks whether they arrive in time (at the very latest every 20 ms) and whether their values are within the valid range.

This checking of temperature messages upon arrival obviously doesn’t catch cases where temperature controllers don’t send messages at all (or with a delay that would cause the timer to overflow multiple times) so there is an additional cyclic task that caters for situations like these. This cyclic task is executed every 100 ms and it additionally takes care of the so-called ‘idle period’.

During the ‘idle period’, which is 500 ms, ‘TemperatureMonitor’ is lenient and doesn’t report problems (if any) to allow for an undisturbed start-up of the whole system.

Once the idle period is over and ‘TemperatureMonitor’ detects abnormal conditions it notifies the global ‘ErrorManager’, which will decide how to handle problems based on its current error handling policy (eg. just log the error, reset or disable a temperature controller).

Since the cyclic task and the message handler (the one that is invoked upon the reception of a temperature sensor message) may run in parallel, access to shared data is protected by a simple (but sufficient) synchronization scheme based on enabling/disabling interrupts.

Time measurement is implemented based on a wrap-around 32-bit tick counter that counts raw clock cycles. These clock cycles are converted to a more convenient unit (ie. milliseconds).