Running Linux on a PCI Add-in Card: Hardware

Every so often I see someone attempting to run the Linux kernel on a PCI add-in card. I’ve done this myself, but there are a lot of complications. This article covers the hardware, and a second article will cover software. Don’t take this as chipset selection advice: before you commit to hardware double-check both the errata and the availability of the silicon.

Read more

SSL Handshake Overhead for Mobile Devices

If you’re designing an application where devices communicate with a server over a mobile network, there are trade-offs between implementation effort and data transfer. This may not apply to a consumer application, where the application developer doesn’t have to pay the data charges. But if the application is M2M these trade-offs matter.

Read more

In the Year 2038

I have now seen my first ever year 2038 bug. An embedded Linux system that was installed two years ago became unable to acquire a network address by DHCP. The machine did not require an accurate clock, and nobody had initialised its battery-backed real-time clock. Once installed, it had started counting forward from 1st January 1900.

32 bit Unix time covers a range from December 13th 1901 to January 19th 2038. As the real-time clock value was outside this range, Linux wrapped the time round to the year 2036. After the machine had been running for nearly two years, it passed through the 2038 rollover and jumped back to 1901.

This would have been harmless in itself if all the applications on the machine used a monotonic clock, such as the uptime counter returned from the sysinfo function. But the machine in question used an older version of Busybox, and the udhcpc DHCP client in that release failed when faced with a time in the negative number range before 1st January 1970.

The moral of the story? Even though a machine doesn’t need a real-time clock function, it may not be immune to clock related bugs.

Vorbis on DM642

Theora video on the DM642 may not be entirely successful, but Vorbis audio is a different story. I’ve been experimenting with the Tremor integer-only implementation of Vorbis decoding.

Tremor offers two modes of operation. Normal mode, and low precision mode. Normal mode requires 64-bit intermediate results in arithmetic operations, whereas low precision mode only requires 32-bit intermediates. Testing both modes against the standard Linux command line vorbis decoder, oggdec, reveals that the normal mode has a RMS error of 0.71 bits, whereas the low precision mode has RMS error of 58 bits. (I performed the test using Lepidoptera by Epoq from as the sample track, decoding to 16 bit, 44.1kHz stereo.) The result for low precision mode is consistent with user complaints of audible distortion.

The good news for Vorbis on DM642 is that using 48 bit intermediate results produces results very close to the normal mode, with RMS error of 1.0 bits. The mpylir instruction of the DM642 multiplies a 16 bit by a 32 bit quantity, and shifts the result to fit within 32 bits. This allows a decoder with quality almost indistinguishable from normal Vorbis output, but performance as fast as Tremor’s low quality mode.

Embedded Theora Video

Last year I did some experiments with the Theora video decoder on a Texas Instruments DM642 DSP. A royalty free video decoder is very attractive for embedded devices, but after some major restructuring for performance, some problems remained.

The main problem is that, unlike MPEG video, Theora video is not packed in the bitstream in the raster order that it is displayed on screen, but instead in Hilbert curve order. This is not a problem in itself, but Theora’s DC prediction and post-processing loop filter are both defined in raster order. The need to go over the data once in Hilbert curve order and once in raster order leaves Theora decode requiring higher memory bandwidth than MPEG decode.

The encoder faces a similar problem. Andrey N. Filippov describes an FPGA implementation of the Theora Encoder, and comments on the high memory bandwidth required. The solution in the article is to implement a custom SDRAM controller with knowledge of the Theora data structures, an option not available on a DSP.

There are other minor problems remaining. The DM642 has instructions to assist video encoding and decoding, but these are optimised for MPEG and may not easily apply to Theora. For example, the avg2 instruction averages two pairs of 16-bit values, but it uses the formula (x + y + 1) >> 1, whereas Theora’s half-pixel predictor uses the formula (x + y) >> 1.

Where does this leave Theora decode on DSP? The DM642 is just capable of decoding NTSC quality video (640×480, 30fps) provided that the bitrate is controlled. The good news is that the newer DaVinci architecture provides extra memory bandwidth through a DDR2 memory controller, plus the possibility of splitting the workload to place bitstream decode on the ARM processor and frame reconstruction on the DSP.