Last year I did some experiments with the Theora video decoder on a Texas Instruments DM642 DSP. A royalty free video decoder is very attractive for embedded devices, but after some major restructuring for performance, some problems remained.
The main problem is that, unlike MPEG video, Theora video is not packed in the bitstream in the raster order that it is displayed on screen, but instead in Hilbert curve order. This is not a problem in itself, but Theora’s DC prediction and post-processing loop filter are both defined in raster order. The need to go over the data once in Hilbert curve order and once in raster order leaves Theora decode requiring higher memory bandwidth than MPEG decode.
The encoder faces a similar problem. Andrey N. Filippov describes an FPGA implementation of the Theora Encoder, and comments on the high memory bandwidth required. The solution in the article is to implement a custom SDRAM controller with knowledge of the Theora data structures, an option not available on a DSP.
There are other minor problems remaining. The DM642 has instructions to assist video encoding and decoding, but these are optimised for MPEG and may not easily apply to Theora. For example, the avg2 instruction averages two pairs of 16-bit values, but it uses the formula (x + y + 1) >> 1, whereas Theora’s half-pixel predictor uses the formula (x + y) >> 1.
Where does this leave Theora decode on DSP? The DM642 is just capable of decoding NTSC quality video (640×480, 30fps) provided that the bitrate is controlled. The good news is that the newer DaVinci architecture provides extra memory bandwidth through a DDR2 memory controller, plus the possibility of splitting the workload to place bitstream decode on the ARM processor and frame reconstruction on the DSP.