<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.3.3" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>
<channel>
	<title>Comments on: Embedded Theora Video</title>
	<link>http://www.humboldt.co.uk/2006/02/embedded-theora-video.html</link>
	<description>Software Development and Consulting</description>
	<pubDate>Thu, 24 Jul 2008 04:28:57 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.3.3</generator>
		<item>
		<title>By: Timothy B. Terriberry</title>
		<link>http://www.humboldt.co.uk/2006/02/embedded-theora-video.html#comment-5</link>
		<dc:creator>Timothy B. Terriberry</dc:creator>
		<pubDate>Fri, 01 Dec 2006 18:09:00 +0000</pubDate>
		<guid>http://www.humboldt.co.uk/2006/02/embedded-theora-video.html#comment-5</guid>
		<description>The real problem, as I saw it, was not the raster/Hilbert curve ordering (though this certainly introduces many complexities in the encoder; the decoder is not as bad), but the fact that coefficients are ordered by their frequency first, and the block they belong to second, meaning that if the 63rd coefficient in a block is non-zero, you need to decode virtually all the coefficients in the frame before you can decode that one block.&lt;br/&gt;&lt;br/&gt;This structure actually helps avoid the raster order problem for DC prediction, because you can undo the DC prediction after decoding just the DC coefficients, i.e., on 1/64th of the final image data, which often fits entirely in cache (at 640x480 it's less than 16k of data).&lt;br/&gt;&lt;br/&gt;The loop filter problem can similarly be handled by applying it (with a one block row delay) after each super block row is decoded. This requires slightly more cache (just under 64k of image data at 640x480, plus additional overhead for the coefficients that were decoded), but should still fit well within the 256K L2 cache on your DSP.&lt;br/&gt;&lt;br/&gt;The &lt;a HREF="http://svn.xiph.org/trunk/theora-exp/" REL="nofollow"&gt;theora-exp&lt;/a&gt; decoder does both these things, to good effect. The spec describes everything as separate processes for conceptual simplicity, but to get good preformance they really need to be pipelined.&lt;br/&gt;&lt;br/&gt;For the loop filter at larger resolutions, you could (if you're careful about it), process 3/4 of each superblock (with a one block column delay) immediately after decoding it, without waiting for the entire row to finish. Then you only need to cache one row of block data between superblock rows.</description>
		<content:encoded><![CDATA[<p>The real problem, as I saw it, was not the raster/Hilbert curve ordering (though this certainly introduces many complexities in the encoder; the decoder is not as bad), but the fact that coefficients are ordered by their frequency first, and the block they belong to second, meaning that if the 63rd coefficient in a block is non-zero, you need to decode virtually all the coefficients in the frame before you can decode that one block.</p>
<p>This structure actually helps avoid the raster order problem for DC prediction, because you can undo the DC prediction after decoding just the DC coefficients, i.e., on 1/64th of the final image data, which often fits entirely in cache (at 640&#215;480 it&#8217;s less than 16k of data).</p>
<p>The loop filter problem can similarly be handled by applying it (with a one block row delay) after each super block row is decoded. This requires slightly more cache (just under 64k of image data at 640&#215;480, plus additional overhead for the coefficients that were decoded), but should still fit well within the 256K L2 cache on your DSP.</p>
<p>The <a HREF="http://svn.xiph.org/trunk/theora-exp/" REL="nofollow">theora-exp</a> decoder does both these things, to good effect. The spec describes everything as separate processes for conceptual simplicity, but to get good preformance they really need to be pipelined.</p>
<p>For the loop filter at larger resolutions, you could (if you&#8217;re careful about it), process 3/4 of each superblock (with a one block column delay) immediately after decoding it, without waiting for the entire row to finish. Then you only need to cache one row of block data between superblock rows.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Adrian Cox</title>
		<link>http://www.humboldt.co.uk/2006/02/embedded-theora-video.html#comment-3</link>
		<dc:creator>Adrian Cox</dc:creator>
		<pubDate>Wed, 15 Feb 2006 21:29:00 +0000</pubDate>
		<guid>http://www.humboldt.co.uk/2006/02/embedded-theora-video.html#comment-3</guid>
		<description>Interesting question, which I'd like to look into later. Dirac has advanced to a stage where someone could do the experiment, but optimisation for DSP often involves different tactics to optimisation on a desktop platform.</description>
		<content:encoded><![CDATA[<p>Interesting question, which I&#8217;d like to look into later. Dirac has advanced to a stage where someone could do the experiment, but optimisation for DSP often involves different tactics to optimisation on a desktop platform.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Stefan de Konink</title>
		<link>http://www.humboldt.co.uk/2006/02/embedded-theora-video.html#comment-2</link>
		<dc:creator>Stefan de Konink</dc:creator>
		<pubDate>Sun, 12 Feb 2006 01:59:00 +0000</pubDate>
		<guid>http://www.humboldt.co.uk/2006/02/embedded-theora-video.html#comment-2</guid>
		<description>Makes me wondering how much effort it would take to put an embedded version of &lt;a HREF="http://dirac.sf.net/" REL="nofollow"&gt;Dirac&lt;/a&gt; inside. After all, Theora is Open and Free, but stays VP3. So from a codec perspective it is an old codec based on older technology.&lt;br/&gt;&lt;br/&gt;While the &lt;a HREF="http://schrodinger.sourceforge.net/" REL="nofollow"&gt;new Dirac project&lt;/a&gt; is working on performance on x86, makes you wonder how well it could work on a DSP.</description>
		<content:encoded><![CDATA[<p>Makes me wondering how much effort it would take to put an embedded version of <a HREF="http://dirac.sf.net/" REL="nofollow">Dirac</a> inside. After all, Theora is Open and Free, but stays VP3. So from a codec perspective it is an old codec based on older technology.</p>
<p>While the <a HREF="http://schrodinger.sourceforge.net/" REL="nofollow">new Dirac project</a> is working on performance on x86, makes you wonder how well it could work on a DSP.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
