There’s a hole in my dataset, dear MFC, dear MFC…
There’s a block of local store that holds a tile of pixels that need to be DMAd out to main memory. 128 lines makes performing many smaller transfers more complex , so we’ll try DMA lists why not. The problem here is that the pixel data for each line does not match up directly with the next — between each is stored the extra lines of extra-tile information that was needed for the calculation. When using DMA lists to transfer data to many memory, the source for each DMA transfer starts where the previous one ended. This means that between each DMA there’s (148-128)×4=80 bytes of information that we don’t want to see on the screen.
There’s a lot of things that are “best-practice” for SPU DMA, particularly relating to size and alignment of transfers, and I’m typically a religious adherent. In this case, I did not comply with best practice and still met my time budget and only feel a small amount of shame :P
Overall, less time is spent performing DMA than is spent calculating pixels, so further optimising DMA for this particular program is unnecessary.
When transferring tiles from the SPU to the framebuffer, there’s three cases of particular interest:
- Tiles on the right edge of the screen
- Tiles on the lower edge of the screen
- The Other Tiles
3. The Other Tiles
These are the easy ones. They will never need to be trimmed to fit the screen edges, and if drawn in the right order have the wonderful characteristic that the extra data needed can be DMAd to the framebuffer and will be overwritten by pixel data from a later tile. There’s no special case needed, just draw each pixel line and — all data between pixel lines — to the screen.
(For the diagrams, the amount of overdraw is far greater than the actual tile part — this is a consequence of the small size. It’s a much smaller percentage for 128 pixel tiles. I’ll post some actual screengrabs here sometime…)
1. Tiles on the right edge of the screen
Overdraw isn’t the answer in this case. It is not possible to overdraw on either the left or right of the rightmost tile in a way that will be correct when the screen is finished. Instead, the extra information (including any portion that may not fit onto the visible screen) must be dealt with some other way.
My solution, ugly as it is, is to write each surplus portion of a tile to a scratch location in memory — every one of them to the same location. It works :|
2. Tiles on the lower edge of the screen
These tiles are really just like the others, except they’ll stop a few lines short. They’re still fully calculated, but only the visible lines are transferred.
(In hindsight, increasing the spacing between lines would help reduce the alignment and size problem here. Adding an extra 48 bytes to each tile would allow every transfer to be optimally aligned and sized. And would probably make no measurable difference to the total runtime. Heck, there’s probably enough time to repack the data in local store before performing the DMA. There’s not that much…)
Awesome! :)