A summary of Linux distros for the PS3

I’ve been asked about selecting a Linux distro for use on the Playstation 3 a few times recently, so I’ve put together a page summarising some of the options which has mentions of pdaXrom, Debian, Ubuntu, Yellow Dog, Fedora, RHEL and Gentoo.

It’s all based on my own knowledge and experience – if there’s something you think is worth adding (or correcting, or improving) let me know.

[Update 20091021: Expanded the entry for Gentoo based on suggestions from unsolo & cheriff, and added a link to Windows-Hosted Cell SDK which I saw mentioned by @domipheus]

sixaxis as joystick and mouse over bluetooth

sixaxisd is a little daemon that will translate sixaxis input, sent via bluetooth, into joystick and (optionally) mouse input.  I found it as part of the pdaXrom-ng distro, available here.

To set it up, grab the source here, unpack, apply the patch from here (which fixes a couple of axis mappings) and compile using make.

You need a kernel with uinput support (Device drivers -> Input device support -> Miscellaneous devices -> User level driver support – or CONFIG_INPUT_UINPUT) and appropriate bluetooth support (I use ps3_defconfig’s defaults in this area), although you don’t need any particular system bluetooth services running – we set that up ourselves.

To configure the bluetooth device using hciconfig (which is part of Debian’s bluez package), run the following commands –

hciconfig hci0 up        # bring up the interface
hciconfig hci0 lm master # set link mode to master
hciconfig hci0 piscan    # enable page and inquiry scan

And then start the daemon –

# optional -mouse param provides mouse emulation
./sixaxisd -mouse

Hit the PS button on the controller to bring it to life – all going well, there device nodes /dev/input/js0 and /dev/input/mouse0 will be created.

There’s an init script to handle the hci configuration and  all of this that may be found in the pdaXrom svn repo.

(I use my sixaxis with my PS3 – if you want to use it with a different system, you’ll need to use sixpair, available here)

And now I know – adventures in double precision

Refining the buddhabrot renderer, I’ve added vectorisation to iterate two points at once, which gives (at least) twice the performance. Huzzah.

To begin with, I lifted code from one of the later revisions of Jeremy’s Mandelbrot renderer. This was written for single precision float, whereas I’ve been working in double precision for this buddhabrot code.  Worth noting on the change from single to double precision –

  • Double precision numbers behave differently to single precision on the SPU (see section 9 of the SPU ISA doc) – I was bitten by infs and NaNs.
  • When browsing that document, I missed the large “Optional v1.2” for instructions like dfcgt. To be clear, the Cell BE SPU does not support this instruction.
  • GCC does include vec_ullong2 spu_cmpgt(vec_double2, vec_double2), but in the absence of dfcgt it takes forty extra instructions to achieve the same result (yeah, that’s what I get for using general intrinsics)

When starting to use double precision, I was expecting much lower performance than single precision on the SPU, but I had not fully understood how much lower – from the Programming Handbook, page 71:

Although double-precision instructions have 13-clock-cycle latencies, on the Cell/B.E. processor, only the final seven cycles are pipelined. No other instructions are dual-issued with double-precision instructions, and no instructions of any kind are issued for six cycles after a double-precision instruction is issued.

Ouch.  I knew this, but I didn’t know it – a run of spu_timing on the generated assembly really rammed it home.

0  0123456789012                                      dfs  $75,$45,$44
0   ------7890123456789                               dfma $46,$59,$47
0          ------4567890123456                        dfa  $43,$45,$44
0                 ------1234567890123                 dfa  $42,$80,$75
0                        ------8901234567890          dfm  $32,$46,$46
0                               ------5678901234567   frds $40,$43
0  01234                               ------23456789 dfm  $33,$42,$42
0  012345678901                               ------9 dfm  $36,$42,$81

(Oh, and I’ve noticed again that dfma and friends use RT as an operand, which presumably makes register scheduling even more fun. The above fragment is from a heavily unrolled inner loop.)

At some point, I’ll try to measure the practical difference between double and single precision for this program, to see what (if anything) would be lost by switching over to single precision. Or perhaps there’s some other way around the problem – I’ve been considering fixed point or even multi-single precision fp alternatives.

Three buddhabrot

I’ve been experimenting with buddhabrot colouring tonight (actually, I think these are nebulabrot, although the colour composition isn’t as nice as I’d like).

Colouring is based on three passes with different parameters, with each hit on a pixel incrementing the colour channel (with saturation).

Click each one for a 1080p version.

Blue: 312-5,000  Green: 625-10,000  Red: 1250-20,000

Blue: 19-5,000  Green: 39-10,000  Red: 78-20,000

Blue: 10-5,000  Green: 5,000-10,000  Red: 10,000-15,000

CellBE Buddhabrot renderer

For my next TUCS tech talk I’ll be continuing on from the Mandelbrot rendering in the last one (which can be seen here) to something a little more complex.

15c2

The Buddhabrot is conceptually not any more complex than the Mandelbrot in terms of its generation – rather than colouring points based on the number of iterations before they ‘escape’, we apply colour to each point reached while iterating escaping starting points.  This has consequences for the drawing of the Buddhabrot – rather than generating one point at a time independently of all other points in the output, iterating a single input point may effect thousands of different output points.  This makes it all trickier when implementing this on the Cell BE – parallel writes by SPEs to shared locations will need some form of synchronisation.  That could be messy, and the process of load/modify/store when expressed in terms of SPU DMA can be quite clumsy.

Rather than try to implement a complex locking/synchronisation system, I have tried to apply some ideas from a set of post-it notes by Mike Acton (you can see them here).  This isn’t identical to Mike’s solution, because it’s not the same problem.

To explain – each SPE thread iterates various points on the screen, and generates a list of points to be written.  This list of points is sent via DMA to a buffer for the SPE’s use the PPE, which proceeds through the list plotting the points to the framebuffer. The advantage of this approach is that there is only one writer to the framebuffer (the PPE), and that each SPE has it’s own buffers to write its data into. The only synchronisation that is necessary is between each SPE and the PPE to ensure that all data in a buffer is consumed before writing more into it.  This is achieved through the use of interrupt mailboxes (SPE tells PPE that there is data), a fenced DMA to act as sentinel (the PPE spins on the arrival of the sentinel data to ensure that DMA of a buffer has completed – this doesn’t feel like the right way to solve this particular problem, though), and the SPE signal register in OR mode to inform the SPE that a particular buffer has been finished with.  Interrupt mailbox events are aggregated through libspe2’s spe_event_*() functions.

It’s not an especially complex piece of code – the motivation in its writing is for my own interest and to use for the tech talk. I think it will do nicely for explaining some of the complexities and curiosities of the Cell BE architecture, and the programming of it with the IBM SDK.

There are a few extra features that I’d like to add – particularly better colouring (including saturation which is unfortunately apparent in its absence), and a number of optimisations to the render_fractal() function that I need to lift from my earlier Mandelbrot efforts.

The program includes code by Jeremy Kerr (See hackfest items at http://ozlabs.org/~jk/diary/tech/cell/) and Mike Acton (framebuffer utilities, from http://cellperformance.beyond3d.com/articles/2007/03/handy-ps3-linux-framebuffer-utilities.html).  My thanks to Jeremy and Mike, and to all those that have offered comments & feedback via twitter.

[edit: Oh, and it includes cheriff’s fine VNC code ;)]

Read the code: fractal.c and spe-fractal.c, or grab a tarball.  Comments & suggestions most welcome.

Addition: I added pixel value saturation and experimented with some alternative approaches to colouring…

5cc(Click for larger version)

Some recent SPU toolchain patches

A couple of patches that I’ve noticed on various mailing lists –

libspe – Some small changes, but includes spe_image_open_library() to load an spe image from a ppe shared library.

binutils – “Also, if DLL’s were supported on SPU…”  Interesting idea – will have to wait to see what comes of it.

gcc – Support non-constants as the second argument of __builtin_expect.  Interesting idea.

(Also, Revital Eres’s function partitioning patch has had some activity, and there’s the odd patch from Alan Modra on overlays and software icache.)

20090627 photograph

dsc_6192-3cc by-nc-sa

A little water and compressed air can be a lot of fun.  Especially when photographing from a distance :)

I managed to get a lot of now-you-see-it-now-you-don’t photo sets through the course of the bottle-rocket session, so I was very happy to get this one where the bottle was still visible after launch.  A couple of launches later one of the bottles exploded, breaking the compressor nozzle and putting an end to the fun.

Taken 20090201 at dcypher with Andrew’s D70s.