“premature optimization”
by jonathan on Nov.25, 2009, under general, history, programming
Here is an except from Donald Knuth’s 1974 paper “Structured Programming with go to Statements”. There is a well known quote at the end of the second paragraph :
The improvement in speed from Example 2 to Example 2a is only about 12%, and many people would pronounce that insignificant. The conventional wisdom shared by many of today’s software engineers calls for ignoring efficiency in the small; but I believe this is simply an overreaction to the abuses they see being practiced by penny-wise-and-pound-foolish programmers, who can’t debug or maintain their “optimized” programs. In established engineering disciplines a 12% improvement, easily obtained, is never considered marginal; and I believe the same viewpoint should prevail in software engineering. Of course I wouldn’t bother making such optimizations on a one-shot job, but when it’s a question of preparing quality programs, I don’t want to restrict myself to tools that deny me such efficiencies.
There is no doubt that the grail of efficiency leads to abuse. Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.
Yet we should no pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified. It is often a mistake to make a priori judgments about what parts of a program are really critical, since the universal experience of programmers who have been using measurement tools has been that their intuitive guesses fail. After working with such tools for seven years, I’ve become convinced that all compilers written from now on should be designed to provide all programmers with feedback indicating what parts of their programs are costing the most; indeed, this feedback should be supplied automatically unless it has been specifically turned off.
The quote to which I am referring – that “premature optimization is the root of all evil” – is, from time to time, used fait accompli to assert that any time spent optimising is time wasted – something that is clearly not the case when considering the quote in context.
There are a lot of other good ideas in Knuth’s article, many still quite relevant to software engineering. The full article (which can be obtained via citeseerx) is, primarily, a contribution to a discussion of the time on the use of go to statements – another topic often reduced to a simple, extreme position, attributed to Edsger Dijkstra. Knuth’s paper makes it clear that Dijkstra did not hold so tightly to that position himself.
This is the sort of paper that I find to be a fascinating read, with both historical and technical value.
The final page of the article contains the following quote :
A disturbingly large percentage of the students ran into situations that require go to’s, and sure enough, it was often because while didn’t work well to their plan, but almost invariably because their plan was poorly thought out.
Which brought to mind another quote that I had seen recently, this one from George Orwell :
A man may take to drink because he feels himself to be a failure, and then fail all the more completely because he drinks. It is rather the same thing that is happening to the English language. It becomes ugly and inaccurate because our thoughts are foolish, but the slovenliness of our language makes it easier for us to have foolish thoughts.
It seems quite valid to say that there is a connection between our language and our expression (in terms of ideas, spoken word, programs, and more).
When trying to find a source for Orwell’s words (which you can find here), I was heartened to find these sentences follow those quoted above :
The point is that the process is reversible. Modern English, especially written English, is full of bad habits which spread by imitation and which can be avoided if one is willing to take the necessary trouble.
Which I believe is also applicable to the expression and optimisation of computer programs.
Some Linux on PS3 updates
by jonathan on Nov.22, 2009, under general, ps3
I’ve updated the Linux distros page with links to articles about installing Ubuntu 9.10 [Joep Cremers Weblog] and Fedora 12 [Geoff Levand, Ken Werner] on the Playstation 3.
In mosty unrelated news,
- IBM will not be releasing a previously planned successor to the PowerXCell 8i [heise online, Playstation University]
(neither article conveys particularly clear information to me – not least because I have only a machine translation of the first one) - Playstation 3 firmware version 3.10 seems to have added ABC iView and BBC iPlayer support to the XMB (the presence of which *may* be related to your PSN region, geographical restrictions to accessing video content still seem to apply). If you can’t see it in the TV column of your XMB, try rebooting…
PS3, Fedora 11 and OpenCL
by jonathan on Nov.03, 2009, under general, ps3
Ken Werner has written a nice guide to installing Fedora 11 on the PS3. (And I’ve added the link to the Linux distro page).
Inside Ken’s page is some details on installing and running demos from the recent “OpenCL Development Kit for Linux on Power”, which is something I’m now even more keen to try :)
(It seems a little odd that the only place that OpenCL SPE acceleration is mentioned is in one of the FAQs…)
20091025 photographs
by jonathan on Oct.25, 2009, under general, imagery
Over my back fence is a little patch of greenery that I’m quite certain doesn’t look as consistently green as this any more.
The were some heavy winds across the state a couple of weeks back – we experienced a small amount of superficial damage to the house, but the trees down the hill (leading down to Fern Glade) suffered a great deal. There are a lot of broken branches, some trees snapped in half, while some of the largest have been tipped over.
Last week I wandered down the hill with the rest of the family to take a closer look at the extent of what had happened. Unfortunately, I don’t have any ‘before’ shots to give a better context for the photos, but in part it feels like a sizable section has been clear-felled by the wind.
(Photos were taken on 20091017)
Click images for larger version.

All photos are licensed under a Creative Commons Attribution-Noncommercial-Share Alike 2.5 Australia License.
sixaxisd and N900? Perhaps…
by jonathan on Oct.21, 2009, under general
My previous post was mentioned over at maemo.org, with some speculation as to whether sixaxisd might work on the Nokia N900.
My initial thought was that with a small modification to sixpair, it would be possible to feed it an arbitrary BD address, so I started poking around in the source – and found that the feature is already there :D
You need the BD address of the device that you want to pair with the sixaxis – get this with hciconfig – run it (on the N900 – or whatever system) and you’ll hopefully see something like this (I don’t have an N900) :
# hciconfig
hci0: Type: USB
BD Address: 00:1A:80:2C:BE:D0 ACL MTU: 384:8 SCO MTU: 64:8
DOWN
RX bytes:87170 acl:1503 sco:0 events:62 errors:0
TX bytes:574 acl:22 sco:0 commands:21 errors:0
Plug your sixaxis into some other machine via USB, grab (and compile) sixpair.c and run it with the BD address
wget http://www.pabr.org/sixlinux/sixpair.c gcc sixpair.c -lusb -o sixpair ./sixpair 00:1A:80:2C:BF:D3 # use your own BD addr here
And that should be about it – unplug the sixaxis, continue with the instructions from the previous post and when you press the PS button, everything should work as if by magic.
If it does not, get more magic. Either way, leave a comment and let me know how it goes :)
A summary of Linux distros for the PS3
by jonathan on Oct.21, 2009, under general, ps3
I’ve been asked about selecting a Linux distro for use on the Playstation 3 a few times recently, so I’ve put together a page summarising some of the options which has mentions of pdaXrom, Debian, Ubuntu, Yellow Dog, Fedora, RHEL and Gentoo.
It’s all based on my own knowledge and experience – if there’s something you think is worth adding (or correcting, or improving) let me know.
[Update 20091021: Expanded the entry for Gentoo based on suggestions from unsolo & cheriff, and added a link to Windows-Hosted Cell SDK which I saw mentioned by @domipheus]
sixaxis as joystick and mouse over bluetooth
by jonathan on Oct.21, 2009, under general, ps3
sixaxisd is a little daemon that will translate sixaxis input, sent via bluetooth, into joystick and (optionally) mouse input. I found it as part of the pdaXrom-ng distro, available here.
To set it up, grab the source here, unpack, apply the patch from here (which fixes a couple of axis mappings) and compile using make.
You need a kernel with uinput support (Device drivers -> Input device support -> Miscellaneous devices -> User level driver support – or CONFIG_INPUT_UINPUT) and appropriate bluetooth support (I use ps3_defconfig’s defaults in this area), although you don’t need any particular system bluetooth services running – we set that up ourselves.
To configure the bluetooth device using hciconfig (which is part of Debian’s bluez package), run the following commands -
hciconfig hci0 up # bring up the interface hciconfig hci0 lm master # set link mode to master hciconfig hci0 piscan # enable page and inquiry scan
And then start the daemon -
# optional -mouse param provides mouse emulation ./sixaxisd -mouse
Hit the PS button on the controller to bring it to life – all going well, there device nodes /dev/input/js0 and /dev/input/mouse0 will be created.
There’s an init script to handle the hci configuration and all of this that may be found in the pdaXrom svn repo.
(I use my sixaxis with my PS3 – if you want to use it with a different system, you’ll need to use sixpair, available here)
And now I know – adventures in double precision
by jonathan on Sep.21, 2009, under general, spu
Refining the buddhabrot renderer, I’ve added vectorisation to iterate two points at once, which gives (at least) twice the performance. Huzzah.
To begin with, I lifted code from one of the later revisions of Jeremy’s Mandelbrot renderer. This was written for single precision float, whereas I’ve been working in double precision for this buddhabrot code. Worth noting on the change from single to double precision -
- Double precision numbers behave differently to single precision on the SPU (see section 9 of the SPU ISA doc) – I was bitten by infs and NaNs.
- When browsing that document, I missed the large “Optional v1.2″ for instructions like dfcgt. To be clear, the Cell BE SPU does not support this instruction.
- GCC does include vec_ullong2 spu_cmpgt(vec_double2, vec_double2), but in the absence of dfcgt it takes forty extra instructions to achieve the same result (yeah, that’s what I get for using general intrinsics)
When starting to use double precision, I was expecting much lower performance than single precision on the SPU, but I had not fully understood how much lower – from the Programming Handbook, page 71:
Although double-precision instructions have 13-clock-cycle latencies, on the Cell/B.E. processor, only the final seven cycles are pipelined. No other instructions are dual-issued with double-precision instructions, and no instructions of any kind are issued for six cycles after a double-precision instruction is issued.
Ouch. I knew this, but I didn’t know it – a run of spu_timing on the generated assembly really rammed it home.
0 0123456789012 dfs $75,$45,$44 0 ------7890123456789 dfma $46,$59,$47 0 ------4567890123456 dfa $43,$45,$44 0 ------1234567890123 dfa $42,$80,$75 0 ------8901234567890 dfm $32,$46,$46 0 ------5678901234567 frds $40,$43 0 01234 ------23456789 dfm $33,$42,$42 0 012345678901 ------9 dfm $36,$42,$81
(Oh, and I’ve noticed again that dfma and friends use RT as an operand, which presumably makes register scheduling even more fun. The above fragment is from a heavily unrolled inner loop.)
At some point, I’ll try to measure the practical difference between double and single precision for this program, to see what (if anything) would be lost by switching over to single precision. Or perhaps there’s some other way around the problem – I’ve been considering fixed point or even multi-single precision fp alternatives.
Three buddhabrot
by jonathan on Sep.15, 2009, under general, imagery
I’ve been experimenting with buddhabrot colouring tonight (actually, I think these are nebulabrot, although the colour composition isn’t as nice as I’d like).
Colouring is based on three passes with different parameters, with each hit on a pixel incrementing the colour channel (with saturation).
Click each one for a 1080p version.
Blue: 312-5,000 Green: 625-10,000 Red: 1250-20,000
Blue: 19-5,000 Green: 39-10,000 Red: 78-20,000
CellBE Buddhabrot renderer
by jonathan on Sep.10, 2009, under general, ps3, spu
For my next TUCS tech talk I’ll be continuing on from the Mandelbrot rendering in the last one (which can be seen here) to something a little more complex.
The Buddhabrot is conceptually not any more complex than the Mandelbrot in terms of its generation – rather than colouring points based on the number of iterations before they ‘escape’, we apply colour to each point reached while iterating escaping starting points. This has consequences for the drawing of the Buddhabrot – rather than generating one point at a time independently of all other points in the output, iterating a single input point may effect thousands of different output points. This makes it all trickier when implementing this on the Cell BE – parallel writes by SPEs to shared locations will need some form of synchronisation. That could be messy, and the process of load/modify/store when expressed in terms of SPU DMA can be quite clumsy.
Rather than try to implement a complex locking/synchronisation system, I have tried to apply some ideas from a set of post-it notes by Mike Acton (you can see them here). This isn’t identical to Mike’s solution, because it’s not the same problem.
To explain – each SPE thread iterates various points on the screen, and generates a list of points to be written. This list of points is sent via DMA to a buffer for the SPE’s use the PPE, which proceeds through the list plotting the points to the framebuffer. The advantage of this approach is that there is only one writer to the framebuffer (the PPE), and that each SPE has it’s own buffers to write its data into. The only synchronisation that is necessary is between each SPE and the PPE to ensure that all data in a buffer is consumed before writing more into it. This is achieved through the use of interrupt mailboxes (SPE tells PPE that there is data), a fenced DMA to act as sentinel (the PPE spins on the arrival of the sentinel data to ensure that DMA of a buffer has completed – this doesn’t feel like the right way to solve this particular problem, though), and the SPE signal register in OR mode to inform the SPE that a particular buffer has been finished with. Interrupt mailbox events are aggregated through libspe2’s spe_event_*() functions.
It’s not an especially complex piece of code – the motivation in its writing is for my own interest and to use for the tech talk. I think it will do nicely for explaining some of the complexities and curiosities of the Cell BE architecture, and the programming of it with the IBM SDK.
There are a few extra features that I’d like to add – particularly better colouring (including saturation which is unfortunately apparent in its absence), and a number of optimisations to the render_fractal() function that I need to lift from my earlier Mandelbrot efforts.
The program includes code by Jeremy Kerr (See hackfest items at http://ozlabs.org/~jk/diary/tech/cell/) and Mike Acton (framebuffer utilities, from http://cellperformance.beyond3d.com/articles/2007/03/handy-ps3-linux-framebuffer-utilities.html). My thanks to Jeremy and Mike, and to all those that have offered comments & feedback via twitter.
[edit: Oh, and it includes cheriff's fine VNC code ;)]
Read the code: fractal.c and spe-fractal.c, or grab a tarball. Comments & suggestions most welcome.
Addition: I added pixel value saturation and experimented with some alternative approaches to colouring…












