hbr

spu_µopt[0] // shifting and arrays

by jonathan on May.07, 2010, under general, programming, spu

There are a couple of mostly-obvious SPU micro-optimisation tricks that I’ve learned. Here’s one.

qword a[128];
qword f(int i) {
    return a[i>>4];
}

Simple enough code. spu-gcc -O3 produces

f:                                                                                                                                                                                                                                                                                                                          
    ila     $2,a                                                                                                                                                                                                                                                                                                        
    andi    $3,$3,-16                                                                                                                                                                                                                                                                                                   
    lqx     $3,$2,$3                                                                                                                                                                                                                                                                                                    
    bi      $lr

And it’s worse if the shift is not 4 – here’s the result of using 5:

f:
    rotmai  $4,$3,-5
    ila     $2,a
    shli    $3,$4,4
    lqx     $3,$2,$3
    bi      $lr

Which is pretty ugly.

The compiler is doing what it has been asked to do, but doesn’t appear to know that SPU load and store instructions force the four rightmost bits of the calculated address to zero.  Which means the andi masking can be removed from the first case, and in the second the shift right and shift left instructions could be replaced with a single shift right – without changing the result.

How to avoid it?  I’ve used manual pointer arithmetic to get around this compiler feature, e.g.

return *(qword*)((uint)a+i)

Which is pretty ugly.

I don’t know what that sort of thing does for the compiler’s alias analysis. Probably nothing good (judicious use of restrict might help…).  si_lqx() would also work, but my  experience – without in-depth examination – is that measured performance gets worse when using load and store intrinsics. I’m not sure if it’s something to do with scheduling, aliasing or programmer competence.

When would this ever matter? I’ve run into this in some very hot lookup table access. Two or four less cycles can make a big difference. Sometimes. (If it really matters, rewriting the whole thing in assembly can work, too)

It’s a micro-optimisation – measure often, use with care.

Leave a Comment : more...

Building gdb for Cell/B.E.

by jonathan on Apr.05, 2010, under ps3, spu

Trying to debug a bus error on my PS3, I realised that the version of GDB I have installed doesn’t support debugging of SPU programs. There doesn’t seem to be a Debian packaged version available that does, so I built my own.

Because I found no obvious google result, I share this with the zero other people that I expect may one day be interested : the key option for configure appears to be

--enable-targets=spu

This information was brought to you via the gdb.spec file, and a post to the gcc-testresults mailing list.

4 Comments :, more...

20100319 photographs

by jonathan on Mar.18, 2010, under general, imagery

On Australia Day this year I climbed to the summit of Cradle Mountain with some friends.  Here are some of the photos that I took before the clouds rolled in and the rain came down.

It’s a great place to take a long walk  :)

Photos taken on 20100126. Click any image for a giant version.

“Wombat Pool”

The mountain. You could see the top at this stage in the day.

Creative Commons License
All photos are licensed under a Creative Commons Attribution-Noncommercial-Share Alike 2.5 Australia License.

Leave a Comment : more...

“premature optimization”

by jonathan on Nov.25, 2009, under general, history, programming

Here is an except from Donald Knuth’s 1974 paper “Structured Programming with go to Statements”.  There is a well known quote at the end of the second paragraph :

The improvement in speed from Example 2 to Example 2a is only about 12%, and many people would pronounce that insignificant.  The conventional wisdom shared by many of today’s software engineers calls for ignoring efficiency in the small; but I believe this is simply an overreaction to the abuses they see being practiced by penny-wise-and-pound-foolish programmers, who can’t debug or maintain their “optimized” programs. In established engineering disciplines a 12% improvement, easily obtained, is never considered marginal; and I believe the same viewpoint should prevail in software engineering. Of course I wouldn’t bother making such optimizations on a one-shot job, but when it’s a question of preparing quality programs, I don’t want to restrict myself to tools that deny me such efficiencies.

There is no doubt that the grail of efficiency leads to abuse. Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.

Yet we should no pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified. It is often a mistake to make a priori judgments about what parts of a program are really critical, since the universal experience of programmers who have been using measurement tools has been that their intuitive guesses fail. After working with such tools for seven years, I’ve become convinced that all compilers written from now on should be designed to provide all programmers with feedback indicating what parts of their programs are costing the most; indeed, this feedback should be supplied automatically unless it has been specifically turned off.

The quote to which I am referring – that “premature optimization is the root of all evil” – is, from time to time, used fait accompli to assert that any time spent optimising is time wasted – something that is clearly not the case when considering the quote in context.

There are a lot of other good ideas in Knuth’s article, many still quite relevant to software engineering. The full article (which can be obtained via citeseerx) is, primarily, a contribution to a  discussion of the time on the use of go to statements – another topic often reduced to a simple, extreme position, attributed to Edsger Dijkstra.  Knuth’s paper makes it clear that Dijkstra did not hold so tightly to that position himself.

This is the sort of paper that I find to be a fascinating read, with both historical and technical value.

The final page of the article contains the following quote :

A disturbingly large percentage of the students ran into situations that require go to’s, and sure enough, it was often because while didn’t work well to their plan, but almost invariably because their plan was poorly thought out.

Which brought to mind another quote that I had seen recently, this one from George Orwell :

A man may take to drink because he feels himself to be a failure, and then fail all the more completely because he drinks. It is rather the same thing that is happening to the English language. It becomes ugly and inaccurate because our thoughts are foolish, but the slovenliness of our language makes it easier for us to have foolish thoughts.

It seems quite valid to say that there is a connection between our language and our expression (in terms of ideas, spoken word, programs, and more).

When trying to find a source for Orwell’s words (which you can find here), I was heartened to find these sentences follow those quoted above :

The point is that the process is reversible.   Modern English, especially written English, is full of bad habits which spread by imitation and which can be avoided if one is willing to take the necessary trouble.

Which I believe is also applicable to the expression and optimisation of computer programs.

2 Comments :, , , more...

Some Linux on PS3 updates

by jonathan on Nov.22, 2009, under general, ps3

I’ve updated the Linux distros page with links to articles about installing Ubuntu 9.10 [Joep Cremers Weblog] and Fedora 12 [Geoff Levand, Ken Werner] on the Playstation 3.

In mosty unrelated news,

  • IBM will not be releasing a previously planned successor to the PowerXCell 8i [heise online, Playstation University]
    (neither article conveys particularly clear information to me – not least because I have only a machine translation of the first one)
  • Playstation 3 firmware version 3.10 seems to have added ABC iView and BBC iPlayer support to the XMB (the presence of which *may* be related to your PSN region, geographical restrictions to accessing video content still seem to apply). If you can’t see it in the TV column of your XMB, try rebooting…
Leave a Comment :, more...

PS3, Fedora 11 and OpenCL

by jonathan on Nov.03, 2009, under general, ps3

Ken Werner has written a nice guide to installing Fedora 11 on the PS3.  (And I’ve added the link to the Linux distro page).

Inside Ken’s page is some details on installing and running demos from the recent “OpenCL Development Kit for Linux on Power”, which is something I’m now even more keen to try :)

(It seems a little odd that the only place that OpenCL SPE acceleration is mentioned is in one of the FAQs…)

2 Comments :, more...

20091025 photographs

by jonathan on Oct.25, 2009, under general, imagery

Over my back fence is a little patch of greenery that I’m quite certain doesn’t look as consistently green as this any more.


View Larger Map

The were some heavy winds across the state a couple of weeks back – we experienced a small amount of superficial damage to the house, but the trees down the hill (leading down to Fern Glade) suffered a great deal. There are a lot of broken branches, some trees snapped in half, while some of the largest have been tipped over.

Last week I wandered down the hill with the rest of the family to take a closer look at the extent of what had happened. Unfortunately, I don’t have any ‘before’ shots to give a better context for the photos, but in part it feels like a sizable section has been clear-felled by the wind.

(Photos were taken on 20091017)

Click images for larger version.

CRW_34108
CRW_34122
CRW_34123
CRW_34130
CRW_34135
CRW_34143
CRW_34161
CRW_34162
CRW_34203
CRW_34172

Creative Commons License
All photos are licensed under a Creative Commons Attribution-Noncommercial-Share Alike 2.5 Australia License.

Leave a Comment : more...

sixaxisd and N900? Perhaps…

by jonathan on Oct.21, 2009, under general

My previous post was mentioned over at maemo.org, with some speculation as to whether sixaxisd might work on the Nokia N900.

My initial thought was that with a small modification to sixpair, it would be possible to feed it an arbitrary BD address, so I started poking around in the source – and found that the feature is already there :D

You need the BD address of the device that you want to pair with the sixaxis – get this with hciconfig – run it (on the N900 – or whatever system) and you’ll hopefully see something like this (I don’t have an N900) :

# hciconfig
hci0:   Type: USB
        BD Address: 00:1A:80:2C:BE:D0 ACL MTU: 384:8 SCO MTU: 64:8
        DOWN
        RX bytes:87170 acl:1503 sco:0 events:62 errors:0
        TX bytes:574 acl:22 sco:0 commands:21 errors:0

Plug your sixaxis into some other machine via USB, grab (and compile) sixpair.c and run it with the BD address

wget http://www.pabr.org/sixlinux/sixpair.c
gcc sixpair.c -lusb -o sixpair
./sixpair 00:1A:80:2C:BF:D3 # use your own BD addr here

And that should be about it – unplug the sixaxis, continue with the instructions from the previous post and when you press the PS button, everything should work as if by magic.

If it does not, get more magic.  Either way, leave a comment and let me know how it goes :)

13 Comments :, , , more...

A summary of Linux distros for the PS3

by jonathan on Oct.21, 2009, under general, ps3

I’ve been asked about selecting a Linux distro for use on the Playstation 3 a few times recently, so I’ve put together a page summarising some of the options which has mentions of pdaXrom, Debian, Ubuntu, Yellow Dog, Fedora, RHEL and Gentoo.

It’s all based on my own knowledge and experience – if there’s something you think is worth adding (or correcting, or improving) let me know.

[Update 20091021: Expanded the entry for Gentoo based on suggestions from unsolo & cheriff, and added a link to Windows-Hosted Cell SDK which I saw mentioned by @domipheus]

Leave a Comment : more...

sixaxis as joystick and mouse over bluetooth

by jonathan on Oct.21, 2009, under general, ps3

sixaxisd is a little daemon that will translate sixaxis input, sent via bluetooth, into joystick and (optionally) mouse input.  I found it as part of the pdaXrom-ng distro, available here.

To set it up, grab the source here, unpack, apply the patch from here (which fixes a couple of axis mappings) and compile using make.

You need a kernel with uinput support (Device drivers -> Input device support -> Miscellaneous devices -> User level driver support – or CONFIG_INPUT_UINPUT) and appropriate bluetooth support (I use ps3_defconfig’s defaults in this area), although you don’t need any particular system bluetooth services running – we set that up ourselves.

To configure the bluetooth device using hciconfig (which is part of Debian’s bluez package), run the following commands -

hciconfig hci0 up        # bring up the interface
hciconfig hci0 lm master # set link mode to master
hciconfig hci0 piscan    # enable page and inquiry scan

And then start the daemon -

# optional -mouse param provides mouse emulation
./sixaxisd -mouse

Hit the PS button on the controller to bring it to life – all going well, there device nodes /dev/input/js0 and /dev/input/mouse0 will be created.

There’s an init script to handle the hci configuration and  all of this that may be found in the pdaXrom svn repo.

(I use my sixaxis with my PS3 – if you want to use it with a different system, you’ll need to use sixpair, available here)

2 Comments :, , , more...

Looking for something?