## f32, u32, and const

Some time ago, I wrote “floats, bits, and constant expressions” about converting floating point number into its representative ones and zeros as a C++ constant expression – constructing the IEEE 754 representation without being able to examine the bits directly.

I’ve been playing around with Rust recently, and rewrote that conversion code as a bit of a learning exercise for myself, with a thoroughly contrived set of constraints: using integer and single-precision floating point math, at compile time, without unsafe blocks, while using as few unstable features as possible.

I’ve included the listing below, for your bemusement and/or head-shaking, and you can play with the code in the Rust Playground and rust.godbolt.org

Continue reading “f32, u32, and const”

## Navigation Mesh and Sunset Overdrive

Navigation mesh encodes where in the game world an agent can stand, and where it can go. (here “agent” means bot, actor, enemy, NPC, etc)

At runtime, the main thing navigation mesh is used for is to find paths between points using an algorithm like A*: https://en.wikipedia.org/wiki/A*_search_algorithm

In Insomniac’s engine, navigation mesh is made of triangles. Triangle edge midpoints define a connected graph for pathfinding purposes.

In addition to triangles, we have off-mesh links (“Custom Nav Clues” in Insomniac parlance) that describe movement that isn’t across the ground. These are used to represent any kind of off-mesh connection – could be jumping over a car or railing, climbing up to a rooftop, climbing down a ladder, etc. Exactly what it means for a particular type of bot is handled by clue markup and game code.

These links are placed by artists and designers in the game environment, and included in prefabs for commonly used bot-traversable objects in the world, like railings and cars.

Navigation mesh makes a certain operations much, much simpler than it would be if done by trying to reason about render or physics geometry.

Our game work is made up of a lot of small objects, which are each typically made from many triangles.

Using render or physics geometry to answer the question “can this bot stand here” hundreds of times every frame is not scalable. (Sunset Overdrive had 33ms frames. That’s not a lot of time.)

It’s much faster to ask: is there navigation mesh where this bot is

Navigation mesh is relatively sparse and simple, so the question can be answered quickly. We pre-compute bounding volumes for navmesh, to make answering that question even faster, and if a bot was standing on navmesh last frame, it’s even less work to reason about where they are this frame.

In addition to path-finding, navmesh can be useful to quickly and safely limit movement in a single direction. We sweep lines across navmesh to find boundaries to clamp bot movement. For example, a bot animating through a somersault will have its movement through the world clamped to the edge of navmesh, rather than rolling off into who-knows-what.

(If you’re making a game where you want bots to be able to freely somersault in any direction, you can ignore the navmesh 😁)

Building navmesh requires a complete view of the static world. The generated mesh is only correct when it accounts for all objects: interactions between objects affect the generated mesh in ways that are not easy (or fast) to reason about independently.

Intersecting objects can become obstructions to movement. Or they can form new surfaces that an agent can stand upon. You can’t really tell what it means to an agent until you mash it all together.

To do as little work as possible at runtime, we required *all* of the static objects to be loaded at one time to pre-build mesh for Sunset City.

We keep that pre-built navmesh loading during the game at all times. For the final version of the game (with both of the areas added via DLC) this required ~55MB memory.

We use Recast https://github.com/recastnavigation/recastnavigation to generate the triangle mesh, and (mostly for historical reasons) repack this into our own custom format.

Sunset Overdrive had two meshes: one for “normal” humanoid-sized bots (2m tall, 0.5m radius)

and one for “large” bots (4.5m tall, 1.35m radius)

Both meshes are generated as 16x16m tiles, and use a cell size of 0.125m when rasterizing collision geometry.

There were a few tools used in Sunset Overdrive to add some sense of dynamism to the static environment:

For pathfinding and bot-steering, we have runtime systems to control bot movement around dynamic obstacles.

For custom nav clues, we keep track of whether they are in use, to make it less likely that multiple bots are jumping over the same thing at the same time. This can help fan-out groups of bots, forcing them to take distinctly different paths.

Since Sunset Overdrive, we’ve added a dynamic obstruction system based on Detour https://github.com/recastnavigation/recastnavigation to temporarily cut holes in navmesh for larger impermanent obstacles like stopped cars or temporary structures.

We also have a way to mark-up areas of navmesh so that they can be toggled in a controlled fashion from script. It’s less flexible than the dyanamic obstruction system – but it is very fast: toggling flags for tris rather than retriangulation.

I spoke about Sunset Overdrive at the AI Summit a few years back – my slide deck is here:
Sunset City Express: Improving the NavMesh Pipeline in Sunset Overdrive

I can also highly recommend @AdamNoonchester‘s talk from GDC 2015:
AI in the Awesomepocalypse – Creating the Enemies of Sunset Overdrive

Here’s some navigation mesh, using the default in-engine debug draw (click for larger version)

What are we looking at? This is a top-down orthographic view of a location in the middle of Sunset City.

The different colors indicate different islands of navigation mesh – groups of triangles that are reachable from other islands via custom nav clues.
Bright sections are where sections of navmesh overlap in the X-Z plane.

There are multiple visualization modes for navmesh.

Usually, this is displayed over some in-game geometry – it exists to debug/understand the data in game and editor. Depending on what the world looks like, some colors are easier to read than others. (click for larger versions)

The second image shows the individual triangles – adjacent triangles do not reliably have different colors. And there is stable color selection as the camera moves, almost 😐

Also, if you squint, you can make out the 16x16m tile boundaries, so you can get a sense of scale.

Here’s a map of the entirety of Sunset City:

“The Mystery of the Mooil Rig” DLC area:

“Dawn of the Rise of the Fallen Machine” DLC area:

Referencing the comments from up-thread, these maps represent the places where agents can be. Additionally, there is connectivity information – we have visualization for that as well.

This image has a few extra in-engine annotations, and some that I added:

The purple lines represent custom nav clues – one line in each direction that is connected.

Also marked are some railings with clues placed at regular intervals, a car with clues crisscrossing it, and moored boats with clues that allow enemies to chase the player.

Also in this image are very faint lines on the mesh that show connectivity between triangles. When a bot is failing to navigate, it can be useful to visualize the connectivity that the mesh thinks it has :)

The radio tower where the fight with Fizzie takes place:

The roller coaster:

The roller coaster tracks are one single, continuous and complete island of navmesh.

Navigation mesh doesn’t line up neatly with collision geometry, or render geometry. To make it easier to see, we draw it offset +0.5m up in world-space, so that it’s likely to be above the geometry it has been generated for. (A while ago, I wrote a full-screen post effect that drew onto rendered geometry based on proximity to navmesh. I thought it was pretty cool, and it was nicely unambiguous & imho easier to read – but I never finished it, it bitrot, and I never got back to it alas.)

Since shipping Sunset Overdrive, we added support for keeping smaller pieces of navmesh in memory – they’re now loaded in 128x128m parts, along with the rest of the open world.

@despair‘s recent technical postmortem has a little more on how this works:
‘Marvel’s Spider-Man’: A Technical Postmortem

Even so, we still load it all of an open world region to build the navmesh: the asset pipeline doesn’t provide information that is needed to generate navmesh for sub-regions efficiently & correctly, so it’s all-or-nothing. (I have ideas on how to improve this. One day…)

Let me know if you have any questions – preferably via twitter @twoscomplement

## Modern C++ Randomness

So I did a little digging to satisfy my own curiosity about the “modern C++” version, and have learned a few things that I didn’t know previously…

(this is a manual unrolled twitter thread that starts here, with slight modifications)

Nearly all of this I gleaned from the invaluable and . Comments about implementation refer specifically to the gcc-8.1 C++ standard library, examined using Compiler Explorer and the -E command line option.

std::random_device is a platform-specific source of entropy.

std: mt19937 is a parameterized typedef of std::mersenne_twister_engine

specifically:
std::mersenne_twister_engine<uint_fast32_t, 32, 624, 397, 31, 0x9908b0df, 11, 0xffffffff, 7, 0x9d2c5680, 15, 0xefc60000, 18, 1812433253>
(What do those number mean? I don’t know.)

And std::uniform_int_distribution produces uniformly distributed random numbers over a specified range, from a provided generator.

The default constructor for std::random_device takes an implementation-defined argument, with a default value.

The meaning of the argument is implementation-defined – but the type is not: std::string. (I’m not sure why a dynamically modifiable string object was the right choice to be the configuration parameter for an entropy generator.)

There are out-of-line private functions for much of this implementation of std::random_device. The constructor that calls the out-of-line init function is itself inline – so the construction and destruction of the default std::string param is also generated inline.

Also, peeking inside std::random_generator, there is a union with two members:

void* _M_file, which I guess would be used to store a file handle for /dev/urandom or similar.

std::mt19937 _M_mt, which is a … parameterized std::mersenne_twister_engine object.

So it seems reasonable to me that if you can’t get entropy* from outside your program, generate your own approximation. It looks like it is possible that the entropy for the std::mersenne_twister_engine will be provided by a std::mersenne_twister_engine.

Unlike std::random_device, which has its implementation out of line, std::mersenne_twister_engine‘s implementation seems to be all inline. It is unclear what benefits this brings, but it results in a few hundred additional instructions generated.

And then there’s std::uniform_int_distribution, which seems mostly unsurprising. It is again fully inline, which (from a cursory eyeballing) may allow a sufficiently insightful compiler to avoid a couple of branches and function calls.

The code that got me started on this was presented in jest – but (std::random_device + std::mt19937 + std::uniform_int_distribution) is a commonly recommended pattern for generating random numbers using these modern C++ library features.

My takeaways:
std::random_device is potentially very expensive to use – and doesn’t provide strong cross-platform guarantees about the randomness it provides. It is configured with an std::string – the meaning of which is platform dependent. I am not compelled to use this type.

std::mt19937 adds a sizeable chunk of codegen via its inline implementation – and there are better options than Mersenne Twister.

Bottom line: I’m probably going to stick with rand(), and if I need something a little fancier,  or one of the other suggestions provided as replies to the twitter thread.

## Watch as the OS rewrites my buggy program.

I didn’t know that SetErrorMode(SEM_NOALIGNMENTFAULTEXCEPT) was a thing, until I wrote a bad test that wouldn’t crash.

Digging into it, I found that a movaps instruction was being rewritten as movups, which was a thoroughly confusing thing to see.

The one clue I had was that a fault due to an unaligned load had been observed in non-test code, but did not reproduce when written as a test using the google-test framework. A short hunt later (including a failed attempt at writing a small repro case), I found an explanation: google test suppresses this class of failure.

The code below will successfully demonstrate the behavior, printing out the SIMD load instruction before and after calling the function with an unaligned pointer.

[Gist]

## Priorities for my team

During the day, I’m a Lead of a group of programmers. We’re responsible for a range of tools and tech used by others at the company for making games.

I have a list of the my priorities (and some related questions) of things that I think are important for us to be able to do well as individuals, and as a team:

1. Treat people with respect. Value their time, place high value on their well-being, and start with the assumption that they have good intentions
(“People” includes yourself: respect yourself, value your own time and well-being, and have confidence in your good intentions.)
2. When solving a problem, know the user and understand their needs.
• Do you understand the problem(s) that need to be solved? (it’s easy to make assumptions)
• Have you spoken to the user and listened to their perspective? (it’s easy to solve the wrong problem)
• Have you explored the specific constraints of the problem by asking questions like:
• Is this part needed? (it’s easy to over-reach)
• Is there a satisfactory simpler alternative? (actively pursue simplicity)
• What else will be needed? (it’s easy to overlook details)
• Have your discussed your proposed solution with users, and do they understand what you intend to do? (verify, and pursue buy-in)
• Do you continue to meet regularly with users? Do they know you? Do they believe that you’re working for their benefit? (don’t under-estimate the value of trust)
3. Have a clear understanding of what you are doing.
• Do you understand the system you’re working in? (it’s easy to make assumptions)
• Have you read the documentation and/or code? (set yourself up to succeed with whatever is available)
• For code:
• Have you tried to modify the code? (pull a thread; see what breaks)
• Can you explain how the code works to another programmer in a convincing way? (test your confidence)
• Can you explain how the code works to a non-programmer?
4. When trying to solve a problem, debug aggressively and efficiently.
• Does the bug need to be fixed? (see 1)
• Do you understand how the system works? (see 2)
• Is there a faster way to debug the problem? Can you change code or data to cause the problem to occur more quickly and reliably? (iterate as quickly as you can, fix the bug, and move on)
• Do you trust your own judgement? (debug boldly, have confidence in what you have observed, make hypotheses and test them)
5. Pursue excellence in your work.
• How are you working to be better understood? (good communication takes time and effort)
• How are you working to better understand others? (don’t assume that others will pursue you with insights)
• Are you responding to feedback with enthusiasm to improve your work? (pursue professionalism)
• Are you writing high quality, easy to understand, easy to maintain code? How do you know? (continue to develop your technical skills)
• How are you working to become an expert and industry leader with the technologies and techniques you use every day? (pursue excellence in your field)
• Are you eager to improve (and fix) systems you have worked on previously? (take responsibility for your work)

The list was created for discussion with the group, and as an effort to articulate my own expectations in a way that will help my team understand me.

Composing this has been useful exercise for me as a lead, and definitely worthwhile for the group. If you’ve never tried writing down your own priorities, values, and/or assumptions, I encourage you to try it :)

## A little bit of floating point in a memory allocator — Part 2: The floating point

[Previously]

This post contains the same material as this thread of tweets, with a few minor edits.

In IEEE754, floating point numbers are represented like this:

±2ⁿⁿⁿ×1.sss…

nnn is the exponent, which is floor(log2(size)) — which happens to be the fl value computed by TLSF.

sss… is the significand fraction: the part that follows the decimal point, which happens to be sl.

And so to calculate fl and sl, all we need to do is convert size to a floating point value (on recent x86 hardware, that’s a single instruction). Then we can extract the exponent, and the upper bits of the fractional part, and we’re all done :D

That can be implemented like this:

```double sf = (int64_t)size;
uint64_t sfi;
memcpy(&sfi, &sf, 8);
fl = (sfi >> 52) - (1023 + 7);
sl = (sfi >> 47) & 31;
```

There’s some subtleties (there always is). I’ll break it down…

```double sf = (int64_t)size;
```

Convert size to a double, with an explicit cast. size has type size_t, but using TLSF from github.com/mattconte/tlsf, the largest supported allocation on 64bit architecture is 2^32 bytes – comfortably less than the precision provided by the double type. If you need your TLSF allocator to allocate chunks bigger than 2^53, this isn’t the technique for you :)

I first tried using float (not double), which can provide correct results — but only if the rounding mode happens to be set correctly. double is easier.

The cast to (int64_t) results in better codegen on x86: without it, the compiler will generate a full 64bit unsigned conversion, and there is no single instruction for that.

The cast tells the compiler to (in effect) consider the bits of size as if they were a two’s complement signed value — and there is an SSE instruction to handle that case (cvtsi2sdq or similar). Again, with the implementation we’re using size can’t be that big, so this will do the Right Thing.

```uint64_t sfi;
memcpy(&sfi, &sf, 8);
```

Copy the 8 bytes of the double into an unsigned integer variable. There are a lot of ways that C/C++ programmers copy bits from floating point to integer – some of them are well defined :) memcpy() does what we want, and any moderately respectable compiler knows how to select decent instructions to implement it.

Now we have floating point bits in an integer register, consisting of one sign bit (always zero for this, because size is always positive), eleven exponent bits (offset by 1023), and 52 bits of significant fraction. All we need to do is extract those, and we’re done :)

```fl = (sfi >> 52) - (1023 + 7);
```

Extract the exponent: shift it down (ignoring the always-zero sign bit), subtract the offset (1023), and that 7 we saw earlier, at the same time.

```sl = (sfi >> 47) & 31;
```

Extract the five most significant bits of the fraction – we do need to mask out the exponent.

And, just like that*, we have mapping_insert(), implemented in terms of integer -> floating point conversion.

* Actual code (rather than fragments) may be included in a later post…

## A little bit of floating point in a memory allocator — Part 1: Background

This post contains the same material as this thread of tweets, with a few minor edits.

Over my holiday break at the end of 2017, I took a look into the TLSF (Two Level Segregated Fit) memory allocator to better understand how it works. I’ve made use of this allocator and have been impressed by its real world performance, but never really done a deep dive to properly understand it.

The mapping_insert() function is a key part of the allocator implementation, and caught my eye. Here’s how that function is described in the paper A constant-time dynamic storage allocator for real-time systems:

I’ll be honest: from that description, I never developed a clear picture in my mind of what that function does.

(Reading it now, it seems reasonably clear – but I can say that only after I spent quite a bit of time using other methods to develop my understanding)

Something that helped me a lot was by looking at the implementation of that function from github.com/mattconte/tlsf/.  There’s a bunch of long-named macro constants in there, and a few extra implementation details. If you collapse those it looks something like this:

```void mapping_insert(size_t size, int* fli, int* sli)
{
int fl, sl;
if (size < 256)
{
fl = 0;
sl = (int)size / 8;
}
else
{
fl = fls(size);
sl = (int)(size >> (fl - 5)) ^ 0x20;
fl -= 7;
}
*fli = fl;
*sli = sl;
}
```

It’s a pretty simple function (it really is). But I still failed to *see* the pattern of results that would be produced in my mind’s eye.

I went so far as to make a giant spreadsheet of all the intermediate values for a range of inputs, to paint myself a picture of the effect of each step :) That helped immensely.

Breaking it down…

There are two cases handled in the function: one for when size is below a certain threshold, and on for when it is larger. The first is straightforward, and accounts for a small number of possible input values. The large size case is more interesting.

The function computes two values: fl and sl, the first and second level indices for a lookup table. For the large case, fl (where fl is “first level”) is computed via fls(size) (where fls is short for “find last set” – similar names, just to keep you on your toes).

fls() returns the index of the largest bit set, counting from the least significant slbit, which is the index of the largest power of two. In the words of the paper:

“the instruction fls can be used to compute the ⌊log2(x)⌋ function”

Which is, in C-like syntax: floor(log2(x))

And there’s that “fl -= 7” at the end. That will show up again later.

For the large case, the computation of sl has a few steps:

sl = (size >> (fl – 5)) ^ 0x20;

Depending on shift down size by some amount (based on fl), and mask out the sixth bit?

(Aside: The CellBE programmer in me is flinching at that variable shift)

It took me a while (longer than I would have liked…) to realize that this
size >> (fl – 5) is shifting size to generate a number that has exactly six significant bits, at the least significant end of the register (bits 5 thru 0).

Because fl is the index of the most significant bit, after this shift, bit 5 will always be 1 – and that “^ 0x20” will unset it, leaving the result as a value between 0 and 31 (inclusive).

So here’s where floating point comes into it, and the cute thing I saw: another way to compute fl and sl is to convert size into an IEEE754 floating point number, and extract the exponent, and most significant bits of the mantissa. I’ll cover that in the next part, here.

## Ralph Waldo Emerson, 8 November 1838

Let me never fall into the vulgar mistake of dreaming that I am persecuted whenever I am contradicted. No man, I think, had ever a greater well-being with a less desert than I. I can very well afford to be accounted bad or foolish by a few dozen or a few hundred persons[…] who see myself greeted by the good expectation of so many friends far beyond any power of thought or communication of thought residing in me. Besides, I own, I am often inclined to take part with those who say I am bad or foolish, for I fear I am both. I believe and know there must be a perfect compensation. I know too well my own dark spots. Not having myself attained, not satisfied myself, far from a holy obedience,— how can I expect to satisfy others, to command their love? A few sour faces, a few biting paragraphs,—is but a cheap expiation for all these short-comings of mine.

– Ralph Waldo Emerson, Journals, 8 November 1838

## Aside: Over-engineered Min() [C++, variadic templates, constexpr, fold left]

Q: Given a function `constexpr int Min(int a, int b)`, construct a function `constexpr int Min(Args... args)` that returns the minimum of all the provided args. Fail to justify your over-engineering.

A: Rename `Min(int, int)` as `MinImpl(int, int)` or stick it in a namespace. Overloading the function is not only unnecessary, it gets in the way of the implementation.

```constexpr int MinImpl(int a, int b)
{
return a < b ? a : b;
}```

Implement a `constexpr` fold left function. If we can use it for `Min()`, we should be able to do the same for `Max()`, and other similar functions. Should we be able to find any (#prematuregeneralization).

```template<typename ArgA, typename ArgB, typename Func>
constexpr auto foldl(Func func, ArgA a, ArgB b)
{
return func(a, b);
}

template<typename ArgA, typename ArgB,
typename Func, typename ...Args>
constexpr auto foldl(Func func, ArgA a, ArgB b, Args... args)
{
return foldl(func, func(a, b), args...);
}```

Combine the two.

```template<typename ...Args>
constexpr auto Min(Args... args)
{
return foldl(MinImpl, args...);
}```

Add the bare minimum amount of testing for a constexpr function: slap a `static_assert()` on it.

`static_assert(Min(6, 4, 5, 3, 9) == 3), "Nope");`

I did so with Visual Studio 2015 Update 2. It did not object.

Addendum: Some discussion with @nlguillemot and @DrPizza led to this attempt to do something similar with a C++17/C++1z fold-expression:

```#include <limits.h>

constexpr int MinImpl1(int a, int b) { return a < b ? a : b; }

constexpr void MinImpl2(int* m, int a, int b) { *m = a < b ? a : b; }

template<typename ...Args>
constexpr int Min(Args... args)
{
int m = INT_MAX;
// a binary expression in an operand of a fold-expression
// is not allowed, so this won't compile:
//((m = MinImpl1(m, args), ...);

// But this does:
(MinImpl2(/*out*/&m, m, args), ...);
return m;
}

int main()
{
static_assert(Min(3,4,5) == 3, "nope");
}```

This compiles with a gcc-6 pre-release snapshot.

Update: Here’s a further updated version, based on a refinement by @dotstdy.