I’ve been playing around with Rust recently, and rewrote that conversion code as a bit of a learning exercise for myself, with a thoroughly contrived set of constraints: using integer and single-precision floating point math, at compile time, without unsafe blocks, while using as few unstable features as possible.

I’ve included the listing below, for your bemusement and/or head-shaking, and you can play with the code in the Rust Playground and rust.godbolt.org

// Jonathan Adamczewski 2020-05-12 // // Constructing the bit-representation of an IEEE 754 single precision floating // point number, using integer and single-precision floating point math, at // compile time, in rust, without unsafe blocks, while using as few unstable // features as I can. // // or "What if this silly C++ thing https://brnz.org/hbr/?p=1518 but in Rust?" // Q. Why? What is this good for? // A. To the best of my knowledge, this code serves no useful purpose. // But I did learn a thing or two while writing it :) // This is needed to be able to perform floating point operations in a const // function: #![feature(const_fn)] // bits_transmute(): Returns the bits representing a floating point value, by // way of std::mem::transmute() // // For completeness (and validation), and to make it clear the fundamentally // unnecessary nature of the exercise :D - here's a short, straightforward, // library-based version. But it needs the const_transmute flag and an unsafe // block. #![feature(const_transmute)] const fn bits_transmute(f: f32) -> u32 { unsafe { std::mem::transmute::<f32, u32>(f) } } // get_if_u32(predicate:bool, if_true: u32, if_false: u32): // Returns if_true if predicate is true, else if_false // // If and match are not able to be used in const functions (at least, not // without #![feature(const_if_match)] - so here's a branch-free select function // for u32s const fn get_if_u32(predicate: bool, if_true: u32, if_false: u32) -> u32 { let pred_mask = (-1 * (predicate as i32)) as u32; let true_val = if_true & pred_mask; let false_val = if_false & !pred_mask; true_val | false_val } // get_if_f32(predicate, if_true, if_false): // Returns if_true if predicate is true, else if_false // // A branch-free select function for f32s. // // If either is_true or is_false is NaN or an infinity, the result will be NaN, // which is not ideal. I don't know of a better way to implement this function // within the arbitrary limitations of this silly little side quest. const fn get_if_f32(predicate: bool, if_true: f32, if_false: f32) -> f32 { // can't convert bool to f32 - but can convert bool to i32 to f32 let pred_sel = (predicate as i32) as f32; let pred_not_sel = ((!predicate) as i32) as f32; let true_val = if_true * pred_sel; let false_val = if_false * pred_not_sel; true_val + false_val } // bits(): Returns the bits representing a floating point value. const fn bits(f: f32) -> u32 { // the result value, initialized to a NaN value that will otherwise not be // produced by this function. let mut r = 0xffff_ffff; // These floation point operations (and others) cause the following error: // only int, `bool` and `char` operations are stable in const fn // hence #![feature(const_fn)] at the top of the file // Identify special cases let is_zero = f == 0_f32; let is_inf = f == f32::INFINITY; let is_neg_inf = f == f32::NEG_INFINITY; let is_nan = f != f; // Writing this as !(is_zero || is_inf || ...) cause the following error: // Loops and conditional expressions are not stable in const fn // so instead write this as type coversions, and bitwise operations // // "normalish" here means that f is a normal or subnormal value let is_normalish = 0 == ((is_zero as u32) | (is_inf as u32) | (is_neg_inf as u32) | (is_nan as u32)); // set the result value for each of the special cases r = get_if_u32(is_zero, 0, r); // if (iz_zero) { r = 0; } r = get_if_u32(is_inf, 0x7f80_0000, r); // if (is_inf) { r = 0x7f80_0000; } r = get_if_u32(is_neg_inf, 0xff80_0000, r); // if (is_neg_inf) { r = 0xff80_0000; } r = get_if_u32(is_nan, 0x7fc0_0000, r); // if (is_nan) { r = 0x7fc0_0000; } // It was tempting at this point to try setting f to a "normalish" placeholder // value so that special cases do not have to be handled in the code that // follows, like so: // f = get_if_f32(is_normal, f, 1_f32); // // Unfortunately, get_if_f32() returns NaN if either input is NaN or infinite. // Instead of switching the value, we work around the non-normalish cases // later. // // (This whole function is branch-free, so all of it is executed regardless of // the input value) // extract the sign bit let sign_bit = get_if_u32(f < 0_f32, 1, 0); // compute the absolute value of f let mut abs_f = get_if_f32(f < 0_f32, -f, f); // This part is a little complicated. The algorithm is functionally the same // as the C++ version linked from the top of the file. // // Because of the various contrived constraints on thie problem, we compute // the exponent and significand, rather than extract the bits directly. // // The idea is this: // Every finite single precision float point number can be represented as a // series of (at most) 24 significant digits as a 128.149 fixed point number // (128: 126 exponent values >= 0, plus one for the implicit leading 1, plus // one more so that the decimal point falls on a power-of-two boundary :) // 149: 126 negative exponent values, plus 23 for the bits of precision in the // significand.) // // If we are able to scale the number such that all of the precision bits fall // in the upper-most 64 bits of that fixed-point representation (while // tracking our effective manipulation of the exponent), we can then // predictably and simply scale that computed value back to a range than can // be converted safely to a u64, count the leading zeros to determine the // exact exponent, and then shift the result into position for the final u32 // representation. // Start with the largest possible exponent - subsequent steps will reduce // this number as appropriate let mut exponent: u32 = 254; { // Hex float literals are really nice. I miss them. // The threshold is 2^87 (think: 64+23 bits) to ensure that the number will // be large enough that, when scaled down by 2^64, all the precision will // fit nicely in a u64 const THRESHOLD: f32 = 154742504910672534362390528_f32; // 0x1p87f == 2^87 // The scaling factor is 2^41 (think: 64-23 bits) to ensure that a number // between 2^87 and 2^64 will not overflow in a single scaling step. const SCALE_UP: f32 = 2199023255552_f32; // 0x1p41f == 2^41 // Because loops are not available (no #![feature(const_loops)], and 'if' is // not available (no #![feature(const_if_match)]), perform repeated branch- // free conditional multiplication of abs_f. // use a macro, because why not :D It's the most compact, simplest option I // could find. macro_rules! maybe_scale { () => {{ // care is needed: if abs_f is above the threshold, multiplying by 2^41 // will cause it to overflow (INFINITY) which will cause get_if_f32() to // return NaN, which will destroy the value in abs_f. So compute a safe // scaling factor for each iteration. // // Roughly equivalent to : // if (abs_f < THRESHOLD) { // exponent -= 41; // abs_f += SCALE_UP; // } let scale = get_if_f32(abs_f < THRESHOLD, SCALE_UP, 1_f32); exponent = get_if_u32(abs_f < THRESHOLD, exponent - 41, exponent); abs_f = get_if_f32(abs_f < THRESHOLD, abs_f * scale, abs_f); }} } // 41 bits per iteration means up to 246 bits shifted. // Even the smallest subnormal value will end up in the desired range. maybe_scale!(); maybe_scale!(); maybe_scale!(); maybe_scale!(); maybe_scale!(); maybe_scale!(); } // Now that we know that abs_f is in the desired range (2^87 <= abs_f < 2^128) // scale it down to be in the range (2^23 <= _ < 2^64), and convert without // loss of precision to u64. const INV_2_64: f32 = 5.42101086242752217003726400434970855712890625e-20_f32; // 0x1p-64f == 2^64 let a = (abs_f * INV_2_64) as u64; // Count the leading zeros. // (C++ doesn't provide a compile-time constant function for this. It's nice // that rust does :) let mut lz = a.leading_zeros(); // if the number isn't normalish, lz is meaningless: we stomp it with // something that will not cause problems in the computation that follows - // the result of which is meaningless, and will be ignored in the end for // non-normalish values. lz = get_if_u32(!is_normalish, 0, lz); // if (!is_normalish) { lz = 0; } { // This step accounts for subnormal numbers, where there are more leading // zeros than can be accounted for in a valid exponent value, and leading // zeros that must remain in the final significand. // // If lz < exponent, reduce exponent to its final correct value - lz will be // used to remove all of the leading zeros. // // Otherwise, clamp exponent to zero, and adjust lz to ensure that the // correct number of bits will remain (after multiplying by 2^41 six times - // 2^246 - there are 7 leading zeros ahead of the original subnormal's // computed significand of 0.sss...) // // The following is roughly equivalent to: // if (lz < exponent) { // exponent = exponent - lz; // } else { // exponent = 0; // lz = 7; // } // we're about to mess with lz and exponent - compute and store the relative // value of the two let lz_is_less_than_exponent = lz < exponent; lz = get_if_u32(!lz_is_less_than_exponent, 7, lz); exponent = get_if_u32( lz_is_less_than_exponent, exponent - lz, 0); } // compute the final significand. // + 1 shifts away a leading 1-bit for normal, and 0-bit for subnormal values // Shifts are done in u64 (that leading bit is shifted into the void), then // the resulting bits are shifted back to their final resting place. let significand = ((a << (lz + 1)) >> (64 - 23)) as u32; // combine the bits let computed_bits = (sign_bit << 31) | (exponent << 23) | significand; // return the normalish result, or the non-normalish result, as appopriate get_if_u32(is_normalish, computed_bits, r) } // Compile-time validation - able to be examined in rust.godbolt.org output pub static BITS_BIGNUM: u32 = bits(std::f32::MAX); pub static TBITS_BIGNUM: u32 = bits_transmute(std::f32::MAX); pub static BITS_LOWER_THAN_MIN: u32 = bits(7.0064923217e-46_f32); pub static TBITS_LOWER_THAN_MIN: u32 = bits_transmute(7.0064923217e-46_f32); pub static BITS_ZERO: u32 = bits(0.0f32); pub static TBITS_ZERO: u32 = bits_transmute(0.0f32); pub static BITS_ONE: u32 = bits(1.0f32); pub static TBITS_ONE: u32 = bits_transmute(1.0f32); pub static BITS_NEG_ONE: u32 = bits(-1.0f32); pub static TBITS_NEG_ONE: u32 = bits_transmute(-1.0f32); pub static BITS_INF: u32 = bits(std::f32::INFINITY); pub static TBITS_INF: u32 = bits_transmute(std::f32::INFINITY); pub static BITS_NEG_INF: u32 = bits(std::f32::NEG_INFINITY); pub static TBITS_NEG_INF: u32 = bits_transmute(std::f32::NEG_INFINITY); pub static BITS_NAN: u32 = bits(std::f32::NAN); pub static TBITS_NAN: u32 = bits_transmute(std::f32::NAN); pub static BITS_COMPUTED_NAN: u32 = bits(std::f32::INFINITY/std::f32::INFINITY); pub static TBITS_COMPUTED_NAN: u32 = bits_transmute(std::f32::INFINITY/std::f32::INFINITY); // Run-time validation of many more values fn main() { let end: usize = 0xffff_ffff; let count = 9_876_543; // number of values to test let step = end / count; for u in (0..=end).step_by(step) { let v = u as u32; // reference let f = unsafe { std::mem::transmute::<u32, f32>(v) }; // compute let c = bits(f); // validation if c != v && !(f.is_nan() && c == 0x7fc0_0000) && // nans !(v == 0x8000_0000 && c == 0) { // negative 0 println!("{:x?} {:x?}", v, c); } } }

At runtime, the main thing navigation mesh is used for is to find paths between points using an algorithm like A*: https://en.wikipedia.org/wiki/A*_search_algorithm

In Insomniac’s engine, navigation mesh is made of triangles. Triangle edge midpoints define a connected graph for pathfinding purposes.

In addition to triangles, we have off-mesh links (“Custom Nav Clues” in Insomniac parlance) that describe movement that isn’t across the ground. These are used to represent any kind of off-mesh connection – could be jumping over a car or railing, climbing up to a rooftop, climbing down a ladder, etc. Exactly what it means for a particular type of bot is handled by clue markup and game code.

These links are placed by artists and designers in the game environment, and included in prefabs for commonly used bot-traversable objects in the world, like railings and cars.

Navigation mesh makes a certain operations much, much simpler than it would be if done by trying to reason about render or physics geometry.

Our game work is made up of a lot of small objects, which are each typically made from many triangles.

Using render or physics geometry to answer the question “can this bot stand here” hundreds of times every frame is not scalable. (Sunset Overdrive had 33ms frames. That’s not a lot of time.)

It’s much faster to ask: is there navigation mesh where this bot is

Navigation mesh is relatively sparse and simple, so the question can be answered quickly. We pre-compute bounding volumes for navmesh, to make answering that question even faster, and if a bot was standing on navmesh last frame, it’s even less work to reason about where they are this frame.

In addition to path-finding, navmesh can be useful to quickly and safely limit movement in a single direction. We sweep lines across navmesh to find boundaries to clamp bot movement. For example, a bot animating through a somersault will have its movement through the world clamped to the edge of navmesh, rather than rolling off into who-knows-what.

(If you’re making a game where you want bots to be able to freely somersault in any direction, you can ignore the navmesh )

Building navmesh requires a complete view of the static world. The generated mesh is only correct when it accounts for all objects: interactions between objects affect the generated mesh in ways that are not easy (or fast) to reason about independently.

Intersecting objects can become obstructions to movement. Or they can form new surfaces that an agent can stand upon. You can’t really tell what it means to an agent until you mash it all together.

To do as little work as possible at runtime, we required *all* of the static objects to be loaded at one time to pre-build mesh for Sunset City.

We keep that pre-built navmesh loading during the game at all times. For the final version of the game (with both of the areas added via DLC) this required ~55MB memory.

We use Recast https://github.com/recastnavigation/recastnavigation to generate the triangle mesh, and (mostly for historical reasons) repack this into our own custom format.

Sunset Overdrive had two meshes: one for “normal” humanoid-sized bots (2m tall, 0.5m radius)

and one for “large” bots (4.5m tall, 1.35m radius)

Both meshes are generated as 16x16m tiles, and use a cell size of 0.125m when rasterizing collision geometry.

There were a few tools used in Sunset Overdrive to add some sense of dynamism to the static environment:

For pathfinding and bot-steering, we have runtime systems to control bot movement around dynamic obstacles.

For custom nav clues, we keep track of whether they are in use, to make it less likely that multiple bots are jumping over the same thing at the same time. This can help fan-out groups of bots, forcing them to take distinctly different paths.

Since Sunset Overdrive, we’ve added a dynamic obstruction system based on Detour https://github.com/recastnavigation/recastnavigation to temporarily cut holes in navmesh for larger impermanent obstacles like stopped cars or temporary structures.

Sometimes you can find interesting things in a crash dump, if you look. pic.twitter.com/DfiWGZicb9

— Jonathan Adamczewski (@twoscomplement) August 21, 2018

We also have a way to mark-up areas of navmesh so that they can be toggled in a controlled fashion from script. It’s less flexible than the dyanamic obstruction system – but it is very fast: toggling flags for tris rather than retriangulation.

I spoke about Sunset Overdrive at the AI Summit a few years back – my slide deck is here:

Sunset City Express: Improving the NavMesh Pipeline in Sunset Overdrive

I can also highly recommend @AdamNoonchester‘s talk from GDC 2015:

AI in the Awesomepocalypse – Creating the Enemies of Sunset Overdrive

Here’s some navigation mesh, using the default in-engine debug draw (click for larger version)

What are we looking at? This is a top-down orthographic view of a location in the middle of Sunset City.

The different colors indicate different islands of navigation mesh – groups of triangles that are reachable from other islands via custom nav clues.

Bright sections are where sections of navmesh overlap in the X-Z plane.

There are multiple visualization modes for navmesh.

Usually, this is displayed over some in-game geometry – it exists to debug/understand the data in game and editor. Depending on what the world looks like, some colors are easier to read than others. (click for larger versions)

The second image shows the individual triangles – adjacent triangles do not reliably have different colors. And there is stable color selection as the camera moves, almost

Also, if you squint, you can make out the 16x16m tile boundaries, so you can get a sense of scale.

Here’s a map of the entirety of Sunset City:

“The Mystery of the Mooil Rig” DLC area:

“Dawn of the Rise of the Fallen Machine” DLC area:

Referencing the comments from up-thread, these maps represent the places where agents can be. Additionally, there is connectivity information – we have visualization for that as well.

This image has a few extra in-engine annotations, and some that I added:

The purple lines represent custom nav clues – one line in each direction that is connected.

Also marked are some railings with clues placed at regular intervals, a car with clues crisscrossing it, and moored boats with clues that allow enemies to chase the player.

Also in this image are very faint lines on the mesh that show connectivity between triangles. When a bot is failing to navigate, it can be useful to visualize the connectivity that the mesh thinks it has :)

The radio tower where the fight with Fizzie takes place:

The roller coaster:

The roller coaster tracks are one single, continuous and complete island of navmesh.

Navigation mesh doesn’t line up neatly with collision geometry, or render geometry. To make it easier to see, we draw it offset +0.5m up in world-space, so that it’s likely to be above the geometry it has been generated for. (A while ago, I wrote a full-screen post effect that drew onto rendered geometry based on proximity to navmesh. I thought it was pretty cool, and it was nicely unambiguous & imho easier to read – but I never finished it, it bitrot, and I never got back to it alas.)

Since shipping Sunset Overdrive, we added support for keeping smaller pieces of navmesh in memory – they’re now loaded in 128x128m parts, along with the rest of the open world.

@despair‘s recent technical postmortem has a little more on how this works:

‘Marvel’s Spider-Man’: A Technical Postmortem

Even so, we still load it all of an open world region to build the navmesh: the asset pipeline doesn’t provide information that is needed to generate navmesh for sub-regions efficiently & correctly, so it’s all-or-nothing. (I have ideas on how to improve this. One day…)

Let me know if you have any questions – preferably via twitter @twoscomplement

This post was originally a twitter thread:

A thread about navigation mesh and Sunset Overdrive…

— Jonathan Adamczewski (@twoscomplement) April 19, 2019

Choose:

— Elan Ruskin (@despair) May 24, 2018

What is the type of x?

— Jonathan Adamczewski (@twoscomplement) May 24, 2018

std::optional<std::any>

— Andreas Fredriksson (@deplinenoise) May 24, 2018

I’ll see your C++17, and raise you C89:#define x rand()

— Jonathan Adamczewski (@twoscomplement) May 24, 2018

auto x = []() { return rand(); }

— Andreas Fredriksson (@deplinenoise) May 24, 2018

tsk tsk… that’s not mordern enough: auto x = []() { std::random_device rd; std::mt19937 gen(rd()); std::uniform_int_distribution<> dis(0, 32767); return dis(gen); } <— FIXED IT FOR YOU!

— Bobby Anguelov (@BobbyAnguelov) May 24, 2018

So, rand() assembles to ‘jmp rand’. That abomination on the other hand.. My eyes! https://t.co/ujNkLrzavs

— Andreas Fredriksson (@deplinenoise) May 24, 2018

So I did a little digging to satisfy my own curiosity about the “modern C++” version, and have learned a few things that I didn’t know previously…

(this is a manual unrolled twitter thread that starts here, with slight modifications)

Nearly all of this I gleaned from the invaluable http://cppreference.com and http://godbolt.org . Comments about implementation refer specifically to the gcc-8.1 C++ standard library, examined using Compiler Explorer and the -E command line option.

std::random_device is a platform-specific source of entropy.

std: mt19937 is a parameterized typedef of std::mersenne_twister_engine

specifically:

std::mersenne_twister_engine<uint_fast32_t, 32, 624, 397, 31, 0x9908b0df, 11, 0xffffffff, 7, 0x9d2c5680, 15, 0xefc60000, 18, 1812433253>

(What do those number mean? I don’t know.)

And std::uniform_int_distribution produces uniformly distributed random numbers over a specified range, from a provided generator.

The default constructor for std::random_device takes an implementation-defined argument, with a default value.

The meaning of the argument is implementation-defined – but the type is not: std::string. (I’m not sure why a dynamically modifiable string object was the right choice to be the configuration parameter for an entropy generator.)

There are out-of-line private functions for much of this implementation of std::random_device. The constructor that calls the out-of-line init function is itself inline – so the construction and destruction of the default std::string param is also generated inline.

Also, peeking inside std::random_generator, there is a union with two members:

void* _M_file, which I guess would be used to store a file handle for /dev/urandom or similar.

std::mt19937 _M_mt, which is a … parameterized std::mersenne_twister_engine object.

So it seems reasonable to me that if you can’t get entropy* from outside your program, generate your own approximation. It looks like it is possible that the entropy for the std::mersenne_twister_engine will be provided by a std::mersenne_twister_engine.

Unlike std::random_device, which has its implementation out of line, std::mersenne_twister_engine‘s implementation seems to be all inline. It is unclear what benefits this brings, but it results in a few hundred additional instructions generated.

And then there’s std::uniform_int_distribution, which seems mostly unsurprising. It is again fully inline, which (from a cursory eyeballing) may allow a sufficiently insightful compiler to avoid a couple of branches and function calls.

The code that got me started on this was presented in jest – but (std::random_device + std::mt19937 + std::uniform_int_distribution) is a commonly recommended pattern for generating random numbers using these modern C++ library features.

My takeaways:

std::random_device is potentially very expensive to use – and doesn’t provide strong cross-platform guarantees about the randomness it provides. It is configured with an std::string – the meaning of which is platform dependent. I am not compelled to use this type.

std::mt19937 adds a sizeable chunk of codegen via its inline implementation – and there are better options than Mersenne Twister.

Bottom line: I’m probably going to stick with rand(), and if I need something a little fancier, http://www.pcg-random.org/ or one of the other suggestions provided as replies to the twitter thread.

Addition: the code I was able to gather, representing some relevant parts

]]>Digging into it, I found that a movaps instruction was being rewritten as movups, which was a *thoroughly* confusing thing to see.

The one clue I had was that a fault due to an unaligned load had been observed in non-test code, but did not reproduce when written as a test using the google-test framework. A short hunt later (including a failed attempt at writing a small repro case), I found an explanation: google test suppresses this class of failure.

The code below will successfully demonstrate the behavior, printing out the SIMD load instruction before and after calling the function with an unaligned pointer.

[Gist]

View the code on Gist.

]]>View the code on Gist.

]]>

During the day, I’m a Lead of a group of programmers. We’re responsible for a range of tools and tech used by others at the company for making games.

I have a list of the my priorities (and some related questions) of things that I think are important for us to be able to do well as individuals, and as a team:

**Treat people with respect. Value their time, place high value on their well-being, and start with the assumption that they have good intentions**(“People” includes yourself: respect yourself, value your own time and well-being, and have confidence in your good intentions.)

**When solving a problem, know the user and understand their needs.**- Do you understand the problem(s) that need to be solved? (it’s easy to make assumptions)
- Have you spoken to the user and listened to their perspective? (it’s easy to solve the wrong problem)
- Have you explored the specific constraints of the problem by asking questions like:
- Is this part needed? (it’s easy to over-reach)
- Is there a satisfactory simpler alternative? (actively pursue simplicity)
- What else will be needed? (it’s easy to overlook details)

- Have your discussed your proposed solution with users, and do they understand what you intend to do? (verify, and pursue buy-in)
- Do you continue to meet regularly with users? Do they know you? Do they believe that you’re working for their benefit? (don’t under-estimate the value of trust)

**Have a clear understanding of what you are doing.**- Do you understand the system you’re working in? (it’s easy to make assumptions)
- Have you read the documentation and/or code? (set yourself up to succeed with whatever is available)
- For code:
- Have you tried to modify the code? (pull a thread; see what breaks)
- Can you explain how the code works to another programmer in a convincing way? (test your confidence)
- Can you explain how the code works to a non-programmer?

**When trying to solve a problem, debug aggressively and efficiently.**- Does the bug need to be fixed? (see 1)
- Do you understand how the system works? (see 2)
- Is there a faster way to debug the problem? Can you change code or data to cause the problem to occur more quickly and reliably? (iterate as quickly as you can, fix the bug, and move on)
- Do you trust your own judgement? (debug boldly, have confidence in what you have observed, make hypotheses and test them)

**Pursue excellence in your work.**- How are you working to be better understood? (good communication takes time and effort)
- How are you working to better understand others? (don’t assume that others will pursue you with insights)
- Are you responding to feedback with enthusiasm to improve your work? (pursue professionalism)
- Are you writing high quality, easy to understand, easy to maintain code? How do you know? (continue to develop your technical skills)
- How are you working to become an expert and industry leader with the technologies and techniques you use every day? (pursue excellence in your field)
- Are you eager to improve (and fix) systems you have worked on previously? (take responsibility for your work)

The list was created for discussion with the group, and as an effort to articulate my own expectations in a way that will help my team understand me.

Composing this has been useful exercise for me as a lead, and definitely worthwhile for the group. If you’ve never tried writing down your own priorities, values, and/or assumptions, I encourage you to try it :)

]]>This post contains the same material as this thread of tweets, with a few minor edits.

In IEEE754, floating point numbers are represented like this:

±2ⁿⁿⁿ×1.sss…

nnn is the exponent, which is floor(log2(size)) — which happens to be the fl value computed by TLSF.

sss… is the significand fraction: the part that follows the decimal point, which happens to be sl.

And so to calculate fl and sl, all we need to do is convert size to a floating point value (on recent x86 hardware, that’s a single instruction). Then we can extract the exponent, and the upper bits of the fractional part, and we’re all done :D

That can be implemented like this:

double sf = (int64_t)size; uint64_t sfi; memcpy(&sfi, &sf, 8); fl = (sfi >> 52) - (1023 + 7); sl = (sfi >> 47) & 31;

There’s some subtleties (there always is). I’ll break it down…

double sf = (int64_t)size;

Convert size to a double, with an explicit cast. size has type size_t, but using TLSF from github.com/mattconte/tlsf, the largest supported allocation on 64bit architecture is 2^32 bytes – comfortably less than the precision provided by the double type. If you need your TLSF allocator to allocate chunks bigger than 2^53, this isn’t the technique for you :)

I first tried using float (not double), which can provide correct results — but only if the rounding mode happens to be set correctly. double is easier.

The cast to (int64_t) results in better codegen on x86: without it, the compiler will generate a full 64bit unsigned conversion, and there is no single instruction for that.

The cast tells the compiler to (in effect) consider the bits of size as if they were a two’s complement signed value — and there is an SSE instruction to handle that case (cvtsi2sdq or similar). Again, with the implementation we’re using size can’t be that big, so this will do the Right Thing.

uint64_t sfi; memcpy(&sfi, &sf, 8);

Copy the 8 bytes of the double into an unsigned integer variable. There are a lot of ways that C/C++ programmers copy bits from floating point to integer – some of them are well defined :) memcpy() does what we want, and any moderately respectable compiler knows how to select decent instructions to implement it.

Now we have floating point bits in an integer register, consisting of one sign bit (always zero for this, because size is always positive), eleven exponent bits (offset by 1023), and 52 bits of significant fraction. All we need to do is extract those, and we’re done :)

fl = (sfi >> 52) - (1023 + 7);

Extract the exponent: shift it down (ignoring the always-zero sign bit), subtract the offset (1023), and that 7 we saw earlier, at the same time.

sl = (sfi >> 47) & 31;

Extract the five most significant bits of the fraction – we do need to mask out the exponent.

And, just like that*, we have mapping_insert(), implemented in terms of integer -> floating point conversion.

* Actual code (rather than fragments) may be included in a later post…

]]>Over my holiday break at the end of 2017, I took a look into the TLSF (Two Level Segregated Fit) memory allocator to better understand how it works. I’ve made use of this allocator and have been impressed by its real world performance, but never really done a deep dive to properly understand it.

The mapping_insert() function is a key part of the allocator implementation, and caught my eye. Here’s how that function is described in the paper A constant-time dynamic storage allocator for real-time systems:

I’ll be honest: from that description, I never developed a clear picture in my mind of what that function does.

(Reading it now, it seems reasonably clear – but I can say that only after I spent quite a bit of time using other methods to develop my understanding)

Something that helped me a lot was by looking at the implementation of that function from github.com/mattconte/tlsf/. There’s a bunch of long-named macro constants in there, and a few extra implementation details. If you collapse those it looks something like this:

void mapping_insert(size_t size, int* fli, int* sli) { int fl, sl; if (size < 256) { fl = 0; sl = (int)size / 8; } else { fl = fls(size); sl = (int)(size >> (fl - 5)) ^ 0x20; fl -= 7; } *fli = fl; *sli = sl; }

It’s a pretty simple function (it really is). But I still failed to *see* the pattern of results that would be produced in my mind’s eye.

I went so far as to make a giant spreadsheet of all the intermediate values for a range of inputs, to paint myself a picture of the effect of each step :) That helped immensely.

Breaking it down…

There are two cases handled in the function: one for when size is below a certain threshold, and on for when it is larger. The first is straightforward, and accounts for a small number of possible input values. The large size case is more interesting.

The function computes two values: fl and sl, the first and second level indices for a lookup table. For the large case, fl (where fl is “first level”) is computed via fls(size) (where fls is short for “find last set” – similar names, just to keep you on your toes).

fls() returns the index of the largest bit set, counting from the least significant slbit, which is the index of the largest power of two. In the words of the paper:

“the instruction fls can be used to compute the ⌊log2(x)⌋ function”

Which is, in C-like syntax: floor(log2(x))

And there’s that “fl -= 7” at the end. That will show up again later.

For the large case, the computation of sl has a few steps:

sl = (size >> (fl – 5)) ^ 0x20;

Depending on shift down size by some amount (based on fl), and mask out the sixth bit?

(Aside: The CellBE programmer in me is flinching at that variable shift)

It took me a while (longer than I would have liked…) to realize that this

size >> (fl – 5) is shifting size to generate a number that has exactly six significant bits, at the least significant end of the register (bits 5 thru 0).

Because fl is the index of the most significant bit, after this shift, bit 5 will always be 1 – and that “^ 0x20” will unset it, leaving the result as a value between 0 and 31 (inclusive).

So here’s where floating point comes into it, and the cute thing I saw: another way to compute fl and sl is to convert size into an IEEE754 floating point number, and extract the exponent, and most significant bits of the mantissa. I’ll cover that in the next part, here.

]]>Let me never fall into the vulgar mistake of dreaming that I am persecuted whenever I am contradicted. No man, I think, had ever a greater well-being with a less desert than I. I can very well afford to be accounted bad or foolish by a few dozen or a few hundred persons[…] who see myself greeted by the good expectation of so many friends far beyond any power of thought or communication of thought residing in me. Besides, I own, I am often inclined to take part with those who say I am bad or foolish, for I fear I am both. I believe and know there must be a perfect compensation. I know too well my own dark spots. Not having myself attained, not satisfied myself, far from a holy obedience,— how can I expect to satisfy others, to command their love? A few sour faces, a few biting paragraphs,—is but a cheap expiation for all these short-comings of mine.

– Ralph Waldo Emerson, Journals, 8 November 1838

Copied from http://www.perfectidius.com/Volume_5_1838-1841.pdf

]]>`constexpr int Min(int a, int b)`

, construct a function `constexpr int Min(Args... args)`

that returns the minimum of all the provided args. Fail to justify your over-engineering.
**A:** Rename `Min(int, int)`

as `MinImpl(int, int)`

or stick it in a namespace. Overloading the function is not only unnecessary, it gets in the way of the implementation.

constexpr int MinImpl(int a, int b) { return a < b ? a : b; }

Implement a `constexpr`

fold left function. If we can use it for `Min()`

, we should be able to do the same for `Max()`

, and other similar functions. Should we be able to find any (#prematuregeneralization).

template<typename ArgA, typename ArgB, typename Func> constexpr auto foldl(Func func, ArgA a, ArgB b) { return func(a, b); } template<typename ArgA, typename ArgB, typename Func, typename ...Args> constexpr auto foldl(Func func, ArgA a, ArgB b, Args... args) { return foldl(func, func(a, b), args...); }

Combine the two.

template<typename ...Args> constexpr auto Min(Args... args) { return foldl(MinImpl, args...); }

Add the bare minimum amount of testing for a constexpr function: slap a `static_assert()`

on it.

static_assert(Min(6, 4, 5, 3, 9) == 3), "Nope");

I did so with Visual Studio 2015 Update 2. It did not object.

**Addendum:** Some discussion with @nlguillemot and @DrPizza led to this attempt to do something similar with a C++17/C++1z fold-expression:

#include <limits.h> constexpr int MinImpl1(int a, int b) { return a < b ? a : b; } constexpr void MinImpl2(int* m, int a, int b) { *m = a < b ? a : b; } template<typename ...Args> constexpr int Min(Args... args) { int m = INT_MAX; // a binary expression in an operand of a fold-expression // is not allowed, so this won't compile: //((m = MinImpl1(m, args), ...); // But this does: (MinImpl2(/*out*/&m, m, args), ...); return m; } int main() { static_assert(Min(3,4,5) == 3, "nope"); }

This compiles with a gcc-6 pre-release snapshot.

**Update:** Here’s a further updated version, based on a refinement by @dotstdy.