{"id":1850,"date":"2020-05-13T07:03:47","date_gmt":"2020-05-12T21:03:47","guid":{"rendered":"http:\/\/brnz.org\/hbr\/?p=1850"},"modified":"2020-05-13T07:16:48","modified_gmt":"2020-05-12T21:16:48","slug":"f32-u32-and-const","status":"publish","type":"post","link":"https:\/\/brnz.org\/hbr\/?p=1850","title":{"rendered":"f32, u32, and const"},"content":{"rendered":"\n<p>Some time ago, I wrote &#8220;<a href=\"https:\/\/brnz.org\/hbr\/?p=1518\">floats, bits, and constant expressions<\/a>&#8221; about converting floating point number into its representative ones and zeros as a C++ constant expression &#8211; constructing the IEEE 754 representation without being able to examine the bits directly.<\/p>\n\n\n\n<p>I&#8217;ve been playing around with Rust recently, and rewrote that conversion code as a bit of a learning exercise for myself, with a thoroughly contrived set of constraints: using integer and single-precision floating point math, at compile time, without unsafe blocks, while using as few unstable features as possible.<\/p>\n\n\n\n<p>I&#8217;ve included the listing below, for your bemusement and\/or head-shaking, and you can play with the code in the <a href=\"https:\/\/play.rust-lang.org\/?version=nightly&amp;mode=release&amp;edition=2018&amp;gist=fd5c6a38da5925ded21c70e55f39876c\">Rust Playground<\/a> and <a href=\"https:\/\/rust.godbolt.org\/z\/unRn_B\">rust.godbolt.org<\/a><\/p>\n\n\n\n<!--more-->\n\n\n\n<div class=\"wp-block-codemirror-blocks-code-block code-block\"><pre class=\"CodeMirror\" data-setting=\"{&quot;mode&quot;:&quot;rust&quot;,&quot;mime&quot;:&quot;text\/x-rustsrc&quot;,&quot;theme&quot;:&quot;default&quot;,&quot;lineNumbers&quot;:false,&quot;styleActiveLine&quot;:false,&quot;lineWrapping&quot;:false,&quot;readOnly&quot;:true,&quot;languageLabel&quot;:&quot;no&quot;,&quot;language&quot;:&quot;Rust&quot;,&quot;modeName&quot;:&quot;rust&quot;}\">\/\/ Jonathan Adamczewski 2020-05-12\n\/\/\n\/\/ Constructing the bit-representation of an IEEE 754 single precision floating \n\/\/ point number, using integer and single-precision floating point math, at \n\/\/ compile time, in rust, without unsafe blocks, while using as few unstable \n\/\/ features as I can.\n\/\/\n\/\/ or &quot;What if this silly C++ thing https:\/\/brnz.org\/hbr\/?p=1518 but in Rust?&quot;\n\n\n\/\/ Q. Why? What is this good for?\n\/\/ A. To the best of my knowledge, this code serves no useful purpose. \n\/\/    But I did learn a thing or two while writing it :)\n\n\n\/\/ This is needed to be able to perform floating point operations in a const \n\/\/ function:\n#![feature(const_fn)]\n\n\n\/\/ bits_transmute(): Returns the bits representing a floating point value, by\n\/\/                   way of std::mem::transmute()\n\/\/\n\/\/ For completeness (and validation), and to make it clear the fundamentally \n\/\/ unnecessary nature of the exercise :D - here's a short, straightforward, \n\/\/ library-based version. But it needs the const_transmute flag and an unsafe \n\/\/ block.\n#![feature(const_transmute)]\nconst fn bits_transmute(f: f32) -&gt; u32 {\n  unsafe { std::mem::transmute::&lt;f32, u32&gt;(f) }\n}\n\n\n\n\/\/ get_if_u32(predicate:bool, if_true: u32, if_false: u32):\n\/\/   Returns if_true if predicate is true, else if_false\n\/\/\n\/\/ If and match are not able to be used in const functions (at least, not \n\/\/ without #![feature(const_if_match)] - so here's a branch-free select function\n\/\/ for u32s\nconst fn get_if_u32(predicate: bool, if_true: u32, if_false: u32) -&gt; u32 {\n  let pred_mask = (-1 * (predicate as i32)) as u32;\n  let true_val = if_true &amp; pred_mask;\n  let false_val = if_false &amp; !pred_mask;\n  true_val | false_val\n}\n\n\/\/ get_if_f32(predicate, if_true, if_false):\n\/\/   Returns if_true if predicate is true, else if_false\n\/\/\n\/\/ A branch-free select function for f32s.\n\/\/ \n\/\/ If either is_true or is_false is NaN or an infinity, the result will be NaN,\n\/\/ which is not ideal. I don't know of a better way to implement this function\n\/\/ within the arbitrary limitations of this silly little side quest.\nconst fn get_if_f32(predicate: bool, if_true: f32, if_false: f32) -&gt; f32 {\n  \/\/ can't convert bool to f32 - but can convert bool to i32 to f32\n  let pred_sel = (predicate as i32) as f32;\n  let pred_not_sel = ((!predicate) as i32) as f32;\n  let true_val = if_true * pred_sel;\n  let false_val = if_false * pred_not_sel;\n  true_val + false_val\n}\n\n\n\/\/ bits(): Returns the bits representing a floating point value.\nconst fn bits(f: f32) -&gt; u32 {\n  \/\/ the result value, initialized to a NaN value that will otherwise not be\n  \/\/ produced by this function.\n  let mut r = 0xffff_ffff;\n\n  \/\/ These floation point operations (and others) cause the following error:\n  \/\/     only int, `bool` and `char` operations are stable in const fn\n  \/\/ hence #![feature(const_fn)] at the top of the file\n  \n  \/\/ Identify special cases\n  let is_zero    = f == 0_f32;\n  let is_inf     = f == f32::INFINITY;\n  let is_neg_inf = f == f32::NEG_INFINITY;\n  let is_nan     = f != f;\n\n  \/\/ Writing this as !(is_zero || is_inf || ...) cause the following error:\n  \/\/     Loops and conditional expressions are not stable in const fn\n  \/\/ so instead write this as type coversions, and bitwise operations\n  \/\/\n  \/\/ &quot;normalish&quot; here means that f is a normal or subnormal value\n  let is_normalish = 0 == ((is_zero as u32) | (is_inf as u32) | \n                        (is_neg_inf as u32) | (is_nan as u32));\n\n  \/\/ set the result value for each of the special cases\n  r = get_if_u32(is_zero,    0,           r); \/\/ if (iz_zero)    { r = 0; }\n  r = get_if_u32(is_inf,     0x7f80_0000, r); \/\/ if (is_inf)     { r = 0x7f80_0000; }\n  r = get_if_u32(is_neg_inf, 0xff80_0000, r); \/\/ if (is_neg_inf) { r = 0xff80_0000; }\n  r = get_if_u32(is_nan,     0x7fc0_0000, r); \/\/ if (is_nan)     { r = 0x7fc0_0000; }\n \n  \/\/ It was tempting at this point to try setting f to a &quot;normalish&quot; placeholder \n  \/\/ value so that special cases do not have to be handled in the code that \n  \/\/ follows, like so:\n  \/\/ f = get_if_f32(is_normal, f, 1_f32);\n  \/\/\n  \/\/ Unfortunately, get_if_f32() returns NaN if either input is NaN or infinite.\n  \/\/ Instead of switching the value, we work around the non-normalish cases \n  \/\/ later.\n  \/\/\n  \/\/ (This whole function is branch-free, so all of it is executed regardless of \n  \/\/ the input value)\n\n  \/\/ extract the sign bit\n  let sign_bit  = get_if_u32(f &lt; 0_f32,  1, 0);\n\n  \/\/ compute the absolute value of f\n  let mut abs_f = get_if_f32(f &lt; 0_f32, -f, f);\n\n  \n  \/\/ This part is a little complicated. The algorithm is functionally the same \n  \/\/ as the C++ version linked from the top of the file.\n  \/\/ \n  \/\/ Because of the various contrived constraints on thie problem, we compute \n  \/\/ the exponent and significand, rather than extract the bits directly.\n  \/\/\n  \/\/ The idea is this:\n  \/\/ Every finite single precision float point number can be represented as a\n  \/\/ series of (at most) 24 significant digits as a 128.149 fixed point number \n  \/\/ (128: 126 exponent values &gt;= 0, plus one for the implicit leading 1, plus \n  \/\/ one more so that the decimal point falls on a power-of-two boundary :)\n  \/\/ 149: 126 negative exponent values, plus 23 for the bits of precision in the \n  \/\/ significand.)\n  \/\/\n  \/\/ If we are able to scale the number such that all of the precision bits fall \n  \/\/ in the upper-most 64 bits of that fixed-point representation (while \n  \/\/ tracking our effective manipulation of the exponent), we can then \n  \/\/ predictably and simply scale that computed value back to a range than can \n  \/\/ be converted safely to a u64, count the leading zeros to determine the \n  \/\/ exact exponent, and then shift the result into position for the final u32 \n  \/\/ representation.\n  \n  \/\/ Start with the largest possible exponent - subsequent steps will reduce \n  \/\/ this number as appropriate\n  let mut exponent: u32 = 254;\n  {\n    \/\/ Hex float literals are really nice. I miss them.\n\n    \/\/ The threshold is 2^87 (think: 64+23 bits) to ensure that the number will \n    \/\/ be large enough that, when scaled down by 2^64, all the precision will \n    \/\/ fit nicely in a u64\n    const THRESHOLD: f32 = 154742504910672534362390528_f32; \/\/ 0x1p87f == 2^87\n\n    \/\/ The scaling factor is 2^41 (think: 64-23 bits) to ensure that a number \n    \/\/ between 2^87 and 2^64 will not overflow in a single scaling step.\n    const SCALE_UP: f32 = 2199023255552_f32; \/\/ 0x1p41f == 2^41\n\n    \/\/ Because loops are not available (no #![feature(const_loops)], and 'if' is\n    \/\/ not available (no #![feature(const_if_match)]), perform repeated branch-\n    \/\/ free conditional multiplication of abs_f.\n\n    \/\/ use a macro, because why not :D It's the most compact, simplest option I \n    \/\/ could find.\n    macro_rules! maybe_scale {\n      () =&gt; {{\n        \/\/ care is needed: if abs_f is above the threshold, multiplying by 2^41 \n        \/\/ will cause it to overflow (INFINITY) which will cause get_if_f32() to\n        \/\/ return NaN, which will destroy the value in abs_f. So compute a safe \n        \/\/ scaling factor for each iteration.\n        \/\/\n        \/\/ Roughly equivalent to :\n        \/\/ if (abs_f &lt; THRESHOLD) {\n        \/\/   exponent -= 41;\n        \/\/   abs_f += SCALE_UP;\n        \/\/ }\n        let scale = get_if_f32(abs_f &lt; THRESHOLD, SCALE_UP,      1_f32);    \n        exponent  = get_if_u32(abs_f &lt; THRESHOLD, exponent - 41, exponent); \n        abs_f     = get_if_f32(abs_f &lt; THRESHOLD, abs_f * scale, abs_f);\n      }}\n    }\n    \/\/ 41 bits per iteration means up to 246 bits shifted.\n    \/\/ Even the smallest subnormal value will end up in the desired range.\n    maybe_scale!();  maybe_scale!();  maybe_scale!();\n    maybe_scale!();  maybe_scale!();  maybe_scale!();\n  }\n\n  \/\/ Now that we know that abs_f is in the desired range (2^87 &lt;= abs_f &lt; 2^128)\n  \/\/ scale it down to be in the range (2^23 &lt;= _ &lt; 2^64), and convert without \n  \/\/ loss of precision to u64.\n  const INV_2_64: f32 = 5.42101086242752217003726400434970855712890625e-20_f32; \/\/ 0x1p-64f == 2^64\n  let a = (abs_f * INV_2_64) as u64;\n\n  \/\/ Count the leading zeros.\n  \/\/ (C++ doesn't provide a compile-time constant function for this. It's nice \n  \/\/ that rust does :)\n  let mut lz = a.leading_zeros();\n\n  \/\/ if the number isn't normalish, lz is meaningless: we stomp it with \n  \/\/ something that will not cause problems in the computation that follows - \n  \/\/ the result of which is meaningless, and will be ignored in the end for \n  \/\/ non-normalish values.\n  lz = get_if_u32(!is_normalish, 0, lz); \/\/ if (!is_normalish) { lz = 0; }\n\n  {\n    \/\/ This step accounts for subnormal numbers, where there are more leading \n    \/\/ zeros than can be accounted for in a valid exponent value, and leading \n    \/\/ zeros that must remain in the final significand.\n    \/\/\n    \/\/ If lz &lt; exponent, reduce exponent to its final correct value - lz will be\n    \/\/ used to remove all of the leading zeros.\n    \/\/\n    \/\/ Otherwise, clamp exponent to zero, and adjust lz to ensure that the \n    \/\/ correct number of bits will remain (after multiplying by 2^41 six times - \n    \/\/ 2^246 - there are 7 leading zeros ahead of the original subnormal's\n    \/\/ computed significand of 0.sss...)\n    \/\/ \n    \/\/ The following is roughly equivalent to:\n    \/\/ if (lz &lt; exponent) {\n    \/\/   exponent = exponent - lz;\n    \/\/ } else {\n    \/\/   exponent = 0;\n    \/\/   lz = 7;\n    \/\/ }\n\n    \/\/ we're about to mess with lz and exponent - compute and store the relative \n    \/\/ value of the two\n    let lz_is_less_than_exponent = lz &lt; exponent;\n\n    lz       = get_if_u32(!lz_is_less_than_exponent, 7,             lz);\n    exponent = get_if_u32( lz_is_less_than_exponent, exponent - lz, 0);\n  }\n\n  \/\/ compute the final significand.\n  \/\/ + 1 shifts away a leading 1-bit for normal, and 0-bit for subnormal values\n  \/\/ Shifts are done in u64 (that leading bit is shifted into the void), then\n  \/\/ the resulting bits are shifted back to their final resting place.\n  let significand = ((a &lt;&lt; (lz + 1)) &gt;&gt; (64 - 23)) as u32;\n\n  \/\/ combine the bits\n  let computed_bits = (sign_bit &lt;&lt; 31) | (exponent &lt;&lt; 23) | significand;\n\n  \/\/ return the normalish result, or the non-normalish result, as appopriate\n  get_if_u32(is_normalish, computed_bits, r)\n}\n\n\n\/\/ Compile-time validation - able to be examined in rust.godbolt.org output\npub static BITS_BIGNUM: u32 = bits(std::f32::MAX);\npub static TBITS_BIGNUM: u32 = bits_transmute(std::f32::MAX);\npub static BITS_LOWER_THAN_MIN: u32 = bits(7.0064923217e-46_f32);\npub static TBITS_LOWER_THAN_MIN: u32 = bits_transmute(7.0064923217e-46_f32);\npub static BITS_ZERO: u32 = bits(0.0f32);\npub static TBITS_ZERO: u32 = bits_transmute(0.0f32);\npub static BITS_ONE: u32 = bits(1.0f32);\npub static TBITS_ONE: u32 = bits_transmute(1.0f32);\npub static BITS_NEG_ONE: u32 = bits(-1.0f32);\npub static TBITS_NEG_ONE: u32 = bits_transmute(-1.0f32);\npub static BITS_INF: u32 = bits(std::f32::INFINITY);\npub static TBITS_INF: u32 = bits_transmute(std::f32::INFINITY);\npub static BITS_NEG_INF: u32 = bits(std::f32::NEG_INFINITY);\npub static TBITS_NEG_INF: u32 = bits_transmute(std::f32::NEG_INFINITY);\npub static BITS_NAN: u32 = bits(std::f32::NAN);\npub static TBITS_NAN: u32 = bits_transmute(std::f32::NAN);\npub static BITS_COMPUTED_NAN: u32 = bits(std::f32::INFINITY\/std::f32::INFINITY);\npub static TBITS_COMPUTED_NAN: u32 = bits_transmute(std::f32::INFINITY\/std::f32::INFINITY);\n\n\n\/\/ Run-time validation of many more values\nfn main() {\n  let end: usize = 0xffff_ffff;\n  let count = 9_876_543; \/\/ number of values to test\n  let step = end \/ count;\n  for u in (0..=end).step_by(step) {\n      let v = u as u32;\n      \n      \/\/ reference\n      let f = unsafe { std::mem::transmute::&lt;u32, f32&gt;(v) };\n      \n      \/\/ compute\n      let c = bits(f);\n\n      \/\/ validation\n      if c != v &amp;&amp; \n         !(f.is_nan() &amp;&amp; c == 0x7fc0_0000) &amp;&amp; \/\/ nans\n         !(v == 0x8000_0000 &amp;&amp; c == 0) { \/\/ negative 0\n          println!(&quot;{:x?} {:x?}&quot;, v, c); \n      }\n  }\n}\n<\/pre><\/div>\n","protected":false},"excerpt":{"rendered":"<p>Some time ago, I wrote &#8220;floats, bits, and constant expressions&#8221; about converting floating point number into its representative ones and zeros as a C++ constant expression &#8211; constructing the IEEE 754 representation without being able to examine the bits directly. I&#8217;ve been playing around with Rust recently, and rewrote that conversion code as a bit &hellip; <a href=\"https:\/\/brnz.org\/hbr\/?p=1850\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;f32, u32, and const&#8221;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[5,26],"tags":[71],"_links":{"self":[{"href":"https:\/\/brnz.org\/hbr\/index.php?rest_route=\/wp\/v2\/posts\/1850"}],"collection":[{"href":"https:\/\/brnz.org\/hbr\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/brnz.org\/hbr\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/brnz.org\/hbr\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/brnz.org\/hbr\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1850"}],"version-history":[{"count":9,"href":"https:\/\/brnz.org\/hbr\/index.php?rest_route=\/wp\/v2\/posts\/1850\/revisions"}],"predecessor-version":[{"id":1865,"href":"https:\/\/brnz.org\/hbr\/index.php?rest_route=\/wp\/v2\/posts\/1850\/revisions\/1865"}],"wp:attachment":[{"href":"https:\/\/brnz.org\/hbr\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1850"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/brnz.org\/hbr\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1850"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/brnz.org\/hbr\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1850"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}