Yak shaving conditional compilation in Rust

In C++ there is a common idiom used when writing a low level interface that has different implementations for multiple architectures. That is to use the preprocessor to select the appropriate implementation at compile time. This pattern is frequently used in C++ math libraries.

For example, the C++ Realtime Math library sets up its own defines based on definitions passed in from the build system:

#if !defined(RTM_NO_INTRINSICS)
  #if defined(__SSE2__) || defined(_M_IX86) || defined(_M_X64)
    #define RTM_SSE2_INTRINSICS
  #endif
#endif

Intrinsics can be entirely disabled if RTM_NO_INTRINSICS is defined. Otherwise the predefined compiler specific definitions (different compilers use different defines for this stuff, welcome to C/C++) __SSE2__, _M_IX86 and _M_X64 are checked, if any of these are defined the Realtime Math library enables SSE2 support with it’s own RTM_SSE2_INTRINSICS define. These defines are then used to select the appropriate implementation to compile:

inline float RTM_SIMD_CALL vector_get_x(vector4f_arg0 input) RTM_NO_EXCEPT
{
#if defined(RTM_SSE2_INTRINSICS)
  return _mm_cvtss_f32(input);
#elif defined(RTM_NEON_INTRINSICS)
  return vgetq_lane_f32(input, 0);
#else
  return input.x;
#endif
}

I don’t know if this idiom has a name but it’s pretty common, especially in math libraries which usually have a lot of small performance critical functions.

From the C preprocessor to Rust

There is nothing quite the same as the C preprocessor in Rust. That is generally considered a good thing, however when it comes to the above idiom there is nothing I’ve found in Rust that feels quite as convenient.

I’ve tried three different approaches to solving this problem in glam which I’ll discuss the pros on cons of here. Perhaps first though, let’s talk about the primary features that Rust provides to solve this problem.

cfg attributes

Rust code can be conditionally compiled using the attributes cfg and cfg_attr and the built-in cfg! macro. These can be used to check conditions like the target architecture of the compiled crate, enabled features passed to the compiler by cargo, and so on. Usually for SIMD we’re interested in target_arch, target_feature and our own crate’s feature flags.

See the conditional compilation section of the Rust book for more details.

Can we achieve the same thing as the C/C++ example above using the cfg attribute?

#[cfg(all(not(feature = "no-intrinsics"), target_feature = "sse2"))]
// now what?

In some ways Rust is simpler in that we only need to check target_feature = sse2 instead of multiple vendor specific compiler flags in C or C++, however unlike #define we are unable to introduce a new feature in Rust code. These can only come from the build system. I will come back to this point later one.

Perhaps instead of setting our own define let’s just make these cfg checks in our function.

#[inline]
impl Vec4 {
  fn x(self) -> f32 {
    #[cfg(all(not(feature = "no-intrinsics"), target_feature = "sse2"))]
    unsafe { _mm_cvtss_f32(self.0) }
    #[cfg(all(not(feature = "no-intrinsics"), target_feature = "wasm32"))]
    unsafe { f32x4_extract_lane(self.0, 0) }
    #[cfg(any(feature = "no-intrinsics", not(any(target_feature = "sse2", target_feature = "wasm32"))))]
    { self.0 }
  }
}

The lack of an else equivalent means this final fallback cfg check is lot more verbose than the C preprocessor version. It might be okay for one or two methods, but not for hundreds. Also there would be a combinatorial explosion every time a new target_feature is supported - the fallback cfg check is going to get even more complex.

If you are familiar with Rust you might be thinking we could use the cfg! macro here, let’s consider that:

#[inline]
impl Vec4 {
  fn x(self) -> f32 {
    if cfg!(all(not(feature = "no-intrinsics"), target_feature = "sse2")) {
      unsafe { _mm_cvtss_f32(self.0) }
    } else if cfg!(all(not(feature = "no-intrinsics"), target_feature = "wasm32")) {
      unsafe { f32x4_extract_lane(self.0, 0) }
    } else {
      self.0
    }
  }
}

This looks much better in principle but it doesn’t compile. The problem here is the compiler still tries to parse every conditional block regardless and if you are on a platform that supports sse2 then the wasm32 module won’t exist, and vice versa.

That is the definition of the problem I’m trying to solve, let’s look at the different approaches I’ve tried with glam.

Conditional compilation by module

This approach splits the different implementations for each architecture into it’s own module and then selects the appropriate module at compile time. This is the first approach I took in glam. Continuing with the above example, separating into modules looks like so:

// vec4_sse2.rs
pub struct Vec4(__m128);
impl Vec4 {
  /// Returns element `x`.
  #[inline]
  pub fn x(self) -> f32 {
    unsafe { _mm_cvtss_f32(self.0) }
  }
}
// file: vec4_wasm32.rs
pub struct Vec4(f32x4);
impl Vec4 {
  /// Returns element `x`.
  #[inline]
  pub fn x(self) -> f32 {
    unsafe { f32x4_extract_lane(self.0) }
  }
}
// file: vec4_f32.rs
pub struct Vec4(f32, f32, f32, f32);
impl Vec4 {
  /// Returns element `x`.
  #[inline]
  pub fn x(self) -> f32 {
    self.0
  }
}
// file: lib.rs
#[cfg(all(not(feature = "no-intrinsics"), target_feature = "sse2"))]
mod vec4_sse2;
#[cfg(all(not(feature = "no-intrinsics"), target_feature = "wasm32"))]
mod vec4_wasm32;
#[cfg(any(feature = "no-intrinsics", not(any(target_feature = "sse2", target_feature = "wasm32"))))]
mod vec4_f32;

#[cfg(all(not(feature = "no-intrinsics"), target_feature = "sse2"))]
pub use vec4_sse2::*;
#[cfg(all(not(feature = "no-intrinsics"), target_feature = "wasm32"))]
pub use vec4_wasm32::*;
#[cfg(any(feature = "no-intrinsics", not(any(target_feature = "sse2", target_feature = "wasm32"))))]
pub use vec4_f32::*;

impl Vec4 {
  // common implementations
}

This simple code sample probably looks verbose compared to the earlier example, however when the module contains 50 or so functions a complete absence of any cfg checks inside each architecture’s module implementation is quite pleasant.

The advantages of doing things this way are:

  • Keeps different implementations separate, so there’s no need for complicated #[cfg(...)] blocks in every method.
  • The implementation for each architecture is normal code and easy to read.

Unfortunately I found a number of downsides to this approach:

  • The interface needs to be duplicated for each implementation - this is mitigated in glam somewhat by ensuring the interface has near 100% test coverage.
  • Documentation also needs to be duplicated for each implementation - this is a bit more annoying Adding additional architectures will start to make the library difficult to maintain due to the above issues

Having to duplicate the interface was annoying and even with a lot of tests sometimes I still made mistakes. Having to duplicate the docs to keep rust doc happy was really the reason I moved away from this method though.

You can see an example of this approach in glam 0.8.5.

Some improvements to this approach were suggested on reddit, see the addendum at the end of this post for more details.

cfg-if

The cfg-if crate is probably the de facto solution to this problem in the Rust ecosystem. The crate provides the cfg_if macro which supports similar functionality to the if/else if/else branches of the C preprocessor. Here’s our familiar sample code using cfg_if.

impl Vec4 {
  /// Returns element `x`.
  #[inline]
  pub fn x(self) -> f32 {
    cfg_if! {
      if #[cfg(all(not(feature = "no-intrinsics"), target_feature = "sse2"))] {
        unsafe { _mm_cvtss_f32(self.0) }
      } else if #[cfg(all(not(feature = "no-intrinsics"), target_feature = "wasm32"))] {
        unsafe { f32x4_extract_lane(self.0) }
      } else {
        self.0
      }
    }
  }
}

This is a pretty good option for mimicking the functionality provided by the C preprocessor and nicely solves the convoluted else block problem you run into using Rust’s cfg attribute on it’s own, but I ran into a few problems.

Aesthetically, I didn’t like having to wrap almost all of the glam code in a macro. This is a bit of a personal preference but I did run into a couple of tooling issues also.

Rustfmt did not format code blocks inside cfg_if macros. I like being able to write code fast and messy and then running cargo fmt to tidy it up when I’m done, so this was a bit of a hindrance to my workflow. The other tooling problem was that tarpaulin, the code coverage tool I am using did not like cfg_if at all it seemed and my test coverage mysteriously dropped from 87% to 73% which was a bit annoying.

The tooling issues are nothing to do with cfg_if and are most probably bugs that may well get fixed one day, but until that happens it felt like I’d gone backwards a bit.

You can see an example of this approach in glam 0.8.6.

build.rs

The cfg_if macro solves the convoluted else block problem that you encounter when using the cfg attribute on it’s own, but the other problem I wanted to solve was my if blocks were also a bit complicated and they were duplicated in every method. What I really wanted was the C preprocessor ability to create new defines. While you can’t do this in your crate, you can do it from build.rs.

The idea here is simplify the feature and target_feature options into a single cfg check. The other idea is to create a cfg for what would be the else condition. The advantage of doing this in the build.rs is I only need to maintain this complicated condition in one place, and I can use normal rust code to define it.

// build.rs
fn main() {
  let force_no_intrinsics = env::var("CARGO_FEATURE_NO_INTRINSICS").is_ok();

  let target_feature_sse2 = env::var("CARGO_CFG_TARGET_FEATURE")
    .map_or(false, |cfg| cfg.split(',').find(|&f| f == "sse2").is_some());

  if target_feature_sse2 && !force_no_intrinsics {
    println!("cargo:rustc-cfg=vec4sse2");
  } else if target_feature_wasm32 && !force_no_intrinsics {
    println!("cargo:rustc-cfg=vec4wasm32");
  } else {
    println!("cargo:rustc-cfg=vec4f32");
  }
}

Then in my Rust code I can check for this with the cfg attribute.

impl Vec4 {
  /// Returns element `x`.
  #[inline]
  pub fn x(self) -> f32 {
    #[cfg(vec4sse2)]
    unsafe {
      _mm_cvtss_f32(self.0)
    }

    #[cfg(vec4wasm32)]
    unsafe {
      f32x4_extract_lane(self.0)
    }

    #[cfg(vec4f32)]
    self.0
  }
}

Each cfg is mutually exclusive so there will never be more than one used in a single method. It’s a bit different if you are used to the C preprocessor if/else if/else conditionals but the end result achieves the same goal.

You can see an example of this approach in glam 0.8.7.

It is possible to pass key value cfg pairs from build.rs so I could have something like #[cfg(myfeature = "sse2")] if I liked, see the cargo rustc-cfg documentation for more details.

Also there’s no reason this technique of creating your own cfg keys with build.rs won’t also work with the other technique’s I’ve explored above.

Many yaks were shaved

This is the end of my journey in conditional compilation in Rust, for now. I’m not totally sure I’ll stick with the solution I’ve got. If I were to support many architectures in glam it may get too unwieldy to continue doing things this way, but that is a future problem.

By modules addendum

CAD1197 suggested some improvements to my by module approach which would address both the duplicated documentation and inconsistent public interface issues I was seeing in my original implementation. The idea is a public interface calls an internal method which is implemented in conditionally compiled modules.

// file: vec4.rs
// public interface with docs

#[cfg(all(not(feature = "no-intrinsics"), target_feature = "sse2"))]
mod vec4_sse2;
#[cfg(all(not(feature = "no-intrinsics"), target_feature = "wasm32"))]
mod vec4_wasm32;
#[cfg(any(feature = "no-intrinsics", not(any(target_feature = "sse2", target_feature = "wasm32"))))]
mod vec4_f32;

#[cfg(all(not(feature = "no-intrinsics"), target_feature = "sse2"))]
pub struct Vec4(__m128);

#[cfg(all(not(feature = "no-intrinsics"), target_feature = "wasm32"))]
pub struct Vec4(f32x4);

#[cfg(any(feature = "no-intrinsics", not(any(target_feature = "sse2", target_feature = "wasm32"))))]
pub struct Vec4(f32, f32, f32, f32);

impl Vec4 {
  /// Returns element `x`.
  #[inline]
  pub fn x(self) -> f32 {
    self._x()
  }
}
// vec4_sse2.rs
use super::Vec4;
impl Vec4 {
  #[inline(always)]
  pub(super) fn _x(self) -> f32 {
    unsafe { _mm_cvtss_f32(self.0) }
  }
}
// file: vec4_wasm32.rs
use super::Vec4;
impl Vec4 {
  #[inline(always)]
  pub(super) fn _x(self) -> f32 {
    unsafe { f32x4_extract_lane(self.0) }
  }
}
// file: vec4_f32.rs
use super::Vec4;
impl Vec4 {
  #[inline(always)]
  pub(super) fn _x(self) -> f32 {
    self.0
  }
}

I think if I were to support a lot of different architectures in glam I would switch to this approach. There is a lot more boilerplate than my current solution, however once the API is stable the boilerplate won’t need to change often.

If you have an comments or feedback you can reply to my posts on /r/rust or twitter.