In C++ there is a common idiom used when writing a low level interface that has different implementations for multiple architectures. That is to use the preprocessor to select the appropriate implementation at compile time. This pattern is frequently used in C++ math libraries.Read more...
Fast iteration times are something that many game developers consider to be of utmost importance. Keeping build times short is a major component of quick iteration for a programmer. Aside from the actual time spent compiling, any time you have to wait long enough that you start to lose focus on the activity you are working on, or you start to get distracted or lose track of what you were doing which costs you more time.
Thus one of my goals when writing
glam was to ensure it was fast to compile.
Rust compile times are known to be a bit slow compared to many other languages,
and I didn’t want to pour fuel on to that particular fire.
As part of writing
glam I also wrote
mathbench so I could compare
performance with similar libraries. I also always wanted to include build time
comparisons as part of
mathbench and I’ve finally got around to doing that
with a new tool called
glam is a simple and fast Rust linear algebra library for games and
In my last post on optimising my Rust path tracer with SIMD I had got within 10% of my performance target, that is Aras’s C++ SSE4.1 path tracer. From profiling I had determined that the main differences were MSVC using SSE versions of
cosf and differences between Rayon and enkiTS thread pools. The first thing I tried was implement an SSE2 version of
sin_cos based off of Julien Pommier’s code that I found via a bit of googling. This was enough to get my SSE4.1 implementation to match the performance of Aras’s SSE4.1 code. I had a slight advantage in that I just call
sin_cos as a single function versus separate
cos functions, but meh, I’m calling my performance target reached. Final performance results are at the end of this post if you just want to skip to that.
The other part of this post is about Rust’s runtime and compile time CPU feature detection and some wrong turns I took along the way.Read more...
Following on from path tracing in parallel with Rayon I had a lot of other optimisations I wanted to try. In particular I want to see if I could match the CPU performance of @aras_p’s C++ path tracer in Rust. He’d done a fair amount of optimising so it seemed like a good target to aim for. To get a better comparison I copied his scene and also added his light sampling approach which he talks about here. I also implemented a live render loop mimicking his.
My initial unoptimized code was processing 10Mrays/s on my laptop. Aras’s code (with GPGPU disabled) was doing 45.5Mrays/s. I had a long way to go from here!
tl;dr did I match the C++ in Rust? Almost. My SSE4.1 version is doing 41.2Mrays/s about 10% slower than the target 45.5Mrays/s running on Windows on my laptop. The long answer is more complicated but I will go into that later.Read more...