Sunday, January 31, 2021

Multiplication and the optimizer

Test case

 the other day I was profiling the difference between floating point and integer multiplication. To test the multiplication contribution with the least amount of overhead contribution the functions did a number of multiplications. 

floating point

The test for floating point case was as follows:


double PrfMultiply(size_t nMax, double d, double dNumber)
{
   for (size_t n = 0; n != nMax; ++n)
   {
      d *= dNumber;
      d *= dNumber;
      d *= dNumber;
      d *= dNumber;

      d += (d < 100 ? 100.0 : 0);
   }

   return d;
}

  For x64 the Visual Studio compiler uses SSE instructions for floating point for numbers so it's no surprise then to see four (scalar) multiplication instructions in the generated code:


00007FF7C0D41700  mulsd       xmm7,xmm1  
00007FF7C0D41704  mulsd       xmm7,xmm1  
00007FF7C0D41708  mulsd       xmm7,xmm1  
00007FF7C0D4170C  mulsd       xmm7,xmm1 

Note: it's also remarkable that the compiler only uses the scalar SSE instruction and not invokes the packed variant (mulpd); From this table one can also see that (double) floating point multiplication lasts around 5 CPU cycles which makes it a fairly fast instruction

integer

  For integer the function looks almost the same:


size_t PrfIntegerMultiply(size_t nMax, size_t n, size_t nNumber)
{
   for (size_t n2 = 0; n2 != nMax; ++n2)
   {
      n *= nNumber;
      n *= nNumber;
      n *= nNumber;
      n *= nNumber;

      n -= (n > 1000 ? 995 : 0);
   }

   return n;
}

 It turns out that Visual Studio 2019 (in release mode) already optimized the four multiplication steps and coalesced them in one multiplication instruction (3^4 == 81 == 51h):


00007FF61C131840  imul        rbx,rbx,51h  

Conclusion

When profiling it's always advisable to look at the generated assembly code. Often the optimizer is smarter than you think or may even completely removes function invocation. This is especially the case with fixed predefined numbers and (static) functions in translation units.

No comments:

Post a Comment

Careful with std::ranges

<ranges>   C++20 has added the the ranges library. Basically it works on ranges instead of iterators but added some subtle constraint...