Sunday, February 20, 2022

Windows spotlight not working

Windows spotlight

 Windows spotlight is responsible for those nice aesthetic pictures on the logon screen. Somehow it was broken on my machine and I was deprived of this feature. Even when one selected the option (under 'settings', 'personalization' 'lock screen'), it would jump to normal picture mode. 

 Google didn't help at first (their search results become worse and worse over the years) until I got the tip from 'Software Test Tips': some necessary background apps where disabled. After enabling the app 'settings' (under 'settings'; 'privacy'; 'background apps') rebooting and waiting some time it works now.

Wednesday, February 16, 2022

Watch out for virtual function invocation

virtual function

 In some heavily invoked code the following construct was used:

interface ITable
{
   virtual std::string GetData() const = 0;
};

class Table : public ITable
{
public:
   virtual std::string GetData() const override
   {
      return std::string{"1"};
   }
};

class TableCt : public Table
{
public:
   long GetDataCt() const
   {
      const std::string strData = GetData();  // watch out
      return std::stol(strData);
   }
};

Client code invokes 'GetDataCt', e.g.:

void f()
{
    TableCt tbl;
    tbl.GetDataCt();  
}

in the real case TableCt wasn't the final class but GetData will not be overriden anywhere else. The code is functional correct but it takes a performance hit since base class function 'GetData' is invoked through vtable:

    TableCt tbl;
    tbl.GetDataCt();
00007FF653F710BC  lea         rdx,[rsp+30h]  
00007FF653F710C1  lea         rcx,[tbl]  
00007FF653F710C6  call        rax  

  On closer thought this is logical since the compiler must deal with 'GetData' being overwritten by a derived class. Therefore explicitly calling the base class function is needed to tell the compiler that it must use the base class:

class TableCt : public Table
{
   long GetDataCt() const
   {
      return std::stol(__super::GetData());
   }
};

 Using the keyword final could also do the trick too although it depends then on the cleverness of the compiler.

Sunday, February 13, 2022

Careful with profiling

Profiling

 The other day I was profiling a SSE optimized distance function and the more optimized form was 3 times as slow as the basic variant. The code was a bit like this (skipping the SSE variant):

template <typename T>
class Point
{
public:
   constexpr      Point     (T x, T y);

   constexpr T    GetX      () const;
   constexpr T    GetY      () const;

private:
   T              m_x;
   T              m_y;
};

template <typename T>
constexpr Point<T>::Point(T x, T y)
:  m_x(x)
,  m_y(y)
{
}

template <typename T>
constexpr T Point<T>::GetX() const
{
   return m_x;
}

template <typename T>
constexpr T Point<T>::GetY() const
{
   return m_y;
}

// explicit (DLL) instantiation
template class __declspec(dllexport) Point<double>;

double DistSqr(const Point<double>& rpt1, const Point<double>& rpt2)
{
    const double dx = rpt1.GetX() - rpt2.GetX();
    const double dy = rpt1.GetY() - rpt2.GetY();
   
    return (dx * dx) + (dy * dy);
}

It turned out that exported functions take a major performance hit; much larger than the two multiplications in 'DistSqr' function. The explicit exported template instantiation exports all functions; even constexpr and inline functions.The reason that calling exported function is slower:

  • invocation is a call instead of one simple memory read instruction
  • it suppresses other optimizations
  • just plain more instructions needed to transform data from 'Point' class to function

Removing the export the function was 3 times faster than without the exported attribute. The accessor functions 'GetX' are then inlined.

Lessons learned:

  • DLL and call invocations can harm performance
  • inspect the assembly

Note: accessor functions like 'GetX'  are prescribed by OOP but be aware of their potential performance cost.



Saturday, February 12, 2022

Mapping enums to enums

Mapping enums to enums

 There is sometimes the need to translate one enumerate value to another. For example let's have the case of enum 'A' and 'B':

enum A
{
   eA0 = 0,
   eA1,
   eA2,
};

enum B
{
   eB0 = 0,
   eB1,
   eB2,
};

A simple translation without asserting on non existing values could work as follow:

B Translate(A eA)
{
    B e = eB0;

    switch (eA)
    {
    case eA0: e = eB0;  break;
    case eA1: e = eB1;  break;
    case eA2: e = eB2;  break;
    }

    return e;
}

This is a straightforward one to one mapping. With Compiler explorer one can see that the Visual Studio 2019 compiler translates this in a bunch of compare and jump statements:

B Translate(A) PROC
        test    ecx, ecx
        je      SHORT $LN4@Translate
        sub     ecx, 1
        je      SHORT $LN5@Translate
        cmp     ecx, 1
        jne     SHORT $LN4@Translate
        mov     eax, 2
        ret     0
$LN5@Translate:
        mov     eax, 1
        ret     0
$LN4@Translate:
        xor     eax, eax
        ret     0
B Translate(A) ENDP   

In this case the value are exact the same so a more optimal solution would be that it just uses the value:

B Translate2(A eA)
{
    return static_cast<B>(eA);
}

In Compiler explorer it can be seen now that this is denser and faster:

B Translate2(EA) PROC
        mov     eax, ecx
        ret     0
EB Translate2(EA) ENDP  

It seems that Visual Studio doesn't recognize this optimization by itself.

More robust

Above cast can be dangerous if values do not correspond. One can trap for this and for accidental changes by adding some static_asserts

enum A
{
   eAbegin = 0,
   eA0     = ABegin,
   eA1,
   eA2,
   eAend
};

B Translate2(A eA)
{
    static_assert(eAbegin == eBbegin);
    static_assert(eAend == eBend);

    // or (verbose and requires updating after adding an enum):
    static_assert(eA0 == eB0);
    static_assert(eA1 == eB1);
    static_assert(eA2 == eB2);
    
    return static_cast<B>(eA);
}
 

Careful with std::ranges

<ranges>   C++20 has added the the ranges library. Basically it works on ranges instead of iterators but added some subtle constraint...