Thursday, September 23, 2021

Careful with that initializer

Initializer

 The other day I used some test coding and created a vector with some points like the following:


std::vector<Point> vecPoint{10};

 Unfortunately this didn't created a std::vector of 10 points. Instead this was intepreted as an initializer_list since the point class had a possiblity to be created from one argument. The code was somewhat similar to this one (but templated):


struct Point
{
   Point (int x = 0, int y = 0);
};

 The Point class was adjusted but the problem is also with std::vector. The ambiguity can easily be solved by introducing a jumper class as size argument:


struct Size
{
   Size(size_t n)
};

Size operator ""_sz(size_t n)
{
   return Size{n};
}

template<class T, class Allocator = std::allocator<T>>
class vector
{
public:
	vector(Size n);
};

 There is no ambiguity now:


void f()
{
  std::vector<Point> vec1{10_sz};  // 10 default points
  std::vector<Point> vec2{10};     // initialized with one point
}

Saturday, August 28, 2021

Domain objects vs primitive types

Domain objects

 In code bases I regularly stumble upon use of primitive types with a particular interpretation. Examples are:

  • int as seconds
  • int x, int y as point location
  • int w, int h as size
  • string for file- and directory names

Using primitive types has some drawbacks:

  • no implicit documentation value
  • error prone with any integer fits the argument
  • possibility to swap arguments, e.g. in case width and height

In this situation it's better to use domain objects which are compile time safe and have an intrinsic doucmentation value. For example in above case it it is preferred to use:

  • std::chrono::seconds
  • Point
  • Size
  • std::filesystem::path

A complete example for time becomes then as follow:


#include <chrono>

using namespace std::chrono_literals;

void SetDurationAlarm(int seconds);
void SetDuration(std::chrono::seconds s);

// client:
SetDuration(100);	// 100 what?
SetDuration(100s);	// 100 seconds no doubt

 Example primitve type arguments:


// correct but error prone

void SetRect(int l, int t, int r, int b);

// client
constexpr int x = 0;
constexpr int y = 0;
constexpr int w = 100;
constexpr int h = 200;

SetRect(x, y, w, h);	      // compiles but wrong
SetRect(x, y, x + w, y + h);  // ok

 It's better to use domain objects:


struct Point
{
   int  m_x;
   int  m_y;
};

struct Size
{
   int  m_cx;
   int  m_cy;
};

void SetRect(const Point& rpt, const Size& rsz);

constexpr Point pt{0, 0};
constexpr Size  size{100, 200};

SetRect(pt, size);

  A careful reader may argue that on the definiton of point and size primitive types are used. Indeed there is an object creation location where you have to take care but the rest of the argument passing is compile time safe when using domain objects as 'Point' and 'Size'.

 Example of argument passing of primitive types:


// correct but error prone

struct Image
{
   Image(int x, int y, int w, int h);   //watch out
};

// client
void FooImage(int x, int y, int w, int h)
{  
   Image img{x, y, w, h};
   
   SetRect(x, y, x + w, y + h);   
   SetLocation(x, y);
   SetSize(w, h);
}

 Example of using domain objects in argument passing:


struct Image
{
   Image(const Point& rpt, const Size& rsz);
};

void FooImage(const Point& rpt, const Size& rsz)
{  
   Image img{rpt, rsz};
   
   SetRect(rpt, rsz);   
   SetLocation(rpt);
   SetSize(rsz);
}

C++ standard

 Unfortuantely C++ does not define many frequently used primtive types like points, size and rects. <chrono> and <filesystem> were only added in C++11; before that there was no duration type and one had to use strings for files and directories. 

 Not sure of things become better in the future since the committee is more interested in adding yet another obscure language feature than adding much needed library support for standard use (e.g. networking; graphics; persistence; cryptography).

  So we ended up that many libraries define their own point, size and rect. This ofc unfortunate with possible conversions errors and that one has to learn a dozen of point classes; each with their own subtle semantics:

Windows POINT
MFC CPoint
Direct2D D2D1_POINT_2U
OpenCV cv::Point2i
Boost.Geometry boost::geometry::model::d2::point_xy
glm vec4

  Note: Boost.Geometry is actual flexible and in 3D vectors are used for points as well.

External links

  •  https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rp-direct

Wednesday, August 25, 2021

Unicode characters and ADO Jet.OleDB

Problem

 The other day we had a problem that databases with Chinese characters couldn't be opened with ADO. I am pretty sure this has worked years ago but that was probably Windows 7 back then. 

Example code


#include "tappch.hpp"
#include <cassert>
#include <comutil.h>
#include <filesystem>
#include <iostream>
#include <string>
#include <tchar.h>

#import "msado15.dll"  rename("EOF", "EndOfFile")

namespace
{
   const _bstr_t     g_bstrEmpty;
   constexpr wchar_t g_szDbTest[]   = _T("WL11-旷场.evxt.mdb");
   constexpr wchar_t g_szProvider[] = _T("Provider=Microsoft.Jet.OLEDB.4.0;Data Source='");
  
   std::wstring MakeConnectionStringOleDb(const std::filesystem::path& rpthDatabase)
   {
      const std::wstring strConnection = g_szProvider 
                                       + rpthDatabase.wstring()
                                       + _T("';Persist Security Info=False");
                                       
       return strConnection;
   }
}


int main()
{
   HRESULT hr = ::CoInitialize(nullptr);
      
   try
   {
      ADODB::_ConnectionPtr ptrConnection;   
      hr = ptrConnection.CreateInstance(__uuidof(ADODB::Connection));
         
      const _bstr_t bstrConnection = MakeConnectionStringOleDb(g_szDbTest).c_str();
      hr = ptrConnection->Open(bstrConnection, g_bstrEmpty, g_bstrEmpty, ADODB::adConnectUnspecified);
         
      const long n = ptrConnection->State;
         
      hr = ptrConnection->Close();
   }
   catch (const _com_error& re)
   {
      std::cout << re.Description() << std::endl;
   }
      
   ::CoUninitialize();
      
   return 0;
}

 I was trying to track down the bug in Visual Studio by spitting through assembly code of the Windows DLL of ADO and OleDB. Visual Studio tough crashed suddenly and it took all my carefully created breakpoint locations with it. So I gave up and assume a bug.

 

Sunday, August 15, 2021

Mapping enums to value

Mapping enums

  The STL has map and  std::unorderd_map to map enums to values. For example:

#include <string>
#include <unordered_map>

enum E
{
   e0,
   e1,
};

struct Foo
{
   Foo()
   {
      m_umap.emplace(e0, "Test1");
      m_umap.emplace(e1, "Test2");
   }

   std::string Find(E e) const
   {
      auto it = m_umap.find(e);
      
      return it != m_umap.cend() ? it->second : std::string{};
   }
   
   std::unordered_map<E, std::string>  m_umap;
};
 std::unordered_maps are fast with on average O(c) lookup (besides the hash function). Still it can be more optimal by using an array and use the enum as index:

#include <array>
#include <string>

enum E
{
   e0 = 0,
   e1,
   eEnd,
};

struct Foo
{
   Foo()
   {
      m_a[e0] = "Test1"
      m_a[e1] = "Test2"
   }

   std::string Find(E e) const
   {
      return m_a[e];
   }
   
   std::array<std::string, eEnd>  m_a;
};

This is the optimal form since with array's the values are in contiguous memory and the lookup is O(c) without a the need to calculate a hash beforehand. 

 Other alternatives are using a switch-case or linear lookup in a  std::vector.

 Some performance measurements with a 10 value enum: 


Method Time (s)
array 0.32
map 2.40
switch-case 1.11
unordered_map 1.60

Friday, August 6, 2021

Explicit member function template specialisation on Visual Studio 2019

Template member function templates

  Classes can have member function templates. These member templates can be explicit specialized. For example:

struct Foo
{
   template <typename T>
   void f()
   {
   }
};

template <>
inline void Foo::f<int>()
{
   // full specialization
}
It seems that Visual Studio 2019 allows for an illegal syntax with the full specialization declaration in the class itself:
struct Foo
{
   template <typename T>
   void f();
   
   // illegal?
   template <>
   void Foo::f<int>();
};

template <>
void Foo::f<int>()
{
}
Also the inline specifier isn't necessary then in Visual Studio without running into errors with duplicate symbols.

Links

Sunday, April 11, 2021

Return value optimization and assignment

Return value optimization

 'Return Value Optimization' (RVO) is a compiler optimization where it can apply copy elision in case a function returns an object by value. There are two flavors:

  • RVO
  • NRVO

 In the following example a function returns a class X:

class X
{
};

X Get()
{
   return X{};
}

int main()
{
   X x1 = Get();  // 1 constructor call
}

 Without RVO in above case there would be two constructor calls: one for creating the anonymous temporary and one to copy construct that to the call side.

 With RVO the compiler can directly construct the object in the call side location and circumvent the extra copy. In C++ 17 this is only guaranteed when returning anonymous variables.

 The mechanism works as if the function 'Get' has a hidden argument in which a class of type 'X' can be created directly. The MSDN article in the link below does a great job of explaining it.

Named return value optimization

With 'Named Return Value Optimization' (NRVO) the return variable inside the function has a name. Still the compiler is allowed to optimize this one away as well:

X Get()
{
   X x;  // named value

   return x;
}

int main()
{
   X x1 = Get();
}

  Be aware that the compiler is allowed but not required to remove the extra copy. Observations on Visual Studio 2019:

  • in _DEBUG mode two constructor calls: one inside the function and one copy constructor to the final destination.
  • in RELEASE mode only one constructor call. NRVO was applied here.

Assignment 

Things are less ideal when an object is assigned instead of constructed. Consider the following example:

int main()
{
   X x1;         // 1 constructor call
   x1 = Get();   // 1 constructor and one (move) assignment call
}

The code needs two constructor- and one assignment calls: The 'Get' function needs again a type X for its hidden argument and the compiler creates a temporary hidden object to be filled inside the function. This temporary is then assigned to call side variable 'x1'. 

 From a performance perspective this is less ideal to fill an object; especially if default constructors are relative heavy. It can be considered to switch back to the classic way of filling an object through argument. This can be an attractive alternative when assignment is a dominant use case:

void Fill(X* p)
{
}

int main()
{
   X x1;         // 1 constructor call
   Fill(&x1);
}

Links

Wednesday, March 24, 2021

FILE_ATTRIBUTE_UNPINNED

GetFileAttributes

 The other day I encountered an assert in some external library code. The code asserted that the high DWORD bits of the file attribute (from GetFileAttributes) should be zero. The file attribute value was 0x00100020. 

 It turned out that the high part comes from a file attribute defined in 'winnt.h':


#define FILE_ATTRIBUTE_UNPINNED             0x00100000

  This file attribute is not described in MSDN (yet). It seems to be related to 'OneDrive'.

Sunday, March 7, 2021

Enums in a container

Enums

 Enumerates (enums) are often used to distinguish mutual exclusive properties. An enum variable can have than only one value. There are other use cases where more than one enum value can be active at the same time. If these enums are unique and ascending it gives a possibility for optimized storage by using std::bitset over other STL containers.

 Container

 To allow non mutual exclusive enums and store them in a container the following options exist:

  • use a std::vector
  • use a std::bitset
  • enums are represented by unique
 using a std:vector is the traditional solution but using the alternative of a std::bitset is more optimal; both in performance as in less memory consumption. The last option is effectively the same as using bitset but enums are more naturllly sequentially ordened.

Example 

Suppose you have the case where used file types are represented by enums and there is a 'Content' class which gives an oversight of all used file types.


enum EFileTypes
{
   eFtBegin = 0,
   eFtData  = eFtBegin,
   eFtBitmap,
   eFtConfig,
   eFtEnd,
};

using std::vector 

Normally one uses a std::vector to store elements. Above example would then become like this:


#include <algorithm>
#include <vector>


struct Content
{
   void AddFileType(EFileTypes eFt)
   {
      if (!HasFileType(eFt))
      {
         m_vecTypes.push_back(eFt);
      }
   }

   bool HasFileType(EFileTypes eFt) const
   {
      return std::find(m_vecTypes.cbegin(), m_vecTypes.cend(), eFt) != m_vecTypes.cend();
   }

   std::vector<EFileTypes>	m_vecTypes;
};

using std::bitset

 with a bitset one can use the (unique) enum value as position in a bitset:


#include <bitset>


struct Content
{
   void AddFileType(EFileTypes eFt)
   {
      m_btsTypes.set(eFt);
   }

   bool HasFileType(EFileTypes eFt) const
   {
      return m_btsTypes.test(eFt);
   }

   std::bitset<eFtEnd>	m_btsTypes;
};

 Using a bitset has thus the following advantages:

  • O (c) lookup to check if an enum is present
  • memory dense when there are only a couple of enums. Only when you have a large enum range (e.g. > 64) this can become a less attractive option since the bitset is always the size of the largest enum value
  • equality operator is easily made

 

Sunday, January 31, 2021

Multiplication and the optimizer

Test case

 the other day I was profiling the difference between floating point and integer multiplication. To test the multiplication contribution with the least amount of overhead contribution the functions did a number of multiplications. 

floating point

The test for floating point case was as follows:


double PrfMultiply(size_t nMax, double d, double dNumber)
{
   for (size_t n = 0; n != nMax; ++n)
   {
      d *= dNumber;
      d *= dNumber;
      d *= dNumber;
      d *= dNumber;

      d += (d < 100 ? 100.0 : 0);
   }

   return d;
}

  For x64 the Visual Studio compiler uses SSE instructions for floating point for numbers so it's no surprise then to see four (scalar) multiplication instructions in the generated code:


00007FF7C0D41700  mulsd       xmm7,xmm1  
00007FF7C0D41704  mulsd       xmm7,xmm1  
00007FF7C0D41708  mulsd       xmm7,xmm1  
00007FF7C0D4170C  mulsd       xmm7,xmm1 

Note: it's also remarkable that the compiler only uses the scalar SSE instruction and not invokes the packed variant (mulpd); From this table one can also see that (double) floating point multiplication lasts around 5 CPU cycles which makes it a fairly fast instruction

integer

  For integer the function looks almost the same:


size_t PrfIntegerMultiply(size_t nMax, size_t n, size_t nNumber)
{
   for (size_t n2 = 0; n2 != nMax; ++n2)
   {
      n *= nNumber;
      n *= nNumber;
      n *= nNumber;
      n *= nNumber;

      n -= (n > 1000 ? 995 : 0);
   }

   return n;
}

 It turns out that Visual Studio 2019 (in release mode) already optimized the four multiplication steps and coalesced them in one multiplication instruction (3^4 == 81 == 51h):


00007FF61C131840  imul        rbx,rbx,51h  

Conclusion

When profiling it's always advisable to look at the generated assembly code. Often the optimizer is smarter than you think or may even completely removes function invocation. This is especially the case with fixed predefined numbers and (static) functions in translation units.

Tuesday, January 12, 2021

CComPtr

IUnknown

 COM is Microsoft's component framework. It was created in the 90's but still used for native development and UWP. All components in this framework must support the IUnknown interface:


interface IUnknown
{
   virtual HRESULT STDMETHODCALLTYPE QueryInterface(REFIID riid, void** ppvObject) = 0;
   virtual ULONG   STDMETHODCALLTYPE AddRef() = 0;
   virtual ULONG   STDMETHODCALLTYPE Release() = 0;
};

This interface has three functions:

  • QueryInterface to query and access the components supported interfaces
  • AddRef to increment the reference count
  • Release to decrement the reference count. When the reference count goes to zero the component is destroyed.

 Client code must adhere to the protocol of incrementing the interface when using it and releasing the interface when done. Mostly components returned from functions are already incremented so that client code only need to decrement the reference count.

 Example

  For the example here the function to create error info is used. The address of a pointer must supplied and when successful the interface must be released:


ICreateErrorInfo* pErrorInfo = nullptr;

HRESULT hr = ::CreateErrorInfo(&pErrorInfo);

if (pErrorInfo)
{
   pErrorInfo->Release();
}

 One of the major error causes when using COM is that reference counts are not administered correctly:

  • when reference counts are more released than incremented by client code it causes cashes and access violations. This may happen beyond the fault location (in time and space).
  • when too few reference counts are released it may cause resource leaks. For example on my work in the past a colleague released one VMR9 interface too little. This resulted in a full thread leak since the VMR9 is a fat object.

 Smart pointer

 Luckily Microsoft has acknowledged this problem and created smart pointer classes for COM interfaces. There are two flavor's. One comes from the compiler support classes '_com_ptr_t'. The other one is CComPtr and comes from the ATL library. The CComPtr is desmontrated in the followign examples.

Code in above example becomes easier especially when there would be multiple return paths:


CComPtr<ICreateErrorInfo> ptrErrorInfo;
HRESULT hr = ::CreateErrorInfo(&ptrErrorInfo);

if (ptrErrorInfo)
{
   //no release necessary
}

 As usual with smart pointers they work transparent. One can assign them to other smart pointers or even return from functions:


CComPtr<ICreateErrorInfo> CreateErrorInfoPtr()
{
  CComPtr<ICreateErrorInfo> ptrErrorInfo;
  HRESULT hr = ::CreateErrorInfo(&ptrErrorInfo);

  return ptrErrorInfo;
}

const CComPtr<ICreateErrorInfo> ptr = CreateMyErrorInfoPtr();

 With returning raw pointers a leak is easily created when the client code ignores the created interface return value. In above case with smart pointers it is suboptimal but it wouldn't hurt when client code ignores the return value since the returned temporary object goes out of scope and releasing thereby the created interface.

 Implementation

  A possible implementation could be as follow (borrowed and modified from the original source):


template <class T>
class CComPtr
{
public:
   CComPtr()
      : m_p(nullptr)
   {
   }

   CComPtr(T* p)
      : m_p(p)
   {
      if (m_p != nullptr)
         m_p->AddRef();
   }

   CComPtr(const CComPtr& rptr)
      : CComPtr(rptr.m_p)
   {
   }

   ~CComPtr()
   {
      if (m_p)
         m_p->Release();
   }

   CComPtr& operator=(const CComPtr& rptr)
   {
      if (m_p != rptr.m_p)
      { 
         if (m_p) 
            m_p->Release();
         
         m_p = rptr.m_p;
         m_p->AddRef();
      }

      return *this;
   }

private:
   T*    m_p;
};

Conclusion

  CComPtr is one the best addition to COM programming. It greatly simplifies COM client implementaitons and solves almost completely all reference counting and tracking issues. It's actually hard to do wrong with CComPtr since it also asserts when a contained interface is overwritten accidently.

 'Effective COM' mentions in item 22 that 'smart interface pointers add at least as much complexity as they remove'. I strongly disagree as can be read from this article. The book mentions a small problem with old CComPtr which is also solved in the latest release of ATL.


Sunday, January 10, 2021

C++ solutions for some C issues

The C language

 The C language was co-developed with UNIX and played and important part in the ICT history. It was a small language with little overhead. Unfortunately in the use of it it had some issues as well:

  • dangling pointers
  • memory leaks
  • buffer overruns

 C++

  C++ original goal was to offer and object oriented programming language compatible with C. Later it incorporated generics as well in the form of 'templates'. 

 Modern C++ offers solutions for above issues:

  • smart pointers like unique_ptr and shared_ptr own a memory resource. The smart pointer releases the memory when the last reference to the smart pointer is going out of scope. This solves the problems of dangling pointers and memory leaks. Note that shared_ptr's are not completely opaque: they still have some sharp edges as well like the circular reference problem and the inability to used shared_from_this from a constructor
  • std::vector offers a safe way to manage a contiguous buffer. It has iterators for access and it automatically grows when elements are added. The memory is released when going out of scope. Again it offers an alternative for all problems above.
  • std::string is comparable with std::vector but is specialized for strings. C uses character arrays and they are prone for all of the above mentioned problems

Good C++ code can be as fast or even faster than corresponding C code. As usual one has to know the idioms and read the standard C++ books.

Example 

Suppose you have a function which fills a variable length buffer and do some processing on it. It has multiple early out paths.

 C case

In C this could be:


#include <stdlib.h>

bool f()
{
   size_t nLen = GetBufferLength();

   int* p = malloc(nLen * sizeof(int));
   
   if (!GetBuffer(p, nLen))
   {
      free(p);
      return false;
   }

   if (!EncodeBuffer(p, nLen))
   {
      free(p);
      return false;
   }
   
   free(p);
   return true;
}

 C++ case

 In C++ one can use std:vector as variable length buffer:


#include <vector>

bool f()
{
   const size_t nLen = GetBufferLength();

   std::vector<int>	vec(nLen);
   
   if (!GetBuffer(vec.data(), vec.size()))
   {
      return false;
   }

   if (!EncodeBuffer(vec.data(), vec.size()))
   {
      return false;
   }
   
   return true;
}

There is only one small performance drawback in using std::vector: its elements get default or zero initialized which may be an issue in case a huge buffer is allocated.

Sunday, January 3, 2021

Visual Studio std::pow implementations

pow

After an upgrade of Visual Studio 2017 to 2019 it was noticed that regression tests were failing with the new version. There were multiple causes; one of them was that (yet again) the implementation of std::pow had changed.

 The Visual Studio 2017 implementation uses a different code path for the common power 2 (square) case: it issues a simple multiplication. The implementation is something like this:


_Check_return_ inline double pow(_In_ double _Xx, _In_ int _Yx) noexcept
       {
       if (_Yx == 2)
              return (_Xx * _Xx);

       return (_CSTD pow(_Xx, static_cast<double>(_Yx)));
       }

 The Visual Studio 2019 implementation isn't available in source code form but the exceptional code path for calculating the square seems not present anymore. This gives (small) differences with some numbers, e.g. the square of '0.10000000055703842' gives a different result.

Alternative

Luckily boost offers an alternative for calculating squared and other integer powers known at compile time in its math library:


#include <boost/math/special_functions/pow.hpp>

constexpr double d = 0.10000000055703842;

const double d2 = boost::math::pow<2>(d);

 Using this function should give stable result for the coming upgrades of Visual Studio. It has also the extra benefit of better performance. Results of a test case with running many power calculations:

Function Time (s)
boost::math::pow<2> 0.241
std::pow 9.395

  Note that instead of using a power function direct multiplication is ofc also possible. Often though these power functions are fed with another calculated value which otherwise has to be duplicated or write down explicitly:


const double d = std::sqrt(boost::math::pow<2>(pt.x - x) + boost::math::pow<2>(pt.y - y));

 Boost's math::pow has the extra benefit of doing the least amount of multiplications in case the power is larger than 3.

Saturday, January 2, 2021

Careful with std::mutex

 recursive_mutex

 std::recursive_mutex can be locked again by the owning thread without blocking. A normal std::mutex doesn't support this behavior and attempt to lock it again from the owning thread is even undefined behavior.

 Still C++ experts promote std::mutex over std::recursive_mutex. The locking is more clear but programmers need to take extra care now.

Example

 Suppose a class with two data members (A and B) and the data needs to be protected from access by multiple threads. The class has two functions; one to change A and one to change A and B. The following implementation is wrong:


#include <mutex>

class Dummy
{
public:
   void SetA(int a)
   {
      std::unique_lock<std::mutex> lck(m_mtx);

      m_a = a;
   }

   void SetAB(int a, int b)
   {
      std::unique_lock<std::mutex> lck(m_mtx);

      m_b = b;

      SetA(a);  // error: mutex is still locked
   }

private:
   std::mutex  m_mtx;
   int         m_a;
   int         m_b;
};

  In the function SetAB the mutex is still locked when invoking SetA leading to undefined behavior.

 Fixing this by adding a SetB function and invoking this function is also incorrect:


class Dummy
{
public:
   void SetA(int a)
   {
      std::unique_lock<std::mutex> lck(m_mtx);

      m_a = a;
   }

   void SetB(int b)
   {
      std::unique_lock<std::mutex> lck(m_mtx);

      m_b = b;
   }

   void SetAB(int a, int b)
   {
      SetA(a);  
      SetB(b);  
   }

   // rest same as before
};

  This is wrong because of another reason: A and B are now not modified under the same lock. A race condition may emerge where A is already changed but B not yet. This may break an invariant if A and B need to be updated together.

 There are multiple solutions to this issue:

  • differentiate between locking and non locking (implementation) functions. Drawback is that the number of (private) member functions are increasing and thereby obfuscating the design
  • set member data A and B directly without using member or other functions. For simple functions like class above this is okay but many times the SetXxx member function does extra things (e.g. notifying clients). Duplicating these code is not attractive.

 Example of differentiate between locking and non locking functions:

class Dummy
{
public:
   void SetA(int a)
   {
      std::unique_lock<std::mutex> lck(m_mtx);

      SetAImpl(a);
   }

   void SetB(int b)
   {
      std::unique_lock<std::mutex> lck(m_mtx);

      SetBImpl(b);
   }

   void SetAB(int a, int b)
   {
      std::unique_lock<std::mutex> lck(m_mtx);

      SetAImpl(a);
      SetBImpl(b);
   }

private:
   void SetAImpl(int a)
   {
      m_a = a;
   }

   //etc.
   
   // rest same as before
};

  Above example is a simplified version. In real code classes may have dozens of member functions. Also the invocation of another member function when the mutex is already locked may happen indirect; e.g. through an event notification.

Careful with std::ranges

<ranges>   C++20 has added the the ranges library. Basically it works on ranges instead of iterators but added some subtle constraint...