Saturday, August 28, 2021

Domain objects vs primitive types

Domain objects

 In code bases I regularly stumble upon use of primitive types with a particular interpretation. Examples are:

  • int as seconds
  • int x, int y as point location
  • int w, int h as size
  • string for file- and directory names

Using primitive types has some drawbacks:

  • no implicit documentation value
  • error prone with any integer fits the argument
  • possibility to swap arguments, e.g. in case width and height

In this situation it's better to use domain objects which are compile time safe and have an intrinsic doucmentation value. For example in above case it it is preferred to use:

  • std::chrono::seconds
  • Point
  • Size
  • std::filesystem::path

A complete example for time becomes then as follow:


#include <chrono>

using namespace std::chrono_literals;

void SetDurationAlarm(int seconds);
void SetDuration(std::chrono::seconds s);

// client:
SetDuration(100);	// 100 what?
SetDuration(100s);	// 100 seconds no doubt

 Example primitve type arguments:


// correct but error prone

void SetRect(int l, int t, int r, int b);

// client
constexpr int x = 0;
constexpr int y = 0;
constexpr int w = 100;
constexpr int h = 200;

SetRect(x, y, w, h);	      // compiles but wrong
SetRect(x, y, x + w, y + h);  // ok

 It's better to use domain objects:


struct Point
{
   int  m_x;
   int  m_y;
};

struct Size
{
   int  m_cx;
   int  m_cy;
};

void SetRect(const Point& rpt, const Size& rsz);

constexpr Point pt{0, 0};
constexpr Size  size{100, 200};

SetRect(pt, size);

  A careful reader may argue that on the definiton of point and size primitive types are used. Indeed there is an object creation location where you have to take care but the rest of the argument passing is compile time safe when using domain objects as 'Point' and 'Size'.

 Example of argument passing of primitive types:


// correct but error prone

struct Image
{
   Image(int x, int y, int w, int h);   //watch out
};

// client
void FooImage(int x, int y, int w, int h)
{  
   Image img{x, y, w, h};
   
   SetRect(x, y, x + w, y + h);   
   SetLocation(x, y);
   SetSize(w, h);
}

 Example of using domain objects in argument passing:


struct Image
{
   Image(const Point& rpt, const Size& rsz);
};

void FooImage(const Point& rpt, const Size& rsz)
{  
   Image img{rpt, rsz};
   
   SetRect(rpt, rsz);   
   SetLocation(rpt);
   SetSize(rsz);
}

C++ standard

 Unfortuantely C++ does not define many frequently used primtive types like points, size and rects. <chrono> and <filesystem> were only added in C++11; before that there was no duration type and one had to use strings for files and directories. 

 Not sure of things become better in the future since the committee is more interested in adding yet another obscure language feature than adding much needed library support for standard use (e.g. networking; graphics; persistence; cryptography).

  So we ended up that many libraries define their own point, size and rect. This ofc unfortunate with possible conversions errors and that one has to learn a dozen of point classes; each with their own subtle semantics:

Windows POINT
MFC CPoint
Direct2D D2D1_POINT_2U
OpenCV cv::Point2i
Boost.Geometry boost::geometry::model::d2::point_xy
glm vec4

  Note: Boost.Geometry is actual flexible and in 3D vectors are used for points as well.

External links

  •  https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rp-direct

Wednesday, August 25, 2021

Unicode characters and ADO Jet.OleDB

Problem

 The other day we had a problem that databases with Chinese characters couldn't be opened with ADO. I am pretty sure this has worked years ago but that was probably Windows 7 back then. 

Example code


#include "tappch.hpp"
#include <cassert>
#include <comutil.h>
#include <filesystem>
#include <iostream>
#include <string>
#include <tchar.h>

#import "msado15.dll"  rename("EOF", "EndOfFile")

namespace
{
   const _bstr_t     g_bstrEmpty;
   constexpr wchar_t g_szDbTest[]   = _T("WL11-旷场.evxt.mdb");
   constexpr wchar_t g_szProvider[] = _T("Provider=Microsoft.Jet.OLEDB.4.0;Data Source='");
  
   std::wstring MakeConnectionStringOleDb(const std::filesystem::path& rpthDatabase)
   {
      const std::wstring strConnection = g_szProvider 
                                       + rpthDatabase.wstring()
                                       + _T("';Persist Security Info=False");
                                       
       return strConnection;
   }
}


int main()
{
   HRESULT hr = ::CoInitialize(nullptr);
      
   try
   {
      ADODB::_ConnectionPtr ptrConnection;   
      hr = ptrConnection.CreateInstance(__uuidof(ADODB::Connection));
         
      const _bstr_t bstrConnection = MakeConnectionStringOleDb(g_szDbTest).c_str();
      hr = ptrConnection->Open(bstrConnection, g_bstrEmpty, g_bstrEmpty, ADODB::adConnectUnspecified);
         
      const long n = ptrConnection->State;
         
      hr = ptrConnection->Close();
   }
   catch (const _com_error& re)
   {
      std::cout << re.Description() << std::endl;
   }
      
   ::CoUninitialize();
      
   return 0;
}

 I was trying to track down the bug in Visual Studio by spitting through assembly code of the Windows DLL of ADO and OleDB. Visual Studio tough crashed suddenly and it took all my carefully created breakpoint locations with it. So I gave up and assume a bug.

 

Sunday, August 15, 2021

Mapping enums to value

Mapping enums

  The STL has map and  std::unorderd_map to map enums to values. For example:

#include <string>
#include <unordered_map>

enum E
{
   e0,
   e1,
};

struct Foo
{
   Foo()
   {
      m_umap.emplace(e0, "Test1");
      m_umap.emplace(e1, "Test2");
   }

   std::string Find(E e) const
   {
      auto it = m_umap.find(e);
      
      return it != m_umap.cend() ? it->second : std::string{};
   }
   
   std::unordered_map<E, std::string>  m_umap;
};
 std::unordered_maps are fast with on average O(c) lookup (besides the hash function). Still it can be more optimal by using an array and use the enum as index:

#include <array>
#include <string>

enum E
{
   e0 = 0,
   e1,
   eEnd,
};

struct Foo
{
   Foo()
   {
      m_a[e0] = "Test1"
      m_a[e1] = "Test2"
   }

   std::string Find(E e) const
   {
      return m_a[e];
   }
   
   std::array<std::string, eEnd>  m_a;
};

This is the optimal form since with array's the values are in contiguous memory and the lookup is O(c) without a the need to calculate a hash beforehand. 

 Other alternatives are using a switch-case or linear lookup in a  std::vector.

 Some performance measurements with a 10 value enum: 


Method Time (s)
array 0.32
map 2.40
switch-case 1.11
unordered_map 1.60

Friday, August 6, 2021

Explicit member function template specialisation on Visual Studio 2019

Template member function templates

  Classes can have member function templates. These member templates can be explicit specialized. For example:

struct Foo
{
   template <typename T>
   void f()
   {
   }
};

template <>
inline void Foo::f<int>()
{
   // full specialization
}
It seems that Visual Studio 2019 allows for an illegal syntax with the full specialization declaration in the class itself:
struct Foo
{
   template <typename T>
   void f();
   
   // illegal?
   template <>
   void Foo::f<int>();
};

template <>
void Foo::f<int>()
{
}
Also the inline specifier isn't necessary then in Visual Studio without running into errors with duplicate symbols.

Links

Sunday, April 11, 2021

Return value optimization and assignment

Return value optimization

 'Return Value Optimization' (RVO) is a compiler optimization where it can apply copy elision in case a function returns an object by value. There are two flavors:

  • RVO
  • NRVO

 In the following example a function returns a class X:

class X
{
};

X Get()
{
   return X{};
}

int main()
{
   X x1 = Get();  // 1 constructor call
}

 Without RVO in above case there would be two constructor calls: one for creating the anonymous temporary and one to copy construct that to the call side.

 With RVO the compiler can directly construct the object in the call side location and circumvent the extra copy. In C++ 17 this is only guaranteed when returning anonymous variables.

 The mechanism works as if the function 'Get' has a hidden argument in which a class of type 'X' can be created directly. The MSDN article in the link below does a great job of explaining it.

Named return value optimization

With 'Named Return Value Optimization' (NRVO) the return variable inside the function has a name. Still the compiler is allowed to optimize this one away as well:

X Get()
{
   X x;  // named value

   return x;
}

int main()
{
   X x1 = Get();
}

  Be aware that the compiler is allowed but not required to remove the extra copy. Observations on Visual Studio 2019:

  • in _DEBUG mode two constructor calls: one inside the function and one copy constructor to the final destination.
  • in RELEASE mode only one constructor call. NRVO was applied here.

Assignment 

Things are less ideal when an object is assigned instead of constructed. Consider the following example:

int main()
{
   X x1;         // 1 constructor call
   x1 = Get();   // 1 constructor and one (move) assignment call
}

The code needs two constructor- and one assignment calls: The 'Get' function needs again a type X for its hidden argument and the compiler creates a temporary hidden object to be filled inside the function. This temporary is then assigned to call side variable 'x1'. 

 From a performance perspective this is less ideal to fill an object; especially if default constructors are relative heavy. It can be considered to switch back to the classic way of filling an object through argument. This can be an attractive alternative when assignment is a dominant use case:

void Fill(X* p)
{
}

int main()
{
   X x1;         // 1 constructor call
   Fill(&x1);
}

Links

Wednesday, March 24, 2021

FILE_ATTRIBUTE_UNPINNED

GetFileAttributes

 The other day I encountered an assert in some external library code. The code asserted that the high DWORD bits of the file attribute (from GetFileAttributes) should be zero. The file attribute value was 0x00100020. 

 It turned out that the high part comes from a file attribute defined in 'winnt.h':


#define FILE_ATTRIBUTE_UNPINNED             0x00100000

  This file attribute is not described in MSDN (yet). It seems to be related to 'OneDrive'.

Sunday, March 7, 2021

Enums in a container

Enums

 Enumerates (enums) are often used to distinguish mutual exclusive properties. An enum variable can have than only one value. There are other use cases where more than one enum value can be active at the same time. If these enums are unique and ascending it gives a possibility for optimized storage by using std::bitset over other STL containers.

 Container

 To allow non mutual exclusive enums and store them in a container the following options exist:

  • use a std::vector
  • use a std::bitset
  • enums are represented by unique
 using a std:vector is the traditional solution but using the alternative of a std::bitset is more optimal; both in performance as in less memory consumption. The last option is effectively the same as using bitset but enums are more naturllly sequentially ordened.

Example 

Suppose you have the case where used file types are represented by enums and there is a 'Content' class which gives an oversight of all used file types.


enum EFileTypes
{
   eFtBegin = 0,
   eFtData  = eFtBegin,
   eFtBitmap,
   eFtConfig,
   eFtEnd,
};

using std::vector 

Normally one uses a std::vector to store elements. Above example would then become like this:


#include <algorithm>
#include <vector>


struct Content
{
   void AddFileType(EFileTypes eFt)
   {
      if (!HasFileType(eFt))
      {
         m_vecTypes.push_back(eFt);
      }
   }

   bool HasFileType(EFileTypes eFt) const
   {
      return std::find(m_vecTypes.cbegin(), m_vecTypes.cend(), eFt) != m_vecTypes.cend();
   }

   std::vector<EFileTypes>	m_vecTypes;
};

using std::bitset

 with a bitset one can use the (unique) enum value as position in a bitset:


#include <bitset>


struct Content
{
   void AddFileType(EFileTypes eFt)
   {
      m_btsTypes.set(eFt);
   }

   bool HasFileType(EFileTypes eFt) const
   {
      return m_btsTypes.test(eFt);
   }

   std::bitset<eFtEnd>	m_btsTypes;
};

 Using a bitset has thus the following advantages:

  • O (c) lookup to check if an enum is present
  • memory dense when there are only a couple of enums. Only when you have a large enum range (e.g. > 64) this can become a less attractive option since the bitset is always the size of the largest enum value
  • equality operator is easily made

 

C++ horrible aspects

C++ horrible aspects  Linus Torvalds described C++ as being a horrible language. While C++ has its dark corners I choose it any day ove...