Saturday, February 12, 2022

Mapping enums to enums

Mapping enums to enums

 There is sometimes the need to translate one enumerate value to another. For example let's have the case of enum 'A' and 'B':

enum A
{
   eA0 = 0,
   eA1,
   eA2,
};

enum B
{
   eB0 = 0,
   eB1,
   eB2,
};

A simple translation without asserting on non existing values could work as follow:

B Translate(A eA)
{
    B e = eB0;

    switch (eA)
    {
    case eA0: e = eB0;  break;
    case eA1: e = eB1;  break;
    case eA2: e = eB2;  break;
    }

    return e;
}

This is a straightforward one to one mapping. With Compiler explorer one can see that the Visual Studio 2019 compiler translates this in a bunch of compare and jump statements:

B Translate(A) PROC
        test    ecx, ecx
        je      SHORT $LN4@Translate
        sub     ecx, 1
        je      SHORT $LN5@Translate
        cmp     ecx, 1
        jne     SHORT $LN4@Translate
        mov     eax, 2
        ret     0
$LN5@Translate:
        mov     eax, 1
        ret     0
$LN4@Translate:
        xor     eax, eax
        ret     0
B Translate(A) ENDP   

In this case the value are exact the same so a more optimal solution would be that it just uses the value:

B Translate2(A eA)
{
    return static_cast<B>(eA);
}

In Compiler explorer it can be seen now that this is denser and faster:

B Translate2(EA) PROC
        mov     eax, ecx
        ret     0
EB Translate2(EA) ENDP  

It seems that Visual Studio doesn't recognize this optimization by itself.

More robust

Above cast can be dangerous if values do not correspond. One can trap for this and for accidental changes by adding some static_asserts

enum A
{
   eAbegin = 0,
   eA0     = ABegin,
   eA1,
   eA2,
   eAend
};

B Translate2(A eA)
{
    static_assert(eAbegin == eBbegin);
    static_assert(eAend == eBend);

    // or (verbose and requires updating after adding an enum):
    static_assert(eA0 == eB0);
    static_assert(eA1 == eB1);
    static_assert(eA2 == eB2);
    
    return static_cast<B>(eA);
}
 

Thursday, January 6, 2022

auto considered harmful

auto

  auto was once introduced to circumvent complex type deductions inside template functions. Lately however many C++ guru's think that one should use it as much as possible since the compiler always deduces the correct type. 

Counter arguments are as follow:

  1. the deduced type may be not what you want
  2. auto makes code harder to read.
  3.  introduces silent bug

unwanted deduction

 The canonical example of incorrect deduction is use in a for loop:

std::vector<std::string> vec;
for (auto a : vec)
{
}

  Here the variable 'a' will deduced to having type std::string so that a copy of the string is made on every loop. This is sub optimal and it would be better if the type was const std::string&. This is a known idiom and people are advised to use 'const auto&' but it illustrates that blindly using auto doesn't imply 'automatic' or 'best type' always.

 Another example is in the use of doubles vs floats. This can be important since there is a tradeoff between precision (double) and performance (float for GPU's).

readability

 This is a personal preference. Use of auto may reduce readability. Consider the following example:

auto a = GetPerson();

From these statement alone one cannot see if the return person is a (smart) pointer, a value or something else. This makes a difference; for example in case of using it in multi threaded environments. 

Consider the alternative where the type is made explicit:

std::shared_ptr<Person> ptr = GetPerson();

Its now clear that it was a pointer type and that the memory of returned object is taken care of. 

Consider another case:

Person* p = GetPerson();

  As a reader of the code it's clear now that one needs to question the ownership. Raw pointers are not recommended but sometimes they are unavoidable (e.g. external library like TensorRT).

 The readability can somewhat mitigated by use of Hungarian notation but in case of auto it's even more dangerous that the prefix will not reflect the (deduced) type.

bugs

 Previous topic already touches it. The use of auto may also coverup bugs. Recently I stumbled upon a case like this:

auto a = GetSerialize();

  It turned out that this the returned object was some kind of interface (again from TensorRT) which needed to be destroyed. This was completely unnoticed until it was written out:

IHostMemory* p = GetSerialize();

conclusion

  auto is a very good addition to the C++ language. It has its merit in original use case and with complex types where the actual type is not important. Iterators are a canonical example. As usual with any technique it can also hurt if you overuse it.

External links

Thursday, September 23, 2021

Careful with that initializer_list

initializer_list

 The other day I used some test coding and created a vector with some points like the following:

std::vector<Point> vecPoint{10};

 Unfortunately this didn't created a std::vector of 10 points. Instead this was intepreted as an initializer_list since the point class had a possiblity to be created from one argument. The code was somewhat similar to this one (but templated):

struct Point
{
   Point (int x = 0, int y = 0);
};

 The Point class was adjusted but the problem is also with std::vector. The ambiguity can easily be solved by introducing a jumper class as size argument. Example:

struct Size
{
   Size(size_t n)
};

Size operator ""_sz(size_t n)
{
   return Size{n};
}

template<class T, class Allocator = std::allocator<T>>
class vector
{
public:
    vector(Size n);
};

 There is no ambiguity now:

void f()
{
  std::vector<Point> vec1{10_sz};  // initialized with 10 default constructed points
  std::vector<Point> vec2{10};     // initialized with one point
}

Saturday, August 28, 2021

Domain objects vs primitive types

Domain objects

 In code bases I regularly stumble upon use of primitive types with a particular interpretation. Examples are:

  • int as seconds
  • int x, int y as point location
  • int w, int h as size
  • string for file- and directory names

Using primitive types has some drawbacks:

  • no implicit documentation value
  • error prone with any integer fits the argument
  • possibility to swap arguments, e.g. in case width and height

In this situation it's better to use domain objects which are compile time safe and have an intrinsic doucmentation value. For example in above case it it is preferred to use:

  • std::chrono::seconds
  • Point
  • Size
  • std::filesystem::path

A complete example for time becomes then as follow:


#include <chrono>

using namespace std::chrono_literals;

void SetDurationAlarm(int seconds);
void SetDuration(std::chrono::seconds s);

// client:
SetDuration(100);	// 100 what?
SetDuration(100s);	// 100 seconds no doubt

 Example primitve type arguments:


// correct but error prone

void SetRect(int l, int t, int r, int b);

// client
constexpr int x = 0;
constexpr int y = 0;
constexpr int w = 100;
constexpr int h = 200;

SetRect(x, y, w, h);	      // compiles but wrong
SetRect(x, y, x + w, y + h);  // ok

 It's better to use domain objects:


struct Point
{
   int  m_x;
   int  m_y;
};

struct Size
{
   int  m_cx;
   int  m_cy;
};

void SetRect(const Point& rpt, const Size& rsz);

constexpr Point pt{0, 0};
constexpr Size  size{100, 200};

SetRect(pt, size);

  A careful reader may argue that on the definiton of point and size primitive types are used. Indeed there is an object creation location where you have to take care but the rest of the argument passing is compile time safe when using domain objects as 'Point' and 'Size'.

 Example of argument passing of primitive types:


// correct but error prone

struct Image
{
   Image(int x, int y, int w, int h);   //watch out
};

// client
void FooImage(int x, int y, int w, int h)
{  
   Image img{x, y, w, h};
   
   SetRect(x, y, x + w, y + h);   
   SetLocation(x, y);
   SetSize(w, h);
}

 Example of using domain objects in argument passing:


struct Image
{
   Image(const Point& rpt, const Size& rsz);
};

void FooImage(const Point& rpt, const Size& rsz)
{  
   Image img{rpt, rsz};
   
   SetRect(rpt, rsz);   
   SetLocation(rpt);
   SetSize(rsz);
}

C++ standard

 Unfortuantely C++ does not define many frequently used primtive types like points, size and rects. <chrono> and <filesystem> were only added in C++11; before that there was no duration type and one had to use strings for files and directories. 

 Not sure of things become better in the future since the committee is more interested in adding yet another obscure language feature than adding much needed library support for standard use (e.g. networking; graphics; persistence; cryptography).

  So we ended up that many libraries define their own point, size and rect. This ofc unfortunate with possible conversions errors and that one has to learn a dozen of point classes; each with their own subtle semantics:

Windows POINT
MFC CPoint
Direct2D D2D1_POINT_2U
OpenCV cv::Point2i
Boost.Geometry boost::geometry::model::d2::point_xy
glm vec4

  Note: Boost.Geometry is actual flexible and in 3D vectors are used for points as well.

External links

  •  https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rp-direct

Wednesday, August 25, 2021

Unicode characters and ADO Jet.OleDB

Problem

 The other day we had a problem that databases with Chinese characters couldn't be opened with ADO. I am pretty sure this has worked years ago but that was probably Windows 7 back then. 

Example code


#include "tappch.hpp"
#include <cassert>
#include <comutil.h>
#include <filesystem>
#include <iostream>
#include <string>
#include <tchar.h>

#import "msado15.dll"  rename("EOF", "EndOfFile")

namespace
{
   const _bstr_t     g_bstrEmpty;
   constexpr wchar_t g_szDbTest[]   = _T("WL11-旷场.evxt.mdb");
   constexpr wchar_t g_szProvider[] = _T("Provider=Microsoft.Jet.OLEDB.4.0;Data Source='");
  
   std::wstring MakeConnectionStringOleDb(const std::filesystem::path& rpthDatabase)
   {
      const std::wstring strConnection = g_szProvider 
                                       + rpthDatabase.wstring()
                                       + _T("';Persist Security Info=False");
                                       
       return strConnection;
   }
}


int main()
{
   HRESULT hr = ::CoInitialize(nullptr);
      
   try
   {
      ADODB::_ConnectionPtr ptrConnection;   
      hr = ptrConnection.CreateInstance(__uuidof(ADODB::Connection));
         
      const _bstr_t bstrConnection = MakeConnectionStringOleDb(g_szDbTest).c_str();
      hr = ptrConnection->Open(bstrConnection, g_bstrEmpty, g_bstrEmpty, ADODB::adConnectUnspecified);
         
      const long n = ptrConnection->State;
         
      hr = ptrConnection->Close();
   }
   catch (const _com_error& re)
   {
      std::cout << re.Description() << std::endl;
   }
      
   ::CoUninitialize();
      
   return 0;
}

 I was trying to track down the bug in Visual Studio by spitting through assembly code of the Windows DLL of ADO and OleDB. Visual Studio tough crashed suddenly and it took all my carefully created breakpoint locations with it. So I gave up and assume a bug.

 

Sunday, August 15, 2021

Mapping enums to value

Mapping enums

  The STL has map and  std::unorderd_map to map enums to values. For example:

#include <string>
#include <unordered_map>

enum E
{
   e0,
   e1,
};

struct Foo
{
   Foo()
   {
      m_umap.emplace(e0, "Test1");
      m_umap.emplace(e1, "Test2");
   }

   std::string Find(E e) const
   {
      auto it = m_umap.find(e);
      
      return it != m_umap.cend() ? it->second : std::string{};
   }
   
   std::unordered_map<E, std::string>  m_umap;
};
 std::unordered_maps are fast with on average O(c) lookup (besides the hash function). Still it can be more optimal by using an array and use the enum as index:

#include <array>
#include <string>

enum E
{
   e0 = 0,
   e1,
   eEnd,
};

struct Foo
{
   Foo()
   {
      m_a[e0] = "Test1"
      m_a[e1] = "Test2"
   }

   std::string Find(E e) const
   {
      return m_a[e];
   }
   
   std::array<std::string, eEnd>  m_a;
};

This is the optimal form since with array's the values are in contiguous memory and the lookup is O(c) without a the need to calculate a hash beforehand. 

 Other alternatives are using a switch-case or linear lookup in a  std::vector.

 Some performance measurements with a 10 value enum: 


Method Time (s)
array 0.32
map 2.40
switch-case 1.11
unordered_map 1.60

Friday, August 6, 2021

Explicit member function template specialisation on Visual Studio 2019

Template member function templates

  Classes can have member function templates. These member templates can be explicit specialized. For example:

struct Foo
{
   template <typename T>
   void f()
   {
   }
};

template <>
inline void Foo::f<int>()
{
   // full specialization
}
It seems that Visual Studio 2019 allows for an illegal syntax with the full specialization declaration in the class itself:
struct Foo
{
   template <typename T>
   void f();
   
   // illegal?
   template <>
   void Foo::f<int>();
};

template <>
void Foo::f<int>()
{
}
Also the inline specifier isn't necessary then in Visual Studio without running into errors with duplicate symbols.

Links

C++ horrible aspects

C++ horrible aspects  Linus Torvalds described C++ as being a horrible language. While C++ has its dark corners I choose it any day ove...