Thorough understanding of implicit type conversion in C + +

Time:2021-2-22

Implicit type conversion can be said to be our old friend, in the code we will more or less rely on C + + implicit type conversion.

Unfortunately, implicit type conversion is also a big hole in C + +. It’s easy to write all kinds of wonderful bugs without paying attention.

Therefore, I want to use this article to sort out the implicit type conversion of C + +, and avoid other people stepping into similar pits.

Article index

  • What is implicit type conversion
  • Basic review
    • Direct initialization
    • Replication initialization
    • Implicit conversion in type construction
  • How does implicit conversion work
    • Standard conversion
    • User defined transformation
    • Implicit transformation sequence
  • Problems caused by implicit transformation
    • Reference binding
    • Array degradation
    • Two step conversion
  • summary
    • reference material

What is implicit type conversion

To borrow from the standard, when you have only one type T1, but the current expression needs a value of type T2, if T1 is automatically converted to T2, then this is implicit type conversion.

If you think it’s too abstract, you can look at two examples. The first is the most common mixed value type:

int a = 0;
Long b = a + 1; // convert int to long

if (a == b) {
    //The default operator = = requires that the type of a is the same as that of B, so the conversion also takes place
}

The conversion from int to long is an upward conversion, which usually does not have a big problem, while the conversion from long to int is likely to cause data loss, so the latter should be avoided as far as possible.

The second example is the conversion from a custom type to a scalar type

std::shared_ptr ptr = func();
If (PTR) {// from shared_ Conversion from PTR to bool
    //Processing data
}

Because the user-defined implicit type conversion rules are provided, we can easily judge whether the smart pointer is empty or not. Here, bool is needed in if expression, so PTR is converted to bool, which is also called context conversion.

After understanding what implicit type conversion is, let’s take a look at those languages that don’t allow implicit conversion, such as golang:

var a int32 = 0;
var b int64 = 1;

fmt.Println(a + b) // error!
fmt.Println(int64(a) + b)

The compiler will tell you that different types cannot be evaluated. A more catastrophic example is as follows:

sleepDuration := 2.5
time.Sleep (  time.Duration (float64( time.Millisecond )* ratio)) // sleep for 2.5ms

It is a very simple code, but the multi-layer nested type conversion brings noise, and the readability of the code is seriously reduced.

This form of type conversion is called explicit type conversion. In C + +, it is as follows:

A a{1};
B b = static_cast(a);

static_castWhen it is used to convert a type to its related type, the user needs to specify the type to be converted toconst_castThey are responsible for explicit type conversion in C + +.

It can be seen that implicit type conversion can simplify the writing of code. But simplification is not without cost. Let’s talk about it in detail.

Basic review

Before introducing implicit type conversion, let’s review the basics and relax.

Direct initialization

The first is the direct initialization of the class.

As the name suggests, it is to explicitly call the constructor of a type for initialization. for instance:

struct A {
    A() = default;
    A(const A&) = default;
    A(int) {}
};

//This is the default initialization: a; pay attention to the distinction

List initialization of a A1 {}; // C + + 11
//A A2 () cannot be written because it would be considered a function declaration
A a2(1);
A A3 (A2); // Yes, calling the copy constructor explicitly is also a direct initialization

auto a4 = static_cast(1);

Note that A4 usesstatic_castThis step of converting to type T is also a direct initialization.

What’s the use of this initialization method? Direct initialization takes into account all constructors and does not ignore the explicit decorated constructors.

Calling constructors explicitly for direct initialization is actually a kind of explicit type conversion.

Replication initialization

Except for default initialization and direct initialization, the rest that will cause replication are basically replication initialization. The typical examples are as follows:

A func() {
    Return a {}; // the return value will be copied and initialized
}

A A5 = 1; // convert implicitly first, then copy initialization

Void func2 (a) {} // non referenced parameter passes are also copied

However, similarA a6 = {1}Instead of copy initialization, this is copy list initialization. The appropriate non explicit constructor will be directly selected for initialization instead of creating a temporary quantity to copy.

What is the role of replication initialization?

The first thing I think of is that it can create a copy of an object. Yes, but it has a more important role

If you want the value of a type T1 to be implicitly converted to T2, the two types must satisfy the call of this expressionT2 v2 = value

This form of expression is the copy initialization expression. As for the specific reasons, we will see in the next section.

Implicit conversion in type construction

Before entering this section, let’s look at a classic interview question:

std::string s = "hello c++";

How many strings have been created? If you blurt out one, the interviewer will probably give you a sly smile and ask you to go home and wait for the notice.

So what’s the answer? One or two. What, are you kidding me?

Don’t worry. Let’s discuss the situation. First, before C + + 11.

The expression in the topic before C + + 11 actually leads to the following behavior:

  1. first"hello c++"yesconst char[N]Type, but it degenerates intoconst char *
  2. Then, because s is actually in a “declare as defined” expression, only the copy constructor is applicable, not the overloaded one=
  3. So the right half of the equal sign must be the samestringtype
  4. Because it happens to be fromconst char *reachstringSo convert it to the appropriate type
  5. After the conversion, a newstringIt calls the copy constructor as a parameter
  6. After the copy constructor is called, the S is created.

Here, we ignore the black technologies such as write time replication of strings. The whole process creates s and a temporary quantity, two strings in total.

Soon C + + 11 appeared and brought mobile semantics, but the result did not change

  1. The previous steps are the same. The literal value of a string is implicitly converted to a string to create a temporary value
  2. The temporary quantity is a right value, so it is bound to the right value reference, so the move constructor is selected
  3. The data in the temporary quantity is moved to s, and S is created

Mobile semantics reduces unnecessary duplication of internal data, but temporary amounts are created.

Some friends who have been tampering with the compiler may have to say that the compiler does not generate this temporary quantity. Well, the compiler optimizes the code with copy elision.

Yes, copy ellipsis has been mentioned in C + + 11, but it was optional at that time and did not force the compiler to support this optimization. So what you observe on GCC and clang may not represent all the C + + compilers, so we still deduce the theoretical behavior based on the standard.

So far, the answer is 2, but soon something interesting happened – copy ellipsis became a standardized behavior in C + + 17.

In C + + 17, unless necessary, the temporary quantity (now called the result object of right value, a right value can only create a temporary variable when there is a temporary variable, this process is called materialization, and the temporary quantity created is the result object of the right value) will not be created, in other words,T obj = exprThis form will directly call the appropriate constructor with the result of expr instead of creating temporary quantity and calling the copy constructor. However, in order to ensure the semantic integrity, the copy constructor is still required to be accessible. After all, if the class itself does not allow the copy construction, the copy initialization itself is incorrect, and the error cannot be caused by the omission of copy The wrong code was compiled.

So now the process is like this:

  1. The compiler finds that the expression is a copy initialization of string
  2. On the right side is the expression that implicitly converts to a pure right value of string to initialize s of the same type
  3. Judge whether the copy constructor is available, and then find that it meets the condition of copy ellipsis
  4. Find out if there is a constructor in the string that meets the requirements
  5. eurekastring::string(const char *)And call
  6. S initialization complete

Therefore, only one string object will be created in C + + 17, which is more efficient than mobile semantics. That’s why I say the answer to the question can be either 1 or 2.

At the same time, we also find that the type conversion in copy construction exists no matter whether the copy is omitted or not, but in a different form. This is what we will talk about later.

How does implicit conversion work

After reviewing the basic knowledge, we can get to the point.

Implicit transformation can be divided into two parts: standard defined transformation and user-defined transformation. Let’s see what they are first.

Standard conversion

That is to say, there are some built-in type conversion rules in the compiler, such as array degradation to pointer, function conversion to function pointer, conversion required in specific context (bool type value required in if), integer type promotion, numeric conversion, conversion from data type pointer to void pointer, nullptr_ T to data type pointer conversion.

The underlying const and Volatie can also be converted, but they can only be added but not reducedT*convert toconst T*But the opposite is not possible.

These transformations are basically for scalar types and arrays, which are built-in aggregation types.

If you want to specify the conversion rules of user-defined type, you need to write the interface of user-defined type conversion.

User defined transformation

Having said so much, it’s time to look at the user-defined transformation.

There are only two user-defined conversion interfaces, conversion constructor and user-defined conversion function.

The conversion constructor is just likeT(T2)Such a constructor, which has an explicit T2 type parameter, can realize the effect of converting T2 type to T1 through this constructor.

User defined conversion functions are similaroperator T2()For such class methods, note that you do not need to specify a return value. It can realize the conversion from T1 to T2. The types that can be converted include their own T1 (which can also be appended with CV qualifiers or references), the base class (or references) of T1, and void.

for instance:

struct A {};

struct B {
    //Conversion constructor
    B(int);
    B(const A&);

    //The user-defined conversion function does not need to explicitly specify the return value
    operator A();
    operator int();
}

The above B defines the conversion rules, which can be converted from int and a to B, or from B to int and a.

It is not difficult to see that the rules are as follows:

T other types

The transformation constructor here means that there is noexplicitRestricted, some of which cannot be used for implicit type conversion.

Starting from C + + 11explicitIt can also be used for user-defined conversion functions, such as:

template 
struct SmartPointer {
    //...
    T *ptr = nullptr;
    //It is convenient to judge whether the pointer is empty
    explicit operator bool() {
        return ptr != nullptr;
    }
};

SmartPointer p = func();
if (p) {
    P < 1; // this is not allowed
}

Such a type conversion function can only be used for explicit initialization and type conversion required by specific context (for example, the conditional expression in if requires to return bool value, which is a kind of implicit conversion), so it can avoid the semantic error of annotation. Therefore, this kind of conversion function can not be used for other implicit conversion.

C + + 11 start function can automatically deduce the return value, template and auto push can also be used for custom conversion function

template 
struct SmartPointer {
    //...
    T *ptr = nullptr;
    explicit operator bool() {
        return ptr != nullptr;
    }

    //Matching template parameters
    operator T*() {
        return ptr;
    }

    /*Automatically push to the return value, synonymous with the previous one
    operator auto() {
        return ptr;
    }
    */
};

SmartPointer p = func();
int *p1 = p;

Finally, the user-defined conversion function can also be a virtual function, but the conversion function implemented by the subclass will only be called when it is distributed from the reference or pointer of the base class

struct D;
struct B {
    virtual operator D() = 0;
};
struct D : B
{
    operator D() override { return D(); }
};
 
int main()
{
    D obj;
    D obji2 = obj; // do not call D:: operator D ()
    B& br = obj;
    D obj3 = br; // call D:: operator d() through virtual dispatch 
}

User defined conversion functions cannot be static member functions of a class.

Implicit transformation sequence

After understanding the standard built-in conversion rules and user-defined conversion rules, it’s time to take a look at the working mechanism of implicit conversion.

For contexts that need to be implicitly converted, the compiler generates an implicit conversion sequence:

  1. Zero or one standard transformation sequence composed of standard transformation rules is called initial standard transformation sequence
  2. A user-defined transformation sequence consisting of zero or one user-defined transformation rule
  3. Zero or one standard conversion sequence composed of standard conversion rules is called the second standard conversion sequence

The second standard conversion sequence does not exist when the implicit conversion occurs on the parameter of the constructor.

The initial standard conversion sequence is easy to understand. Before calling the user-defined conversion, handle the value type well, such as adding CV qualifier:

struct A {};
struct B {
    operator A() const;
};

const B b;
const A &a = b;

The initial standard conversion sequence will first convert the value to an appropriate form for use by the user conversion sequenceoperator A() constI hope this is coming inconst B*Type, but for B, you can only get the address directlyB*It happens that there is a rule to add the underlying const in the standard conversion rules, so it is applicable.

If the value is of the right type and does not require any preprocessing, then the initial canonical transformation sequence does not do anything redundant.

If the first step fails to convert the appropriate type, the user-defined conversion sequence will be entered.

If the type is directly initialized, only the transform constructor will be called; if the type is copy initialized or reference bound, the transform constructor and user-defined transform function will determine who to use according to the overload resolution. In addition, if the conversion function is not const limited, the conversion function is preferred when both are feasible functions, such asoperator A();In this way, otherwise, it will report an error and ambiguity (when the test on GCC 10.2 shows ambiguity, the conversion constructor will be selected, and clang + + 11.0 is consistent with the standard description). This is also the reason why we reviewed the differences between several initializations, because the results may be different depending on the form of class construction.

After choosing a rule, you can go to the next step.

If it’s on the parameter of the constructor, the implicit conversion ends here. Besides, we need to do a third one.

The third part is to do some aftercare work for the type of value processed by the user’s conversion sequence. The reason why this step is not allowed to be performed on the parameters of the constructor is to prevent the over conversion from looping with the user conversion rules.

for instance:

struct A
{
    operator int() const;
};

A a;
bool b = a;

Here, a can only be converted to int, but in order to be lazy, we can implicitly convert a to boolA*Converted toconst A*As the implicit parameter of this class method, the user transformation sequenceconst A*Convert to int. int and bool are totally different types. What should I do?

This uses the second standard conversion sequence, here is the numerical conversion, int into bool.

However, the above is just an example. Please don’t write it like this, because there will be problems in the actual code

template 
struct SmartPointer {
    //...
    T *ptr = nullptr;
    operator bool() {
        return ptr != nullptr;
    }

    T& operator*() {
        return *ptr;
    }
};

auto ptr = get_smart_pointer();
if (ptr) {
    //PTR is the wrapper of int *. Now we want to get the value PTR points to
    int value = p;
    // ...
}

The above code will not have any compilation errors, however, it will cause serious runtime errors.

Why? Because as the comment says, we want to get the value that the pointer points to, but we forgot to dereference! In fact, to convert to int, the implicit conversion sequence is as follows:

  1. Initial standard conversion sequence: – > the current type has called the user’s conversion sequence requirements, and does nothing
  2. User defined conversion sequence: – > the closest type with conversion relationship to int is bool. Call this
  3. The second standard conversion sequence – > gets bool, the int of the target, and there are rules available for conversion

So your value will only have two values, 0 and 1. This is the first big hole caused by implicit conversion, and the problem reflected in the above code is called “safe bool”.

Fortunately, we can use itexplicitKick it out of the transformation sequence:

template 
struct SmartPointer {
    //...
    T *ptr = nullptr;
    explicit operator bool() {
        return ptr != nullptr;
    }
};

Then write it againint value = pThe compiler can detect and report errors in time.

Second, the original intention of the standard transformation sequence is to help us deal with the aftermath. After all, it is difficult for class writers to cover everything, but it also brings some pitfalls.

Another point to note is that the standard stipulates that if the user’s conversion sequence converts an lvalue (such as an lvalue reference) and the right value reference of the final conversion target, then the rule of lvalue conversion to right value in the standard conversion is not available, and the program cannot be compiled, for example:

struct A
{
    operator int&();
};

int&& b = A();

Compile the above code, G + + will reward youcannot bind rvalue reference of type ‘int&&’ to lvalue of type ‘int’

What if there is no feasible transformation in the implicit transformation sequence? It’s a pity that we can only compile and report errors.

Problems caused by implicit transformation

Now we know how implicit transformation works. And we’ve also seen how implicit casting can get into trouble.

Here’s how to deal with the aftermath of implicit type conversion and how to prevent it.

It’s time for a little spark to collide with the actual application.

Reference binding

The first problem is related to references. However, it is not so much the fault of implicit conversion as the fault of reference binding itself.

We know that for a type T, there are several reference types:

  • T&, t can only be bound to the lvalue of type T
  • const T&The reference of const t can be bound to the left and right values of T, and the left and right values of const t
  • T&&, the right value reference of T, can only be bound to the right value of T type
  • const T&&In general, you don’t see it, but when you’re talking to aconst T&usestd::moveYou’ll get it

The reference must be initialized at the same time of declaration, so the following code should be familiar to everyone:

int num = 0;
const int &a = num;
int &b = num;
int &&c = 100;
const int &d = 100;

New problems appear. Consider the running results of the following code:

int a = 10;
long &b = a;
std::cout << b << std::endl;

Isn’t it 10? Not really

c.cpp: In function ‘int main()’:
c.cpp:6:11: error: cannot bind non-const lvalue reference of type ‘long int&’ to an rvalue of type ‘long int’
    6 | long &b = a;
      |

It is very clear that an ordinary left value reference cannot be bound to a right value. Because a is int and B is long, if a wants to assign a value to B, it must be implicitly converted to long.

Unless the implicit conversion is converted to a reference type, it is usually a right value, so an error is reported here. The solution is simple:

long b1 = a;
const long &b2 = a;

Either directly copy and construct a new long type variable, and the value type variable can be initialized from the right value, or use the const left value reference, because it can be bound to the right value.

To expand, the same is true for function parameter passing

void func(unsigned int &)
{
    std::cout << "lvalue reference" << std::endl;
}

void func(const unsigned int &)
{
    std::cout << "const lvalue reference" << std::endl;
}

int main()
{
    int a = 1;
    func(a);
}

The result is “const lvalue reference”, which is why many tutorials ask you to use const lvalue reference as much as possible, because in addition to its own type T, such a function can also accept other data that can be converted to t as parameters through implicit type conversion, and the application cost is less than creating an object and copying it to initialization as parameters. Of course, the right value first matches the right value reference, so if it is visible asvoid func(unsigned int &&)If the overload exists, the overload will be called.

The most typical applications are as follows:

template 
void format_and_print(const std::string &s, Args&&... args)
{
    //Format and print the result
}

std::string info = "%d + %d = %d\n";
format_and_print(info, 2, 2, 4);
format_and_print("%d * %d = %d\n", 2, 2, 4);

As long as it can be implicitly converted to string, we can call our function directly.

Most importantly, implicit type conversions usually produce right values. (of course, it’s the same with explicit type conversions, but it’s easier to forget about implicit conversions.)

Array degradation

It is also a classic problem caused by implicit conversion: arrays degenerate into pointers in evaluation expressions.

Can you give the output of the following code:

void func(int arr[])
{
    std::cout << (sizeof arr) << std::endl;
}

int main()
{
    int a[100] = {0};
    std::cout << (sizeof a) << std::endl;
    func(a);
}

On my AMD64 Linux, the results of compiling and running with GCC 10.2 are 400 and 8, the latter is actually on this systemint*The size of the. Because sizeof is not evaluated and function parameter passing is evaluated, arrays degenerate into pointers.

What are the disadvantages of such implicit conversion? The answer is that the length of the array is lost. If you don’t know this and still use sizeof to find the size of the array in the function, it will not be a problem.

There are many solutions, such as the simplest template:

template 
void func(int (&arr)[N])
{
    std::cout << (sizeof arr) << std::endl; // 400
    std::cout << N << std::endl; // 100
}

Now n is 100, and sizeof returns 400, because sizeof a reference returns the size of the type that the reference points toint [100]

A simpler and more modern approach advocated by C + + is to abandon the original array, discard it as a heavy historical burden, and use it insteadstd::arrayAnd the comingstd::span. These more modern array substitutes can better replace the original array without problems such as implicit conversion to pointer.

Two step conversion

There are also many tutorials that will tell you that it is not allowed to convert more than once in implicit conversion. I used to call this kind of problem “two-step conversion”.

Why is it called two-step conversion? If we have three types of ABC, a can be converted to B, B can be converted to C, they are single-step conversion, and if we need to convert a to C, we need to convert a to B first, because a can not be directly converted to C, thus forming a conversion chainA -> B -> C, in which there are two type conversions, which I call two-step conversion.

Here is a typical “two-step transformation”:

struct A{
    A(const std::string &s): _s{s} {}
    std::string _s;
};

void func(const A &s)
{
    std::cout << s._s << std::endl;
}

int main()
{
    func("two-steps-implicit-conversion");
}

We know thatconst char*Can be implicitly converted to string, and then string can be implicitly converted to a:const char* -> string -> AAnd the function parameter is a constant lvalue reference, which should be bound to the right value generated by implicit conversion. However, compiling code with G + + results in the following:

test.cpp: In function 'int main()':
test.cpp:15:10: error: invalid initialization of reference of type 'const A&' from expression of type 'const char [30]'
   15 |     func("two-steps-implicit-conversion");
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
test.cpp:8:20: note: in passing argument 1 of 'void func(const A&)'
    8 | void func(const A &s)
      |           ~~~~~~~~~^

Sure enough, the report was wrong. But is this really the result of the two-step transformation? Let’s change the code a little bit

struct A{
    A(bool b)
    {
        _s = b ? "received true" : "received false";
    }
    std::string _s;
};

void func(const A &s)
{
    std::cout << s._s << std::endl;
}

int main()
{
    int num = 0;
    func(num); // received false
    unsigned long num2 = 100;
    func(num2); // received true
}

This time, it not only compiles, but also specifies-Wall -WextraThere will be no warning and the output is normal.

That’s strange. The two calls here areint -> bool -> Aandunsigned long -> bool -> AHow can the two-step conversion of stars be legal normal code?

In fact, the answer was given in the section of implicit transformation sequence

An implicit type conversion sequence includes an initial standard conversion sequence, a user-defined conversion sequence, and a second standard conversion sequence

That is to say, there is no two-step conversion problem. The conversion sequence itself can be converted at least once and at most three times. Two conversions, of course, are OK.

The only problem that will trigger is that there are two user-defined transformations, because only one user-defined transformation is allowed in the implicit transformation sequence, and the language standard also stipulates that more than one user-defined transformation is not allowed

At most one user-defined conversion (constructor or conversion function) is implicitly applied to a single value. — 12.3 Conversions [class.conv]

So this transformation chain:const char* -> string -> AIt’s problematic because both literal to string and string to a are user-defined conversions.

andint -> bool -> Aandunsigned long -> bool -> AThe first conversion is completed by the initial standard conversion sequence, and the second is user-defined conversion. The whole process is reasonable.

From this point of view, the tutorials are only half right. The crux of “two-step conversion” is that two user-defined type conversions cannot occur in one implicit conversion. This problem is called “two-step custom conversion”.

User defined type conversions can only appear in custom types, including standard libraries. So in other words, when you have oneA -> B -> CIn such an implicit conversion chain, if two of them are user-defined types, then the implicit conversion is wrong.

The only solution is to change the first user-defined conversion to explicit type conversion

struct A{
    A(const std::string &s): _s{s} {}
    std::string _s;
};

void func(const A &s)
{
    std::cout << s._s << std::endl;
}

int main()
{
    func(std::string{"two-steps-implicit-conversion"});
}

Now that there is only one custom transformation in the implicit transformation sequence, the problem will not occur.

summary

I believe that now you have a thorough understanding of C + + implicit type conversion, common pit should also be able to bypass.

But I still have to remind you, try not to rely on implicit type conversion, use moreexplicitAnd various explicit conversions, less take for granted.

Keep It Simple and Stupid.

reference material

https://zh.cppreference.com/w/cpp/language/copy_elision

http://www.cplusplus.com/doc/tutorial/typecasting/

https://en.cppreference.com/w/cpp/language/implicit_conversion

https://stackoverflow.com/questions/26954276/second-standard-conversion-sequence-of-user-defined-conversion

https://stackoverflow.com/questions/48576011/why-does-const-allow-implicit-conversion-of-references-in-arguments/48576055

https://zh.cppreference.com/w/cpp/language/cast_operator

https://www.nextptr.com/tutorial/ta1211389378/beware-of-using-stdmove-on-a-const-lvalue

https://en.cppreference.com/w/cpp/language/reference_initialization

https://stackoverflow.com/questions/12847272/multiple-implicit-conversions-on-custom-types-not-allowed

Recommended Today

Practice analysis of rust built-in trait: partialeq and EQ

Abstract:Rust uses traits in many places, from simple operator overloading to subtle features like send and sync. This article is shared from Huawei cloud community《Analysis of rust built-in trait: partialeq and EQ》Author: debugzhang Rust uses traits in many places, from simple operator overloading to subtle features like send and sync. Some traits can be automatically […]