Iterator Invalidation in Modern C++

This classic problem never went away and it may now be harder to detect.

Light Cone
4 min readSep 16, 2019

Veteran C++ programmers know the pitfalls of iterator invalidation. Iterators are often implemented as just wrapped pointers to elements in a container. However, some operations on a container that modify the internal data structure can invalidate currently existing iterators; in other words, make existing “pointers” now point to wrong things.

Consider std::vector. Its underlying data structure is just a contiguous array allocated on the heap. The push_back() and ( emplace_back()) can cause the array to be reallocated if the operation adds an item beyond its original capacity. This means that any previously existing iterators would be pointing to memory that has been deallocated! Using insert() would invalidate any iterators that are pointing to elements past the insert position by being off by one at best and pointing to deallocated memory at worst. The function resize() is an obvious one to watch out for, but so are functions like reserve() and shrink_to_fit().

This is not just a C++ problem. Other languages that have iterators have this issue as well. Consider this Python code:

The programmer may have intended for all items of value 1 to be removed from the list, but he would be surprised to find that only one of them has been removed. Similar issues exist in C# when working with Enumerators and modifying the data structure while "enumerating", i.e. iterating.

The difference in C++ is that, often, you don’t just get unexpected results like the one Python example above or get an exception thrown as in the C# case, but your program could go into undefined behavior land causing things like memory access violations or, worse yet, random memory corruption problems.

So, veteran programmers are careful to write code like this:

Okay, I was kidding. While the above is correct, real veteran programmers would write this instead:

When we see iterators and calls to any modifying methods on the container being iterated, a red flag goes up and we are watchful of potential iterator invalidation problems.

Unique Challenges in Modern C++

The interesting challenge that we face with modern C++ is that a quick scan of our code may not raise red flags. Consider the following example which is based on code I found in production:

Things look pretty good on the surface. No obvious iterators invalidation going on or is there?

What would you expect the result to be? The intent seems to be that all nodes with the value of “apple” be moved from v1 to v2. When compiled using gcc, we get the following unexpected result.

v1 size = 2
v1[0].val = apple
v2[1].val = orange

Nodes with the value “orange” should not have moved over to v2 and yet there it is.

It’s useful to note that the range-for expression for (const auto &n : v1) is really just syntactic shorthand for writing:

for (auto iter = v1.begin(); iter != v1.end(); ++iter) {
const auto &n = *iter;
...
}

And because, in the loop, we are calling transfer_by_id() which actually manipulates v1 (and incidentally also v2), we have container modification in an iterator loop, hence iterator invalidation.

Consider another example found in production, assuming the same struct node definition from above:

When I saw this, I was pretty flummoxed by the creative use of std::reduce. This algorithm is meant to reduce the entire range into a single value. But the programmer uses std::reduce to iterate over an iterator span returned by std::equal_range! Then during the loop, again, he makes modifications on the container with the call to remove_by_id() causing iterator invalidation.

The intent of the program is to remove all nodes with the value “apple” from v1. However, running the program above results in the following:

v1 size = 2
v1[0].val = apple
count = 2

Two elements were indeed removed, but a node named “apple” still remains!

Iterator invalidation as a problem did not go away when modern language additions such as range-for and shiny new algorithms in <algorithm> were introduced. Just because modern C++ hides some of the ugly scaffolding of doing business, it doesn't mean that you can be blissfully ignorant of the potential landmines that still lurk beneath the surface.

When new C++ language extensions and libraries were added, I had hoped that it would make the language more approachable by novice programmers. Instead, when novice programmers make classic mistakes such as iterator invalidation using new C++ syntax, it is more difficult for veteran programmers to detect the mistake! The real-life counterparts to the code I posted as samples passed the rigors of pull requests and code reviews by senior developers. And explaining the iterator invalidation problem to a new C++ programmer when the code he wrote seemingly do not even have iterators in them is an interesting challenge!

Of course, this sort of problem is not unique to modern C++. I had similar problems explaining to novice programmers that with managed languages like C# and Java, you do have to worry about memory allocation and usage. If you don’t, you can get performance problems like stuttering due to garbage collection happening at inopportune times or “memory leaks” by having references to objects that are never used and never collected. The shiny new coat of abstraction did not make the underlying problem go away. You can only use the shiny new syntax effectively if you know in-depth what is actually happening under the hood!

--

--

Light Cone
Light Cone

Written by Light Cone

There is no such thing as absolute time, but there is the concept of a light cone and, so far, it is absolute.

No responses yet