Concurrency Bugs
Errors Observed in Real-world Usage #
In “ Problems With Concurrency” I mentioned that I see concurrency issues a lot. Let’s look at something I recently found in production:
|
|
Try it on the Go Playground.
This code is pretty similar to the one in “
Problems With Concurrency” before
it was modified in the blog. It makes nearly that same mistake and deadlocks on error conditions. Obviously1, the fix
is not to make this function more complex and use errs := make(chan error, 3)
.
What Are the Issues? #
This code very nicely demonstrates what you get when you start with concurrent code without thinking about the design.
Let first fix it, then analyze its problems. The task is divided into three independent calculations, where failure in any one of them results in the failure of the entire task. The synchronous version would be:
|
|
Where we see immediately that we don’t need the mutex, since the results don’t overlap. Transforming into a parallel version gives us:
|
|
Given that this task appears straightforward, why are there bugs in the production code?
The comments provide valuable insights, beginning with the directive to “calculate … in parallel”, but without considering the task’s purpose and communication methods. Consequently, it establishes … as synchronization points, along with … and …
The code wasn’t initially designed with correctness as the primary concern. Instead, it began as an asynchronous
version, employing go
as a substitute for subroutine calls, with fixes being implemented reactively as problems arose.
Regrettably, this pattern is frequently observed among junior Go developers.
Summary #
In designing asynchronous programs, it is often better to begin with a synchronous, correct version. This initial approach might even suffice in terms of performance, with parallelism introduced by a calling function, such as operating within one of multiple web requests. Additionally, it’s worth noting that concurrency doesn’t always need to be that extremely fine-grained, especially considering that the number of CPUs in a machine is limited.
Moreover, channels and synchronization points should be purposefully integrated, not employed as mere necessities. The need for error checking shouldn’t be retrospectively fixed by a channel with slapped-on synchronization primitives.
-
As mentioned before, addition of magic numbers reduces maintainability. ↩︎