Goroutines Are Cheap, but Not Free
Concurrency Is Not Parallelism1 #
Recently two interesting observations in “A Study of Real-World Data Races in Golang”2 caught my eye:
Observation 1. Developers using Go employ significantly more concurrency and synchronization constructs than in Java.
Observation 2. Developers using Go for programming microservices expose significantly more runtime concurrency than other languages such as Java, Python, and NodeJS used for the same purpose.
These observations agree with my experiences, and while easy concurrency is great it comes with its problems.3
If you are familiar with the intricacies of the Go scheduler jump to part four, otherwise let’s just take a step back:
Building Blocks #
Concurrency in Go builds upon
- Goroutines
- Channels
- Cancelation (in package
context
) - Function literals (closures, anonymous functions)
- Synchronization primitives (in package
sync
) - Result and error handling (or more generally communication between goroutines)
And while it can be argued that some of these points are not exactly part of Go’s program execution model, all of these elements are identifiable in code dealing with concurrency.
A Simple Example #
To start our journey, let us take a simple example: We want to calculate the 27. Fibonacci number (196,418), using a simple approach:
|
|
And call this function 1,000 times:
|
|
This is an easily understandable stand-in for a CPU-bound algorithm, so please do not send me better implementations - it is meant to burn CPU cycles.
Running this on a N5105 CPU gives us:
> go run fillmore-labs.com/blog/goroutines/cmd/try1
*** Finished 1000 runs in 1.47s - avg 1.47ms, stddev 18.4µs
So, our whole program takes 1.47 seconds on a single core. This is okay, but since the N5105 has four cores we can do better.
Let’s parallelize (yes, I know) this:
|
|
Ok, great. Off we go:
> go run fillmore-labs.com/blog/goroutines/cmd/try2
*** Finished 1000 runs in 5m25s - avg 325ms, stddev 6.49ms
This is pretty terrible. It takes more than 200 times as long as before while using all available cores.
One problem is easy to spot: We are creating two new goroutines and use the original one just to wait rather than do any meaningful work. Let us fix that:
|
|
Another try, then:
> go run fillmore-labs.com/blog/goroutines/cmd/try3
*** Finished 1000 runs in 1m51s - avg 111ms, stddev 3.10ms
Ok, that was an easy three times speed up. Much better, but we are still much slower than the single-core solution.
No One Writes Code Like That #
Oh? containerd
has
a ‘concurrent’ garbage collector that spawns
exponentially many goroutines:
|
|
This is some clever piece of code. The access to seen
is serialized through the grays
channel and every Node
posted to grays
increments the wait group and spawns a new goroutine.
Simply constructing a complete four-ary tree of height nine and comparing with a simple non-concurrent solution gives us the following result:
> go run fillmore-labs.com/blog/goroutines/cmd/gc
Concurrent: Found 349525 reachable nodes in 462ms
Non-Concurrent: Found 349525 reachable nodes in 166ms
Should containerd change its solution? No. This code is seven years old and seems to be unused. The constructed test tree is pathologic and the results will be different in practice due to many already seen nodes. Also calculating children might be CPU intensive.
The point I am making is that the presented kind of code exists in practice and while the go scheduler is good and forgives many mistakes, it is still often not used properly.
Summary #
Use of goroutines can make execution slower.
Especially interesting is the comparison of Parallel1
and Parallel2
. While both are bad solutions, I have often seen
the construct of having a goroutine exclusively for waiting instead of doing actual work, and it makes a dramatic
difference in this case.
… continued in part two.
-
Rob Pike. 2012. Concurrency is not Parallelism. Talk at Heroku Waza conference — January 2012 — <vimeo.com/49718712> — <go.dev/talks/2012/waza.slide> ↩︎
-
Milind Chabbi and Murali Krishna Ramanathan. 2022. A Study of Real-World Data Races in Golang. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation — June 2022 — Pages 474–489 — <doi.org/10.1145/3519939.3523720> ↩︎
-
Tengfei Tu, Xiaoyu Liu, Linhai Song, Yiying Zhang. 2019. Understanding Real-World Concurrency Bugs in Go. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems — April 2019 — Pages 865–878 — <doi.org/10.1145/3297858.3304069> ↩︎