Java Virtual Threads
Categories:
Reading my articles about Go concurrency a friend asked me whether one could something similar in Java.
Project Loom
Since the release JDK 21 Java has virtual threads1:
Thread.startVirtualThread(() -> {
System.out.println("Hello, world");
});
As an equivalent to Go’s goroutines:
go func() {
fmt.Println("Hello, world")
}()
A Simple Example
Like our experiments in Go, we implement2 a simple recursive calculation of the Fibonacci sequence:
1package com.fillmore_labs.blog.jvt;
2
3public final class Slow {
4 public static int fibonacci(int n) {
5 if (n < 2) {
6 return n;
7 }
8
9 var fn1 = fibonacci(n - 1);
10 var fn2 = fibonacci(n - 2);
11
12 return fn1 + fn2;
13 }
14}
Then call it 1,000 times:
1import com.fillmore_labs.blog.jvt.Slow;
2
3void main() {
4 for (int i = 0; i < 1_000; i++) {
5 // var queryStart = Instant.now();
6 Slow.fibonacci(27);
7 // var duration = Duration.between(queryStart, Instant.now());
8 }
9}
Running this on our good old N5105 CPU gives us:
> bazel run //:try1
INFO: Running command line: bazel-bin/try1
*** Finished 1000 runs in 1.219s - avg 1.214ms, stddev 48.555µs
Which is even a little faster3 than our Go version. Nice.
So, let’s try a naïve approach to parallelize things:
1package com.fillmore_labs.blog.jvt;
2
3public final class Parallel1 {
4 public static int fibonacci(int n) {
5 if (n < 2) {
6 return n;
7 }
8
9 var ff1 = new FutureTask<>(() -> fibonacci(n - 1));
10 Thread.startVirtualThread(ff1);
11 var ff2 = new FutureTask<>(() -> fibonacci(n - 2));
12 Thread.startVirtualThread(ff2);
13
14 return ff1.get() + ff2.get();
15 }
16}
Resulting in:
> bazel run //:try2
INFO: Running command line: bazel-bin/try2
*** Finished 1000 runs in 279.364s - avg 279.346ms, stddev 54.647ms
4 Minutes and 20 Seconds is a little better that what Go did, but still much slower than our single-threaded solution.
Analyzing Flame Graphs
If we look at the flame graph of the single-threaded run:

> bazel run //:bench1 -- -prof "async:output=flamegraph;direction=forward"
Iteration 1: 1220.789 ms/op
Benchmark Mode Cnt Score Error Units
Bench1.measure ss 1220.789 ms/op
We see a little time spent interpreting/compiling the program and mostly working on our Fibonacci implementation. Our naïve implementation looks like this:

We spend a lot of time blocked on a Mutex in the
JVM Tool Interface, maybe the global
JvmtiThreadState_lock?
Other Approaches
Anyway, we are not here to debug the JVM, let’s try some other approaches.
1package com.fillmore_labs.blog.jvt;
2
3import java.util.concurrent.ExecutorService;
4
5public record Parallel3(ExecutorService e) {
6 public int fibonacci(int n) {
7 if (n < 2) {
8 return n;
9 }
10
11 var ff1 = e.submit(() -> fibonacci(n - 1));
12 var fn2 = fibonacci(n - 2);
13
14 return ff1.get() + fn2;
15 }
16}
Sharing an ExecutorService and using the ‘original’ thread to do some work improves things:
> bazel run //:try3
INFO: Running command line: bazel-bin/try3
*** Finished 1000 runs in 179.452s - avg 179.426ms, stddev 41.363ms

Run 3 is faster (interestingly enough we loose to Go here) - but still slower that the single-threaded version.
So, let’s move parallelization to the calling function:
1import com.fillmore_labs.blog.jvt.Slow;
2import java.util.concurrent.Executors;
3
4void main() {
5 try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
6 for (int i = 0; i < 1_000; i++) {
7 // var queryStart = Instant.now();
8 executor.execute(() -> {
9 Slow.fibonacci(27);
10 // var duration = Duration.between(queryStart, Instant.now());
11 });
12 }
13 }
14}
> bazel run //:try4
INFO: Running command line: bazel-bin/try4
*** Finished 1000 runs in 349.151ms - avg 164.952ms, stddev 88.675ms

This has a similar flame graph than the single-threaded version and is approximately 3.5 times faster.
Improve Latency
Now let us limit the number of queued calls:
1import com.fillmore_labs.blog.jvt.Slow;
2import java.util.concurrent.Executors;
3import java.util.concurrent.Semaphore;
4
5void main() throws InterruptedException {
6 try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
7 var numCPU = Runtime.getRuntime().availableProcessors();
8 var pool = new Semaphore(numCPU);
9 for (int i = 0; i < 1_000; i++) {
10 // var queryStart = Instant.now();
11 pool.acquire();
12 executor.execute(
13 () -> {
14 Slow.fibonacci(27);
15 // var duration = Duration.between(queryStart, Instant.now());
16 pool.release();
17 });
18 }
19 }
20}
> bazel run //:try5
INFO: Running command line: bazel-bin/try5
*** Finished 1000 runs in 359.420ms - avg 1.697ms, stddev 665.871µs

Which improves our latency from 165ms to 1.7ms.
Summary
Exercises on how many threads can be started on a certain machine are mostly boring - this metric primarily showcases the small initial stack size of virtual threads.
Seeing Java adopt virtual threads is exciting. However, it’s unlikely that Java code will resemble Go or Erlang soon. Developing correct, efficient concurrent code is much more than just replacing one threading model with another4, also there are fundamental differences in existing (standard) libraries.
… continued in part two.
Ron Pressler, Alan Bateman. 2023. Virtual Threads. In JDK Enhancement Proposals — March 2023 — JEP 444 — <openjdk.org/jeps/444> ↩︎
The code is available on GitHub at github.com/fillmore-labs/blog-javavirtualthreads. ↩︎
This isn’t a comparison of Go and Java, at least not in terms of performance. Java excels in benchmarks and repetitive tasks. ↩︎
Alan Bateman. 2023. The Challenges of Introducing Virtual Threads to the Java Platform - Project Loom — August 2023 — JVM Language Summit 2023 — <youtu.be/WsCJYQDPrrE?t=667> ↩︎