Java 21 Virtual Threads – Dude, Where’s My Lock? | by Netflix Technology Blog | Jul, 2024

0
287
Java 21 Virtual Threads – Dude, Where’s My Lock? | by Netflix Technology Blog | Jul, 2024


Getting actual with digital threads

Netflix Technology Blog

Netflix TechBlog

By Vadim Filanovsky, Mike Huang, Danny Thomas and Martin Chalupa

Netflix has an intensive historical past of utilizing Java as our major programming language throughout our huge fleet of microservices. As we decide up newer variations of Java, our JVM Ecosystem workforce seeks out new language options that may enhance the ergonomics and efficiency of our techniques. In a current article, we detailed how our workloads benefited from switching to generational ZGC as our default rubbish collector once we migrated to Java 21. Virtual threads is one other function we’re excited to undertake as a part of this migration.

For these new to digital threads, they’re described as “lightweight threads that dramatically reduce the effort of writing, maintaining, and observing high-throughput concurrent applications.” Their energy comes from their potential to be suspended and resumed mechanically by way of continuations when blocking operations happen, thus liberating the underlying working system threads to be reused for different operations. Leveraging digital threads can unlock larger efficiency when utilized within the acceptable context.

In this text we talk about one of many peculiar instances that we encountered alongside our path to deploying digital threads on Java 21.

Netflix engineers raised a number of unbiased studies of intermittent timeouts and hung situations to the Performance Engineering and JVM Ecosystem groups. Upon nearer examination, we seen a set of widespread traits and signs. In all instances, the apps affected ran on Java 21 with SpringBoot 3 and embedded Tomcat serving site visitors on REST endpoints. The situations that skilled the difficulty merely stopped serving site visitors although the JVM on these situations remained up and working. One clear symptom characterizing the onset of this situation is a persistent improve within the variety of sockets in closeWait state as illustrated by the graph beneath:

Sockets remaining in closeWait state point out that the distant peer closed the socket, nevertheless it was by no means closed on the native occasion, presumably as a result of the appliance failed to take action. This can usually point out that the appliance is hanging in an irregular state, through which case software thread dumps might reveal further perception.

In order to troubleshoot this situation, we first leveraged our alerts system to catch an occasion on this state. Since we periodically accumulate and persist thread dumps for all JVM workloads, we will usually retroactively piece collectively the conduct by inspecting these thread dumps from an occasion. However, we have been stunned to seek out that each one our thread dumps present a wonderfully idle JVM with no clear exercise. Reviewing current adjustments revealed that these impacted companies enabled digital threads, and we knew that digital thread name stacks don’t present up in jstack-generated thread dumps. To acquire a extra full thread dump containing the state of the digital threads, we used the “jcmd Thread.dump_to_file” command as a substitute. As a last-ditch effort to introspect the state of JVM, we additionally collected a heap dump from the occasion.

Thread dumps revealed 1000’s of “blank” digital threads:

#119821 "" digital

#119820 "" digital

#119823 "" digital

#120847 "" digital

#119822 "" digital
...

These are the VTs (digital threads) for which a thread object is created, however has not began working, and as such, has no stack hint. In reality, there have been roughly the identical variety of clean VTs because the variety of sockets in closeWait state. To make sense of what we have been seeing, we have to first perceive how VTs function.

A digital thread shouldn’t be mapped 1:1 to a devoted OS-level thread. Rather, we will consider it as a process that’s scheduled to a fork-join thread pool. When a digital thread enters a blocking name, like ready for a Future, it relinquishes the OS thread it occupies and easily stays in reminiscence till it is able to resume. In the meantime, the OS thread will be reassigned to execute different VTs in the identical fork-join pool. This permits us to multiplex plenty of VTs to only a handful of underlying OS threads. In JVM terminology, the underlying OS thread is known as the “carrier thread” to which a digital thread will be “mounted” whereas it executes and “unmounted” whereas it waits. An ideal in-depth description of digital thread is obtainable in JEP 444.

In the environment, we make the most of a blocking mannequin for Tomcat, which in impact holds a employee thread for the lifespan of a request. By enabling digital threads, Tomcat switches to digital execution. Each incoming request creates a brand new digital thread that’s merely scheduled as a process on a Virtual Thread Executor. We can see Tomcat creates a VirtualThreadExecutor right here.

Tying this data again to our drawback, the signs correspond to a state when Tomcat retains creating a brand new internet employee VT for every incoming request, however there aren’t any accessible OS threads to mount them onto.

What occurred to our OS threads and what are they busy with? As described right here, a VT will probably be pinned to the underlying OS thread if it performs a blocking operation whereas inside a synchronized block or technique. This is precisely what is going on right here. Here is a related snippet from a thread dump obtained from the caught occasion:

#119515 "" digital
java.base/jdk.inside.misc.Unsafe.park(Native Method)
java.base/java.lang.VirtualThread.parkOnCarrierThread(VirtualThread.java:661)
java.base/java.lang.VirtualThread.park(VirtualThread.java:593)
java.base/java.lang.System$2.parkVirtualThread(System.java:2643)
java.base/jdk.inside.misc.VirtualThreads.park(VirtualThreads.java:54)
java.base/java.util.concurrent.locks.LockAssist.park(LockAssist.java:219)
java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.purchase(AbstractQueuedSynchronizer.java:754)
java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.purchase(AbstractQueuedSynchronizer.java:990)
java.base/java.util.concurrent.locks.ReentrantLock$Sync.lock(ReentrantLock.java:153)
java.base/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:322)
zipkin2.reporter.inside.CountBoundedQueue.supply(CountBoundedQueue.java:54)
zipkin2.reporter.inside.AsyncReporter$BoundedAsyncReporter.report(AsyncReporter.java:230)
zipkin2.reporter.courageous.AsyncZipkinSpanHandler.finish(AsyncZipkinSpanHandler.java:214)
courageous.inside.handler.NoopAwareSpanHandler$CompositeSpanHandler.finish(NoopAwareSpanHandler.java:98)
courageous.inside.handler.NoopAwareSpanHandler.finish(NoopAwareSpanHandler.java:48)
courageous.inside.recorder.PendingSpans.end(PendingSpans.java:116)
courageous.RealSpan.end(RealSpan.java:134)
courageous.RealSpan.end(RealSpan.java:129)
io.micrometer.tracing.courageous.bridge.BraveSpan.finish(BraveSpan.java:117)
io.micrometer.tracing.annotation.SummaryMethodInvocationProcessor.after(SummaryMethodInvocationProcessor.java:67)
io.micrometer.tracing.annotation.CrucialMethodInvocationProcessor.proceedUnderSynchronousSpan(CrucialMethodInvocationProcessor.java:98)
io.micrometer.tracing.annotation.CrucialMethodInvocationProcessor.course of(CrucialMethodInvocationProcessor.java:73)
io.micrometer.tracing.annotation.SpanAspect.newSpanMethod(SpanAspect.java:59)
java.base/jdk.inside.mirror.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
java.base/java.lang.mirror.Method.invoke(Method.java:580)
org.springframework.aop.aspectj.SummaryAspectJAdvice.invokeAdviceMethodWithGivenArgs(SummaryAspectJAdvice.java:637)
...

In this stack hint, we enter the synchronization in courageous.RealSpan.end(RealSpan.java:134). This digital thread is successfully pinned — it’s mounted to an precise OS thread even whereas it waits to accumulate a reentrant lock. There are 3 VTs on this precise state and one other VT recognized as “<redacted> @DefaultExecutor - 46542” that additionally follows the identical code path. These 4 digital threads are pinned whereas ready to accumulate a lock. Because the app is deployed on an occasion with 4 vCPUs, the fork-join pool that underpins VT execution additionally comprises 4 OS threads. Now that we have now exhausted all of them, no different digital thread could make any progress. This explains why Tomcat stopped processing the requests and why the variety of sockets in closeWait state retains climbing. Indeed, Tomcat accepts a connection on a socket, creates a request together with a digital thread, and passes this request/thread to the executor for processing. However, the newly created VT can’t be scheduled as a result of the entire OS threads within the fork-join pool are pinned and by no means launched. So these newly created VTs are caught within the queue, whereas nonetheless holding the socket.

Now that we all know VTs are ready to accumulate a lock, the subsequent query is: Who holds the lock? Answering this query is vital to understanding what triggered this situation within the first place. Usually a thread dump signifies who holds the lock with both “- locked <0x…> (at …)” or “Locked ownable synchronizers,” however neither of those present up in our thread dumps. As a matter of reality, no locking/parking/ready data is included within the jcmd-generated thread dumps. This is a limitation in Java 21 and will probably be addressed sooner or later releases. Carefully combing by the thread dump reveals that there are a complete of 6 threads contending for a similar ReentrantLock and related Condition. Four of those six threads are detailed within the earlier part. Here is one other thread:

#119516 "" digital
java.base/java.lang.VirtualThread.park(VirtualThread.java:582)
java.base/java.lang.System$2.parkVirtualThread(System.java:2643)
java.base/jdk.inside.misc.VirtualThreads.park(VirtualThreads.java:54)
java.base/java.util.concurrent.locks.LockAssist.park(LockAssist.java:219)
java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.purchase(AbstractQueuedSynchronizer.java:754)
java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.purchase(AbstractQueuedSynchronizer.java:990)
java.base/java.util.concurrent.locks.ReentrantLock$Sync.lock(ReentrantLock.java:153)
java.base/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:322)
zipkin2.reporter.inside.CountBoundedQueue.supply(CountBoundedQueue.java:54)
zipkin2.reporter.inside.AsyncReporter$BoundedAsyncReporter.report(AsyncReporter.java:230)
zipkin2.reporter.courageous.AsyncZipkinSpanHandler.finish(AsyncZipkinSpanHandler.java:214)
courageous.inside.handler.NoopAwareSpanHandler$CompositeSpanHandler.finish(NoopAwareSpanHandler.java:98)
courageous.inside.handler.NoopAwareSpanHandler.finish(NoopAwareSpanHandler.java:48)
courageous.inside.recorder.PendingSpans.end(PendingSpans.java:116)
courageous.RealScopedSpan.end(RealScopedSpan.java:64)
...

Note that whereas this thread seemingly goes by the identical code path for ending a span, it doesn’t undergo a synchronized block. Finally right here is the sixth thread:

#107 "AsyncReporter <redacted>"
java.base/jdk.inside.misc.Unsafe.park(Native Method)
java.base/java.util.concurrent.locks.LockAssist.park(LockAssist.java:221)
java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.purchase(AbstractQueuedSynchronizer.java:754)
java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1761)
zipkin2.reporter.inside.CountBoundedQueue.drainTo(CountBoundedQueue.java:81)
zipkin2.reporter.inside.AsyncReporter$BoundedAsyncReporter.flush(AsyncReporter.java:241)
zipkin2.reporter.inside.AsyncReporter$Flusher.run(AsyncReporter.java:352)
java.base/java.lang.Thread.run(Thread.java:1583)

This is definitely a traditional platform thread, not a digital thread. Paying explicit consideration to the road numbers on this stack hint, it’s peculiar that the thread appears to be blocked inside the inside purchase() technique after finishing the wait. In different phrases, this calling thread owned the lock upon coming into awaitNanos(). We know the lock was explicitly acquired right here. However, by the point the wait accomplished, it couldn’t reacquire the lock. Summarizing our thread dump evaluation:

There are 5 digital threads and 1 common thread ready for the lock. Out of these 5 VTs, 4 of them are pinned to the OS threads within the fork-join pool. There’s nonetheless no data on who owns the lock. As there’s nothing extra we will glean from the thread dump, our subsequent logical step is to peek into the heap dump and introspect the state of the lock.

Finding the lock within the heap dump was comparatively simple. Using the wonderful Eclipse MAT instrument, we examined the objects on the stack of the AsyncReporter non-virtual thread to determine the lock object. Reasoning in regards to the present state of the lock was maybe the trickiest a part of our investigation. Most of the related code will be discovered within the AbstractQueuedSynchronizer.java. While we don’t declare to completely perceive the inside workings of it, we reverse-engineered sufficient of it to match towards what we see within the heap dump. This diagram illustrates our findings:

LEAVE A REPLY

Please enter your comment!
Please enter your name here