[GR-60772] Java applications crashing with error SIGSEGV #10386

rudde0 · 2024-12-28T20:26:46Z

Describe the issue

Hello,
I'm experiencing a SIGSEGV crash issue on my Minecraft server, which is running OpenJDK Runtime Environment GraalVM CE 23.0.1+11.

This issue also occurs with Temurin 21/23 and the default OpenJDK versions. While this might not be a GraalVM-specific issue (since other JVMs exhibit the same behavior), I've been trying to resolve it for months. I wanted to open an issue with my favorite JVM distribution in hopes of receiving insights or potential solutions.

My Java servers crash randomly and consistently due to SIGSEGV errors. The event stack (crash address and reason) changes each time, but the main cause—SIGSEGV (segmentation fault)—remains the same.

Steps to reproduce the issue

Unfortunately, I don't have a clear way to reproduce the issue. It happens randomly—sometimes after an hour or a day of runtime, other times within minutes. While there must be an underlying cause, I haven't been able to identify it.

Describe GraalVM and your environment:

GraalVM Version: OpenJDK Runtime Environment GraalVM CE 23.0.1+11
JDK Major Version: 23
OS: Ubuntu 22.04.5 (kernel 5.15.0-128-generic) (also tested with 5.15.0-127-generic)
Architecture: AMD64
Hardware:
- CPUs: Ryzen 7 5800X, Ryzen 9 9950X
- Storage: Samsung 980, Samsung 990 PRO
- Motherboards: B550M-PRO, B650M
- RAM: Patriot 4x32GB 3600MHz, Kingston EXPO 6400MHz

More details

I've tried numerous troubleshooting steps to resolve the issue, including:

Enabling and disabling huge pages
Allocating more RAM / reducing RAM capacity
Increasing swap / disabling swap

I can't experiment with Ubuntu 24 because kernel 6.8 causes boot issues on my system. I previously opened a thread about it on Reddit.

Due to the size of the crash logs, I've uploaded them as attachments instead of including them directly in this report.

hs_err_pid2496996.log
_usr_lib_jvm_jdk-23.0.1+11_bin_java.0.txt
hs_err_pid105125.log

oubidar-Abderrahim · 2024-12-30T15:27:20Z

Thank you for sharing this, we'll take a look into it

oubidar-Abderrahim · 2025-01-03T10:03:22Z

Based on our team's analysis, this is an issue with G1GC. We'll continue investigating this to get to the root cause.
For you, a suitable workaround would be to select a different GC

rudde0 · 2025-01-03T11:00:00Z

Two days ago, I decreased the RAM speed from 4600 MHz to 3600 MHz and disabled AMD's EXPO functionality to test if the issue was caused by AMD. After that, I ran Memtest86+ with 12 passes on all memory (test took 42hrs). No errors were logged by the software. However, today another hs_err file was generated by my JVM application.

edit: The hs_err below is taken on Debian 12.8 (kernel 6.11)

hs_err_pid71604.log

I'll try to change GC now. Thank you

tkrodriguez · 2025-01-06T19:33:56Z

Your command line includes quite a few G1 specific configuration options. Have you tried leaving those out? It's possible those options are accidentally exposing you to some bug that would otherwise be rare.

I think setting max and min heap to the same value is probably not great in general so maybe drop the -Xms part?

Are you using virtual threads?

Problems like this are exceedingly hard to track down. One thing I'd suggest is to keep all your hs_err_pid files and look for some commonality between them. Crashes in the depths of the collector itself are often inscrutable so I'd focus on crashes in normal execution. The hs_err_pid2496996.log.log crash is dying while performing a virtual dispatch on RSI which is a heap like address that doesn't seem to be a valid object. From the crash it looks like it's being called from CraftAsyncTask.run

stack at sp + 0 slots: 0x00007f31c365f83c is at entry_point+2410 in (nmethod*)0x00007f31c365eb08
Compiled method (JVMCI) 52209173 160696   !   4       org.bukkit.craftbukkit.scheduler.CraftAsyncTask::run (807 bytes)

Keep an eye on the functions you are crashing in but also the values in the Register to memory mapping:. This is where HotSpot tries to decode the contents of registers into things it knows about. Commonality between the oops, classes and nmethods can provide a clue about where thing might be going wrong.

Could this be a problem with usage of JNA? Are there any debug options you can use with it?

Is there some JDK release where you don't have this problem?

rudde0 added the bug label Dec 28, 2024

oubidar-Abderrahim self-assigned this Dec 30, 2024

oubidar-Abderrahim changed the title ~~Java applications crashing with error SIGSEGV~~ [GR-60772] Java applications crashing with error SIGSEGV Jan 3, 2025

oubidar-Abderrahim assigned tkrodriguez and unassigned oubidar-Abderrahim Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GR-60772] Java applications crashing with error SIGSEGV #10386

[GR-60772] Java applications crashing with error SIGSEGV #10386

rudde0 commented Dec 28, 2024

oubidar-Abderrahim commented Dec 30, 2024

oubidar-Abderrahim commented Jan 3, 2025

rudde0 commented Jan 3, 2025 •

edited

Loading

tkrodriguez commented Jan 6, 2025

[GR-60772] Java applications crashing with error SIGSEGV #10386

[GR-60772] Java applications crashing with error SIGSEGV #10386

Comments

rudde0 commented Dec 28, 2024

Describe the issue

Steps to reproduce the issue

Describe GraalVM and your environment:

More details

oubidar-Abderrahim commented Dec 30, 2024

oubidar-Abderrahim commented Jan 3, 2025

rudde0 commented Jan 3, 2025 • edited Loading

tkrodriguez commented Jan 6, 2025

rudde0 commented Jan 3, 2025 •

edited

Loading