Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GR-60772] Java applications crashing with error SIGSEGV #10386

Open
rudde0 opened this issue Dec 28, 2024 · 4 comments
Open

[GR-60772] Java applications crashing with error SIGSEGV #10386

rudde0 opened this issue Dec 28, 2024 · 4 comments
Assignees
Labels

Comments

@rudde0
Copy link

rudde0 commented Dec 28, 2024

Describe the issue

Hello,
I'm experiencing a SIGSEGV crash issue on my Minecraft server, which is running OpenJDK Runtime Environment GraalVM CE 23.0.1+11.

This issue also occurs with Temurin 21/23 and the default OpenJDK versions. While this might not be a GraalVM-specific issue (since other JVMs exhibit the same behavior), I've been trying to resolve it for months. I wanted to open an issue with my favorite JVM distribution in hopes of receiving insights or potential solutions.

My Java servers crash randomly and consistently due to SIGSEGV errors. The event stack (crash address and reason) changes each time, but the main cause—SIGSEGV (segmentation fault)—remains the same.


Steps to reproduce the issue

Unfortunately, I don't have a clear way to reproduce the issue. It happens randomly—sometimes after an hour or a day of runtime, other times within minutes. While there must be an underlying cause, I haven't been able to identify it.


Describe GraalVM and your environment:

  • GraalVM Version: OpenJDK Runtime Environment GraalVM CE 23.0.1+11
  • JDK Major Version: 23
  • OS: Ubuntu 22.04.5 (kernel 5.15.0-128-generic) (also tested with 5.15.0-127-generic)
  • Architecture: AMD64
  • Hardware:
    • CPUs: Ryzen 7 5800X, Ryzen 9 9950X
    • Storage: Samsung 980, Samsung 990 PRO
    • Motherboards: B550M-PRO, B650M
    • RAM: Patriot 4x32GB 3600MHz, Kingston EXPO 6400MHz

More details

I've tried numerous troubleshooting steps to resolve the issue, including:

  • Enabling and disabling huge pages
  • Allocating more RAM / reducing RAM capacity
  • Increasing swap / disabling swap

I can't experiment with Ubuntu 24 because kernel 6.8 causes boot issues on my system. I previously opened a thread about it on Reddit.

Due to the size of the crash logs, I've uploaded them as attachments instead of including them directly in this report.

hs_err_pid2496996.log
_usr_lib_jvm_jdk-23.0.1+11_bin_java.0.txt
hs_err_pid105125.log

@rudde0 rudde0 added the bug label Dec 28, 2024
@oubidar-Abderrahim oubidar-Abderrahim self-assigned this Dec 30, 2024
@oubidar-Abderrahim
Copy link
Member

Thank you for sharing this, we'll take a look into it

@oubidar-Abderrahim
Copy link
Member

Based on our team's analysis, this is an issue with G1GC. We'll continue investigating this to get to the root cause.
For you, a suitable workaround would be to select a different GC

@oubidar-Abderrahim oubidar-Abderrahim changed the title Java applications crashing with error SIGSEGV [GR-60772] Java applications crashing with error SIGSEGV Jan 3, 2025
@rudde0
Copy link
Author

rudde0 commented Jan 3, 2025

Two days ago, I decreased the RAM speed from 4600 MHz to 3600 MHz and disabled AMD's EXPO functionality to test if the issue was caused by AMD. After that, I ran Memtest86+ with 12 passes on all memory (test took 42hrs). No errors were logged by the software. However, today another hs_err file was generated by my JVM application.

edit: The hs_err below is taken on Debian 12.8 (kernel 6.11)

hs_err_pid71604.log

I'll try to change GC now. Thank you

@tkrodriguez
Copy link
Member

Your command line includes quite a few G1 specific configuration options. Have you tried leaving those out? It's possible those options are accidentally exposing you to some bug that would otherwise be rare.

I think setting max and min heap to the same value is probably not great in general so maybe drop the -Xms part?

Are you using virtual threads?

Problems like this are exceedingly hard to track down. One thing I'd suggest is to keep all your hs_err_pid files and look for some commonality between them. Crashes in the depths of the collector itself are often inscrutable so I'd focus on crashes in normal execution. The hs_err_pid2496996.log.log crash is dying while performing a virtual dispatch on RSI which is a heap like address that doesn't seem to be a valid object. From the crash it looks like it's being called from CraftAsyncTask.run

stack at sp + 0 slots: 0x00007f31c365f83c is at entry_point+2410 in (nmethod*)0x00007f31c365eb08
Compiled method (JVMCI) 52209173 160696   !   4       org.bukkit.craftbukkit.scheduler.CraftAsyncTask::run (807 bytes)

Keep an eye on the functions you are crashing in but also the values in the Register to memory mapping:. This is where HotSpot tries to decode the contents of registers into things it knows about. Commonality between the oops, classes and nmethods can provide a clue about where thing might be going wrong.

Could this be a problem with usage of JNA? Are there any debug options you can use with it?

Is there some JDK release where you don't have this problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants