-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
405B-FP16-Decode-tp8 : Segmentation fault #19574
Comments
Are you running out of memory here? (I just fixed an issue w.r.t. propagating errors) but if it works in some configurations, you might just be running out of memory on the system. |
Decode with 128/2048 have these failures and it's don't seem to be OOM issue, but I need to debug further to give exact root cause. |
Running module with
|
Testing on SharkMi300x-4, the module hits 99% memory usage for GPU-0 (the other 7 GPU's hit 95%) and then crashes. These are the GDB logs I am seeing for 405b decode w/ this patch right before the crash: I set a breakpoint in gdb in https://gist.github.com/aviator19941/78b4c0e8afd72a55a9d7355a4e84dd7b |
So the new async allocator has a higher watermark than the previous (the previous implementation would essentially stall the program entirely until the free could happen). You can try with the caching allocator to see if that would fix your problem, as it more closely mirrors the previous allocation strategy. |
What happened?
405B-FP16-Decode-tp8 is getting segmentation fault for both iree-run-module and iree-benchmark module when token size is 2048. for token size of 128, iree-run-module is working but iree-benchmark-module is getting seg fault.
command:
build: a43d893
Run the above commands on Shark MI300X machine.
Steps to reproduce your issue
What component(s) does this issue relate to?
Runtime
Version information
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: