Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

405B-FP16-Decode-tp8 : Segmentation fault #19574

Open
pdhirajkumarprasad opened this issue Dec 31, 2024 · 6 comments
Open

405B-FP16-Decode-tp8 : Segmentation fault #19574

pdhirajkumarprasad opened this issue Dec 31, 2024 · 6 comments
Labels
bug 🐞 Something isn't working

Comments

@pdhirajkumarprasad
Copy link

What happened?

405B-FP16-Decode-tp8 is getting segmentation fault for both iree-run-module and iree-benchmark module when token size is 2048. for token size of 128, iree-run-module is working but iree-benchmark-module is getting seg fault.

command:

python3 -m sharktank.examples.export_paged_llm_v1 --bs=4 --irpa-file=/data/llama3.1/weights/405b/fp16/tp8/llama3.1_405b_fp16_tp8_parameters.irpa --output-mlir=405b_f16_decode_tp8_nondecomposed.mlir --output-config=405b_f16_decode_tp8_nondecomposed.json

iree-compile 405b_f16_decode_tp8_nondecomposed.mlir --iree-hip-target=gfx942 -o=405b_decode_sharded.vmfb --iree-hal-target-device="hip[0]" --iree-hal-target-device="hip[1]" --iree-hal-target-device="hip[2]" --iree-hal-target-device="hip[3]" --iree-hal-target-device="hip[4]" --iree-hal-target-device="hip[5]" --iree-hal-target-device="hip[6]" --iree-hal-target-device="hip[7]" --iree-dispatch-creation-enable-aggressive-fusion=true --iree-global-opt-propagate-transposes=true --iree-opt-aggressively-propagate-transposes=true --iree-opt-data-tiling=false --iree-preprocessing-pass-pipeline='builtin.module(util.func(iree-preprocessing-generalize-linalg-matmul-experimental))' --iree-hal-indirect-command-buffers=true --iree-stream-resource-memory-model=discrete --iree-hip-legacy-sync=false --iree-hal-memoization=true --iree-opt-strip-assertions


iree-run-module --hip_use_streams=true --module=405b_decode_sharded.vmfb --parameters=model=/data/llama3.1/weights/405b/fp16/tp8/llama3.1_405b_fp16_tp8_parameters.irpa --parameters=model=/data/llama3.1/weights/405b/fp16/tp8/llama3.1_405b_fp16_tp8_parameters.rank0.irpa --parameters=model=/data/llama3.1/weights/405b/fp16/tp8/llama3.1_405b_fp16_tp8_parameters.rank1.irpa --parameters=model=/data/llama3.1/weights/405b/fp16/tp8/llama3.1_405b_fp16_tp8_parameters.rank2.irpa --parameters=model=/data/llama3.1/weights/405b/fp16/tp8/llama3.1_405b_fp16_tp8_parameters.rank3.irpa --parameters=model=/data/llama3.1/weights/405b/fp16/tp8/llama3.1_405b_fp16_tp8_parameters.rank4.irpa --parameters=model=/data/llama3.1/weights/405b/fp16/tp8/llama3.1_405b_fp16_tp8_parameters.rank5.irpa --parameters=model=/data/llama3.1/weights/405b/fp16/tp8/llama3.1_405b_fp16_tp8_parameters.rank6.irpa --parameters=model=/data/llama3.1/weights/405b/fp16/tp8/llama3.1_405b_fp16_tp8_parameters.rank7.irpa --device=hip://0 --device=hip://1 --device=hip://2 --device=hip://3 --device=hip://4 --device=hip://5 --device=hip://6 --device=hip://7 --function=decode_bs4 --input=@/shark-dev/405b/decode_args_bs4_2048_stride_32_tp8/next_tokens.npy --input=@/shark-dev/405b/decode_args_bs4_2048_stride_32_tp8/seq_lens.npy --input=@/shark-dev/405b/decode_args_bs4_2048_stride_32_tp8/start_positions.npy --input=@/shark-dev/405b/decode_args_bs4_2048_stride_32_tp8/seq_block_ids.npy --input=@/shark-dev/405b/decode_args_bs4_2048_stride_32_tp8/cs_f16_shard_0.npy --input=@/shark-dev/405b/decode_args_bs4_2048_stride_32_tp8/cs_f16_shard_1.npy --input=@/shark-dev/405b/decode_args_bs4_2048_stride_32_tp8/cs_f16_shard_2.npy --input=@/shark-dev/405b/decode_args_bs4_2048_stride_32_tp8/cs_f16_shard_3.npy --input=@/shark-dev/405b/decode_args_bs4_2048_stride_32_tp8/cs_f16_shard_4.npy --input=@/shark-dev/405b/decode_args_bs4_2048_stride_32_tp8/cs_f16_shard_5.npy --input=@/shark-dev/405b/decode_args_bs4_2048_stride_32_tp8/cs_f16_shard_6.npy --input=@/shark-dev/405b/decode_args_bs4_2048_stride_32_tp8/cs_f16_shard_7.npy

build: a43d893

Run the above commands on Shark MI300X machine.

Steps to reproduce your issue

What component(s) does this issue relate to?

Runtime

Version information

No response

Additional context

No response

@pdhirajkumarprasad pdhirajkumarprasad added the bug 🐞 Something isn't working label Dec 31, 2024
@AWoloszyn
Copy link
Contributor

Are you running out of memory here? (I just fixed an issue w.r.t. propagating errors) but if it works in some configurations, you might just be running out of memory on the system.

@pdhirajkumarprasad
Copy link
Author

Decode with 128/2048 have these failures and it's don't seem to be OOM issue, but I need to debug further to give exact root cause.

@aviator19941
Copy link
Contributor

aviator19941 commented Jan 3, 2025

Running module with --trace_execution:

wget https://sharkpublic.blob.core.windows.net/sharkpublic/halo-models/llm-dev/llama3_405b/405b_decode_trace_execution.txt

@AWoloszyn
Copy link
Contributor

#19583

@aviator19941
Copy link
Contributor

aviator19941 commented Jan 3, 2025

Testing on SharkMi300x-4, the module hits 99% memory usage for GPU-0 (the other 7 GPU's hit 95%) and then crashes. These are the GDB logs I am seeing for 405b decode w/ this patch right before the crash:

I set a breakpoint in gdb in break runtime/src/iree/hal/drivers/hip/hip_allocator.c:670 and break runtime/src/iree/hal/drivers/hip/hip_allocator.c:679. It seems like the threads allocated in decode keep being used over and over again because it is still calling iree_hal_hip_allocator_alloc_async many more times compared to prefill.

https://gist.github.com/aviator19941/78b4c0e8afd72a55a9d7355a4e84dd7b

@AWoloszyn
Copy link
Contributor

So the new async allocator has a higher watermark than the previous (the previous implementation would essentially stall the program entirely until the free could happen). You can try with the caching allocator to see if that would fix your problem, as it more closely mirrors the previous allocation strategy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants