Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

405B prefill tp8 sharded: iree/runtime/src/iree/hal/drivers/hip/hip_buffer.c:33: iree_hal_hip_buffer_t *iree_hal_hip_buffer_cast(iree_hal_buffer_t *): Assertion `!!(iree_hal_resource_is(base_value, &iree_hal_hip_buffer_vtable))' failed #19573

Open
pdhirajkumarprasad opened this issue Dec 31, 2024 · 3 comments
Labels
bug 🐞 Something isn't working

Comments

@pdhirajkumarprasad
Copy link

pdhirajkumarprasad commented Dec 31, 2024

What happened?

Getting error during iree-run-module/iree-benchmark-module for 405B-FP16-prefill-tp8-sharded:

iree-run-module: iree/runtime/src/iree/hal/drivers/hip/hip_buffer.c:33: iree_hal_hip_buffer_t *iree_hal_hip_buffer_cast(iree_hal_buffer_t *): Assertion `!!(iree_hal_resource_is(base_value, &iree_hal_hip_buffer_vtable))' failed.
Abort (core dumped)

build: a43d893

commands:

python3 -m sharktank.examples.export_paged_llm_v1 --bs=4 --irpa-file=/data/llama3.1/weights/405b/fp16/tp8/llama3.1_405b_fp16_tp8_parameters.irpa --output-mlir=405b_f16_prefill_tp8_nondecomposed.mlir --output-config=405b_f16_prefill_tp8_nondecomposed.json --skip-decode

iree-compile 405b_f16_prefill_tp8_nondecomposed.mlir --iree-hip-target=gfx942 -o=405b_prefill_sharded.vmfb --iree-hal-target-device="hip[0]" --iree-hal-target-device="hip[1]" --iree-hal-target-device="hip[2]" --iree-hal-target-device="hip[3]" --iree-hal-target-device="hip[4]" --iree-hal-target-device="hip[5]" --iree-hal-target-device="hip[6]" --iree-hal-target-device="hip[7]" --iree-dispatch-creation-enable-aggressive-fusion=true --iree-global-opt-propagate-transposes=true --iree-opt-aggressively-propagate-transposes=true --iree-opt-data-tiling=false --iree-preprocessing-pass-pipeline='builtin.module(util.func(iree-preprocessing-generalize-linalg-matmul-experimental))' --iree-hal-indirect-command-buffers=true --iree-stream-resource-memory-model=discrete --iree-hip-legacy-sync=false --iree-hal-memoization=true --iree-opt-strip-assertions

iree-benchmark-module --hip_use_streams=true --module=405b_prefill_sharded.vmfb --parameters=model=/home/sai/temp_dir_by_dhiraj/gitRepo/shark-ai/rerun_31st_dec/instruct/405b_fp16_tp8.irpa --parameters=model=/home/sai/temp_dir_by_dhiraj/gitRepo/shark-ai/rerun_31st_dec/instruct/405b_fp16_tp8.rank0.irpa --parameters=model=/home/sai/temp_dir_by_dhiraj/gitRepo/shark-ai/rerun_31st_dec/instruct/405b_fp16_tp8.rank1.irpa --parameters=model=/home/sai/temp_dir_by_dhiraj/gitRepo/shark-ai/rerun_31st_dec/instruct/405b_fp16_tp8.rank2.irpa --parameters=model=/home/sai/temp_dir_by_dhiraj/gitRepo/shark-ai/rerun_31st_dec/instruct/405b_fp16_tp8.rank3.irpa --parameters=model=/home/sai/temp_dir_by_dhiraj/gitRepo/shark-ai/rerun_31st_dec/instruct/405b_fp16_tp8.rank4.irpa --parameters=model=/home/sai/temp_dir_by_dhiraj/gitRepo/shark-ai/rerun_31st_dec/instruct/405b_fp16_tp8.rank5.irpa --parameters=model=/home/sai/temp_dir_by_dhiraj/gitRepo/shark-ai/rerun_31st_dec/instruct/405b_fp16_tp8.rank6.irpa --parameters=model=/home/sai/temp_dir_by_dhiraj/gitRepo/shark-ai/rerun_31st_dec/instruct/405b_fp16_tp8.rank7.irpa --device=hip://0 --device=hip://1 --device=hip://2 --device=hip://3 --device=hip://4 --device=hip://5 --device=hip://6 --device=hip://7 --function=prefill_bs4 \
--input=@/shark-dev/405b/prefill_args_bs4_2048_stride_32_tp8/tokens.npy \
  --input=@/shark-dev/405b/prefill_args_bs4_2048_stride_32_tp8/seq_lens.npy \
  --input=@/shark-dev/405b/prefill_args_bs4_2048_stride_32_tp8/seq_block_ids.npy \
  --input=@/shark-dev/405b/prefill_args_bs4_2048_stride_32_tp8/cs_f16_shard_0.npy \
  --input=@/shark-dev/405b/prefill_args_bs4_2048_stride_32_tp8/cs_f16_shard_1.npy \
  --input=@/shark-dev/405b/prefill_args_bs4_2048_stride_32_tp8/cs_f16_shard_2.npy \
  --input=@/shark-dev/405b/prefill_args_bs4_2048_stride_32_tp8/cs_f16_shard_3.npy \
  --input=@/shark-dev/405b/prefill_args_bs4_2048_stride_32_tp8/cs_f16_shard_4.npy \
  --input=@/shark-dev/405b/prefill_args_bs4_2048_stride_32_tp8/cs_f16_shard_5.npy \
  --input=@/shark-dev/405b/prefill_args_bs4_2048_stride_32_tp8/cs_f16_shard_6.npy \
  --input=@/shark-dev/405b/prefill_args_bs4_2048_stride_32_tp8/cs_f16_shard_7.npy --benchmark_repetitions=8

Try above command on Shark MI300X

Steps to reproduce your issue

What component(s) does this issue relate to?

Runtime

Version information

No response

Additional context

No response

@pdhirajkumarprasad pdhirajkumarprasad added the bug 🐞 Something isn't working label Dec 31, 2024
@aviator19941
Copy link
Contributor

Seems like 405b instruct prefill tp8 with --benchmark_repetitions=3 it works. Trying with 8 benchmark repetitions now to see if I can reproduce the error.

@aviator19941
Copy link
Contributor

aviator19941 commented Jan 2, 2025

Ok increasing the benchmark_repetitions to 8, I see a segfault (core dumped). Will try to debug with asan and --trace_execution flag with iree-run-module to get the console dump of all the instructions that get executed on the host side.

@aviator19941
Copy link
Contributor

aviator19941 commented Jan 3, 2025

GDB backtrace shows the crash might be happening in another thread not visible to us, so might not be too useful:

(gdb) bt
#0  0x00007f5c902e5630 in ?? ()
#1  0x00005555555b54ee in iree_allocator_free ()
#2  0x00005555555b5f40 in iree_status_ignore ()
#3  0x00005555555dd280 in iree_hal_hip_dispatch_thread_main ()
#4  0x000055555560f75b in iree_thread_start_routine ()
#5  0x00007ffff7894ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#6  0x00007ffff7926850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants