Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump IREE to 3.1.0rc20250107 #773

Closed
wants to merge 3 commits into from
Closed

Conversation

monorimet
Copy link
Contributor

No description provided.

@monorimet monorimet requested a review from ScottTodd January 7, 2025 17:49
@monorimet
Copy link
Contributor Author

monorimet commented Jan 7, 2025

@AWoloszyn on this IREE version the SD server segfaults during load time for some tiny scheduler modules. The SDXL ci shows an opaque error but locally with AMD_LOG_LEVEL=3 we at least get more information:

:3:hip_module.cpp           :74  : 436640535928 us: [pid:2560090 tid:0x7f75a8e99740]  hipModuleGetFunction ( 0x7ffdccb2faf0, 0x56132c3946d0, run_step$async_dispatch_0_elementwise_65536_f16 ) 
:3:hip_module.cpp           :88  : 436640535931 us: [pid:2560090 tid:0x7f75a8e99740] hipModuleGetFunction: Returned hipSuccess : 
:3:hip_module.cpp           :181 : 436640535934 us: [pid:2560090 tid:0x7f75a8e99740]  hipFuncSetAttribute ( 0x56132c452450, 8, 0 ) 
:3:hip_module.cpp           :185 : 436640535937 us: [pid:2560090 tid:0x7f75a8e99740] hipFuncSetAttribute: Returned hipSuccess : 
:3:hip_context.cpp          :237 : 436640535940 us: [pid:2560090 tid:0x7f75a8e99740]  hipCtxPopCurrent ( char array:<null> ) 
:3:hip_context.cpp          :250 : 436640535943 us: [pid:2560090 tid:0x7f75a8e99740] hipCtxPopCurrent: Returned hipSuccess : 
:3:hip_context.cpp          :254 : 436640535955 us: [pid:2560090 tid:0x7f75a8e99740]  hipCtxPushCurrent ( context:0x561325ab7660 ) 
:3:hip_context.cpp          :264 : 436640535959 us: [pid:2560090 tid:0x7f75a8e99740] hipCtxPushCurrent: Returned hipSuccess : 
:3:hip_context.cpp          :254 : 436640535963 us: [pid:2560090 tid:0x7f75a8e99740]  hipCtxPushCurrent ( context:0x561325ab7660 ) 
:3:hip_context.cpp          :264 : 436640535966 us: [pid:2560090 tid:0x7f75a8e99740] hipCtxPushCurrent: Returned hipSuccess : 
:3:hip_memory.cpp           :615 : 436640535971 us: [pid:2560090 tid:0x7f75a8e99740]  hipMalloc ( 0x7ffdccb31a70, 80064 ) 
:3:rocdevice.cpp            :2418: 436640535992 us: [pid:2560090 tid:0x7f75a8e99740] Device=0x561325aa63e0, freeMem_ = 0x24847278b0
:3:hip_memory.cpp           :617 : 436640536002 us: [pid:2560090 tid:0x7f75a8e99740] hipMalloc: Returned hipSuccess : 0x7f28e11d8000: duration: 31 us
:3:hip_context.cpp          :237 : 436640536008 us: [pid:2560090 tid:0x7f75a8e99740]  hipCtxPopCurrent ( char array:<null> ) 
:3:hip_context.cpp          :250 : 436640536011 us: [pid:2560090 tid:0x7f75a8e99740] hipCtxPopCurrent: Returned hipSuccess : 
:3:hip_context.cpp          :254 : 436640536017 us: [pid:2560090 tid:0x7f75a8e99740]  hipCtxPushCurrent ( context:0x561325ab7660 ) 
:3:hip_context.cpp          :264 : 436640536020 us: [pid:2560090 tid:0x7f75a8e99740] hipCtxPushCurrent: Returned hipSuccess : 
:3:hip_memory.cpp           :1257: 436640536025 us: [pid:2560090 tid:0x7f75a8e99740]  hipHostRegister ( 0x7f75a285bc00, 80064, 2 ) 
:3:hip_memory.cpp           :1259: 436640539731 us: [pid:2560090 tid:0x7f75a8e99740] hipHostRegister: Returned hipSuccess : 
:3:hip_memory.cpp           :3579: 436640539738 us: [pid:2560090 tid:0x7f75a8e99740]  hipHostGetDevicePointer ( 0x7ffdccb317a8, 0x7f75a285bc00, 0 ) 
:3:hip_memory.cpp           :3593: 436640539743 us: [pid:2560090 tid:0x7f75a8e99740] hipHostGetDevicePointer: Returned hipSuccess : 
:3:hip_context.cpp          :237 : 436640539748 us: [pid:2560090 tid:0x7f75a8e99740]  hipCtxPopCurrent ( char array:<null> ) 
:3:hip_context.cpp          :250 : 436640539753 us: [pid:2560090 tid:0x7f75a8e99740] hipCtxPopCurrent: Returned hipSuccess : 
:3:hip_context.cpp          :254 : 436640539768 us: [pid:2560090 tid:0x7f75a8e99740]  hipCtxPushCurrent ( context:0x561325ab7660 ) 
:3:hip_context.cpp          :264 : 436640539772 us: [pid:2560090 tid:0x7f75a8e99740] hipCtxPushCurrent: Returned hipSuccess : 
:3:hip_memory.cpp           :615 : 436640539774 us: [pid:2560090 tid:0x7f75a8e99740]  hipMalloc ( 0x7ffdccb318e0, 24 ) 
:3:rocdevice.cpp            :2418: 436640539782 us: [pid:2560090 tid:0x7f75a8e99740] Device=0x561325aa63e0, freeMem_ = 0x2484727898
:3:hip_memory.cpp           :617 : 436640539788 us: [pid:2560090 tid:0x7f75a8e99740] hipMalloc: Returned hipSuccess : 0x7f28e15fd000: duration: 14 us
:3:hip_context.cpp          :237 : 436640539790 us: [pid:2560090 tid:0x7f75a8e99740]  hipCtxPopCurrent ( char array:<null> ) 
:3:hip_context.cpp          :250 : 436640539794 us: [pid:2560090 tid:0x7f75a8e99740] hipCtxPopCurrent: Returned hipSuccess : 
:3:hip_context.cpp          :254 : 436640539802 us: [pid:2560090 tid:0x7f75a8e99740]  hipCtxPushCurrent ( context:0x561325ab7660 ) 
:3:hip_context.cpp          :264 : 436640539806 us: [pid:2560090 tid:0x7f75a8e99740] hipCtxPushCurrent: Returned hipSuccess : 
:3:hip_memory.cpp           :615 : 436640539808 us: [pid:2560090 tid:0x7f75a8e99740]  hipMalloc ( 0x7ffdccb318e0, 64 ) 
:3:rocdevice.cpp            :2418: 436640539812 us: [pid:2560090 tid:0x7f75a8e99740] Device=0x561325aa63e0, freeMem_ = 0x2484727858
:3:hip_memory.cpp           :617 : 436640539817 us: [pid:2560090 tid:0x7f75a8e99740] hipMalloc: Returned hipSuccess : 0x7f28e15fe000: duration: 9 us
:3:hip_context.cpp          :237 : 436640539821 us: [pid:2560090 tid:0x7f75a8e99740]  hipCtxPopCurrent ( char array:<null> ) 
:3:hip_context.cpp          :250 : 436640539827 us: [pid:2560090 tid:0x7f75a8e99740] hipCtxPopCurrent: Returned hipSuccess : 
:3:hip_context.cpp          :254 : 436640539925 us: [pid:2560090 tid:0x7f758f7f5640]  hipCtxPushCurrent ( context:0x561325ab7660 ) 
:3:hip_context.cpp          :264 : 436640539974 us: [pid:2560090 tid:0x7f758f7f5640] hipCtxPushCurrent: Returned hipSuccess : 
Segmentation fault (core dumped)

@ScottTodd
Copy link
Member

ScottTodd commented Jan 7, 2025

Looks like some shortfin unit tests on CPU are hanging, as they did on #747.

edit: For example: https://github.com/nod-ai/shark-ai/actions/runs/12656553538/job/35269366099?pr=773

@ScottTodd
Copy link
Member

Fixed the unit tests hanging with #777 and #778.

@AWoloszyn
Copy link
Contributor

AWoloszyn commented Jan 9, 2025

iree-org/iree#19645 should hopefully fix it, unless there is another problem as well.
I was able to startup sd just fine with the fix.

@monorimet monorimet closed this Jan 9, 2025
@monorimet
Copy link
Contributor Author

Closing in favor of #802

@ScottTodd ScottTodd deleted the iree-bump-3.1.0rc20250107 branch January 9, 2025 18:34
ScottTodd added a commit that referenced this pull request Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants