-
Notifications
You must be signed in to change notification settings - Fork 291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Resolve S3 crash issue #2120
base: master
Are you sure you want to change the base?
fix: Resolve S3 crash issue #2120
Conversation
…sdk to prevent exception
Can you review this PR? I closed #2119 and re-opened this. |
Sorry for the frequent updates. I just updated AWSLogSystem's destructor to call ShutdownAWSLogging(). Since current aws-sdk-cpp still uses static variables for logger and calling ShutdownAWSLogging also resolved issue. |
@mihaimaruseac |
I can't do much, sadly. I left Google's TF team years ago, only helping with reviews here and there (mostly OSS builds and security related). TF IO was never maintained by Google, so this is even harder to land. |
@yongtang |
Related issue: #1912
From v0.35.0, there appears to be an issue with build toolchain changes, resulting in the problem described in #1912 (pure virtual method called exception). After investigation, I believe I've identified the root cause.
While using S3 filesystem, I captured the program's backtrace (shown below in the toggle).
Click to toggle
Upon inspection of
CurlHandleContainer.cpp:27
, I found that it uses a logging macro defined here. This macro depends on static_variables which could lead to unsafe behavior during program termination. Since the pure virtual method called exception occurs during program exit, it's likely caused by the destruction order of static variables (logging-related static variables being destroyed before the destructor is called).As a temporary fix, I've removed the logging macros in CurlHandleContainer's destructor using bazel's patch_cmds. While this resolves the immediate issue, it may not be the optimal long-term solution. I'd appreciate review of this approach, considering our dependency on tensorflow==2.16 and S3 filesystem functionality.
Regarding the build.Dockerfile modification: I noticed that the tensorflow version should align with what's specified in tensorflow_io/python/ops/version_ops.py. The original script was installing the latest tensorflow version, so I've modified it to install the specific version defined in the version_ops.py file.