-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++][Python] Add a version byte to tables #45277
Comments
Hi @MaxiBoether and thanks for taking the time to write this up. While passing a Table from Python to a C/C++ extension can work (as you've found), it comes with the downside you ran into here. The preferred way to share Arrow structures is by using the C Data Interface which is ABI-stable. See #36274 for more discussion on this topic too. Could that work for your use case? |
Hey! I am using it for a research project and will take a deeper look at the C Data Interface after I have finished the paper using this. Taking a very quick look, I am not sure whether this will work in my setup, since I use pybind11. I am forced to pass the table from Python and it comes in as a pybind11 It might definitely be a solution, but it was not super clear to me reading the documentation I should not interact between Python/C++ like this. I guess my main suggestion here is to add this layout version check to make the error message clearer, because I kept segfaulting. Showing a version mismatch error might help others who face the same issue, because even if I switched to the C-interface, it might cause trouble for other devs who are like me going for the other route. In any case, thank you so much for your swift reply! I appreciate it and will try to refactor after my paper deadline :) |
Good luck on the paper! To help make what I was saying above more clear, I put together an example that passes a Table from Python to a C++ extension (with pybind11): https://github.com/amoeba/arrow-pybind11-example/. Hopefully that helps.
I think this is totally fair feedback. I'm curious, if there had been a note somewhere in our docs, where would have been a good spot? i.e., what docs did you read during the process and what docs did you try to use when you saw segfaults? |
Describe the enhancement requested
I installed
pyarrow
via pip without specifying a concrete version (which was my fault). Another requirement forced pip to download pyarrow 17. At the same time, I installed libarrow 18 via apt in my build container. I have a custom C++ Python extension which via CMake got compiled against the system libarrow 18. In Python, I read a pyarrow table, passed this to the C++ extension, and then onGetColumnByName
my application segfaulted. It took me a bit to realize that I have a version mismatch between python and C++, I presume the memory layouts of the tables in memory are a bit different, which probably caused the segfault. Now it is all working fine again.I wonder whether there should be something like a magic version byte that gets updated when the in-memory layout changes. This way, I could have avoided debugging this and gotten a better error message instead. While this might not be a common problem, it could help avoid issues like the segfault I encountered.
Component(s)
C++, Python
The text was updated successfully, but these errors were encountered: