-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation Fault When Using PyJulia Inside of PyTorch Custom Autograd Function #518
Comments
https://pypi.org/project/juliacall/ Why are you using Julia 1.7.1? |
0.6.0
This is the only version I have tried. Is there a particular version you would suggest?
I tried after this suggestion and also obtained a segmentation fault.
My actual function was written for Julia 1.7.1 so for ease of reproducibility I was sticking with that. I have tried the sum example using 1.8.3 and 1.6.7 LTS and the error persists. |
Does the issue occur with pyjulia v0.5.7? My guess is yes, but I just wanted to double check since there were a few major changes with 0.6.0. |
I'm not sure if this is related, but there was also reports of a segmentation fault with PySR: That appears to mainly occur on Windows, however, and is difficult to reproduce. I don't suppose you are able to run a debugging in your environment, are you? |
It does.
I can get something rudimentary using pdb. It's a bit hard to interpret given that the segmentation fault doesn't occur in the Python code but I can find the last line of Python called before the segmentation fault. I think this might be a big hint as to what is going wrong but I'm not sure how to interpret it. Here is the debugging output.
The first call to |
I don't think the linked issues are related as I am running on Linux and have no trouble using PyJulia outside of the |
I am also facing this issue when using julia in
Also juliacall seems to mostly work even on nontrivial calls when there are no uses of Julia runtime. @THargreaves I believe your code can work with juliacall if the vectors were pre-allocated on python side. Here's my MWE, it from julia.api import Julia
jl = Julia(compiled_modules=False)
class MyLoss(torch.autograd.Function):
@staticmethod
def forward(ctx, dat):
return torch.full((1,1), 1., device=dat.device)
@staticmethod
def backward(ctx, grad_output):
jl.eval('1') # without this line no deadlock happens
# jl.seval('println(1)') # in juliacall seval('1') runs without problem, but println hangs the program (no segfault)
return None
device = "cuda:0" # on cpu works fine
# device = "cpu"
dat = torch.full((5,1), 1.).to(device)
model = torch.nn.Linear(1,1).to(device).train()
output = model(dat)
loss = MyLoss.apply(output)
loss.backward() # segfaults here I'm using Ubuntu, torch 2.0.0.post200, Julia 1.9.0, pyjulia '0.6.1' |
I have a (vector-to-scalar) function and corresponding derivative function written in Julia that I am unable to translate to Python. I would like to use these within PyTorch by defining a custom autograd function. As a simple, reproducible example, let's say the function is
sum()
:Calling the forward method works fine, as does running the code contained in the
backward
method from the global scope. However, when I call thebackward
method, I receive:The exact line command causing the issue is
Main.ones(len(x))
. Replacing this withMain.ones(3)
still causes a segmentation fault, so it appears to be an issue with PyJulia accessing memory that has been deallocated.Also note that when I replace the two calls to Julia with the corresponding NumPy commands (left commented-out), the
backward
method works fine. The code also works when all tensors are on the CPU but my application requires GPU-acceleration.What is causing this segmentation fault, and how can alter my code to avoid it whilst keeping PyTorch tensors on the GPU?
I've included a Dockerfile that matches my environment to make reproducing this issue as simple as possible. For reference, I am using an RTX 3060.
The text was updated successfully, but these errors were encountered: