-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue running 'PhoenixOS-Remoting' separately, with stable-diffusion (pytorch version) as AI app #17
Comments
What's your commit number for now? I used to communicated with authors and they emergently pushed a temporary version, so the version control became a totally mess... Its remoting framework is not compatible with phos now, you may need to roll back some commits. |
Thanks for your reply, the commit of Moreover, I noticed in #9 that you has uploaded some docker file, so I used However when I build phoenixos following your document in
server side:
And I interrupt the daemon process and want to re-run, it threw following err msg. I was running proxy process inside the container, does it concern?
|
That's not me, I'm not the author... Btw, I mean the commit number of |
Sorry I misunderstand, I will try this commit id later |
Commit
I first build environments under commit |
Have you removed |
I tried this just now, but the error message on the client side turns out to be |
I don't know either, I have not met this issue qwq. |
That's fine thanks. Did you succeed in running |
As I known, this is their external version of code and remain many mysterious things left. |
Sorry to hear that, hope you will get well soon |
1 Problem description
I came across an cuda error 209 when running stable-difussion app (arg: batch=2, iter=2,
![client报错-无忧化phoenixOS](https://private-user-images.githubusercontent.com/130744273/400136614-a34b1686-e1d4-4ffa-9d7d-deda81485983.PNG?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk2ODIzMzEsIm5iZiI6MTczOTY4MjAzMSwicGF0aCI6Ii8xMzA3NDQyNzMvNDAwMTM2NjE0LWEzNGIxNjg2LWUxZDQtNGZmYS05ZDdkLWRlZGE4MTQ4NTk4My5QTkc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjE2JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxNlQwNTAwMzFaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0yNGU1ZjYzY2QxZGZhNTY2M2Y3ZTdiZTFlYmM3NmFiYjUzYWUzODBiZGZmYTcyZWY4ZTc1N2MxNGE5NGRkNDFjJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.YJGL6JDD7rCqxDVWZ4XYpmrInIAKCxUph5q3qivqOLw)
inference.py
), where the programme cannot find kernel image, possibly not supporting exteral.so
. I came across the same issue when running original cricket, even though in your work Characterizing Network Requirements for GPU API Remoting in AI Applications have supported running SD in pytorch version.Here are my compiling arguments,
VERSION=NO_OPTIMIZATION
or version where both async, cache, handler are included results the same error.2 Environment setup
My machine is ubuntu 22.04 with one nvidia A4500 (sm=80), driver version=535.183.06, cuda version=11.8. I tried several solution, none of them succeed
Then I pulled nvidia docker image nvidia/cuda:11.1.1-cudnn8-devel-rockylinux8 and tried to build env based on it according to your dockerfile. What follows is the same problem in 1
I ran pytorch SD app under miniconda, here are my envs:
3 Other modification
Due to the dependency on main PhoenixOS, I cancel the following code in cpu/proxy/svc.cpp, which is not included in
![svc](https://private-user-images.githubusercontent.com/130744273/400138286-3da23a6c-8609-4cf7-a9ec-1e6d4e89edf9.PNG?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk2ODIzMzEsIm5iZiI6MTczOTY4MjAzMSwicGF0aCI6Ii8xMzA3NDQyNzMvNDAwMTM4Mjg2LTNkYTIzYTZjLTg2MDktNGNmNy1hOWVjLTFlNmQ0ZTg5ZWRmOS5QTkc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjE2JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxNlQwNTAwMzFaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1jNDExNDQwZTYyNmE2ZjlhMWYzMzc5YTQ5MTIyYWIyNTQzNWEzMmMyZTMxNDVkNjQyYTFlMDg5OGFmZTQ4YjhjJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.KLMHzndULpugwszAJdF6-UzHLrJ4XQvrt3n4UFBlcRY)
POS_ENABLE
and I have manually disabled compilation of
tests
andbin/tests
in the main Makefile.I sincerely hope that you can figure out my omissive steps, or other extra traceback infos I can provide, or provide an executable configuration or Dockerfile or DockerImage.
Thanks a lot
The text was updated successfully, but these errors were encountered: