Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wanted to run LLMs in parallel by locking the chips while inferencing #105

Open
VenkateshPasumarti opened this issue Jan 31, 2025 · 1 comment

Comments

@VenkateshPasumarti
Copy link

Wanted to run LLMs in parallel by locking the chips, specifying the tp degree and allocate those number of chips for a certain task
I need something like if i wanted to perform 2 tasks , is there some argument which specifies perform task 1 on 0,1,2,3 chips and task 2 on 4,5,6,7 chips

@aws-rishyraj
Copy link

Hi @VenkateshPasumarti,

Yes, you can set the following environment variables to fit your usecase: NEURON_RT_VISIBLE_CORES and NEURON_RT_VISIBLE_CORES. The documentation for these variables can be foundhere.

Please let us know if this is what you were looking for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants