Wanted to run LLMs in parallel by locking the chips while inferencing #105

VenkateshPasumarti · 2025-01-31T12:06:30Z

Wanted to run LLMs in parallel by locking the chips, specifying the tp degree and allocate those number of chips for a certain task
I need something like if i wanted to perform 2 tasks , is there some argument which specifies perform task 1 on 0,1,2,3 chips and task 2 on 4,5,6,7 chips

aws-rishyraj · 2025-02-03T05:36:29Z

Hi @VenkateshPasumarti,

Yes, you can set the following environment variables to fit your usecase: NEURON_RT_VISIBLE_CORES and NEURON_RT_VISIBLE_CORES. The documentation for these variables can be foundhere.

Please let us know if this is what you were looking for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wanted to run LLMs in parallel by locking the chips while inferencing #105

Wanted to run LLMs in parallel by locking the chips while inferencing #105

VenkateshPasumarti commented Jan 31, 2025

aws-rishyraj commented Feb 3, 2025

Wanted to run LLMs in parallel by locking the chips while inferencing #105

Wanted to run LLMs in parallel by locking the chips while inferencing #105

Comments

VenkateshPasumarti commented Jan 31, 2025

aws-rishyraj commented Feb 3, 2025