You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Adding OSWorld benchmark to OpenHands evaluation harness evaluation/benchmarks.
Describe the UX of the solution you'd like
Do you have thoughts on the technical implementation?
The primary challenge is to be able to emulate an OS inside the Docker-based runtime. Fortunately, OSWorld authors already figured out a way to do it by running qemu inside docker:
This likely needs major work with our runtime to support -- but once they are working, OpenHands runtime will be much more capable since we can optionally have access to a full OS (with GUI) for both agents and humans to interact with.
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered:
What problem or use case are you trying to solve?
Adding OSWorld benchmark to OpenHands evaluation harness
evaluation/benchmarks
.Describe the UX of the solution you'd like
Do you have thoughts on the technical implementation?
The primary challenge is to be able to emulate an OS inside the Docker-based runtime. Fortunately, OSWorld authors already figured out a way to do it by running qemu inside docker:
https://github.com/xlang-ai/OSWorld?tab=readme-ov-file#docker-server-with-kvm-support-for-the-better
This likely needs major work with our runtime to support -- but once they are working, OpenHands runtime will be much more capable since we can optionally have access to a full OS (with GUI) for both agents and humans to interact with.
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: