However, we are currently looking for self-hosted and easy-to-set-up microVM solutions for the agent's GUI mode. The idea is to let agents operate in an isolated environment for its GUI operation, like web-browsing, launching an app, and using the app, etc.
Anyone with any experience with microVM, feel free to let me know in the comments. Many thanks!
While the microVM route is more secure, it's more complicated and ops are tricky, but you can do some cool things to optimize startup time like when I was working on a function as a service platform, and to reduce TTFB, I trapped the `listen()` call, sent a VSOCK message to the VMM to trigger a freeze, snapshot the VM and save it as a "template". Then for every request, the snapshot was cloned (with some file system tricks like CoW) and resumed to handle the request. It "just" worked, but the orchestration was kludgy.
In the second incarnation of this, I decided to use Linux containers with the gVisor sandbox. You can take a look at my project https://github.com/ammmir/sandboxer which uses Podman and gVisor underneath; it's good enough for a prototype. Later on, you can swap it out with Firecracker microVM, if necessary. In fact, I'm thinking of adding microVM support to sandboxer itself. If you wanted to do it yourself, swap out ContainerEngine() with a new implementation based on calling out to Firecracker. You'll need some way to do disk volume management (grow, clone, shared, cross-machine? good luck!), snapshots, etc.
Also, an interesting project you got there. If you are interested, would it be possible to invite you over to our project Discord? Would love to hear more of your experience.
Containers are more flexible, especially in cloud environments. You can run containers on a cloud VM, or in a managed cloud cluster. Micro VMs can’t typically be used that way.
(I work at a SaaS that relies heavily on this model.)
Any chance we can talk about this in detail?