Those who are hosting on bare metal: What is stopping you from using Containers or VM's? What are you self hosting?

kiol@lemmy.world · edit-2 15 hours ago

Those who are hosting on bare metal: What is stopping you from using Containers or VM's? What are you self hosting?

brucethemoose@lemmy.world · edit-2 10 hours ago

In my case it’s performance and sheer RAM need.

GLM 4.5 needs like 112GB RAM and absolutely every megabyte of VRAM from the GPU, at least without the quantization getting too compressed to use. I’m already swapping a tiny bit and simply cannot afford the overhead.

I think containers may slow down CPU<->GPU transfers slightly, but don’t quote me on that.

kiol@lemmy.world · 11 hours ago

Can anyone confirm if containers would actually impact CPU to GPU transfers

brucethemoose@lemmy.world · edit-2 11 hours ago

To be clear, VMs absolutely have overhead but Docker/Podman is the question. It might be negligible.

And this is a particularly weird scenario (since prompt processing literally has to shuffle ~112GB over the PCIe bus for each batch). Most GPGPU apps aren’t so sensitive to transfer speed/latency.