I am planning to build a multipurpose home server. It will be a NAS, virtualization host, and have the typical selfhosted services. I want all of these services to have high uptime and be protected from power surges/balckouts, so I will put my server on a UPS.
I also want to run an LLM server on this machine, so I plan to add one or more GPUs and pass them through to a VM. I do not care about high uptime on the LLM server. However, this of course means that I will need a more powerful UPS, which I do not have the space for.
My plan is to get a second power supply to power only the GPUs. I do not want to put this PSU on the UPS. I will turn on the second PSU via an Add2PSU.
In the event of a blackout, this means that the base system will get full power and the GPUs will get power via the PCIe slot, but they will lose the power from the dedicated power plug.
Obviously this will slow down or kill the LLM server, but will this have an effect on the rest of the system?
The proper way of doing this is to have two separate systems in a cluster such as proxmox. The system with GPUs runs certain workloads and the non GPU system runs other workloads.
Each system can be connected (or not) to a ups and shut down with a power outage and then boot back up when power is back.
Don’t try hot-plugging a gpu, it will never be reliable.
Run a proxmox cluster or kubernetes cluster, it is designed for this type of application but will add a fair amount of complexity.