I want to host some LLM’s locally and use more advanced models. Since new hardware is out of the question, I think I should be able to pull something off buying some yesteryear equipment on ebay etc. Did anybody attempt such a project? Does it scale horizontally? (I.e. can I connext two boxes to overcome single box slowness?)


The trend I see are the Mac Minis with a lot of unified memory. These are typically very well off people though. Prices for even old GPUs like 3090s are ridiculous now. I don’t think connecting 2 machines over Ethernet would work well, but putting 2 GPUs in a single machine does.
I bought a used 3090 two years ago, and back then they were usually listed for €800-1000 in my country. I thought I was lucky to find one for €700 after searching for a few months, and I don’t think they’ve ever been cheaper than this here. There are definitely fewer of them available now, but you can still buy one for €950 (and possibly even lower if you’re patient). So prices have gone up, but IMO not by ridiculous amounts like RAM.
I’ve just checked the Mac Studio on the site and lmao, they first ran out of 512gb uram and then of 256gb uram, now selling only 96gb version.
Which is quite impressive to be honest, these machines were fucking expensive, and yet sold out completely
512 gb would future proof you to run any local LLMs for quite a while. The speed at which they did it wasn’t exactly bad too afaik due to uram being so fast. Dunno what other setup would compete here for the price.