Quantcast
Channel: Raspberry Pi Forums
Viewing all articles
Browse latest Browse all 8609

Other projects • Re: Deepseek cluster?

$
0
0
I ran the quantised 70-billion-parameter distillation of DeepSeek-R1 on a first-generation 64-core dual socket Epyc server. The result is not fast, however, individually those 32-core CPUs are selling for 45 US$ on eBay, which would be very cost effective if not for the price of a compatible motherboard.
Hi,
I posted link to Jeff's benchmark repo in previous posting.
Can you please show the result of obench for the model you stated on the EPYC CPU?

The cheapest Vega56 is 80€ with free shipping on German ebay.de spot buy just now.
Before Trump went crazy, that was 80 USD, now it is 91 USD.
I want to compare the Vega56 with obench for the model you stated.


TIL that with multiple Radeon GPUs in a system setting ROCR_VISIBLE_DEVICES to GPU(s) to use allows to restrict all rocm applications, and definitely ollama.

First on my 7600X CPU system ollama was started as a service.
So I had to stop the service in order to start ollama with a selected GPU:

Code:

systemctl stop ollama.service
Next I queried my system for GPU uuids installed and started ollama with the 2nd MI50 installed on 7600x PC (that PC now has a 2000W PSU to be able to power the 5 GPUs):

Code:

hermann@7600x:~unset ROCR_VISIBLE_DEVICEShermann@7600x:~$ rocminfo | grep -i uuid  Uuid:                    CPU-XX                               Uuid:                    GPU-d64a58a17330f0ed                 Uuid:                    GPU-6e56508172dc76b6                 Uuid:                    GPU-021511f8bce82084                 Uuid:                    GPU-02151dfe505629a4                 Uuid:                    GPU-021521a4d1243124                 Uuid:                    GPU-XX                             hermann@7600x:~$ export ROCR_VISIBLE_DEVICES="GPU-6e56508172dc76b6"hermann@7600x:~$ ollama serve
And this benchmark is the first that shows a difference on how the GPUs are connected. For the prime proof PRP software I use it does not matter whether GPU is connected via PCIE 4.0 x16 or PCIE 3.0 x1 with Riser card. While the first Instinct MI50 is connected to mainboard PCIE 4.0 x16 (top in photo), the 2nd just chosen is connected via "PCI-E PCIe Express Riser card Adapter x1 4-Port x16 USB 3.0 Mining GPU Extender" (very back position in mining rig) as the other 3 GPUs:
IMG_20250411_144735.10pc.jpg
And with ollama the tokens/s drops from 150 to 100 for Instinct MI50 when not connected with PCIE 4.0:

Code:

hermann@7600x:~/ollama-benchmark$ ./obench.sh -m deepseek-r1:1.5b -c 3Running benchmark 3 times using model: deepseek-r1:1.5beval rate:            98.77 tokens/seval rate:            100.98 tokens/seval rate:            101.57 tokens/sAverage Eval Rate: 100.44 tokens/secondhermann@7600x:~/ollama-benchmark$

The PSU mainboard cables were shortest, so I moved 2000W PSU inside 7600X CPU PC. To make the 60cm USB cables connecting the Riser cards and the 80cm 2x8pin PCIE power cables for the GPUs on open mining rig work, PC case got a new hole on side with nipper pliers ;-)
IMG_20250413_024935.part.20pc.jpg

Statistics: Posted by HermannSW — Sun Apr 13, 2025 1:05 am



Viewing all articles
Browse latest Browse all 8609

Trending Articles