Deploying DeepSeek-R1 Inference Model on Neardi PI 3 (RK3576 Platform)

dahua.xu · February 20, 2025, 1:13pm

With the continuous development of artificial intelligence technology, deep learning inference has become an essential part of developing intelligent applications. This article will guide you step by step on how to deploy the DeepSeek-R1 model on the Neardi PI 3 (based on the RK3576 chip platform) and use the RKNPU (Rockchip NPU) for efficient inference.

System Environment

Neardi PI 3 (RK3576) hardware configuration: 8GB RAM + 64GB storage

1. Local Deployment of Ollama DeepSeek-R1

1.1 Installing Ollama

Method 1: Install via terminal

curl -fsSL https://ollama.com/install.sh | sh

Method 2: Manually download and install the package
If you cannot install via Method 1, you can directly visit the following link to download the arm64 version and extract the installation package:
Download Ollama Installation Package

1.2 Running Ollama Service

ollama start

Start the Ollama inference service and wait for the model to load and initialize.

1.3 Installing DeepSeek-R1:8B Model

./ollama run deepseek-r1:8b

It will take some time to download. Ollama’s download speed may automatically slow down; if the speed drops to a few hundred KB/s, pausing and re-running it will speed up the download (this has been confirmed to work).

1.4 Installation Successful

Once installed and running, when the terminal displays a prompt like >>>, you can start interacting with the model.

1.5 Running DeepSeek Inference

2. Deploying DeepSeek-R1 with RKNPU

2.1 Model Download

It is recommended to directly download the model converted by Rockchip. The DeepSeek-R1-Distill-Qwen-1.5B model has already been converted to a format that RK3576 can recognize.

RKLLM Model Download, extraction code: rkllm

Download the DeepSeek-R1-Distill-Qwen-1.5B_W4A16_RK3576.rkllm model here.

If you need to convert the model yourself, refer to the Model Conversion section in the DeepSeek-R1-Distill-Qwen-1.5B_Demo.

2.2 Compile llm_demo (Optional)

git clone https://github.com/airockchip/rknn-llm

cd rknn-llm/examples/DeepSeek-R1-Distill-Qwen-1.5B_Demo/deploy

# In the `build-linux.sh` file, change the `GCC_COMPILER_PATH` value to `aarch64-linux-gnu`, otherwise, the compilation will fail.
./build-linux.sh

2.3 Running DeepSeek Inference

Use the following command to run the DeepSeek model inference on the PI 3 (RK3576):

export LD_LIBRARY_PATH=/home/neardi/rknn-llm/rkllm-runtime/Linux/librkllm_api/aarch64:$LD_LIBRARY_PATH
taskset f0 ./rknn-llm/examples/DeepSeek-R1-Distill-Qwen-1.5B_Demo/deploy/install/demo_Linux_aarch64/llm_demo DeepSeek-R1-Distill-Qwen-1.5B_W4A16_RK3576.rkllm 2048 4096

After running, you will see output similar to the following:

rkllm init start
I rkllm: rkllm-runtime version: 1.1.4, rknpu driver version: 0.9.8, platform: RK3576

rkllm init success

********************** You can input the following questions corresponding to the number or custom input ********************

[0] A bag contains 5 red balls and 3 blue balls. What is the probability of drawing a blue ball?
[1] The number sequence: 1, 4, 9, 16, 25, …, what is the next number?
[2] Write a program to determine if a number is odd or even.

*************************************************************************

At this point, you can enter questions and receive inference results from the model.