Run the Complete DeepSeek-R1-0528 Model Locally AI NEWS

Home
AInews
Run the Complete DeepSeek-R1-0528 Model Locally

Run the Complete DeepSeek-R1-0528 Model Locally

2025-06-10

DeepSeek-R1-0528, the latest iteration of DeepSeek's R1 inference model, demands a disk space of 715GB, positioning it among the largest open-source models currently available. Thanks to Unsloth's cutting-edge quantization techniques, the model size can be reduced to just 162GB, achieving an 80% reduction. This allows users to run the model with significantly lower hardware requirements while maintaining most of its functionality, albeit with a slight performance trade-off.

In this tutorial, we will:

Set up Ollama and Open Web UI for running the DeepSeek-R1-0528 model locally.
Download and configure the model’s 1.78-bit quantized version (IQ1_S).
Run the model using both GPU+CPU and CPU-only configurations.

Step 0: Prerequisites

To operate the IQ1_S quantized variant, your system must meet the following specifications:

GPU Requirements: A minimum of one 24GB GPU (e.g., NVIDIA RTX 4090 or A6000) and 128GB of RAM. With this configuration, you can expect a generation speed of approximately 5 tokens per second.

RAM Requirements: At least 64GB of RAM is necessary to run the model without a GPU, although the performance will be limited to 1 token per second.

Optimal Setup: For the best performance (over 5 tokens per second), you’ll need at least 180GB of unified memory or a combination of 180GB RAM + VRAM.

Storage: Ensure that you have at least 200GB of free disk space available for the model and its dependencies.

Step 1: Install Dependencies and Ollama

Update your system and install the required tools. Ollama acts as a lightweight server designed for running large language models locally. Use the following commands to install it on an Ubuntu distribution:

apt-get update
apt-get install pciutils -y
curl -fsSL https://ollama.com/install.sh | sh

Step 2: Download and Run the Model

Use the following commands to run the 1.78-bit quantized version (IQ1_S) of the DeepSeek-R1-0528 model:

ollama serve &
ollama run hf.co/unsloth/DeepSeek-R1-0528-GGUF:TQ1_0

Step 3: Set Up and Run Open Web UI

Pull the Open Web UI Docker image with CUDA support. Launch the container with GPU acceleration and Ollama integration enabled.

This command will:

Start the Open Web UI server on port 8080
Enable GPU acceleration with the --gpus all flag
Mount the necessary data directories (-v open-webui:/app/backend/data)

docker pull ghcr.io/open-webui/open-webui:cuda
docker run -d -p 9783:8080 -v open-webui:/app/backend/data --name open-webui ghcr.io/open-webui/open-webui:cuda

After the container is up and running, access the Open Web UI interface in your browser at http://localhost:8080/.

Step 4: Running DeepSeek R1 0528 in Open WebUI

Select the hf.co/unsloth/DeepSeek-R1-0528-GGUF:TQ1_0 model from the model menu.

If the Ollama server does not properly use the GPU, you can switch to CPU execution. Although this drastically reduces performance (around 1 token per second), it ensures that the model remains operational.

# Kill any existing Ollama processes
pkill ollama 

# Clear GPU memory
sudo fuser -v /dev/nvidia* 

# Restart Oll

Sapia

Sapia - AI hiring agent for fair recruitment processes

Magic Motion

Magic Motion - AI transforms text into engaging 3D animations

Recall

Recall - AI summarizer for streamlined knowledge management

Rocket.new

Rocket.new - AI analyzes and summarizes call conversations

Qodo AI Platform

Qodo AI Platform - AI tool for ensuring code quality and integrity

Zev AI

Zev AI - AI coding assistant for seamless integration

Kepl-AI Scanner

Kepl-AI Scanner - AI scanner for quick object recognition

RECENT AI TOOLS

Final Round AI

Sapia

Magic Motion

Recall

Rocket.new

RECENT AI NEWS

Decagon, a Chatbot Startup, Raises $131 Million in Funding to Create Personalized AI Agents for Every Consumer

Google Contributes Agent2Agent Protocol to Linux Foundation

Google introduces AI-powered proxy mode in Android Studio

Google Launches On-Device Gemini AI Model

Google Introduces AI Mode in India

Salesforce Launches Agentforce 3 to Enhance AI Agent Visibility and Connectivity

Massive Leak Reveals Design of Google Pixel 10 Pro XL

Leaked Information Indicates Grok Could Soon Gain the Ability to Edit Spreadsheets

RECENT AI TOOLS