Skip to main content

DeepSeek is an open source, open weight set of pre-trained models people can use for inference through the DeepSeek web app on chat.deepseek.com by agreeing to their terms of service and privacy policies or training by downloading the models. 

Similarly to OpenAI, the privacy policy lets users know that data and logic shared is collected and fair game for the company to retain and retrain their models, so be careful what you share. However, since DeepSeek is open source, AI creators can download models locally and set-up to use for local AI development and retraining of weights to efficiently create and tune focused models for optimized functions and tasks. Check out DeepSeek’s open models on Github at github.com/deepseek-ai.

Mike Le Galloudec at Oakland shares a great 6 min review and will be sharing how to get R1 running locally on private systems.

Are you doing something similar? Share how you’re getting DeepSeek models running locally!

TL:DR of the DeepSeek R1 paper - DeepSeek-R1/DeepSeek_R1.pdf at main · deepseek-ai/DeepSeek-R1 · GitHub

  • Aha moments emerged naturally in RL: Self-correction behaviors like "Wait, let’s reevaluate..." arose without SFT.
  • Cold-start SFT fixed readability: ~1k structured examples resolved language mixing.
  • GRPO cut RL costs by 30%: Group-wise reward normalization outperformed PPO.
  • RL increased CoT length autonomously: Reasoning steps grew from 100→1k tokens without penalties.
  • Distillation beat direct RL in small models: SFT on R1 data outperformed RL-trained base models.
  • Process rewards failed; outcome rewards worked better: Rule-based final-answer checks stabilized training.
  • XML tags reduced hallucinations 15%: Structured <think>/<answer> improved reward clarity.
  • Language mixing fixed via consistency rewards: Penalized code-switching in multilingual outputs.

Reply