*Posted on March 3, 2025 in #Thoughts #Reflection*
Since May last year, I've been recapping my experiences monthly. When 2025 began, I skipped January's reflection, but I realize now how important it is to document this period. This post reflects on the past two months and captures my thoughts about life and the rapidly evolving AI landscape.
## The DeepSeek Revolution
The first two months of 2025 marked a pivotal moment in AI history, particularly with the release of `DeepSeek-R1`. This groundbreaking model has fundamentally transformed the open-source AI community in several key ways:
1. **Widespread Impact**: Its influence spans both industry and academia, revolutionizing how researchers and practitioners approach AI development.
2. **Mainstream Recognition**: The model's reach extends far beyond technical circles - during my recent visit to Melbourne, even my elderly uncle, who has limited technical background, was eager to discuss `DeepSeek`.
3. **Technical Excellence**: For a deeper understanding of the model's capabilities and technical specifications, you can refer to my detailed notes here: [[Notes on DeepSeek-R1]]
What I find most valuable about `DeepSeek-R1` is the team's decision to make the model's thinking process public. This transparency enables extensive data-distillation work, which they've already begun exploring in their technical report. This approach stands in stark contrast to `OpenAI`, which has deliberately chosen not to release their models' thinking processes. Interestingly, `Google` initially provided access to the thinking process in their `gemini-2.0-flash-thinking-exp-01-21` API, but later disabled this parameter, following `OpenAI`'s lead.
## The Open-Source Response
Following `DeepSeek-R1`'s release, numerous projects emerged in the open-source community:
- Some attempted to replicate `R1`'s development journey
- Others focused on data distillation, applying `DeepSeek-R1`'s capabilities to smaller models to enhance their reasoning
- Many explored using GPRO (the same reinforcement learning algorithm powering `DeepSeek-R1`) on smaller models to achieve similar "aha" moments
Some of these efforts have successfully demonstrated that smaller models can achieve reasoning performance comparable to `O1` and `R1`. However, most work has concentrated primarily on mathematics rather than expanding to other domains. This limitation likely stems from the relative ease of designing reward rules in mathematics compared to real-world scenarios, where questions are often open-ended without absolute answers. I'm eager to see this research extend into more diverse domains beyond mathematics.
## The Broader AI Landscape
The past two months brought numerous other significant developments beyond `DeepSeek`:
- `OpenAI` released `o3-mini`, a new reasoning model, in January and followed with `gpt-4.5-preview` in February. I've tested `gpt-4.5-preview` and documented my findings here: [[Feb 28, Notes on GPT-4.5]].
- `Anthropic` launched `claude-3-7-20250219`, offering users the option to enable thinking capabilities or use its general abilities—effectively providing a unified model. My tests revealed impressive performance, detailed here: [[Feb 25, Notes on Claude-3-7-Sonnet & Qwen2-5-Max]].
- `Alibaba` continued its steady progress, releasing `Qwen2.5-Max` and its reasoning-focused variant `QwQ-Preview`. Despite `DeepSeek`'s prominence overshadowing some of `Alibaba`'s contributions, it's worth noting that many data distillation projects still choose `Qwen` as their base model.
## My Recent Work
I've dedicated significant time to learning reinforcement learning algorithms. Inspired by `DeepSeek`'s success, I experimented with `GPRO` on the Unsloth framework to train models and observe the "aha" moment—when models demonstrate reflection, verification, and reasoning. While this approach yields fascinating results, it's extremely computationally intensive. Even attempting to train a `3B` model on an H100 GPU resulted in out-of-memory errors.
Given my limited computational resources, I've pivoted to Supervised Fine-Tuning (SFT). I'm currently training `Qwen2.5-32B-Instruct` with distillation techniques based on `DeepSeek-R1`. The model's performance still has room for improvement, which I hope to share in next month's update.
## Why I Write
I'm often asked, "Why do you love writing things down?" even though my audience is limited and many topics seem trivial. My reasons are threefold:
1. **Personal Documentation**: Writing is primarily for myself, not others. I document my life, thoughts, and learning to preserve experiences that might otherwise fade from memory.
2. **Clarity of Thought**: Writing reveals my thinking process. When I struggle to express something clearly, it signals that my understanding is incomplete or my thoughts are disorganized.
3. **Building a Second Brain**: Writing forms the foundation of my second brain—a system for organizing thoughts and knowledge. By writing and linking ideas, I discover connections between concepts and build a more structured knowledge base. For example, writing about `DeepSeek-R1` allows me to connect it with previous notes on AI models, revealing patterns in AI development I might otherwise miss.
## Looking Forward
These reflections capture my experiences, feelings, and thoughts from the beginning of 2025. I commit to continuing these monthly reflections, documenting my journey through this remarkable era of AI advancement.