Thursday, April 24, 2025

1 article

Synced

Can GRPO be 10x Efficient? Kwai AI’s SRPO Suggests Yes with SRPO

Kwai AI has introduced SRPO (Staged Reinforcement Policy Optimization), a new framework that significantly reduces the computational resources required for large language model (LLM) training. The approach achieves a 90% reduction in reinforcement learning post-training steps while maintaining performance levels comparable to DeepSeek-R1, a leading model in mathematics and coding tasks.

Read more