Synced
Can GRPO be 10x Efficient? Kwai AI’s SRPO Suggests Yes with SRPO
Kwai AI has introduced SRPO (Staged Reinforcement Policy Optimization), a new framework that significantly reduces the computational resources required for large language model (LLM) training. The approach achieves a 90% reduction in reinforcement learning post-training steps while maintaining performance levels comparable to DeepSeek-R1, a leading model in mathematics and coding tasks.
Read more