[논문 리뷰] Reinforcement Learning with Verifiable Rewards Incentivizes Correct Reasoning in Base LLMs 작성자: 김민경 | 2026, Feb 05
[논문 리뷰] Direct Prefernce Optimization: Your Language Model is Secretly a Reward Model 작성자: 김민경 | 2025, Nov 13
[논문 리뷰] Dual RL: Unification and New Methods for Reinforcement and Imitation Learning 작성자: 김동민 | 2025, Oct 23