[논문 리뷰] Direct Preference Optimization: Your Language Model is Secretly a Reward Model 작성자: 김민경 | 2025, Nov 13
[논문 리뷰] Dual RL: Unification and New Methods for Reinforcement and Imitation Learning 작성자: 김동민 | 2025, Oct 23
[논문 리뷰] FLaRe: Achieving Masterful and Adaptive Robot Policies with Large-Scale Reinforcement Learning Fine-Tuning 작성자: 김재훈 | 2025, Sep 18