[논문 리뷰] Direct Preference-based Policy Optimization without Reward Modeling
작성자: 김민경
2025, Sep 04
논문 정보
제목: Direct Preference-based Policy Optimization without Reward Modeling
저자: Gaon An, Junhyeok Lee, Xingdong Zuo, Norio Kosaka, Kyung-Min Kim, Hyun Oh Song, SNU & NAVER.
학회: NeurIPS 2023