[논문 리뷰] Direct Preference-based Policy Optimization without Reward Modeling

[논문 리뷰] Direct Preference-based Policy Optimization without Reward Modeling

작성자: 김민경

2025, Sep 04    

논문 정보

제목: Direct Preference-based Policy Optimization without Reward Modeling

저자: Gaon An, Junhyeok Lee, Xingdong Zuo, Norio Kosaka, Kyung-Min Kim, Hyun Oh Song, SNU & NAVER.

학회: NeurIPS 2023

링크: https://arxiv.org/abs/2301.12842


발표 자료