[논문 리뷰] Direct Prefernce Optimization: Your Language Model is Secretly a Reward Model
작성자: 김민경
2025, Nov 13
논문 정보
제목: Direct Prefernce Optimization: Your Language Model is Secretly a Reward Model
저자: Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, Chelsea Finn, Stanford University.
학회: NeurIPS 2023