[논문 리뷰] Direct Prefernce Optimization: Your Language Model is Secretly a Reward Model

[논문 리뷰] Direct Prefernce Optimization: Your Language Model is Secretly a Reward Model

작성자: 김민경

2025, Nov 13    

논문 정보

제목: Direct Prefernce Optimization: Your Language Model is Secretly a Reward Model

저자: Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, Chelsea Finn, Stanford University.

학회: NeurIPS 2023

링크: https://arxiv.org/abs/2305.18290


발표 자료