논문 정보

제목: Direct Prefernce Optimization: Your Language Model is Secretly a Reward Model

저자: Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, Chelsea Finn, Stanford University.

학회: NeurIPS 2023

링크: https://arxiv.org/abs/2305.18290

발표 자료