ZEPHYR: DIRECT DISTILLATION OF LM ALIGNMENT Paper Notes
Using dSFT (Distilled Supervised Finetunning) improves model accuracy. However, dSFT models lack alignment. The paper demonstrates that the application of dD...
Using dSFT (Distilled Supervised Finetunning) improves model accuracy. However, dSFT models lack alignment. The paper demonstrates that the application of dD...
RLHF is used for aligning language models with human preferences. However, RLHF is a complex and often unstable procedure. Proposed algorithm DPO is stable,...
Contributions:
As of 2023 November, jupyterlab still does not support working with virtual environments. Using global environment causes all sort of troubles. It is possib...
Paper provides details about data used in pre-training phase.