habanoz’s tech posts

ZEPHYR: DIRECT DISTILLATION OF LM ALIGNMENT Paper Notes

3 minute read

Using dSFT (Distilled Supervised Finetunning) improves model accuracy. However, dSFT models lack alignment. The paper demonstrates that the application of dD...

Direct Preference Optimization: Your Language Model is Secretly a Reward Mode

4 minute read

RLHF is used for aligning language models with human preferences. However, RLHF is a complex and often unstable procedure. Proposed algorithm DPO is stable,...

QLoRA: Efficient Finetuning of Quantized LLMs

5 minute read

Contributions:

Running Jupyter Notebooks in Virtual Environments

1 minute read

As of 2023 November, jupyterlab still does not support working with virtual environments. Using global environment causes all sort of troubles. It is possib...

Granite Foundation Models

4 minute read

Paper provides details about data used in pre-training phase.

Huseyin ABANOZ

Recent posts

ZEPHYR: DIRECT DISTILLATION OF LM ALIGNMENT Paper Notes

Direct Preference Optimization: Your Language Model is Secretly a Reward Mode

QLoRA: Efficient Finetuning of Quantized LLMs

Running Jupyter Notebooks in Virtual Environments

Granite Foundation Models