Recent posts

Haber GPT-2

4 minute read

HaberGPT2 is a decoder only language model with 100M parameters. It is trained from scratch using turkish news. This post shares details about training proce...

Transformer Pre-training Notes

32 minute read

In this post, I have compiled architectural and training details of prominent language models. All information is based on the published papers.

Complex Numbers

less than 1 minute read

Standard Form \(z = x + iy\)

LLM Scaling Laws

4 minute read

Training language models is an expensive business and it is important to plan carefully ahead of training. This post will briefly touch studies on scaling la...