Haber GPT-2
HaberGPT2 is a decoder only language model with 100M parameters. It is trained from scratch using turkish news. This post shares details about training proce...
HaberGPT2 is a decoder only language model with 100M parameters. It is trained from scratch using turkish news. This post shares details about training proce...
In this post, I have compiled architectural and training details of prominent language models. All information is based on the published papers.
Standard Form \(z = x + iy\)
Backpropagation is a crucial part of the deep learning training process. It allows us to compute gradients and update our model parameters. This post is not ...
Training language models is an expensive business and it is important to plan carefully ahead of training. This post will briefly touch studies on scaling la...