Huseyin ABANOZ

HaberGPT2 is a decoder only language model with 100M parameters. It is trained from scratch using turkish news. This post shares details about training proce...

Transformer Pre-training Notes

38 minute read

In this post, I have compiled architectural and training details of prominent language models. All information is based on the published papers.

Huseyin ABANOZ

References

Share on

You may also enjoy

LLM Training vs Serving in Terms of Memory Access Patterns

New Turkish Pre-training Datasets

Haber GPT-2

Transformer Pre-training Notes