Huseyin ABANOZ
Software/AI Engineer.
You may also enjoy
LLM Training vs Serving in Terms of Memory Access Patterns
7 minute read
This article investigates differences of LLM training vs serving in terms of memory access patterns.
New Turkish Pre-training Datasets
5 minute read
Language model pre-training requires massive amount of high quality text. For Turkish language pre-training there are not many options available.
Haber GPT-2
4 minute read
HaberGPT2 is a decoder only language model with 100M parameters. It is trained from scratch using turkish news. This post shares details about training proce...
Transformer Pre-training Notes
38 minute read
In this post, I have compiled architectural and training details of prominent language models. All information is based on the published papers.