LLM Scaling Laws
Training language models is an expensive business and it is important to plan carefully ahead of training. This post will briefly touch studies on scaling la...
Training language models is an expensive business and it is important to plan carefully ahead of training. This post will briefly touch studies on scaling la...
2 Months ago, I trained a tiny language model with 10M parameters on turkish news datasets [1]. However, 10M parameters turned out to be not enough to genera...
Minhash algorithm can be used to detect near duplicate documents. Minhash algorithm works by calculating multiple hashes for different sections of a document...
MinHash algorithm is used to identify near-duplicate documents in a training corpus.
HaberGPT is a simple framework influenced greatly by the works of Andrej Karpathy, namely mingpt and nanogpt.