habanoz’s tech posts

Demystifying Neural Network Activations In Pytorch

8 minute read

Backpropagation is a crucial part of the deep learning training process. It allows us to compute gradients and update our model parameters. This post is not ...

LLM Scaling Laws

4 minute read

Training language models is an expensive business and it is important to plan carefully ahead of training. This post will briefly touch studies on scaling la...

Collecting 1.8M news documents from Common Crawl

6 minute read

2 Months ago, I trained a tiny language model with 10M parameters on turkish news datasets [1]. However, 10M parameters turned out to be not enough to genera...

LSH with Jaccard Index

3 minute read

Minhash algorithm can be used to detect near duplicate documents. Minhash algorithm works by calculating multiple hashes for different sections of a document...

Scalable MinHash Implementation in Python

7 minute read

MinHash algorithm is used to identify near-duplicate documents in a training corpus.

Huseyin ABANOZ

Recent posts

Demystifying Neural Network Activations In Pytorch

LLM Scaling Laws

Collecting 1.8M news documents from Common Crawl

LSH with Jaccard Index

Scalable MinHash Implementation in Python