Counts vs Gradients - Developing A Bi-gram Language Model - A Case Study
I am inspired from the make more lecture of Andrej Karpathy. This post aims to showcase similarities between count based and gradient based methods to genera...
I am inspired from the make more lecture of Andrej Karpathy. This post aims to showcase similarities between count based and gradient based methods to genera...
TinyStories: How Small Can Language Models Be and Still Speak Coherent English? (April 2023)
During a conversation in a Telegram group, one of my friends used an unpleasant word. Although I didn’t appreciate the choice of language, I also didn’t want...
The much-anticipated Virtual Threads feature finally included as a standard feature in Java 21 LTS, promising to revolutionize the Java ecosystem. This post...
Using dSFT (Distilled Supervised Finetunning) improves model accuracy. However, dSFT models lack alignment. The paper demonstrates that the application of dD...