Training a Tiny LLaMA From Scratch
January 10, 2024

Archived from an original LinkedIn post by Brian Greenforest.

Original Post

Bizarre fun training a LLaMA-2 FROM SCRATCH 500k parameters on the TinyStories dataset (trained overnight on CPU in PyTorch: ModelArgs(dim=103, n_layers=5, n_heads=8, n_kv_heads=1, vocab_size=103, multiple_of=4, max_seq_len=240, dropout=0.2))

"Once upon a time, there was a very cold! The road had been to be thoughtful and see the animals remembered to eat her new friends. One day, while playing, Tom took a big pot in his friend when he decided to see the sun to let Lucy be dangerous of happy plants. Tom jumped on..."