The Artifact
The public artifact is a GitHub pull request that adds a single training log/script file. It records the model shape, training command, inference command, sample output, and enough loss-log history to show the run moving from initialization into a working tiny-story generator.
Model Shape
| Layers | 4 |
|---|---|
| Attention heads | 16 |
| Key/value heads | 16 |
| Embedding dimension | 128 |
| Context length | 128 tokens |
| Vocabulary | 361 tokens |
| Dropout | 0.15 |
| Logged parameters | 834,644 total from 832,128 decayed and 2,516 non-decayed parameters |
| Training device | CPU |
| Tokens per iteration | 8,192 |
Training Run
The training command initializes a new model from scratch with a custom 361-token vocabulary, batch size 64, 128-token sequence length, learning rate 3e-4, weight decay 0.1, warmup over 2,500 iterations, and a target maximum of 100,000 iterations.
The log starts at train loss 13.0864 and validation loss 13.0707. The artifact includes excerpts through step 53,130, where the logged loss is already in the rough 0.75 to 0.86 range.
The Tweak
The PR body notes one LLaMA tweak: the embedding matrix has a computed pseudo-inverse used as the unembedding, and that unembedding is backpropagated. That note should be treated as an implementation detail of the artifact. A later article can explain the linear-algebra and training implications if the supporting notes are published.
Generated Text
The sample output is not clean prose. It repeats names, loses references, and breaks grammar. But it also has recognizable story structure: characters, actions, dialogue, simple causal turns, and moral-like endings. That is why this tiny run is worth preserving as a public artifact. It is small enough to inspect, yet large enough to show emergent generative behavior.
Tradeoff
A run this small is not a replacement for a useful production language model. Its value is visibility. The architecture, tokenizer size, context length, parameter tensors, training command, loss log, and sample behavior all fit into one inspectable record.