TinyStoriesGPT 5M
A 5 million parameter character level Bigram Transformer implemented in PyTorch and CUDA, trained on the Tiny Stories dataset with context length 128 and batch size 2048, optimized via cross entropy loss for coherent text generation.