Phase 1: The Atomic Era (Primitives)¶

Goal: Master numerical stability and gradient flow.

These are the building blocks that every neural network is made of:

Modules¶

Module	Status	Description
Linear Layer	🔲	Weight initialization, forward pass
Activations	🔲	ReLU, GELU, SiLU, SwiGLU
Loss Functions	🔲	MSE, CrossEntropy, LogSumExp trick
Normalization	🔲	BatchNorm, LayerNorm, RMSNorm, GroupNorm
Regularization	🔲	Dropout, L1/L2 penalty
Optimizers	🔲	SGD, Momentum, Adam, AdamW
LR Schedulers	🔲	Warmup, StepLR, CosineAnnealing
Gradient Clipping	🔲	Norm and value clipping

Train an MLP on MNIST using only your implementations. Compare numerical accuracy with torch.nn.