Phase 1: The Atomic Era (Primitives)¶
Goal: Master numerical stability and gradient flow.
These are the building blocks that every neural network is made of:
Modules¶
| Module | Status | Description |
|---|---|---|
| Linear Layer | 🔲 | Weight initialization, forward pass |
| Activations | 🔲 | ReLU, GELU, SiLU, SwiGLU |
| Loss Functions | 🔲 | MSE, CrossEntropy, LogSumExp trick |
| Normalization | 🔲 | BatchNorm, LayerNorm, RMSNorm, GroupNorm |
| Regularization | 🔲 | Dropout, L1/L2 penalty |
| Optimizers | 🔲 | SGD, Momentum, Adam, AdamW |
| LR Schedulers | 🔲 | Warmup, StepLR, CosineAnnealing |
| Gradient Clipping | 🔲 | Norm and value clipping |
🎯 Capstone¶
Train an MLP on MNIST using only your implementations. Compare numerical accuracy with torch.nn.