Library
Find your next learning path
Start here
A great place to begin. You can always switch later.
🗡️Learn the Hero's Journey Story Structure
Walk through Campbell's hero's journey one stage at a time, anchored to scenes from Star Wars and The Matrix, then draft a one-page outline for your own protagonist.
Or try one of these
☸️Learn the Four Noble Truths
Walk through the Four Noble Truths as a philosophical diagnosis — dukkha, its cause, its cessation, the eightfold path — with translation notes and comparative context, then apply the structure to a real problem of your own.
🎯Use AI to Build Slides and Decks
Stop asking AI to 'make a deck on X' and getting bullet-point sludge that looks like every other AI deck. Learn the outline-first workflow that drives AI from a thinking argument, not a topic — and ship a 7-slide deck for a real talk you can track time saved on.
Or tell us what you want
Describe what you want to learn or improve, and we’ll find a path for you.
Browse more paths
🧊Understand ZeRO and Its Three Stages
Pencil-and-paper your way through ZeRO stages 1, 2, and 3 — sharding optimizer state, then gradients, then params — until you can pick a stage for a 13B model on 8 A100s and justify it from memory math, not vibes.
🧮Understand vLLM PagedAttention and KV Cache Memory
Re-use the virtual-memory analogy you already know to demystify vLLM: by the end you can sketch a block table, explain prefix sharing, and estimate how many 8k-context sequences fit on your GPU.
🧮Understand Tensor Cores and Mixed Precision
Stop hand-waving about '100x faster than CUDA cores.' You'll trace one 4x4 tile through a tensor core's registers, multipliers, and FP32 accumulator, then estimate the real FLOPS uplift from switching one layer of your favorite model to mixed precision.
🌀Understand RoPE and Why It Beat Sinusoidal
Stop treating RoPE as a black-box position trick and start seeing it as 2D rotations on pairs of dimensions — by the end you'll predict how it fails past training context and explain on a napkin why position interpolation rescues it.
🎯Understand Reward Hacking and Goodhart's Law in RLHF
Spot reward hacking in real model outputs — length bias, sycophancy, refusal escalation, sophistication bias — and pick the right mitigation (KL penalty, reward model ensembling, or process-based reward) for each failure mode.
🧭Understand MoE Routing and Load Balancing
Open the MoE router black box piece by piece — softmax gate, top-k, auxiliary loss, capacity factor, token dropping — until you can predict how capacity factor 1.0 versus 1.25 changes wasted compute and dropped tokens, then verify with an ablation.
🧮Understand Gradient Checkpointing
Stop guessing why gradient checkpointing tanks your throughput by 30% — learn to read the activation tape, pick the right granularity, and predict the compute overhead before you launch a single training run.
🧠Understand GPU vs TPU vs NPU vs ASIC
Tell GPUs, TPUs, NPUs, and ASICs apart by the workload each was built for — then defend your accelerator pick for a new AI product with one paragraph of architectural reasoning, not vendor branding.
🔪Understand FSDP Sharding Strategies
Walk every FSDP sharding strategy across the same toy transformer until all-gather and reduce-scatter become numbers, not folklore. By the end you can pick FULL_SHARD vs SHARD_GRAD_OP vs HYBRID_SHARD for a 7B model on 16 GPUs and defend it.
⚡Understand FlashAttention and Tiling
Stop treating FlashAttention as a mystery flag — understand the tiling, online softmax, and HBM-vs-SRAM tradeoff that turn the same attention math into 2-4× speedups. By the end you can estimate FA's win for any sequence length on graph paper, before touching CUDA.
🎯Understand DPO and Why It Replaced PPO for Alignment
Trace DPO from the Bradley-Terry preference equation to the closed-form policy and the log-prob loss so it stops feeling like 'just another trainer' and starts feeling inevitable. By the end, you'll predict on three preference pairs which way DPO will push chosen vs rejected log-probs — then check against a real training run.
🧮Understand Data, Tensor, and Pipeline Parallelism
Walk one toy 4-layer model through every parallelism axis — DP, TP, PP — until the geometry sticks. By drop 14 you can pick a (DP, TP, PP) tuple for a 70B model on 64 GPUs and defend it from a cost model.
📉Understand Chinchilla Scaling Laws and Compute-Optimal Training
Stop repeating '20 tokens per parameter' like a mantra and start picking N and D the way LLaMA-3's team does — by the end, you'll defend a compute budget split that ignores Chinchilla on purpose.
🔬Understand bf16, fp16, and Loss Scaling
Stop flipping the precision flag and praying. You'll read a float as sign-exponent-mantissa, see exactly why fp16 NaNs and bf16 doesn't, and prescribe the right fix — loss scaling, bf16, or a mixed policy — for any training run.
🧪Understand Benchmark Saturation and Contamination
MMLU plateaued. HumanEval is in the training set. You'll separate saturation from contamination, run n-gram and perplexity checks on real test items, and design a holdout that's structurally hard to leak — defensible enough to put in front of a buyer.
⚡Compare LLM Serving Frameworks: vLLM, TensorRT-LLM, SGLang, llama.cpp
Stop picking vLLM because Twitter said so. You'll learn to read a deployment's shape — concurrency, prefix overlap, hardware, lifetime — and narrow the four frameworks to one defensible choice in four questions.
🧠Compare GQA, MQA, and Multi-Head Attention
GQA isn't a new mechanism — it's a single knob (G) that trades KV-cache memory for quality on top of plain attention. You'll learn to pick G for a real serving budget by walking the cache-size math and the quality argument side by side.
⚖️Compare DPO, IPO, KTO, ORPO, and SimPO
Map each post-DPO algorithm — IPO, KTO, ORPO, SimPO — to the exact failure mode it fixes, so picking one stops being a coin flip. By the end, you'll match three real datasets to the right algorithm and justify each call in a paragraph.
🧮Choose a Quantization Format: GPTQ vs AWQ vs EXL2 vs GGUF
Stop picking quantization formats from Reddit threads. You'll separate algorithm, file format, and runtime kernel into three clean decisions — then justify any pick for Ollama, vLLM, or a single 4090.
🐍Build Intuition for State Space Models and Mamba
Stop reading 'Mamba is linear-time attention' as marketing and start seeing the SSM as a controllable filter — A forgets, B absorbs, C reads out, Δ sets the clock. By the end you can predict whether Mamba or a transformer wins on a 1M-token retrieval task and justify it from the architecture.
🎯Use AI to Build Slides and Decks
Stop asking AI to 'make a deck on X' and getting bullet-point sludge that looks like every other AI deck. Learn the outline-first workflow that drives AI from a thinking argument, not a topic — and ship a 7-slide deck for a real talk you can track time saved on.
🎙️Understand Voice Cloning and Its Ethics
Few-shot voice cloning needs 3-30 seconds of audio — the technical story and the ethical one are different. Walk through a consented cloning flow, see why provenance beats 'is it AI?' for fraud, and sketch a consent-and-watermark policy for a feature that clones a customer's own voice.
🎨Understand Image Style Transfer and Aesthetics
Separate the three knobs of image style transfer — content preservation, style intensity, structural guidance — so you can pick img2img, ControlNet, IP-Adapter, or a LoRA deliberately, then plan a brand-illustration workflow that stays consistent across products.
©️Understand Copyright in AI Training Data
Public web is not 'fair to train on,' and not every scrape is theft. Walk the four real threads — what copyright covers, how fair use is being argued, what licensing actually looks like, and which opt-out signals matter — then outline a sourcing policy you'd defend.
Showing 24 of 327