Library

Find your next learning path

Start here

A great place to begin. You can always switch later.

Good first path

🗡️Learn the Hero's Journey Story Structure

Walk through Campbell's hero's journey one stage at a time, anchored to scenes from Star Wars and The Matrix, then draft a one-page outline for your own protagonist.

Foundations~2-week path · 5-8 min/day

Or try one of these

Or tell us what you want

Describe what you want to learn or improve, and we’ll find a path for you.

Browse more paths

🧊Understand ZeRO and Its Three Stages

Pencil-and-paper your way through ZeRO stages 1, 2, and 3 — sharding optimizer state, then gradients, then params — until you can pick a stage for a 13B model on 8 A100s and justify it from memory math, not vibes.

Applied~2-week path · 5-8 min/day

🧮Understand vLLM PagedAttention and KV Cache Memory

Re-use the virtual-memory analogy you already know to demystify vLLM: by the end you can sketch a block table, explain prefix sharing, and estimate how many 8k-context sequences fit on your GPU.

Applied~2-week path · 5-8 min/day

🧮Understand Tensor Cores and Mixed Precision

Stop hand-waving about '100x faster than CUDA cores.' You'll trace one 4x4 tile through a tensor core's registers, multipliers, and FP32 accumulator, then estimate the real FLOPS uplift from switching one layer of your favorite model to mixed precision.

Applied~2-week path · 5-8 min/day

🌀Understand RoPE and Why It Beat Sinusoidal

Stop treating RoPE as a black-box position trick and start seeing it as 2D rotations on pairs of dimensions — by the end you'll predict how it fails past training context and explain on a napkin why position interpolation rescues it.

Applied~2-week path · 5-8 min/day

🎯Understand Reward Hacking and Goodhart's Law in RLHF

Spot reward hacking in real model outputs — length bias, sycophancy, refusal escalation, sophistication bias — and pick the right mitigation (KL penalty, reward model ensembling, or process-based reward) for each failure mode.

Applied~2-week path · 5-8 min/day

🧭Understand MoE Routing and Load Balancing

Open the MoE router black box piece by piece — softmax gate, top-k, auxiliary loss, capacity factor, token dropping — until you can predict how capacity factor 1.0 versus 1.25 changes wasted compute and dropped tokens, then verify with an ablation.

Advanced~2-week path · 5-8 min/day

🧮Understand Gradient Checkpointing

Stop guessing why gradient checkpointing tanks your throughput by 30% — learn to read the activation tape, pick the right granularity, and predict the compute overhead before you launch a single training run.

Applied~2-week path · 5-8 min/day

🧠Understand GPU vs TPU vs NPU vs ASIC

Tell GPUs, TPUs, NPUs, and ASICs apart by the workload each was built for — then defend your accelerator pick for a new AI product with one paragraph of architectural reasoning, not vendor branding.

Applied~2-week path · 5-8 min/day

🔪Understand FSDP Sharding Strategies

Walk every FSDP sharding strategy across the same toy transformer until all-gather and reduce-scatter become numbers, not folklore. By the end you can pick FULL_SHARD vs SHARD_GRAD_OP vs HYBRID_SHARD for a 7B model on 16 GPUs and defend it.

Applied~2-week path · 5-8 min/day

Understand FlashAttention and Tiling

Stop treating FlashAttention as a mystery flag — understand the tiling, online softmax, and HBM-vs-SRAM tradeoff that turn the same attention math into 2-4× speedups. By the end you can estimate FA's win for any sequence length on graph paper, before touching CUDA.

Applied~2-week path · 5-8 min/day

🎯Understand DPO and Why It Replaced PPO for Alignment

Trace DPO from the Bradley-Terry preference equation to the closed-form policy and the log-prob loss so it stops feeling like 'just another trainer' and starts feeling inevitable. By the end, you'll predict on three preference pairs which way DPO will push chosen vs rejected log-probs — then check against a real training run.

Applied~2-week path · 5-8 min/day

🧮Understand Data, Tensor, and Pipeline Parallelism

Walk one toy 4-layer model through every parallelism axis — DP, TP, PP — until the geometry sticks. By drop 14 you can pick a (DP, TP, PP) tuple for a 70B model on 64 GPUs and defend it from a cost model.

Applied~2-week path · 5-8 min/day

📉Understand Chinchilla Scaling Laws and Compute-Optimal Training

Stop repeating '20 tokens per parameter' like a mantra and start picking N and D the way LLaMA-3's team does — by the end, you'll defend a compute budget split that ignores Chinchilla on purpose.

Applied~2-week path · 5-8 min/day

🔬Understand bf16, fp16, and Loss Scaling

Stop flipping the precision flag and praying. You'll read a float as sign-exponent-mantissa, see exactly why fp16 NaNs and bf16 doesn't, and prescribe the right fix — loss scaling, bf16, or a mixed policy — for any training run.

Applied~2-week path · 5-8 min/day

🧪Understand Benchmark Saturation and Contamination

MMLU plateaued. HumanEval is in the training set. You'll separate saturation from contamination, run n-gram and perplexity checks on real test items, and design a holdout that's structurally hard to leak — defensible enough to put in front of a buyer.

Applied~2-week path · 5-8 min/day

Compare LLM Serving Frameworks: vLLM, TensorRT-LLM, SGLang, llama.cpp

Stop picking vLLM because Twitter said so. You'll learn to read a deployment's shape — concurrency, prefix overlap, hardware, lifetime — and narrow the four frameworks to one defensible choice in four questions.

Applied~2-week path · 5-8 min/day

🧠Compare GQA, MQA, and Multi-Head Attention

GQA isn't a new mechanism — it's a single knob (G) that trades KV-cache memory for quality on top of plain attention. You'll learn to pick G for a real serving budget by walking the cache-size math and the quality argument side by side.

Applied~2-week path · 5-8 min/day

⚖️Compare DPO, IPO, KTO, ORPO, and SimPO

Map each post-DPO algorithm — IPO, KTO, ORPO, SimPO — to the exact failure mode it fixes, so picking one stops being a coin flip. By the end, you'll match three real datasets to the right algorithm and justify each call in a paragraph.

Applied~2-week path · 5-8 min/day

🧮Choose a Quantization Format: GPTQ vs AWQ vs EXL2 vs GGUF

Stop picking quantization formats from Reddit threads. You'll separate algorithm, file format, and runtime kernel into three clean decisions — then justify any pick for Ollama, vLLM, or a single 4090.

Applied~2-week path · 5-8 min/day

🐍Build Intuition for State Space Models and Mamba

Stop reading 'Mamba is linear-time attention' as marketing and start seeing the SSM as a controllable filter — A forgets, B absorbs, C reads out, Δ sets the clock. By the end you can predict whether Mamba or a transformer wins on a 1M-token retrieval task and justify it from the architecture.

Applied~2-week path · 5-8 min/day

🎯Use AI to Build Slides and Decks

Stop asking AI to 'make a deck on X' and getting bullet-point sludge that looks like every other AI deck. Learn the outline-first workflow that drives AI from a thinking argument, not a topic — and ship a 7-slide deck for a real talk you can track time saved on.

Foundations~2-week path · 5-8 min/day

🎙️Understand Voice Cloning and Its Ethics

Few-shot voice cloning needs 3-30 seconds of audio — the technical story and the ethical one are different. Walk through a consented cloning flow, see why provenance beats 'is it AI?' for fraud, and sketch a consent-and-watermark policy for a feature that clones a customer's own voice.

Foundations~2-week path · 5-8 min/day

🎨Understand Image Style Transfer and Aesthetics

Separate the three knobs of image style transfer — content preservation, style intensity, structural guidance — so you can pick img2img, ControlNet, IP-Adapter, or a LoRA deliberately, then plan a brand-illustration workflow that stays consistent across products.

Applied~2-week path · 5-8 min/day

©️Understand Copyright in AI Training Data

Public web is not 'fair to train on,' and not every scrape is theft. Walk the four real threads — what copyright covers, how fair use is being argued, what licensing actually looks like, and which opt-out signals matter — then outline a sourcing policy you'd defend.

Foundations~2-week path · 5-8 min/day

Showing 24 of 327