Most of the post-training stack — RLHF, preference learning, instruction tuning — is developed inside closed labs and rarely described publicly. Lambert maintains an open counter-line: Tülu is the open post-training pipeline with published recipes; the RLHF Book is the only systematic treatment of the subject available without an NDA; his Interconnects newsletter is read across the industry as the primary open source on what actually goes on inside model labs during post-training. For ai100 this matters because post-training determines what the model says: pre-training fixes the corpus of knowledge, post-training decides which parts of it surface in any given answer.

Worth following when
you want to understand why the same base model behaves very differently depending on which company shipped it — and where that behavior is decided.
Topics
RLHF and preference learning in open documentation; post-training as the layer where model behavior is actually determined; open-model alignment pipelines (Tülu).
Key works
Tülu post-training framework (2023, lead); RLHF Book (2024, ongoing); Interconnects newsletter as longer-form public analysis (2022, ongoing).