Writing

Thoughts on AI, research, and building things that matter. These are older posts, but more are on the way.

An Intuitive Introduction to Solving Scalable Oversight using Iterated Amplification

February 27, 20245 min read

How do you supervise an AI that's smarter than you? Iterated Amplification offers a surprisingly elegant answer — break the problem down, align the pieces, and build up from there.

NLPAdversarial TrainingDeep Learning

Everything You Need to Know About Adversarial Training in NLP

January 4, 202113 min read

Adversarial examples expose fundamental limitations of deep neural networks. This post covers what they are, how adversarial training works, and why the robustness-generalisation trade-off is harder to solve than it looks.