‘Show Your Working’: ChatGPT Performance Doubled w/ Process Rewards (+Synthetic Data Event Horizon)

I have not only read the Let's Verify step by Step paper released less than 24 hours ago, I have combed the release notes and appendix, read most of the linked papers and done my own tests. It's true, performance is massively boosted, and not just for mathematics but science and other domains too. I'll show you comparisons with GPT 3 and PaLM 2, and demonstrate that new records are coming soon.

I will also cover the 'synthetic data event horizon' and what might have gone into GPT 4's training. I'll show you how PRM works vs ORM, and why finetuning is still relevant. Plus I'll cover reaction from Jan Leike, Ilya Sutskever, Sam Altman and more. I will also feature the highly relevant paper 'Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting'. I'll also give a glimpse from Rob Miles about just how weirdly GPT 4 might think.

Verify Paper: https://cdn.openai.com/improving-mathematical-reasoning-with-process-supervision/Lets_Verify_Step_by_Step.pdf
Release Page: https://openai.com/research/improving-mathematical-reasoning-with-process-supervision#samples
Altman tweet: https://twitter.com/sama/status/1664018190840614912
Language Models Don’t Always Say What They Think: https://arxiv.org/pdf/2305.04388.pdf
Sparks of AGI MATH Comparison: https://arxiv.org/pdf/2303.12712v5.pdf
PaLM 2 Comparison: https://ai.google/static/documents/palm2techreport.pdf
AP Chemistry Calc: https://www.albert.io/blog/ap-chemistry-score-calculator/
Altman Synthetic Data (min 4): https://www.youtube.com/watch?v=1egAKCKPKCk&t=764s
Anthropic View: https://www.anthropic.com/index/core-views-on-ai-safety
Jan Leike Tweet: https://twitter.com/janleike/status/1663977494058520576
Rob Miles' Tweet: https://twitter.com/robertskmiles/status/1663534255249453056

https://www.patreon.com/AIExplained
Leave a Reply Cancel reply