Hi all,The theme of this newsletter are juxtapositions: training ever bigger models (GPT-8 8B) vs. making models smaller (via distillation or compression); powerful models (see Tools ⚒) vs. dumb models à la Clever Hans, i.e. that only appear to be able to perform complex tasks (see Articles and Blog Posts 📰). Besides these themes, there are as always many other interesting tools, blog posts, and papers.Contributions 💪 If you have written or have come across something that would be relevant to the community, hit reply on the issue so that it can be shared more widely.I really appreciate your feedback, so let me know what you love ❤️ and hate 💔 about this edition. Simply hit reply on the issue.If you were referred by a friend, click here to subscribe. If you enjoyed this issue, give it a tweet 🐦.
Share this post
Bigger vs. smaller models, powerful vs. dumb…
Share this post
Hi all,The theme of this newsletter are juxtapositions: training ever bigger models (GPT-8 8B) vs. making models smaller (via distillation or compression); powerful models (see Tools ⚒) vs. dumb models à la Clever Hans, i.e. that only appear to be able to perform complex tasks (see Articles and Blog Posts 📰). Besides these themes, there are as always many other interesting tools, blog posts, and papers.Contributions 💪 If you have written or have come across something that would be relevant to the community, hit reply on the issue so that it can be shared more widely.I really appreciate your feedback, so let me know what you love ❤️ and hate 💔 about this edition. Simply hit reply on the issue.If you were referred by a friend, click here to subscribe. If you enjoyed this issue, give it a tweet 🐦.