11 Comments
User's avatar
Haodong's avatar

Welcome back brother !

Expand full comment
Sebastian Ruder's avatar

Thanks! :)

Expand full comment
Sebastian Raschka, PhD's avatar

1) Congrats on becoming a father!! 🥳

2) My favorite NLP newsletter is back!! 😊

Expand full comment
Sebastian Ruder's avatar

Thank you!! 🥰 Thanks for recommending NLP News!

Expand full comment
Oliver Adams's avatar

Thank you for returning to making these newsletters. They are such a fantastic contribution to society. I always get a better view of some topic and the effects of this flow on to the work I do.

Expand full comment
Sebastian Ruder's avatar

Thank you! Glad to be back! :)

Expand full comment
Joe's avatar

Great insights! Thank you.

Expand full comment
Sebastian Ruder's avatar

Thanks! :)

Expand full comment
Abhinav Upadhyay's avatar

Congratulations on the baby, Sebastian!

I loved your insights on the gzip + KNN paper. I wrote about it in my blog from the point of view of compression algorithms to decode, what exactly might be working well. Many people initially claimed that it is information theory at play. However, not all compression algorithms can achieve the same performance. For example, in my article I show that LZ77 performs just as well as gzip. But, character level Huffman coding doesn't work. Indeed, LZ77 is the basis of most of the compression techniques such as gzip, LZMA and zstd. And, LZ77 itself is taking advantage of the repetition of words/phrases between two pieces of text, the higher the common words between two texts, the higher the compression ratio of their concatenation. This is comparable to bag-of-words, which also performs better when the texts have many common words.

Full article here: https://codeconfessions.substack.com/p/lz77-is-all-you-need

Expand full comment
Sebastian Ruder's avatar

Thank you and thanks for the insights, Abhinav! I added a reference to your article to the post.

Expand full comment
Abhinav Upadhyay's avatar

Thank you :)

Expand full comment