This newsletter discusses attention that enables modeling long sequences and simple but surprisingly competitive classifiers such as based on gzip compression.
Thank you for returning to making these newsletters. They are such a fantastic contribution to society. I always get a better view of some topic and the effects of this flow on to the work I do.
I loved your insights on the gzip + KNN paper. I wrote about it in my blog from the point of view of compression algorithms to decode, what exactly might be working well. Many people initially claimed that it is information theory at play. However, not all compression algorithms can achieve the same performance. For example, in my article I show that LZ77 performs just as well as gzip. But, character level Huffman coding doesn't work. Indeed, LZ77 is the basis of most of the compression techniques such as gzip, LZMA and zstd. And, LZ77 itself is taking advantage of the repetition of words/phrases between two pieces of text, the higher the common words between two texts, the higher the compression ratio of their concatenation. This is comparable to bag-of-words, which also performs better when the texts have many common words.
Welcome back brother !
Thanks! :)
1) Congrats on becoming a father!! 🥳
2) My favorite NLP newsletter is back!! 😊
Thank you!! 🥰 Thanks for recommending NLP News!
Thank you for returning to making these newsletters. They are such a fantastic contribution to society. I always get a better view of some topic and the effects of this flow on to the work I do.
Thank you! Glad to be back! :)
Great insights! Thank you.
Thanks! :)
Congratulations on the baby, Sebastian!
I loved your insights on the gzip + KNN paper. I wrote about it in my blog from the point of view of compression algorithms to decode, what exactly might be working well. Many people initially claimed that it is information theory at play. However, not all compression algorithms can achieve the same performance. For example, in my article I show that LZ77 performs just as well as gzip. But, character level Huffman coding doesn't work. Indeed, LZ77 is the basis of most of the compression techniques such as gzip, LZMA and zstd. And, LZ77 itself is taking advantage of the repetition of words/phrases between two pieces of text, the higher the common words between two texts, the higher the compression ratio of their concatenation. This is comparable to bag-of-words, which also performs better when the texts have many common words.
Full article here: https://codeconfessions.substack.com/p/lz77-is-all-you-need
Thank you and thanks for the insights, Abhinav! I added a reference to your article to the post.
Thank you :)