11 Comments

Welcome back brother !

Expand full comment

1) Congrats on becoming a father!! πŸ₯³

2) My favorite NLP newsletter is back!! 😊

Expand full comment

Thank you for returning to making these newsletters. They are such a fantastic contribution to society. I always get a better view of some topic and the effects of this flow on to the work I do.

Expand full comment

Great insights! Thank you.

Expand full comment

Congratulations on the baby, Sebastian!

I loved your insights on the gzip + KNN paper. I wrote about it in my blog from the point of view of compression algorithms to decode, what exactly might be working well. Many people initially claimed that it is information theory at play. However, not all compression algorithms can achieve the same performance. For example, in my article I show that LZ77 performs just as well as gzip. But, character level Huffman coding doesn't work. Indeed, LZ77 is the basis of most of the compression techniques such as gzip, LZMA and zstd. And, LZ77 itself is taking advantage of the repetition of words/phrases between two pieces of text, the higher the common words between two texts, the higher the compression ratio of their concatenation. This is comparable to bag-of-words, which also performs better when the texts have many common words.

Full article here: https://codeconfessions.substack.com/p/lz77-is-all-you-need

Expand full comment