Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> What would be some use cases for this?

It can do image recognition, speech recognition or text classification (eg, is this positive or negative sentiment) using the same model architecture (trained on different data each time)

It's very competitive in each of those fields with the existing state of the art. This is interesting because usually the models for each of those fields are different.

> How does it work?

It's trained by passing sequences of data (pixels sequentially, words in order, speech as a wav file) with part of each sequence masked out. The model has to learn to correctly guess what is in that masked area.

The innovation is that it instead of predicting the masked area directly, it tries to predict what the representation of that masked area is in the Neural Network itself. That's extremely unusual (I don't think I've seen that before) and I'll need to study it more to completely understand why that explains the better performance.

It's also very fast to train.



Hubert does something similar; it predicts masked spectrogram regions during its first phase, and then masked embeddings later in training. It's a bit of a pain to get the complicated training regimen to work well, though.


What is meant by "what the representation of that masked area" vs "masked area directly" ?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: