> What would be some use cases for this? It can do image recognition, speech rec... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		nl on Dec 13, 2022 \| parent \| context \| favorite \| on: Data2vec 2.0: Highly efficient self-supervised lea... > What would be some use cases for this? It can do image recognition, speech recognition or text classification (eg, is this positive or negative sentiment) using the same model architecture (trained on different data each time) It's very competitive in each of those fields with the existing state of the art. This is interesting because usually the models for each of those fields are different. > How does it work? It's trained by passing sequences of data (pixels sequentially, words in order, speech as a wav file) with part of each sequence masked out. The model has to learn to correctly guess what is in that masked area. The innovation is that it instead of predicting the masked area directly, it tries to predict what the representation of that masked area is in the Neural Network itself. That's extremely unusual (I don't think I've seen that before) and I'll need to study it more to completely understand why that explains the better performance. It's also very fast to train.

sdenton4 on Dec 14, 2022 | [–]

Hubert does something similar; it predicts masked spectrogram regions during its first phase, and then masked embeddings later in training. It's a bit of a pain to get the complicated training regimen to work well, though.

deskamess on Dec 14, 2022 | [–]

What is meant by "what the representation of that masked area" vs "masked area directly" ?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact