This is what a truly revolutionary idea looks like. There are so many details in...

legel · on Nov 29, 2023

It's cool, it's also par for the field of 3D reconstruction today. I wouldn't describe this paper as particularly innovative or exceptional.

What do I think is really compelling in this field (given that it's my profession)?

This has me star-struck lately -- 3D meshing from a single image, a very large 3D reconstruction model trained on millions of all kinds of 3D models... https://yiconghong.me/LRM/

hedgehog · on Nov 28, 2023

Another thing to note here is this looks to be around seven total days of training on at most 4 A100s. Not all really cutting edge work requires a data center sized cluster.

tomcam · on Nov 28, 2023

Can someone explain quantized embeddings to me?

_hark · on Nov 28, 2023

NNs are typically continuous/differentiable so you can do gradient-based learning on them. We often want to use some of the structure the NN has learned to represent data efficiently. E.g., we might take a pre-trained GPT-type model, and put a passage of text through it, and instead of getting the next-token prediction probability (which GPT was trained on), we just get a snapshot of some of the activations at some intermediate layer of the network. The idea is that these activations will encode semantically useful information about the input text. Then we might e.g. store a bunch of these activations and use them to do semantic search/lookup to find similar passages of text, or whatever.

Quantized embeddings are just that, but you introduce some discrete structure into the NN, such that the representations there are not continuous. A typical way to do this these days is to learn a codebook VQ-VAE style. Basically, we take some intermediate continuous representation learned in the normal way, and replace it in the forward pass with the nearest "quantized" code from our codebook. It biases the learning since we can't differentiate through it, and we just pretend like we didn't take the quantization step, but it seems to work well. There's a lot more that can be said about why one might want to do this, the value of discrete vs continuous representations, efficiency, modularity, etc...

enjeyw · on Nov 28, 2023

If you’re willing, I’d love your insight on the “why one might want to do this”.

Conceptually I understand embedding quantization, and I have some hint of why it works for things like WAV2VEC - human phonemes are (somewhat) finite so forcing the representation to be finite makes sense - but I feel like there’s a level of detail that I’m missing regarding whats really going on and when quantisation helps/harms that I haven’t been able to gleam from papers.

topwalktown · on Nov 29, 2023

Quantization also works as regularization; it stops the neural network from being able to use arbitrarily complex internal rules.

But really it's only really useful if you absolutely need to have a discrete embedding space for some sort of downstream usage. VQVAEs can be difficult to get to converge, they have problems stemming from the approximation of the gradient like codebook collapse

visarga · on Nov 28, 2023

Maybe it helps to point out that the first version of Dall-E (of 'baby daikon radish in a tutu walking a dog' fame) used the same trick, but they quantized the image patches.

godelski · on Nov 29, 2023

> Also, we know that transformers can scale

Do we have strong evidence that other models don't scale or have we just put more time into transformers?

Convolutional resnets look to scale on vision and language: (cv) https://arxiv.org/abs/2301.00808, (cv) https://arxiv.org/abs/2110.00476, (nlp) https://github.com/HazyResearch/safari

MLPs also seem to scale: (cv) https://arxiv.org/abs/2105.01601, (cv) https://arxiv.org/abs/2105.03404

I mean I don't see a strong reason to turn away from attention as well but I also don't think anyone's thrown a billion parameter MLP or Conv model at a problem. We've put a lot of work into attention, transformers, and scaling these. Thousands of papers each year! Definitely don't see that for other architectures. The ResNet Strikes back paper is a great paper for one reason being that it should remind us all to not get lost in the hype and that our advancements are coupled. We learned a lot of training techniques since the original ResNet days and pushing those to ResNets also makes them a lot better and really closes the gaps. At least in vision (where I research). It is easy to railroad in research where we have publish or perish and hype driven reviewing.

donpark · on Nov 29, 2023

How does this differ from similar techniques previously applied to DNA and RNA sequences?

ganzuul · on Nov 28, 2023

...Is graph convolution matrix factorization by another name?

fjkdlsjflkds · on Nov 29, 2023

No... a graph convolution is just a convolution (over a graph, like all convolutions).

The difference from a "normal" convolution is that you can consider arbitrary connectivity of the graph (rather than the usual connectivity induced by a regular Euclidian grid), but the underlying idea is the same: to calculate the result of the operation at any single place (i.e., node), you need to perform a linear operation over that place (i.e., node) and its neighbourhood (i.e., connected nodes), the same way that (e.g.) in a convolutional neural network, you calculate the value of a pixel by considering its value and that of its neighbours, when performing a convolution.