Some embedding models are explicitly trained on cosine similarity. Otherwise, if...

extasia · on Sept 6, 2024

This is not quite right; you are actually losing information about each of the dimensions and your mental model of reducing the dimensionality by one is misleading.

Consider [1,0] and [x,x] Normalised we get [1,0] and [sqrt(.5),sqrt(.5)] — clearly something has changed because the first vector is now larger in dimension zero than the second, despite starting off as an arbitrary value, x, which could have been smaller than 1. As such we have lost information about x’s magnitude which we cannot recover from just the normalized vector.

Scene_Cast2 · on Sept 6, 2024

Well, depends. For some models (especially two tower style models that use a dot product), you're definitely right and it makes a huge difference. In my very limited experience with LLM embeddings, it doesn't seem to make a difference.

extasia · on Sept 6, 2024

Interesting, I hadn’t heard of two tower modes before!

Yes, I guess it’s curious that the information lost doesn’t seem very significant (this also matches my experience!)

Scene_Cast2 · on Sept 7, 2024

Two tower models (and various variants thereof) are popular for early stages of recommendation system pipelines and search engine pipelines.

atorodius · on Sept 6, 2024

That‘s exactly the point no? We lost 1 dim (magnitude). Not so nice in 2d but no biggie in 512d

extasia · on Sept 6, 2024

Magnitude is not a dimension, it’s information about each value that is lost when you normalize it. To prove this normalize any vector and then try to de-normalize it again.

kccqzy · on Sept 7, 2024

Magnitude is a dimension. Any 2-dimensional vector can be explicitly transformed into the polar (r, theta) coordinate system where one of the dimensions is magnitude. Any 3-dimensional vector can be transformed into the spherical (r, theta, phi) coordinate where one of the dimensions is magnitude. This is high school mathematics. (Okay I concede that maybe the spherical coordinate system isn't exactly high school material, then just think about longitude, latitude, and distance from the center.)

immibis · on Sept 6, 2024

Impossible because... you lost a dimension.

extasia · on Sept 7, 2024

That’s not mathematically accurate though, is it? We haven’t reduced the dimension of the vector by one.

Pray tell, which dimension do we lose when we normalize, say a 2D vector?

creata · on Sept 7, 2024

Mathematically, it's fine to say that you've lost the magnitude dimension.

Before normalization, the vector lies in R^n, which is an n-dimensional manifold.

After normalization, the vector lies in the unit sphere in R^n, which is an (n-1)-dimensional manifold.

thaumasiotes · on Sept 7, 2024

Magnitude, obviously.

>>> Magnitude is not a dimension [...] To prove this normalize any vector and then try to de-normalize it again.

Say you have the vector (18, -5) in a normal Euclidean x, y plane.

Now project that vector onto the y-axis.

Now try to un-project it again.

What do you think you just proved?

langcss · on Sept 7, 2024

A circle circumference is a line, is 1D?

ekianjo · on Sept 7, 2024

you dont lose anything when you normalize things. not sure what you are tallking about.

renewiltord · on Sept 7, 2024

There's something wrong with the picture here but I can't put my finger on it because my mathematical background here is too old. The space of k dimension vectors all normalized isn't a vector space itself. It's well-behaved in many ways but you lose the 0 vector (may not be relevant). Addition isn't defined anymore, and if you try to keep it inside by normalization post addition, distribution becomes weird. I have no idea what this transformation means for word2vec and friends.

But the intuitive notion is that if you take all 3D and flatten it / expand it to be just the surface of the 3D sphere, then paste yourself onto it Flatland style, it's not the same as if you were to Flatland yourself into the 2D plane. The obvious thing is that triangles won't sum to 180, but also parallel lines will intersect, and all sorts of differing strange things will happen.

I mean, it might still work in practice, but it's obviously different from some method of dimensionality reduction because you're changing the curvature of the space.

kccqzy · on Sept 7, 2024

The space of all normalized k-dimensional vector is just a unit k-sphere. You can deal with it directly, or you can use the standard inverse stereographic projection to map every point (except for one) onto a plane.

> triangles won't sum to 180

Exactly. Spherical triangles have the sum of their interior angles exceed 180 degrees.

> parallel lines will intersect

Yes because parallel "lines" are really great circles on the sphere.

renewiltord · on Sept 7, 2024

So is it actually the case that normalizing down and then mapping to the k-1 plane yields a useful (for this purpose) k-1 space? Something feels wrong about the whole thing but I must just have broken intuition.

kccqzy · on Sept 7, 2024

I do not understand the purpose that you are referring to in this comment or the earlier comment. But it is useful for some purposes.