I've always thought the use of "Tensor" in the "TensorFlow" library is a misnomer. I'm not too familiar with ML/theory, is there a deeper geometric meaning to the multi-dimensional array of numbers we are multiplying or is "MatrixFlow" a more appropriate name?
Since the beginning of computer technology, "array" is the term that has been used for any multi-dimensional array, with "vectors" and "matrices" being special kinds of arrays. An exception was COBOL, which had a completely different terminology in comparison with the other programming languages of that time. Among the long list of differences between COBOL and the rest were e.g. "class" instead of "type" and "table" instead of "array". Some of the COBOL terminology has been inherited by languages like SQL or Simula 67 (hence the use of "class" in OOP languages).
A "tensor", as used in mathematics in physics is not any array, but it is a special kind of array, which is associated with a certain coordinate system and which is transformed by special rules whenever the coordinate system is changed.
The "tensor" in TensorFlow is a fancy name for what should be called just "array". When an array is bidimensional, "matrix" is an appropriate name for it.
I agree. Just like NumPy's Einsum. "Multi-Array Flow" doesn't sound sexy and associating your project with a renowned physicist's name gives your project that "we solve big science problems" vibe by association. Very pretentious, very predictable, and very cringe.
The joke I learned in a Physics course is "a vector is something that transforms like a vector," and "a tensor is something that transforms like a tensor." It's true, though.
The physicist's tensor is a matrix of functions of coordinates that transform in a prescribed way when the coordinates are transformed. It's a particular application of the chain rule from calculus.
I don't know why the word "tensor" is used in other contexts. Google says that the etymology of the word is:
> early 18th century: modern Latin, from Latin tendere ‘to stretch’.
So maybe the different senses of the word share the analogy of scaling matrices.
The mathematical definition is 99% equivalent to the physical one. I find that the physical one helps to motivate the mathematical one by illustrating the numerical difference between the basis-change transformation for (1,0)- and (0,1)-tensors. The mathematical one is then simpler and more conceptual once you've understood that motivation. The concept of a tensor really belongs to linear algebra, but occurs mostly in differential geometry.
There is still a "1% difference" in meaning though. This difference allows a physicist to say "the Christoffel symbols are not a tensor", while a mathematician would say this is a conflation of terms.
TensorFlow's terminology is based on the rule of thumb that a "vector" is really a 1D array (think column vector), a "matrix" is really a 2D array, and a "tensor" is then an nD array. That's it. This is offensive to physicists especially, but ¯\_(ツ)_/¯
The problem with the physicist's definition is that the larger the N the less the geometrical interpretation makes sense. For 1, 2, and even 3-dimensional tensors there is some connection to geometry, but eventually it loses all meaning. Physicist has to give up and "admit" that an N-dimensional tensor really just is a collection of N-1-dimensional tensors.
The tensors in tensorflow are often higher dimensional. Is a 3d block of numbers (say 1920x1080x3) still a matrix? I would argue it's not. Are there transformation rules for matrices?
You're totally correct that the tensors in tensorflow do drop the geometric meaning, but there's precedence there from how CS vs math folk use vectors.
Matrices are strictly two-dimensional arrays (together with some other properties, but for a computer scientist that's it). Tensors are the generalization to higher dimensional arrays.
I could stop right here since it's a counterexample to x being a matrix (with a matrix product defined on it; P.S. try tf.matmul(x, x)--it will fail; there's no .transpose either). But that's only technically correct :)
So let's look at tensorflow some more:
The tensorflow tensors should transform like vectors would under change of coordinate system.
In order to see that, let's do a change of coordinate system. To summarize the stuff below: If L1 and W12 are indeed tensors, it should be true that A L1 W12 A^-1 = L1 W12.
Try it (in tensorflow) and see whether the new tensor obeys the tensor laws after the transformation. Interpret the changes to the nodes as covariant and the changes to the weights as contravariant:
import tensorflow as tf
# Initial outputs of one layer of nodes in your neural network
L1 = tf.constant([2.5, 4, 1.2], dtype=tf.float32)
# Our evil transformation matrix (coordinate system change)
A = tf.constant([[2, 0, 0], [0, 1, 0], [0, 0, 0.2]], dtype=tf.float32)
# Weights (no particular values; "random")
W12 = tf.constant(
[[-1, 0.4, 1.5],
[0.8, 0.5, 0.75],
[0.2, -0.3, 1]], dtype=tf.float32
)
# Covariant tensor nature; varying with the nodes
L1_covariant = tf.matmul(A, tf.reshape(L1, [3, 1]))
A_inverse = tf.linalg.inv(A)
# Contravariant tensor nature; varying against the nodes
W12_contravariant = tf.matmul(W12, A_inverse)
# Now derive the inputs for the next layer using the transformed node outputs and weights
L2 = tf.matmul(W12_contravariant, L1_covariant)
# Compare to the direct way
L2s = tf.matmul(W12, tf.reshape(L1, [3, 1]))
#assert L2 == L2s
A tensor (like a vector) is actually a very low-level object from the standpoint of linear algebra. It's not hard at all to make something a tensor. Think of it like geometric "assembly language".
In comparison, a matrix is rank 2 (and not all matrices represent tensors). That's it. No rank 3, rank 4, rank 1 (!!). So what does a matrix help you, really?
If you mean that the operations in tensorflow (and numpy before it) aren't beautiful or natural, I agree. It still works, though. If you want to stick to ascii and have no indices on names, you can't do much better (otherwise, use Cadabra[1]--which is great). For example, it was really difficult to write the stuff above without using indices and it's really not beautiful this way :(
See also http://singhal.info/ieee2001.pdf for a primer on information science, including its references, for vector spaces with an inner product that are usually used in ML. The latter are definitely geometry.
[1] https://cadabra.science/ (also in mogan or texmacs) - Einstein field equations also work there and are beautiful
In TensorFlow the tf.matmul function or the @ operator perform matrix multiplication. Element-wise multiplication ends up being useful for a lot of paralellizable computation but should not be confused with matrix multiplication.