Here are three reasons you want to be able to calculate the volume change for arbitrary parallelpipeds:
- If det M = 0, then M is not invertible. Knowing this is useful for all kinds of reasons. It means you cannot solve an equation like Mx = b by taking the inverse ("dividing") on both sides, x = M \ b. It means you can find the eigenvalues of a matrix by rearranging Mx = λx <--> (M-λI)x = 0 <--> det M-λI = 0, which is a polynomial equation.
- Rotations are volume-preserving, so the rotation group can be expressed as the matrices where det M = 1 (well, the component connected to the identity). This is useful for theoretical physics, where they're playing around with such groups and need representations they can do things with.
- In information theory, the differential entropy (or average amount of bits it takes to describe a particular point in a continuous probability distribution) increases if you spread out the distribution, and decreases if you squeeze it together by exactly log |det M| for a linear transformation. A nonlinear transformation can be linearized with its gradient. This is useful for image compression (and thus generation) with normalizing flow neural networks.
- If det M = 0, then M is not invertible. Knowing this is useful for all kinds of reasons. It means you cannot solve an equation like Mx = b by taking the inverse ("dividing") on both sides, x = M \ b. It means you can find the eigenvalues of a matrix by rearranging Mx = λx <--> (M-λI)x = 0 <--> det M-λI = 0, which is a polynomial equation.
- Rotations are volume-preserving, so the rotation group can be expressed as the matrices where det M = 1 (well, the component connected to the identity). This is useful for theoretical physics, where they're playing around with such groups and need representations they can do things with.
- In information theory, the differential entropy (or average amount of bits it takes to describe a particular point in a continuous probability distribution) increases if you spread out the distribution, and decreases if you squeeze it together by exactly log |det M| for a linear transformation. A nonlinear transformation can be linearized with its gradient. This is useful for image compression (and thus generation) with normalizing flow neural networks.