chapter in the in-progress book on linear algebra. The table of contents so far:
- Chapter-1: The basics
- Chapter-2: Measure of a map (current)
Stay tuned for future chapters.
Linear algebra is the tool of many dimensions. No matter what you might be doing, as soon as you scale to \( n \) dimensions, linear algebra comes into the picture.
In the previous chapter, we described abstract linear maps. In this one, we roll up our sleeves and start to deal with matrices. Practical considerations like numerical stability, efficient algorithms, etc. will now start to be explored.
Note: all images in this article, unless otherwise stated are by the author.
I) How to quantify a linear map
Determinants are one of the most ancient concepts in linear algebra. The roots of the subject lay in solving systems of linear equations. And determinants would “determine” if there even was a solution worth looking for. But in most of the cases, where the system does have a solution, it provides further useful information. In the modern framework of linear maps, determinants provide a single quantification of linear maps.
We discussed in the previous chapter the concept of vector spaces (basically n-dimensional collections of numbers — and more generally collections of fields) and linear maps that operate on two of those vector spaces, taking objects in one to the other.
As an example of these kinds of maps, one vector space could be the surface of the planet you’re sitting on and the other could be the surface of the table you might be sitting at. Literal maps of the world are also maps in this sense since they “map” every point on the surface of the Earth to a point on a paper or surface of a table, although they aren’t linear maps since they don’t preserve relative areas (Greenland appears much larger than it is for example in some of the projections).
Once we pick a basis for the vector space (a collection of n “independent” vectors in the space; there could be infinite choices in general), all linear maps on that vector space get unique matrices assigned to them.
For the time being, let’s restrict our attention to maps that take vectors from an 𝑛-dimensional space back to the 𝑛-dimensional space (we’ll generalize later). The matrices corresponding to these linear maps are 𝑛×𝑛 (see section III of chapter 1). It might be useful to “quantify” such a linear map, express its effect on the vector space, ℝⁿ in a single number. The kind of map we’re dealing with, effectively takes vectors from ℝⁿ and “distorts” them into some other vectors in the same space. Both the original vector 𝑣 and the vector 𝑢 that the map converted it into have some lengths (say |𝑣| and |𝑢|). We can think about how much the length of the vector is changed by the map, |𝑢|∕|𝑣|. Maybe that can quantify the impact of the map? How much it “stretches” vectors?
This approach has a fatal flaw. The ratio depends not just on the linear map, but also on the vector 𝑣 it acts on. It is therefore not strictly a property of the linear map itself.
What if we take two vectors instead now, 𝑣₁ and 𝑣₂ which are converted by the linear map into the vectors 𝑢₁ and 𝑢₂. Just as the measure of the single vector, 𝑣 was its length, the measure of two vectors is the area of the parallelogram contained between them.

Just as we considered the amount by which the length of 𝑣 changed, we can now talk in terms of the amount by which the area between 𝑣₁ and 𝑣₂ changes once they pass through the linear map and become 𝑢₁, 𝑢₂. And alas, this again depends not just on the linear map, but also the vectors chosen.
Next, we can go to three vectors and consider the change in volume of the parallelepiped between them and run into the same problem of the initial vectors having a say.

But now consider an n-dimensional region in the original vector space. This region will have some “n-dimensional measure”. To understand this, a two dimensional measure is an area (measured in square kilometers). A three dimensional measure is the volume used for measuring water (in liters). A four dimensional measure has no counterpart in the physical world we’re used to, but is just as mathematically sound, a measure of the amount of four dimensional space enclosed within a parallelepiped formed of four 4- d vectors and so on.

The 𝑛 original vectors (𝑣₁, 𝑣₂, …, 𝑣ₙ) form a parallelepiped which is transformed by the linear map into 𝑛 new vectors, 𝑢₁, 𝑢₂, …, 𝑢ₙ which form their own parallelepiped. We can then ask about the 𝑛-dimensional measure of the new region in relation to the original one. And this ratio, it turns out, is indeed a function only of the linear map. Regardless of what the original region looked like, where it was placed and so on, the ratio of its measure once the linear map acted on it to its measure before will be the same — a function purely of the linear map. This ratio of 𝑛-dimensional measures (after to before) then is what we’ve been looking for: an exclusive property of the linear map that quantifies its effect in one number.
This ratio by which the measure of any 𝑛-dimensional patch of space is changed by the linear map is a good way to quantify the effect it has on the space it acts on. It is called the determinant of the linear map (the reason for that name will become apparent in section V).
For now, we simply stated the fact that the amount by which a linear map from ℝⁿ to ℝⁿ “stretches” any patch of 𝑛-dimensional space depends only on the map without offering a proof since the purpose here was motivation. We’ll cover a proof later (section VI), once we arm ourselves with some weapons.
II) Calculating determinants
Now, how do we find this determinant given a linear map from the vector space ℝⁿ back to ℝⁿ? We can take any 𝑛 vectors, find the measure of the parallelepiped between them and the measure of the new parallelepiped once the linear map has acted on all of them. Finally, divide the latter by the former.
We need to make these steps more concrete. First, let’s start playing around in this ℝⁿ vector space.
The ℝⁿ vector space is just a collection of 𝑛 real numbers. The simplest vector is just 𝑛 zeros — [0, 0, …, 0]. This is the zero vector. If we multiply a scalar with it, we just get the zero vector back. Not interesting. For the next simplest vector, we can replace the first 0 with a 1. This leads to the vector: 𝑒₁ = [1, 0, 0, …, 0]. Now, multiplying by a scalar, 𝑐 gives us a different vector.
$$c.[1, 0, 0,.., 0] = [c, 0, 0, …, 0]$$
We can “span” an infinite number of vectors with 𝑒₁ depending on the scalar 𝑐 we choose.
If 𝑒₁ is the vector with just the first element being 1 and the rest being 0, then what is 𝑒₂? The second element being 1 and the rest being 0 seems like a logical choice.
$$e_2 = [0,1,0,0,\dots 0]$$
Taking this to its logical conclusion, we get a collection of n vectors:

These vectors form a basis of the vector space that is ℝⁿ. What does this mean? Any vector 𝑣 in ℝⁿ can be expressed as a linear combination of these 𝑛 vectors. Which means that for some scalars 𝑐₁, 𝑐₂, …, 𝑐ₙ:
$$v = c_1.e_1+c_2.e_2+\dots +c_n.e_n$$
All vectors, 𝑣 are “spanned” by the set of vectors 𝑒₁, 𝑒₂, …, 𝑒ₙ.
This particular collection of vectors isn’t the only basis. Any set of 𝑛 vectors works. The only caveat is that none of the 𝑛 vectors should be “spanned” by the rest. In other words, all the 𝑛 vectors should be linearly independent. If we choose 𝑛 random numbers from most continuous distributions and repeat the process 𝑛 times to create the 𝑛 vectors, you will get a set of linearly independent vectors with 100% probability (“almost surely” in probability terms). It’s just very, very unlikely that a random vector happens to be “spanned” by some other 𝑘 < 𝑛 random vectors.
Going back to our recipe at the beginning of this section to find the determinant of a linear map, we now have a basis to express our vectors in. Fixing the basis also means our linear map can be expressed as a matrix (see section III of chapter 1). Since this linear map is taking vectors from ℝⁿ back to ℝⁿ, the corresponding matrix is 𝑛 × 𝑛.
Next, we needed 𝑛 vectors to form our parallelepiped. Why not take the 𝑒₁, 𝑒₂, …, 𝑒ₙ standard basis we defined before? The measure of the patch of space contained between these vectors happens to be 1, by very definition. The picture below for ℝ³ will hopefully make this clear.

If we collect these vectors from the standard basis into a matrix (rows or columns), we get the identity matrix (1’s on the main diagonal, 0’s everywhere else):

When we said we could apply our linear transform to any n-dimensional patch of space, we might as well apply it to this “standard” patch.
But, it’s easy to show that multiplying any matrix with the identity matrix results in the same matrix. So, the resulting vectors after the linear map is applied are the columns of the matrix representing the linear map itself. So, the amount by which the linear map changed the volume of the “standard patch” is the same as the n-dimensional measure of the parallelepiped between the column vectors of the matrix representing the map itself.
To recap, we started by motivating the determinant as the ratio by which a linear map changes the measure of an n-dimensional patch of space. And now, we showed that this ratio itself is an n-dimensional measure. In particular, the measure contained between the column vectors of any matrix representing the linear map.
III) Motivating the basic properties
We described in the previous section how a determinant of a linear map should simply be the measure contained between the vectors of any of its matrix representations. In this section, we use two dimensional space (where measures are areas) to motivate some fundamental properties a determinant must have.
The first property is multi-linearity. A determinant is a function that takes a bunch of vectors (collected in a matrix) and maps them to a single scalar. Since we’re restricting to two-dimensional space, we’ll consider two vectors, both two dimensional. Our determinant (since we’ve motivated it to be the area of the parallelogram between the vectors) can be expressed as:
$$det = A(v_1, v_2)$$
How should this function behave if we add a vector to one of the two vectors? The multi-linearity property requires:
$$A(v_1+v_3, v_2) = A(v_1,v_2)+A(v_3,v_2)\tag{1}$$
This is apparent from the moving picture below (note the new area getting added).

And this visualization can also be used to see (by scaling one of the vectors instead of adding another vector to it):
$$A(c.v_1, v_2) = c.A(v_1, v_2) \tag{2}$$
This second property has an important implication. What if we plug a negative c into the equation?
The area, 𝐴(𝑣₁, 𝑣₂) should then be the opposite sign to 𝐴(𝑐·𝑣₁, 𝑣₂).
Which means we need to introduce the notion of negative area and a negative determinant.
This makes a lot of sense if we’re okay with the concept of negative lengths. If lengths — measures in 1-D space — can be positive or negative, then it stands to reason that areas — measures in 2-D space — should also be allowed to be negative. And so, measures in space of any dimensionality should as well.
Together, equations (1) and (2) are the multi-linearity property.
Another important property that has to do with the sign of the determinant is the alternating property. It requires:
$$A(v_1, v_2) = -A(v_2, v_1)$$
Swapping the order of two vectors negates the sign of the determinant (or measure between them). If you learned about the cross product of 3-D vectors, this property will be very natural. To motivate it, let’s think first of the one-dimensional distance between two position vectors, 𝑑(𝑣₁, 𝑣₂). It’s clear that 𝑑(𝑣₁, 𝑣₂) = −𝑑(𝑣₂, 𝑣₁) since when we go from 𝑣₂ to 𝑣₁, we’re traveling in the opposite direction to when we go from 𝑣₁ to 𝑣₂. Similarly, if the area spanned between vectors 𝑣₁ and 𝑣₂ is positive, then that between 𝑣₂ and 𝑣₁ must be negative. This property holds in 𝑛-dimensional space as well. If in 𝐴(𝑣₁, 𝑣₂, …, 𝑣ₙ) we swap two of the vectors, it causes the sign to switch.
The alternating property also implies that if one of the vectors is simply a scalar multiple of the other, the determinant must be 0. This is because swapping the two vectors should negate the determinant:
$$\begin{align}A(v_1, v_1) = -A(v_1, v_1)\
=> 2 A(v_1, v_1) = 0\
=> A(v_1, v_1) = 0\end{align}$$
We also have by multi-linearity (equation 2):
$$A(v_1, c.v_1) = c A(v_1, v_1) = 0$$
This makes sense geometrically since if two vectors are parallel to each other, the area between them is \( 0 \).
The video [6] covers the geometric motivation of these properties with really good visualizations and video [4] visualizes the alternating property quite well.
IV) Getting algebraic: Deriving the Leibniz formula
In this section, we move away from geometric intuition and approach the topic of determinants from an alternate route — that of cold, algebraic calculations.
See, the multi-linearity and alternating properties which we motivated in the last section with geometry are (remarkably) enough to give us a very specific algebraic formula for the determinant, called the Leibniz formula.
That formula helps us see properties of the determinant that would be really, really hard to observe from the geometric approach or with other algebraic formulas.
The Leibniz formula can then be reduced to the Laplace expansion, involving going along a row or column and calculating cofactors — which many people see in high school.
Let’s derive the Leibniz formula. We need a function that takes the 𝑛 column vectors, 𝛼₁, 𝛼₂, …, 𝛼ₙ of the matrix as input and converts them into a scalar, 𝑐.
$$c=f(\vec{a_1}, \vec{a_2}, \dots \vec{a_n})$$
We can express each column vector in terms of the standard basis of the space.

Now, we can apply the property of multi-linearity. For now, to the first column, 𝛼₁.

We can do the same for the second column. Let’s take just the first term from the summation above and take a look at the resulting terms.

Note that in the first term, we get the vector 𝑒₁ appearing twice. And by the alternating property, the function 𝑓 for that term becomes 0.
In order for two 𝑒₁’s to appear, the second indices of the two 𝑎’s in the product must each become 1.
So, once we do this for all the columns, the terms that won’t become zero by the alternating property will be the ones where the second indices of the 𝑎’s don’t have any repetition — so all distinct numbers from 1 to 𝑛. In other words, we’re looking for permutations of 1 to 𝑛 to appear in the second indices of the 𝑎’s.
What about the first indices of the 𝑎’s? Those are simply the numbers 1 to 𝑛 in order since we pull out the 𝑎₁ₓ’s first, then the 𝑎₂ₓ’s, and so on. In more compact algebraic notation,

In the expression on the right, the areas 𝑓(𝑒_{𝑗₁}, 𝑒_{𝑗₂}, …, 𝑒_{𝑗ₙ}) can either be +1, −1, or 0 since the 𝑒ⱼ’s are all unit vectors orthogonal to each other. We already established that any term that has any repeated 𝑒ⱼ’s will become 0, leaving us with just permutations (no repetition). Among those permutations, we will sometimes get +1 and sometimes −1.
The concept of permutations carries with it signs. The signs of the areas are equivalent to the signs of the permutations. If we denote by 𝑆ₙ the set of all permutations of [1, 2, …, 𝑛], then we get the Leibniz formula of the determinant:
$$\det([\vec{a_1}, \vec{a_2}, \dots \vec{a_n}]) = |A| = \sum\limits_{\sigma \in S_n} sgn(\sigma) \prod \limits_{i=1}^n a_{i,\sigma(i)} \tag{3}$$
This formula is also described in detail in mathexchange post, [3]. And to make things concrete, here is some simple Python code that implements it (along with a test case).
One shouldn’t actually use this formula to calculate the determinant of a matrix (unless it’s just for fun or exposition). It works, but is comically inefficient given the sum over all permutations (which is 𝑛!, which is super-exponential).
However, many theoretical properties of the determinant become trivial to see with the Leibniz formula when they would be very hard to decipher or prove if we started from another of its forms. For example:
- Proposition-1: With this formula it becomes apparent that a matrix and its transpose have the same determinant: |𝐴| = |𝐴ᵀ|. It’s a simple consequence of the symmetry of the formula.
- Proposition-2: A very similar derivation to the above can be used to show that for two matrices 𝐴 and 𝐵, |𝐴𝐵| = |𝐴| ⋅ |𝐵|. See this answer in the mathexchange post, [8]. This is a very convenient property since matrix multiplication comes up all the time in various decompositions of matrices, and reasoning about the determinants of those decompositions can be a powerful tool.
- Proposition-3: With the Leibniz formula, we can easily see that if the matrix is upper triangular or lower triangular (lower triangular means every element of the matrix above the diagonal is zero), the determinant is simply the product of the entries on the diagonal. This is because all permutations bar one: (𝑎₁₁ ⋅ 𝑎₂₂ ⋯ 𝑎ₙₙ) (the main diagonal) get some zero term or the other and make their terms in the summation 0.

The third fact actually leads to the most efficient algorithm for calculating a determinant that most linear algebra libraries use. A matrix can be decomposed efficiently into lower and upper triangular matrices (called the LU decomposition which we’ll cover in the next chapter). After doing this decomposition, the third fact is used to multiply the diagonals of those lower and upper matrices to get their determinants. And finally, the second fact is used to multiply those two determinants and get the determinant of the original matrix.
A lot of people in high school or university when first exposed to the determinant, learn about the Laplace expansion, which involves expanding about a row or column, finding co-factors for each element and summing. That can be derived from the above Leibniz expansion by collecting similar terms. See this answer to the mathexchange post, [2].
V) Historic motivation
The determinant was first discovered in the context of linear systems of equations. Say we have 𝑛 equations in 𝑛 variables (𝑥₀, 𝑥₁, …, 𝑥ₙ).

This system can be expressed in matrix form:

And more compactly:
$$A.x = b$$
An important question is whether or not the system above has a unique solution, x. And the determinant is a function that “determines” this. There is a unique solution if and only if the determinant of A is non-zero.
This historically inspired approach motivates the determinant as a polynomial that arises when we try to solve a linear system of equations associated with the linear map. We will cover this in more depth in chapter 5.
For more on this, see the excellent answer in the mathexchange post, [8].
VI) Proof of the property we motivated with
We started this chapter by motivating the determinant as the amount by which the ℝⁿ → ℝⁿ linear map changes the measure of an n-dimensional patch of space. We also said that this doesn’t work for 1, 2, … n − 1 dimensional measures. Below is a proof of this where we use some of the properties we encountered in the rest of the sections.
Define (𝑉, 𝑈) as 𝑛 × 𝑘 matrices, where
$$ V = (v_1, v_2, \dots, v_k) $$
By definition,
$$|v_1, v_2, \dots, v_k| = \sqrt{\det(V^t V)} $$ and
$$ |u_1, u_2, \dots, u_k| = \sqrt{\det(U^t U)} = \sqrt{\det((AV)^t (AV))} = \sqrt{\det(V^t A^t A V)} $$
Only when n = k is V is a square matrix, so
$$|v_1, v_2, \dots, v_k| = \sqrt{\det(V^t A^t A V)}$$
$$= \sqrt{\det(V^t) \det(A^t) \det(A) \det(V)} $$
$$= \det(A) \sqrt{\det(V^t V)} = \det(A) |v_1, v_2, \dots, v_k| $$
References
[1] Mathexchange post: Determinant of a linear map doesn’t depend on the bases: https://math.stackexchange.com/questions/962382/determinant-of-linear-transformation
[2] Mathexchange post: Determinant of a matrix Laplace expansion (high school formula) https://math.stackexchange.com/a/4225580/155881
[3] Mathexchange post: Understanding Leibniz formula for determinants https://math.stackexchange.com/questions/319321/understanding-the-leibniz-formula-for-determinants#:~:text=The%20formula%20says%20that%20det,permutation%20get%20a%20minus%20sign.&text=where%20the%20minus%20signs%20correspond%20to%20the%20odd%20permutations%20from%20above.
[4] Youtube video: 3B1B on determinants https://www.youtube.com/watch?v=Ip3X9LOh2dk&t=295s
[5] Connecting Leibniz formula with geometry https://math.stackexchange.com/questions/593222/leibniz-formula-and-determinants
[6] Youtube video: Leibniz formula is area: https://www.youtube.com/watch?v=9IswLDsEWFk
[7] Mathexchange post: product of determinants is determinant of product https://math.stackexchange.com/questions/60284/how-to-show-that-detab-deta-detb
[8] Historic context for motivating determinant: https://math.stackexchange.com/a/4782557/155881