A Bird’s Eye View of Linear Algebra: The Basics

of the in-progress book on Linear Algebra, “A birds eye view of linear algebra”. This book will put a special emphasis on AI applications and how they leverage linear algebra.

Linear algebra is a fundamental discipline underlying anything one can do with Math. From Physics to machine learning, probability theory (ex: Markov chains), you name it. No matter what you’re doing, linear algebra is always lurking under the covers, ready to spring at you as soon as things go multi-dimensional. In my experience (and I’ve heard this from others), this was at the source of a big shock between high school and university. In high school (India), I was exposed to some very basic linear algebra (mainly determinants and matrix multiplication). Then in university level engineering education, every subject all of a sudden seems to be assuming proficiency in concepts like Eigen values, Jacobians, etc. like you were supposed to be born with the knowledge.

This chapter is meant to provide a high level overview of the concepts and their obvious applications that exist and are important to know in this discipline.

The AI revolution

Almost any information can be embedded in a vector space. Images, video, language, speech, biometric information and whatever else you can imagine. And all the applications of machine learning and artificial intelligence (like the recent chat-bots, text to image, etc.) work on top of these vector embeddings. Since linear algebra is the science of dealing with high dimensional vector spaces, it is an indispensable building block.

Complex concepts from our real world like images, text, speech, etc. can be embedded in high dimensional vector spaces. The higher the dimensionality of the vector space, the more complex information it can encode. Image created using MIdjourney.

A lot of the techniques involve taking some input vectors from one space and mapping them to other vectors from some other space.

But why the focus on “linear” when most interesting functions are non-linear? It’s because the problem of making our models high dimensional and that of making them non-linear (general enough to capture all kinds of complex relationships) turn out to be orthogonal to each other. Many neural network architectures work by using linear layers with simple one dimensional non-linearities in between them. And there is a theorem that says this kind of architecture can model any function.

Since the way we manipulate high dimensional vectors is primarily matrix multiplication, it isn’t a stretch to say it is the bedrock of the modern AI revolution.

I) Vector spaces

As mentioned in the previous section, linear algebra inevitably crops up when things go multi-dimensional. We start off with a scalar, which is just a number of some sort. For this article, we’ll be considering real and complex numbers for these scalars. In general, a scalar can be any object where the basic operations of addition, subtraction, multiplication and division are defined (abstracted as a “field”). Now, we want a framework to describe collections of such numbers (add dimensions). These collections are called “vector spaces”. We’ll be considering the cases where the elements of the vector space are either real or complex numbers (the former being a special case of the latter). The resulting vector spaces are called “real vector spaces” and “complex vector spaces” respectively.

The ideas in linear algebra are applicable to these “vector spaces”. The most common example is your floor, table or the computer screen you’re reading this on. These are all two-dimensional vector spaces since every point on your table can be specified by two numbers (the x and y coordinates as shown below). This space is denoted by R² since two real numbers specify it.

We can generalize R² in different ways. First, we can add dimensions. The space we live in is 3 dimensional (R³). Or, we can curve it. The surface of a sphere like the Earth for example (denoted S²), is still two dimensional, but unlike R² (which is flat), it is curved. So far, these spaces have all basically been arrays of numbers. But the idea of a vector space is more general. It is a collection of objects where the following ideas should be well defined:

Addition of any two of the objects.
Multiplication of the objects by a scalar (a real number).

Not only that, but the objects should be “closed” under these operations. This means that if you apply these two operations to the objects of the vector space, you should get objects of the same type (you shouldn’t leave the vector space). For example, the set of integers isn’t a vector space because multiplication by a scalar (real number) can give us something that isn’t an integer (3*2.5 = 7.5 which isn’t an integer).

One of the ways to express the objects of a vector space is with vectors. Vectors require an arbitrary “basis”. An example of a basis is the compass system with directions — North, South, East and West. Any direction (like “SouthWest”) can be expressed in terms of these. These are “direction vectors” but we can also have “position vectors” where we need an origin and a coordinate system intersecting at that origin. The latitude and longitude system for referencing every place on the surface of the Earth is an example. The latitude and longitude pair are one way to identify your house. But there are infinite other ways. Another culture might draw the latitude and longitude lines at a slightly different angle to what the standard is. And so, they’ll come up with different numbers for your house. But that doesn’t change the physical location of the house itself. The house exists as an object in the vector space and these different ways to express that location are called “bases”. Choosing one basis allows you to assign a pair of numbers to the house and choosing another one allows you to assign a different set of numbers that are equally valid.

A vector space where every position is organized and neatly mapped to a collection of numbers. Image created using Midjourney.

Vector spaces can also be infinite dimensional. For instance, in miniature 12 of [2], the entire set of real numbers is thought of as an infinite dimensional vector space.

II) Linear maps

Now that we know what a vector space is, let’s take it to the next level and talk about two vector spaces. Since vector spaces are simply collections of objects, we can think of a mapping that takes an object from one of the spaces and maps it to an object from the other. An example of this is recent AI programs like Midjourney where you enter a text prompt and they return an image matching it. The text you enter is first converted to a vector. Then, that vector is converted to another vector in the image space via such a “mapping”.

Let V and W be vector spaces (either both real or complex vector spaces). A function f: V ->W is said to be a ‘linear map’ if for any two vectors u, v 𝞮 V and any scalar c (a real number of complex number depending on weather we’re working with real or complex vector spaces) the following two conditions are satisfied:

$$f(u+v) = f(u) + f(v) \tag{1}$$
$$f(c.v) = c.f(v)\tag{2}$$

Combining the above two properties, we can get the following result about a linear combination of n vectors.

$$f(c_1.u_1+ c_2.u_2+ … c_n.u_n) = c_1.f(u_1)+c_2.f(u_2)+…+c_n.f(u_n)$$

And now we can see where the name “linear map” comes from. If we pass to the linear map, f, a linear combination of n vectors (LHS of equation above), this is equivalent to applying the same linear map to the functions (f) of the individual vectors. We can apply the linear map first and then the linear combination or the linear combination first and then the linear map. The two are equivalent.

In high school, we learn about linear equations. In two dimensional space, such an equation is represented by f(x)=m.x+c. Here, m and c are the parameters of the equation. Note that this function isn’t a linear map. Although it satisfies equation (1) above, it fails to satisfy equation (2). If we set f(x)=m.x instead, then this is a linear map since it satisfies both equations.

A linear map takes objects from one vector space and maps them to objects in another vector space. Kind of like a portal between worlds. Of course, there can be many such “maps” or “portals”. A linear map has to satisfy other properties. If you pass a linear combination of vectors from the first space to it, it shouldn’t matter weather you apply the linear map first or the linear combination first. Image created using Midjourney.

III) Matrices

In section I, we introduced the concept of basis for a vector space. Given a basis for the first vector space (V) and the dimensionality of the second one (U), every linear map can be expressed as a matrix (for details, see here). A matrix is just a collection of vectors. These vectors can be arranged in columns, giving us a 2-d grid of numbers as shown below.

A matrix as a collection of vectors arranged in columns. Image by author.

Matrices are the objects people first think of in the context of linear algebra. And for good reason. Most of the time spent practicing linear algebra is dealing with matrices. But it is important to remember that there (in general) are an infinite number of matrices that can represent a linear map, depending on the basis we choose for the first space, V. The linear map is hence a more general concept than the matrix one happens to be using to represent it.

How do matrices help us perform the linear map they represent (from one vector to the other)? Through the matrix getting multiplied with the first vector. The result is the second vector and the mapping is complete (from first to second).

In detail, we take the dot product (sum product) of the first vector, v_1 with the first row of the matrix and this yields the first entry of the resulting vector, v_2 and then the dot product of v_1 with the second row of the matrix to get the second entry of v_2 and so on. This process is demonstrated below for a matrix with 2 rows and 3 columns. The first vector, v_1 is three dimensional and the second vector, v_2 is two dimensional.

How matrix multiplication with a vector works. Image by author.

Note that the underlying linear map behind a matrix with this dimensionality (2x3) will always take a three dimensional vector, v_1 and map it to a two dimensional space, v_2.

A linear transformation that takes vectors in three dimensional space and maps them to two dimensional space. Image created with MidJourney.

In general an (nxm) matrix will map an m dimensional vector to an n dimensional one.

III-A) Properties of matrices

Let’s cover some properties of matrices that’ll allow us to identify properties of the linear maps they represent.

Rank

An important property of matrices and their corresponding linear maps is the rank. We can talk about this in terms of a collection of vectors, since that’s all a matrix is. Say we have a vector, v1=[1,0,0]. The first element of the vector is the coordinate along the x-axis, the second one is that along the y-axis and the third one the z-axis. These three axes are a basis (there are many) of the 3-dimensional space, R³, meaning that any vector in this space can be expressed as a linear combination of those three vectors.

A single vector in 3-dimensional space. Image by author

We can multiply this vector by a scalar, s. This gives us s.[1,0,0] = [s,0,0]. As we vary the value of s, we can get any point along the x-axis. But that’s about it. Say we add another vector to our collection, v2=[3.5,0,0]. Now, what are the vectors we can make with linear combinations of those two vectors? We get to multiply the first one with any scalar, s_1 and the second one with any scalar, s_2. This gives us:

$$s_1.[1,0,0] + s_2[3.5,0,0] = [s_1+3.5 s_2, 0,0] = [s’,0,0]$$

Here, s’ is just another scalar. So, we can still reach points only on the x-axis, even with linear combinations of both these vectors. The second vector didn’t “expand our reach” at all. The number of points we can reach with linear combinations of the two is exactly the same as the number we can reach with the first. So even though we have two vectors, the rank of this collection of vectors is 1 since the space they span is one dimensional. If on the other hand, the second vector were v2=[0,1,0] then you could get any point on the x-y plane with these two vectors. So, the space spanned would be two dimensional and the rank of this collection would be 2. If the second vector were v2=[2.1,1.5,0.8], we could still span a two dimensional space with v1 and v2 (though that space would be different from the x-y plane now, it would be some other 2-d plane). And the two vectors would still have a rank of 2. If the rank of a collection of vectors is the same as the number of vectors (meaning they can together span a space of dimensionality as high as the number of vectors), then they are called “linearly independent”.

If the vectors that make up the matrix can span an m dimensional space, then the rank of the matrix is m. But a matrix can be thought of as a collection of vectors in two ways. Since it’s a simple two dimensional grid of numbers, we can either consider all the columns as the group of vectors or consider all the rows as the group as shown below. Here, we have a (3x4) matrix (three rows and 4 columns). It can be thought of either as a collection of 4 column vectors (each 3-dimensional) or 3 row vectors (each 4 dimensional).

A matrix can be thought as a collection of row vectors or a collection of column vectors. Image by author.

Full row rank means all row the row vectors are linearly independent. Full column rank means all column vectors are linearly independent.

When the matrix is a square matrix, it turns out that the row rank and column rank will always be the same. This isn’t obvious at all and a proof is given in the mathexchange post, [3]. This means that for a square matrix, we can talk just in terms of the rank and don’t have to bother specifying “row rank” or “column rank”.

The linear transformation corresponding to a (3 x 3) matrix that has a rank of 2 will map everything in the 3-d space to a lower, 2-d space much like the (3 x 2) matrix we encountered in the last section.

A light source forming shadows of points in 3-d space onto a 2-d floor or wall is one linear transformation that maps 3-d vectors to 2-d vectors. Image created with MidJourney.

Notions closely related to the rank of square matrices are the determinant and invertibility.

Determinants

The determinant of a square matrix is its “measure” in a sense. Let me explain by going back to thinking of a matrix as a collection of vectors. Let’s start with just one vector. The way to “measure” it is obvious — its length. And since we’re dealing only with square matrices, the only way to have one vector is to have it be one dimensional. Which is basically just a scalar. Things get interesting when we go from one dimension to two. Now, we’re in two dimensional space. So, the notion of “measure” is no longer length, but has graduated to areas. And with two vectors in that two dimensional space, it is the area of the parallelogram they form. If the two vectors are parallel to each other (ex: both lie on x-axis). In other words, they are not linearly independent, then the area of the parallelogram between them will become zero. The determinant of the matrix formed by them will be zero and so will the rank of that matrix be zero.

Two vectors forming a parallelogram between them. The area of the parallelogram is the determinant of the matrix formed by those two vectors. Image by author.

Taking it one dimension higher, we get 3 dimensional space. And to construct a square matrix (3x3), we now need three vectors. And since the notion of “measure” in three dimensional space is volume, the determinant of a (3x3) matrix becomes the volume contained between the vectors that make it up.

In three dimensional space, three vectors are needed to create a 3×3 matrix. The determinant of the matrix is the volume contained between those vectors. Image by MidJourney.

And this can be extended to space of any dimensionality.

Notice that we spoke about the area or the volume contained between the vectors. We didn’t specify if these were the vectors composing the rows of the square matrix or the ones composing its columns. And the somewhat surprising thing is that we don’t need to specify this because it doesn’t matter either way. Weather we take the vectors forming the rows and measure the volume between them or the vectors forming the columns, we get the same answer. This is proven in the mathexchange post [4].

There are a host of other properties of linear maps and corresponding matrices which are invaluable in understanding them and extracting value out of them. We’ll be delving into invertability, eigen values, diagonalizability and different transformations one can do in the coming articles (check back here for links).

If you liked this story, buy me a coffee 🙂 https://www.buymeacoffee.com/w045tn0iqw