Why can matrix multiplication find the most similar vectors?

Matrix multiplication can find the most similar vectors, usually because in vector space, the dot product measures the similarity between two vectors. Particularly when we use normalized vectors, the dot product calculates the cosine similarity, which is a common way to measure the directional similarity of two vectors. Let's explore in detail why matrix multiplication can find the most similar vectors.

Significance of Vector Dot Product#

In vector operations, the dot product of two vectors $\mathbf{a}$ and $\mathbf{b}$ is defined as:

$\mathbf{a} \cdot \mathbf{b} = \sum_{i=1}^{n} a_i b_i$

Geometrically, the dot product is related to the lengths of these two vectors and the angle between them:

$\mathbf{a} \cdot \mathbf{b} = |\mathbf{a}| |\mathbf{b}| \cos(\theta)$

Where:
$|\mathbf{a}|$ and $|\mathbf{b}|$ are the lengths (or magnitudes) of vectors $\mathbf{a}$ and $\mathbf{b}$ respectively.

$\theta$ is the angle between them.
$\cos(\theta)$ represents the directional similarity between the vectors; when $\theta = 0$ , $\cos(0) = 1$ , indicating that the two vectors are pointing in the same direction.

If we normalize all vectors to unit vectors (i.e., magnitudes of 1), then their dot product depends solely on the cosine of the angle, becoming cosine similarity:

$\cos(\theta) = \frac{\mathbf{a} \cdot \mathbf{b}}{|\mathbf{a}| |\mathbf{b}|}$

Cosine Similarity and Vector Similarity#

Cosine similarity is a commonly used measure of similarity, especially in high-dimensional spaces (such as text embedding vectors). It measures the directional similarity between two vectors rather than their distance or size differences. Its value ranges from -1 to 1:

When $\cos(\theta) = 1$ , it indicates that the two vectors are completely similar and point in the same direction.
When $\cos(\theta) = 0$ , it indicates that the two vectors are orthogonal and completely dissimilar.
When $\cos(\theta) = -1$ , it indicates that the two vectors point in opposite directions.

Therefore, calculating the cosine similarity between vectors through the dot product (i.e., matrix multiplication) can find the most similar vectors. The larger the dot product, the more similar the directions of the two vectors are, and the maximum value of the dot product indicates that they are the most similar.

Application of Matrix Multiplication in Similarity Search#

In this example, the similarity between the query vector and multiple document vectors is calculated using matrix multiplication. Let the query vector be $\mathbf{q}$ , and all document embedding vectors form a matrix $\mathbf{D}$ (where each row is a vector for a document). We can calculate the dot product between the query vector and all document vectors through the following matrix multiplication:

$\mathbf{q} \cdot \mathbf{D}^T$

Where:

$\mathbf{q}$ is the embedding vector of the query (1×d).
$\mathbf{D}^T$ is the transpose of the document vector matrix (d×n), where $d$ is the dimension of the embedding vectors and $n$ is the number of documents.

Through this matrix multiplication, we obtain a 1×n vector, where each element represents the dot product (i.e., similarity score) between the query vector and the corresponding document vector. By comparing these scores, we can find the documents most similar to the query vector.

Summary: Why Matrix Multiplication Can Find Similar Vectors#

Dot Product Measures Similarity: The size of the dot product is directly related to the angle between the vectors; if the vectors are normalized, the dot product is the cosine similarity, with values closer to 1 indicating greater similarity.
Efficient Calculation: Matrix multiplication can quickly compute the similarity between the query vector and multiple document vectors without needing to compute them one by one.
Selection of Most Similar Vectors: By calculating similarity scores, the vector with the highest score can determine which documents are most similar.

Thus, matrix multiplication is an efficient method for calculating similarity, enabling the rapid identification of other vectors most similar to the query vector in vector space.