is a subspace.
3. For any n ≥ 0, the set of polynomials f(X) ∈ R[X] of degree at most n is a subspace
of R[X].
4. The set of upper triangular n × n matrices is a subspace of the space of n × n matrices.
Proposition 2.5. Given any vector space E, if S is any nonempty subset of E, then the
smallest subspace S (or Span(S)) of E containing S is the set of all (finite) linear combi-
nations of elements from S.
Proof. We prove that the set Span(S) of all linear combinations of elements of S is a subspace
of E, leaving as an exercise the verification that every subspace containing S also contains
Span(S).
First, Span(S) is nonempty since it contains S (which is nonempty). If u =
λ
i∈I
iui
and v =
µ
j∈J
j vj are any two linear combinations in Span(S), for any two scalars λ, µ ∈ R,
λu + µv = λ
λiui + µ
µjvj
i∈I
j∈J
=
λλiui +
µµjvj
i∈I
j∈J
=
λλiui +
(λλi + µµi)ui +
µµjvj,
i∈I−J
i∈I∩J
j∈J−I
which is a linear combination with index set I ∪ J, and thus λu + µv ∈ Span(S), which
proves that Span(S) is a subspace.
2.4. BASES OF A VECTOR SPACE
29
One might wonder what happens if we add extra conditions to the coefficients involved
in forming linear combinations. Here are three natural restrictions which turn out to be
important (as usual, we assume that our index sets are finite):
(1) Consider combinations
λ
i∈I
iui for which
λi = 1.
i∈I
These are called affine combinations. One should realize that every linear combination
λ
i∈I
iui can be viewed as an affine combination. For example, if k is an index not
in I, if we let J = I ∪ {k}, uk = 0, and λk = 1 −
λ
λ
i∈I
i, then
j∈J
j uj is an affine
combination and
λiui =
λjuj.
i∈I
j∈J
However, we get new spaces. For example, in
3
R , the set of all affine combinations of
the three vectors e1 = (1, 0, 0), e2 = (0, 1, 0), and e3 = (0, 0, 1), is the plane passing
through these three points. Since it does not contain 0 = (0, 0, 0), it is not a linear
subspace.
(2) Consider combinations
λ
i∈I
iui for which
λi ≥ 0, for all i ∈ I.
These are called positive (or conic) combinations It turns out that positive combina-
tions of families of vectors are cones. They show naturally in convex optimization.
(3) Consider combinations
λ
i∈I
iui for which we require (1) and (2), that is
λi = 1,
and λi ≥ 0 for all i ∈ I.
i∈I
These are called convex combinations. Given any finite family of vectors, the set of all
convex combinations of these vectors is a convex polyhedron. Convex polyhedra play a
very important role in convex optimization.
2.4
Bases of a Vector Space
Given a vector space E, given a family (vi)i∈I, the subset V of E consisting of the null vector 0
and of all linear combinations of (vi)i∈I is easily seen to be a subspace of E. Subspaces having
such a “generating family” play an important role, and motivate the following definition.
30
CHAPTER 2. VECTOR SPACES, BASES, LINEAR MAPS
Definition 2.12. Given a vector space E and a subspace V of E, a family (vi)i∈I of vectors
vi ∈ V spans V or generates V if for every v ∈ V , there is some family (λi)i∈I of scalars in
K such that
v =
λivi.
i∈I
We also say that the elements of (vi)i∈I are generators of V and that V is spanned by (vi)i∈I,
or generated by (vi)i∈I. If a subspace V of E is generated by a finite family (vi)i∈I, we say
that V is finitely generated . A family (ui)i∈I that spans V and is linearly independent is
called a basis of V .
Example 2.9.
1. In
3
R , the vectors (1, 0, 0), (0, 1, 0), and (0, 0, 1) form a basis.
2. The vectors (1, 1, 1, 1), (1, 1, −1, −1), (1, −1, 0, 0), (0, 0, 1, −1) form a basis of 4
R known
as the Haar basis. This basis and its generalization to dimension 2n are crucial in
wavelet theory.
3. In the subspace of polynomials in R[X] of degree at most n, the polynomials 1, X, X2,
. . . , Xn form a basis.
n
4. The Bernstein polynomials
(1 − X)kXn−k for k = 0, . . . , n, also form a basis of
k
that space. These polynomials play a major role in the theory of spline curves.
It is a standard result of linear algebra that every vector space E has a basis, and that
for any two bases (ui)i∈I and (vj)j∈J, I and J have the same cardinality. In particular, if E
has a finite basis of n elements, every basis of E has n elements, and the integer n is called
the dimension of the vector space E. We begin with a crucial lemma.
Lemma 2.6. Given a linearly independent family (ui)i∈I of elements of a vector space E, if
v ∈ E is not a linear combination of (ui)i∈I, then the family (ui)i∈I ∪k (v) obtained by adding
v to the family (ui)i∈I is linearly independent (where k /
∈ I).
Proof. Assume that µv +
λ
i∈I
iui = 0, for any family (λi)i∈I of scalars in K. If µ = 0, then
µ has an inverse (because K is a field), and thus we have v = −
(µ−1λ
i∈I
i)ui, showing
that v is a linear combination of (ui)i∈I and contradicting the hypothesis. Thus, µ = 0. But
then, we have
λ
i∈I
iui = 0, and since the family (ui)i∈I is linearly independent, we have
λi = 0 for all i ∈ I.
The next theorem holds in general, but the proof is more sophisticated for vector spaces
that do not have a finite set of generators. Thus, in this chapter, we only prove the theorem
for finitely generated vector spaces.
2.4. BASES OF A VECTOR SPACE
31
Theorem 2.7. Given any finite family S = (ui)i∈I generating a vector space E and any
linearly independent subfamily L = (uj)j∈J of S (where J ⊆ I), there is a basis B of E such
that L ⊆ B ⊆ S.
Proof. Consider the set of linearly independent families B such that L ⊆ B ⊆ S. Since this
set is nonempty and finite, it has some maximal element, say B = (uh)h∈H. We claim that
B generates E. Indeed, if B does not generate E, then there is some up ∈ S that is not a
linear combination of vectors in B (since S generates E), with p /
∈ H. Then, by Lemma
2.6, the family B = (uh)h∈H∪{p} is linearly independent, and since L ⊆ B ⊂ B ⊆ S, this
contradicts the maximality of B. Thus, B is a basis of E such that L ⊆ B ⊆ S.
Remark: Theorem 2.7 also holds for vector spaces that are not finitely generated. In this
case, the problem is to guarantee the existence of a maximal linearly independent family B
such that L ⊆ B ⊆ S. The existence of such a maximal family can be shown using Zorn’s
lemma, see Appendix 31 and the references given there.
The following proposition giving useful properties characterizing a basis is an immediate
consequence of Theorem 2.7.
Proposition 2.8. Given a vector space E, for any family B = (vi)i∈I of vectors of E, the
following properties are equivalent:
(1) B is a basis of E.
(2) B is a maximal linearly independent family of E.
(3) B is a minimal generating family of E.
The following replacement lemma due to Steinitz shows the relationship between finite
linearly independent families and finite families of generators of a vector space.
Proposition 2.9. (Replacement lemma) Given a vector space E, let (ui)i∈I be any finite
linearly independent family in E, where |I| = m, and let (vj)j∈J be any finite family such
that every ui is a linear combination of (vj)j∈J, where |J| = n. Then, there exists a set L and
an injection ρ : L → J such that L ∩ I = ∅, |L| = n − m, and the families (ui)i∈I ∪ (vρ(l))l∈L
and (vj)j∈J generate the same subspace of E. In particular, m ≤ n.
Proof. We proceed by induction on |I| = m. When m = 0, the family (ui)i∈I is empty, and
the proposition holds trivially with L = J (ρ is the identity). Assume |I| = m + 1. Consider
the linearly independent family (ui)i∈(I−{p}), where p is any member of I. By the induction
hypothesis, there exists a set L and an injection ρ : L → J such that L ∩ (I − {p}) = ∅,
|L| = n − m, and the families (ui)i∈(I−{p}) ∪ (vρ(l))l∈L and (vj)j∈J generate the same subspace
of E. If p ∈ L, we can replace L by (L − {p}) ∪ {p } where p does not belong to I ∪ L, and
replace ρ by the injection ρ which agrees with ρ on L − {p} and such that ρ (p ) = ρ(p).
Thus, we can always assume that L ∩ I = ∅. Since up is a linear combination of (vj)j∈J
32
CHAPTER 2. VECTOR SPACES, BASES, LINEAR MAPS
and the families (ui)i∈(I−{p}) ∪ (vρ(l))l∈L and (vj)j∈J generate the same subspace of E, up is
a linear combination of (ui)i∈(I−{p}) ∪ (vρ(l))l∈L. Let
up =
λiui +
λlvρ(l).
(1)
i∈(I−{p})
l∈L
If λl = 0 for all l ∈ L, we have
λiui − up = 0,
i∈(I−{p})
contradicting the fact that (ui)i∈I is linearly independent. Thus, λl = 0 for some l ∈ L, say
l = q. Since λq = 0, we have
vρ(q) =
(−λ−1
q λi)ui + λ−1
q up +
(−λ−1
q λl)vρ(l).
(2)
i∈(I−{p})
l∈(L−{q})
We claim that the families (ui)i∈(I−{p}) ∪ (vρ(l))l∈L and (ui)i∈I ∪ (vρ(l))l∈(L−{q}) generate the
same subset of E. Indeed, the second family is obtained from the first by replacing vρ(q) by up,
and vice-versa, and up is a linear combination of (ui)i∈(I−{p}) ∪ (vρ(l))l∈L, by (1), and vρ(q) is a
linear combination of (ui)i∈I ∪(vρ(l))l∈(L−{q}), by (2). Thus, the families (ui)i∈I ∪(vρ(l))l∈(L−{q})
and (vj)j∈J generate the same subspace of E, and the proposition holds for L − {q} and the
restriction of the injection ρ : L → J to L − {q}, since L ∩ I = ∅ and |L| = n − m imply that
(L − {q}) ∩ I = ∅ and |L − {q}| = n − (m + 1).
The idea is that m of the vectors vj can be replaced by the linearly independent ui’s in
such a way that the same subspace is still generated. The purpose of the function ρ : L → J
is to pick n − m elements j1, . . . , jn−m of J and to relabel them l1, . . . , ln−m in such a way
that these new indices do not clash with the indices in I; this way, the vectors vj , . . . , v
1
jn−m
who “survive” (i.e. are not replaced) are relabeled vl , . . . , v
, and the other m vectors v
1
ln−m
j
with j ∈ J − {j1, . . . , jn−m} are replaced by the ui. The index set of this new family is I ∪ L.
Actually, one can prove that Proposition 2.9 implies Theorem 2.7 when the vector space
is finitely generated. Putting Theorem 2.7 and Proposition 2.9 together, we obtain the
following fundamental theorem.
Theorem 2.10. Let E be a finitely generated vector space. Any family (ui)i∈I generating E
contains a subfamily (uj)j∈J which is a basis of E. Furthermore, for every two bases (ui)i∈I
and (vj)j∈J of E, we have |I| = |J| = n for some fixed integer n ≥ 0.
Proof. The first part follows immediately by applying Theorem 2.7 with L = ∅ and S =
(ui)i∈I. Assume that (ui)i∈I and (vj)j∈J are bases of E. Since (ui)i∈I is linearly independent
and (vj)j∈J spans E, proposition 2.9 implies that |I| ≤ |J|. A symmetric argument yields
|J| ≤ |I|.
2.4. BASES OF A VECTOR SPACE
33
Remark: Theorem 2.10 also holds for vector spaces that are not finitely generated. This
can be shown as follows. Let (ui)i∈I be a basis of E, let (vj)j∈J be a generating family of E,
and assume that I is infinite. For every j ∈ J, let Lj ⊆ I be the finite set
Lj = {i ∈ I | vj =
λiui, λi = 0}.
i∈I
Let L =
L
j∈J
j . By definition L ⊆ I , and since (ui)i∈I is a basis of E, we must have I = L,
since otherwise (ui)i∈L would be another basis of E, and this would contradict the fact that
(ui)i∈I is linearly independent. Furthermore, J must be infinite, since otherwise, because
the Lj are finite, I would be finite. But then, since I =
L
j∈J
j with J infinite and the Lj
finite, by a standard result of set theory, |I| ≤ |J|. If (vj)j∈J is also a basis, by a symmetric
argument, we obtain |J| ≤ |I|, and thus, |I| = |J| for any two bases (ui)i∈I and (vj)j∈J of E.
When E is not finitely generated, we say that E is of infinite dimension. The dimension
of a vector space E is the common cardinality of all of its bases and is denoted by dim(E).
Clearly, if the field K itself is viewed as a vector space, then every family (a) where a ∈ K
and a = 0 is a basis. Thus dim(K) = 1. Note that dim({0}) = 0.
If E is a vector space, for any subspace U of E, if dim(U ) = 1, then U is called a line; if
dim(U ) = 2, then U is called a plane. If dim(U ) = k, then U is sometimes called a k-plane.
Let (ui)i∈I be a basis of a vector space E. For any vector v ∈ E, since the family (ui)i∈I
generates E, there is a family (λi)i∈I of scalars in K, such that
v =
λiui.
i∈I
A very important fact is that the family (λi)i∈I is unique.
Proposition 2.11. Given a vector space E, let (ui)i∈I be a family of vectors in E. Let v ∈ E,
and assume that v =
λ
λ
i∈I
iui. Then, the family (λi)i∈I of scalars such that v =
i∈I
iui
is unique iff (ui)i∈I is linearly independent.
Proof. First, assume that (ui)i∈I is linearly independent. If (µi)i∈I is another family of scalars
in K such that v =
µ
i∈I
iui, then we have
(λi − µi)ui = 0,
i∈I
and since (ui)i∈I is linearly independent, we must have λi−µi = 0 for all i ∈ I, that is, λi = µi
for all i ∈ I. The converse is shown by contradiction. If (ui)i∈I was linearly dependent, there
would be a family (µi)i∈I of scalars not all null such that
µiui = 0
i∈I
34
CHAPTER 2. VECTOR SPACES, BASES, LINEAR MAPS
and µj = 0 for some j ∈ I. But then,
v =
λiui + 0 =
λiui +
µiui =
(λi + µi)ui,
i∈I
i∈I
i∈I
i∈I
with λj = λj +µj since µj = 0, contradicting the assumption that (λi)i∈I is the unique family
such that v =
λ
i∈I
iui.
If (ui)i∈I is a basis of a vector space E, for any vector v ∈ E, if (xi)i∈I is the unique
family of scalars in K such that
v =
xiui,
i∈I
each xi is called the component (or coordinate) of index i of v with respect to the basis (ui)i∈I.
Given a field K and any (nonempty) set I, we can form a vector space K(I) which, in
some sense, is the standard vector space of dimension |I|.
Definition 2.13. Given a field K and any (nonempty) set I, let K(I) be the subset of the
cartesian product KI consisting of all families (λi)i∈I with finite support of scalars in K.4
We define addition and multiplication by a scalar as follows:
(λi)i∈I + (µi)i∈I = (λi + µi)i∈I,
and
λ · (µi)i∈I = (λµi)i∈I.
It is immediately verified that addition and multiplication by a scalar are well defined.
Thus, K(I) is a vector space. Furthermore, because families with finite support are consid-
ered, the family (ei)i∈I of vectors ei, defined such that (ei)j = 0 if j = i and (ei)i = 1, is
clearly a basis of the vector space K(I). When I = {1, . . . , n}, we denote K(I) by Kn. The
function ι : I → K(I), such that ι(i) = ei for every i ∈ I, is clearly an injection.
When I is a finite set, K(I) = KI, but this is false when I is infinite. In fact, dim(K(I)) =
|I|, but dim(KI) is strictly greater when I is infinite.
Many interesting mathematical structures are vector spaces. A very important example
is the set of linear maps between two vector spaces to be defined in the next section. Here
is an example that will prepare us for the vector space of linear maps.
Example 2.10. Let X be any nonempty set and let E be a vector space. The set of all
functions f : X → E can be made into a vector space as follows: Given any two functions
f : X → E and g : X → E, let (f + g): X → E be defined such that
(f + g)(x) = f (x) + g(x)
4Where KI denotes the set of all functions from I to K.
2.5. LINEAR MAPS
35
for all x ∈ X, and for every λ ∈ K, let λf : X → E be defined such that
(λf )(x) = λf (x)
for all x ∈ X. The axioms of a vector space are easily verified. Now, let E = K, and let I
be the set of all nonempty subsets of X. For every S ∈ I, let fS : X → E be the function
such that fS(x) = 1 iff x ∈ S, and fS(x) = 0 iff x /
∈ S. We leave as an exercise to show that
(fS)S∈I is linearly independent.
2.5
Linear Maps
A function between two vector spaces that preserves the vector space structure is called
a homomorphism of vector spaces, or linear map. Linear maps formalize the concept of
linearity of a function. In the rest of this section, we assume that all vector spaces are over
a given field K (say R).
Definition 2.14. Given two vector spaces E and F , a linear map between E and F is a
function f : E → F satisfying the following two conditions:
f (x + y) = f (x) + f (y)
for all x, y ∈ E;
f (λx) = λf (x)
for all λ ∈ K, x ∈ E.
Setting x = y = 0 in the first identity, we get f (0) = 0. The basic property of linear
maps is that they transform linear combinations into linear combinations. Given a family
(ui)i∈I of vectors in E, given any family (λi)i∈I of scalars in K, we have
f (
λiui) =
λif (ui).
i∈I
i∈I
The above identity is shown by induction on the size of the support of the family (λiui)i∈I,
using the properties of Definition 2.14.
Example 2.11.
1. The map f :
2
2
R → R defined such that
x
= x − y
y
= x + y
is a linear map. The reader should check that it is the composition of a rotation by
√
π/4 with a magnification of ratio
2.
2. For any vector space E, the identity map id : E → E given by
id(u) = u for all u ∈ E
is a linear map. When we want to be more precise, we write idE instead of id.
36
CHAPTER 2. VECTOR SPACES, BASES, LINEAR MAPS
3. The map D : R[X] → R[X] defined such that
D(f (X)) = f (X),
where f (X) is the derivative of the polynomial f (X), is a linear map.
4. The map Φ : C([a, b]) → R given by
b
Φ(f ) =
f (t)dt,
a
where C([a, b]) is the set of continuous functions defined on the interval [a, b], is a linear
map.
5. The function −, − : C([a, b]) × C([a, b]) → R given by
b
f, g =
f (t)g(t)dt,
a
is linear in each of the variable f , g. It also satisfies the properties f, g = g, f and
f, f = 0 iff f = 0. It is an example of an inner product.
Definition 2.15. Given a linear map f : E → F , we define its image (or range) Im f = f(E),
as the set
Im f = {y ∈ F | (∃x ∈ E)(y = f(x))},
and its Kernel (or nullspace) Ker f = f −1(0), as the set
Ker f = {x ∈ E | f(x) = 0}.
Proposition 2.12. Given a linear map f : E → F , the set Im f is a subspace of F and the
set Ker f is a subspace of E. The linear map f : E → F is injective iff Ker f = 0 (where 0
is the trivial subspace {0}).
Proof. Given any x, y ∈ Im f, there are some u, v ∈ E such that x = f(u) and y = f(v),
and for all λ, µ ∈ K, we have
f (λu + µv) = λf (u) + µf (v) = λx + µy,
and thus, λx + µy ∈ Im f, showing that Im f is a subspace of F .
Given any x, y ∈ Ker f, we have f(x) = 0 and f(y) = 0, and thus,
f (λx + µy) = λf (x) + µf (y) = 0,
that is, λx + µy ∈ Ker f, showing that Ker f is a subspace of E.
First, assume that Ker f = 0. We need to prove that f (x) = f (y) implies that x = y.
However, if f (x) = f (y), then f (x) − f(y) = 0, and by linearity of f we get f(x − y) = 0.
Because Ker f = 0, we must have x − y = 0, that is x = y, so f is injective. Conversely,
assume that f is injective. If x ∈ Ker f, that is f(x) = 0, since f(0) = 0 we have f(x) =
f (0), and by injectivity, x = 0, which proves that Ker f = 0. Therefore, f is injective iff
Ker f = 0.
2.5. LINEAR MAPS
37
Since by Proposition 2.12, the image Im f of a linear map f is a subspace of F , we can
define the rank rk(f ) of f as the dimension of Im f .
A fundamental property of bases in a vector space is that they allow the definition of
linear maps as unique homomorphic extensions, as shown in the following proposition.
Proposition 2.13. Given any two vector spaces E and F , given any basis (ui)i∈I of E,
given any other family of vectors (vi)i∈I in F , there is a unique linear map f : E → F such
that f (ui) = vi for all i ∈ I. Furthermore, f is injective iff (vi)i∈I is linearly independent,
and f is surjective iff (vi)i∈I generates F .
Proof. If such a linear map f : E → F exists, since (ui)i∈I is a basis of E, every vector x ∈ E
can written uniquely as a linear combination
x =
xiui,
i∈I
and by linearity, we must have
f (x) =
xif (ui) =
xivi.
i∈I
i∈I
Define the function f : E → F , by letting
f (x) =
xivi
i∈I
for every x =
x
i∈I
iui.
It is easy to verify that f is indeed linear, it is unique by the
previous reasoning, and obviously, f (ui) = vi.
Now, assume that f is injective. Let (λi)i∈I be any family of scalars, and assume that
λivi = 0.
i∈I
Since vi = f (ui) for every i ∈ I, we have
f (
λiui) =
λif (ui) =
λivi = 0.
i∈I
i∈I
i∈I
Since f is injective iff Ker f = 0, we have
λiui = 0,
i∈I
and since (ui)i∈I is a basis, we have λi = 0 for all i ∈ I, which shows that (vi)i∈I is linearly
independent. Conversely, assume that (vi)i∈I is linearly independent. Since (ui)i∈I is a basis
of E, every vector x ∈ E is a linear combination x =
λ
i∈I
iui of (ui)i∈I . If
f (x) = f (
λiui) = 0,
i∈I
38
CHAPTER 2. VECTOR SPACES, BASES, LINEAR MAPS
then
λivi =
λif (ui) = f (
λiui) = 0,
i∈I
i∈I
i∈I
and λi = 0 for all i ∈ I