(2.4)
m
m 1
just remember that it is true only if each experiment definitely yields one of the outcomes N 1, N 2,…, NM.
Second, if we have an additive function of the results,
M
1
f
N f ,
(2.5)
m
m
N m1
where fm are some definite (deterministic) coefficients, the statistical average (also called the expectation value) of the function is naturally defined as
2 The most popular counter-example is an energy-conserving system. Consider, for example, a system of particles placed in a potential that is a quadratic form of its coordinates. The theory of oscillations tells us (see, e.g., CM
Sec. 6.2) that this system is equivalent to a set of non-interacting harmonic oscillators. Each of these oscillators conserves its own initial energy Ej forever, so that the statistics of N measurements of one such system may differ from that of N different systems with a random distribution of Ej, even if the total energy of the system, E = jEj, is the same. Such non-ergodicity, however, is a rather feeble phenomenon and is readily destroyed by any of many mechanisms, such as weak interaction with the environment (leading, in particular, to oscillation damping), potential anharmonicity (see, e.g., CM Chapter 5), and chaos (CM Chapter 9), all of them strongly enhanced by increasing the number of particles in the system, i.e. the number of its degrees of freedom. This is why an overwhelming part of real-life systems are ergodic; for the readers interested in non-ergodic exotics, I can recommend the monograph by V. Arnold and A. Avez, Ergodic Problems of Classical Mechanics, Addison-Wesley, 1989.
3 Here, and everywhere in this series, angle brackets … mean averaging over a statistical ensemble, which is generally different from averaging over time – as it will be the case in quite a few examples considered below.
Chapter 2
Page 2 of 44
SM: Statistical Mechanics
M
1
f lim
N
f ,
(2.6)
N
m m
N m1
so that using Eq. (3) we get
M
Expectation
f
value via
W f .
(2.7)
m
m
probabilities
m1
Notice that Eq. (3) may be considered as the particular form of this general result, when all fm = 1.
Next, the spectrum of possible experimental outcomes is frequently continuous for all practical purposes. (Think, for example, about the set of positions of the marks left by bullets fired into a target from afar.) The above formulas may be readily generalized to this case; let us start from the simplest situation when all different outcomes may be described by just one continuous scalar variable q – which replaces the discrete index m in Eqs. (1)-(7). The basic relation for this case is the self-evident fact that the probability dW of having an outcome within a small interval dq near some point q is proportional to the magnitude of that interval:
dW w( q) dq ,
(2.8)
where w( q) is some function of q, which does not depend on dq. This function is called probability density. Now all the above formulas may be recast by replacing the probabilities Wm with the products (8), and the summation over m, with the integration over q. In particular, instead of Eq. (4) the normalization condition now becomes
(
w q) dq ,
1
(2.9)
where the integration should be extended over the whole range of possible values of q. Similarly, instead of the discrete values fm participating in Eq. (5), it is natural to consider a function f( q). Then instead of Eq. (7), the expectation value of the function may be calculated as
Expectation
value via
f
probability
(
w q) f ( q) .
dq
(2.10)
density
It is also straightforward to generalize these formulas to the case of more variables. For example, the state of a classical particle with three degrees of freedom may be fully described by the probability density w defined in the 6D space of its generalized radius-vector q and momentum p. As a result, the expectation value of a function of these variables may be expressed as a 6D integral
f (
w q, p) f (q, p) 3
3
d qd
.
p
(2.11)
Some systems considered in this course consist of components whose quantum properties cannot be ignored, so let us discuss how f should be calculated in this case. If by fm we mean measurement results, then Eq. (7) (and its generalizations) remains valid, but since these numbers themselves may be affected by the intrinsic quantum-mechanical uncertainty, it may make sense to have a bit deeper look into this situation. Quantum mechanics tells us4 that the most general expression for the expectation value of an observable f in a certain ensemble of macroscopically similar systems is f W f
Tr(Wf ) .
(2.12)
mm'
m'm
m, m'
4 See, e.g., QM Sec. 7.1.
Chapter 2
Page 3 of 44
SM: Statistical Mechanics
Here fmm’ are the matrix elements of the quantum-mechanical operator f ˆ corresponding to the observable f, in a full basis of orthonormal states m,
f
m f ˆ m' ,
(2.13)
mm'
while the coefficients Wmm’ are the elements of the so-called density matrix W, which represents, in the same basis, the density operator W ˆ describing properties of this ensemble. Eq. (12) is evidently more general than Eq. (7), and is reduced to it only if the density matrix is diagonal:
W
W
(2.14)
mm'
m mm'
(where mm’ is the Kronecker symbol), when the diagonal elements Wm play the role of probabilities of the corresponding states.
Thus formally, the largest difference between the quantum and classical description is the presence, in Eq. (12), of the off-diagonal elements of the density matrix. They have the largest values in the pure (also called “coherent”) ensemble, in which the state of the system may be described with state vectors, e.g., the ket-vector
m ,
(2.15)
m
m
where m are some (generally, complex) coefficients. In this case, the density matrix elements are merely
W
,
(2.16)
mm'
m
m'
so that the off-diagonal elements are of the same order as the diagonal elements. For example, in the very important particular case of a two-level system, the pure-state density matrix is
W
1
1
1
2
,
(2.17)
2
1
2
2
so that the product of its off-diagonal components is as large as that of the diagonal components.
In the most important basis of stationary states, i.e. the eigenstates of the system’s time-independent Hamiltonian, the coefficients m oscillate in time as5
E
E
m
m
( t) ( )
0 exp
i
t
i
t i
(2.18)
m
m
exp
m
,
m
where Em are the corresponding eigenenergies, and m are constant phase shifts. This means that while the diagonal terms of the density matrix (16) remain constant, its off-diagonal components are oscillating functions of time:
E E
W
exp
m
m'
i
t
i
(2.19)
mm'
m'
m
m'
m
exp ( m'
m
) .
5 Here I use the Schrödinger picture of quantum dynamics, in which the matrix elements fnn’ representing quantum-mechanical operators, do not evolve in time. The final results of this discussion do not depend on the particular picture – see, e.g., QM Sec. 4.6.
Chapter 2
Page 4 of 44
SM: Statistical Mechanics
Due to the extreme smallness of the Planck constant (on the human scale of things), minuscule random perturbations of eigenenergies are equivalent to substantial random changes of the phase multipliers, so that the time average of any off-diagonal matrix element tends to zero. Moreover, even if our statistical ensemble consists of systems with exactly the same Em, but different values m (which are typically hard to control at the initial preparation of the system), the average values of all Wmm’ (with m m’) vanish again.
This is why, besides some very special cases, typical statistical ensembles of quantum particles are far from being pure, and in most cases (certainly including the thermodynamic equilibrium), a good approximation for their description is given by the opposite limit of the so-called classical mixture, in which all off-diagonal matrix elements of the density matrix equal zero, and its diagonal elements Wmm are merely the probabilities Wm of the corresponding eigenstates. In this case, for the observables compatible with energy, Eq. (12) is reduced to Eq. (7), with fm being the eigenvalues of the variable f, so that we may base our further discussion on this key relation and its continuous extensions (10)-(11).
2.2. Microcanonical ensemble and distribution
Now we move to the now-standard approach to statistical mechanics, based on the three statistical ensembles introduced in the 1870s by Josiah Willard Gibbs.6 The most basic of them is the so-called microcanonical statistical ensemble 7 defined as a set of macroscopically similar closed (isolated) systems with virtually the same total energy E. Since in quantum mechanics the energy of a closed system is quantized, in order to make the forthcoming discussion suitable for quantum systems as well, it is convenient to include to the ensemble all systems with energies Em within a relatively narrow interval Δ E << E (see Fig. 1) that is nevertheless much larger than the average distance E between the energy levels, so that the number M of different quantum states within the interval Δ E is large, M >> 1.
Such choice of E is only possible if E << E; however, the reader should not worry too much about this condition, because the most important applications of the microcanonical ensemble are for very large systems (and/or very high energies) when the energy spectrum is very dense.8
E
Fig. 2.1. A v ery schematic image of the microcanonical
E
ensemble. (Actually, the ensemble deals with quantum
states rather than energy levels. An energy level may be
degenerate, i.e. correspond to several states.)
This ensemble serves as the basis for the formulation of the postulate which is most frequently called the microcanonical distribution (or, more adequately, “the main statistical postulate” or “the main 6 Personally, I believe that the genius of J. Gibbs, praised by Albert Einstein as the “greatest mind in the American history”, is still insufficiently recognized, and agree with R. Millikan that Gibbs “did for statistical mechanics and thermodynamics what […] Maxwell did for electrodynamics”.
7 The terms “microcanonical”, as well as “canonical” (see Sec. 4 below) are apparently due to Gibbs and I was unable to find out his motivation for the former name. (“Canonical” in the sense of “standard” or “common” is quite appropriate, but why “micro”? Perhaps to reflect the smallness of Δ E?)
8 Formally, the main result of this section, Eq. (20), is valid for any M (including M = 1); it is just less informative for small M – and trivial for M = 1.
Chapter 2
Page 5 of 44
SM: Statistical Mechanics
statistical hypothesis”): in the thermodynamic equilibrium of a microcanonical ensemble, all its states have equal probabilities,
1
Micro-
W
const.
(2.20) canonical
m
M
distribution
Though in some constructs of statistical mechanics this equality is derived from other axioms, which look more plausible to their authors, I believe that Eq. (20) may be taken as the starting point of the statistical physics, supported “just” by the compliance of all its corollaries with experimental observations.
Note that the postulate (20) is closely related to the macroscopic irreversibility of the systems that are microscopically virtually reversible (closed): if such a system was initially in a certain state, its time evolution with just minuscule interactions with the environment (which is necessary for reaching the thermodynamic equilibrium) eventually leads to the uniform distribution of its probability among all states with essentially the same energy. Each of these states is not “better” than the initial one; rather, in a macroscopic system, there are just so many of these states that the chance to find the system in the initial state is practically nil – again, think about the ink drop diffusion into a glass of water.9
Now let us find a suitable definition of the entropy S of a microcanonical ensemble’s member –
for now, in the thermodynamic equilibrium only. This was done in 1877 by another giant of statistical physics, Ludwig Eduard Boltzmann – on the basis of the prior work by James Clerk Maxwell on the kinetic theory of gases – see Sec. 3.1 below. In the present-day terminology, since S is a measure of disorder, it should be related to the amount of information10 lost when the system went irreversibly from the full order to the full disorder, i.e. from one definite state to the microcanonical distribution (20). In an even more convenient formulation, this is the amount of information necessary to find the exact state of your system in a microcanonical ensemble.
In the information theory, the amount of information necessary to make a definite choice between two options with equal probabilities (Fig. 2a) is defined as
I (2) log 2 1.
(2.21)
2
This unit of information is called a bit.
(a)
(b)
Fig. 2.2. “Logarithmic trees” of binary decisions
for choosing between (a) M = 2, and (b) M = 4
1 bit
1 bit
opportunities with equal probabilities.
1 bit
9 Though I have to move on, let me note that the microcanonical distribution (20) is a very nontrivial postulate, and my advice to the reader is to find some time to give additional thought to this keystone of the whole building of statistical mechanics.
10 I will rely on the reader’s common sense and intuitive understanding of what information is, because even in the formal information theory, this notion is essentially postulated – see, e.g., the wonderfully clear text by J.
Pierce , An Introduction to Information Theory, Dover, 1980.
Chapter 2
Page 6 of 44
SM: Statistical Mechanics
Now, if we need to make a choice between four equally probable opportunities, it can be made in two similar steps (Fig. 2b), each requiring one bit of information, so that the total amount of information necessary for the choice is
I (4) 2 I (2) 2 log 4.
(2.22)
2
An obvious extension of this process to the choice between M = 2 m states gives I ( M ) mI (2) m log M .
(2.23)
2
This measure, if extended naturally to any integer M is quite suitable for the definition of entropy at equilibrium, with the only difference that, following tradition, the binary logarithm is replaced with the natural one:11
S ln M.
(2.24a)
Using Eq. (20), we may recast this definition in its most frequently used form
Entropy
1
in
S ln
ln W .
(2.24b)
m
equilibrium
Wm
(Again, please note that Eq. (24) is valid in thermodynamic equilibrium only!)
Note that Eq. (24) satisfies the major properties of the entropy discussed in thermodynamics.
First, it is a unique characteristic of the disorder. Indeed, according to Eq. (20), M (at fixed E) is the only possible measure characterizing the microcanonical distribution, and so is its unique function ln M.
This function also satisfies another thermodynamic requirement to the entropy, of being an extensive variable. Indeed, for several independent systems, the joint probability of a certain state is just a product of the partial probabilities, and hence, according to Eq. (24), their entropies just add up.
Now let us see whether Eqs. (20) and (24) are compatible with the 2nd law of thermodynamics.
For that, we need to generalize Eq. (24) for S to an arbitrary state of the system (generally, out of thermodynamic equilibrium), with an arbitrary set of state probabilities Wm. Let us first recognize that M
in Eq. (24) is just the number of possible ways to commit a particular system to a certain state m ( m = 1, 2,… M), in a statistical ensemble where each state is equally probable. Now let us consider a more general ensemble, still consisting of a large number N >> 1 of similar systems, but with a certain number Nm = WmN >> 1 of systems in each of M states, with the factors Wm not necessarily equal. In this case, the evident generalization of Eq. (24) is that the entropy SN of the whole ensemble is S ln M ( N , N ,..) ,
(2.25)
N
1
2
where M ( N 1, N 2,…) is the number of ways to commit a particular system to a certain state m while keeping all numbers Nm fixed. This number M ( N 1, N 2,…) is clearly equal to the number of ways to distribute N distinct balls between M different boxes, with the fixed number Nm of balls in each box, but 11 This is of course just the change of a constant factor: S( M) = ln M = ln2 log2 M = ln2 I( M) 0.693 I( M). A review of Chapter 1 shows that nothing in thermodynamics prevents us from choosing such a constant coefficient arbitrarily, with the corresponding change of the temperature scale – see Eq. (1.9). In particular, in the SI units, where Eq. (24b) becomes S = – k Bln Wm, one bit of information corresponds to the entropy change Δ S = k B ln2 ≈
0.693 k B 0.96510-23 J/K. By the way, the formula “S = k log W” is engraved on L. Boltzmann’s tombstone in Vienna.
Chapter 2
Page 7 of 44
SM: Statistical Mechanics
in no particular order within it. Comparing this description with the definition of the so-called multinomial coefficients,12 we get
M
N!
M ( N , N
, ) NC
,
with N
.
(2.26)
1
2
Nm
N , N ,..., N
1
2
M
N ! N
N
!...
!
1
2
M
m1
To simplify the resulting expression for SN, we can use the famous Stirling formula, in its crudest, de Moivre’s form,13 whose accuracy is suitable for most purposes of statistical physics: ln( N!)
N(ln ).
1
N
N
(2.27)
When applied to our current problem, this formula gives the following average entropy per system,14
SN
1
M
1
M
S
ln( N!)
ln( N !)
N
N
N
N
m
ln 1
N
mln
m
1
N
N
N
m 1
m
m 1
(2.28)
M
N
N
m ln m ,
m 1
N
N
and since this result is only valid in the limit Nm anyway, we may use Eq. (2) to represent it as M
M
1
Entropy
S W ln W W ln
.
(2.29)
out of
m
m
m
equilibrium
m1
m1
Wm
This extremely important result15 may be interpreted as the average of the entropy values given by Eq.
(24), weighed with specific probabilities Wm per the general formula (7).16
Now let us find what distribution of probabilities Wm provides the largest value of the entropy (29). The answer is almost evident from a good glance at Eq. (29). For example, if for a subgroup of M’
M states the coefficients Wm are constant and equal to 1/ M’, so that Wm = 0 for all other states, all M’
non-zero terms in the sum (29) are equal to each other, so that
1
S M'
ln M' ln M' ,
(2.30)
M'
and the closer M’ to its maximum value M the larger S. Hence, the maximum of S is reached at the uniform distribution given by Eq. (24).
12 See, e.g., MA Eq. (2.3). Despite the intimidating name, Eq. (26) may be very simply derived. Indeed, N! is just the number of all possible permutations of N balls, i.e. the ways to place them in certain positions – say, inside M
boxes. Now to take into account that the particular order of the balls in each box is not important, that number should be divided by all numbers Nm! of possible permutations of balls within each box – that’s it.
13 See, e.g., MA Eq. (2.10).
14 Strictly speaking, I should use the notation S here. However, following the style accepted in thermodynamics, I will drop the averaging signs until we will really need them to avoid confusion. Again, this shorthand is not too bad because the relative fluctuations of entropy (as those of any macroscopic variable) are very small at N >> 1.
15 With the replacement of ln Wm with log2 Wm (i.e. division of both sides by ln2), Eq. (29) becomes the famous Shannon (or “Boltzmann-Shannon”) formula for the average information I per symbol in a long communication string using M different symbols, with probability Wm each.
16 In some textbooks, this interpretation is even accepted as the derivation of Eq. (29); however, it is evidently less strict than the one outlined above.
Chapter 2
Page 8 of 44
SM: Statistical Mechanics
In order to prove this important fact more strictly, let us find the maximum of the function given by Eq. (29). If its arguments W 1, W 2, … WM were completely independent, this could be done by finding the point (in the M-dimensional space of the coefficients Wm) where all partial derivatives S/ Wm equal zero. However, since the probabilities are constrained by the condition (4), the differentiation has to be carried out more carefully, taking into account this interdependence:
S
S
W
m'
S W
( , W ,...)
.
(2.31)
1
2
W
W
W
W
m
m
m' m
m'
m
cond
At the maximum of the function S, all such expressions should be equal to zero simultaneously. This condition yields S/ Wm = , where the so-called Lagrange multiplier is independent of m. Indeed, at such point Eq. (31) becomes
W
W
W
m'
m
m'
S( W , W ,...)
)
1
( 0
.
(2.32)
1
2
W
W
W
W
W
m
m' m
m
m
m m
'
cond
m
m
For our particular expression (29), the condition S/ Wm = yields
S
d
W ln W
W
(2.33)
m
m ln
1 .
m
W
dW
m
m
The last equality holds for all m (and hence the entropy reaches its maximum value) only if Wm is independent on m. Thus the entropy (29) indeed reaches its maximum value (24) at equilibrium.
To summarize, we see that the statistical definition (24) of entropy does fit all the requirements imposed on this variable by thermodynamics. In particular, we have been able to prove the 2nd law of thermodynamics using that definition together with the fundamental postulate (20).
Now let me discuss one possible point of discomfort with that definition: the values of M, and hence Wm, depend on the accepted energy interval Δ E of the microcanonical ensemble, for whose choice no exact guidance is offered. However, if the interval Δ E contains many states, M >> 1, as was assumed before, then with a very small relative error (vanishing in the limit M → ∞), M may be represented as M g( E) E
,
(2.34)
where g( E) is the density of states of the system:
d( E)
g( E)
,
(2.35)
dE
Σ( E) being the total number of states with energies below E. (Note that the average interval E between energy levels, mentioned at the beginning of this section, is just E/ M = 1/ g( E).) Plugging Eq. (34) into Eq. (24), we get
S ln M ln g( E) ln E
,
(2.36)
so that the only effect of a particular choice of Δ E is an offset of the entropy by a constant, and in Chapter 1 we have seen that such constant shift does not affect any measurable quantity. Of course, Eq.
(34), and hence Eq. (36) are only precise in the limit when the density of states g( E) is so large that the range available for the appropriate choice of E:
Chapter 2
Page 9 of 44
SM: Statistical Mechanics
1
g ( E) E
E,
(2.37)
is sufficiently broad: g( E) E = E/ E >> 1.
In order to get some feeling of the functions g( E) and S( E) and the feasibility of the condition (37), and also to see whether the microcanonical distribution may be directly used for calculations of thermodynamic variables in particular systems, let us apply it to a microcanonical ensemble of many sets of N >> 1 independent, similar harmonic oscillators with frequency ω. (Please note that the requirement of a virtually fixed energy is applied, in this case, to the total energy EN of each set of oscillators, rather to energy E of a single oscillator – which may be virtually arbitrary, though certainly much less than EN ~ NE >> E.) Basic quantum mechanics tells us17 that the eigenenergies of such an oscillator form a discrete, equidistant spectrum:
1
E
(2.38)
m
m
,
where m ,
0 ,
1 ,...
2
2
If ω is kept constant, the ground-state energy ω/2 does not contribute to any thermodynamic properties of the system,18 so that for the sake of simplicity we may take that point as the energy origin, and replace Eq. (38) with Em = m ω. Let us carry out an approximate analysis of the system for the case when its average energy per oscillator,
E
E
N
,
(2.39)
N
is much larger than the energy quantum ω.
For one oscillator, the number of states with energy 1 below a certain value E 1 >> ω is evidently Σ( E 1) ≈ E 1/ ω ( E 1/ ω)/1! (Fig. 3a). For two oscillators, all possible values of the total energy ( ε 1 + ε 2) below some level E 2 correspond to the points of a 2D square grid within the right triangle shown in Fig. 3b, giving Σ( E 2) ≈ (1/2)( E 2/ ω)2 ( E 2/ ω)2/2!. For three oscillators, the possible values of the total energy ( ε 1 + ε 2 + ε 3) correspond to those points of the 3D cubic grid, that fit inside the right pyramid shown in Fig. 3c, giving Σ( E 3) ≈ (1/3)[(1/2)( E 3/ ω)3] ( E 3/ ω)3/3!, etc.
ε 2
ε 2
(c)
(a)
(b)
E
E 3
2
E 1
ε
2 ω
1
0
…
ω 2
ω
ω
0
ε 1
ε 1
Σ( E 1) ω
E 3
0
ω 2 ω E 2
ε
3
E 3
Fig. 2.3. Calculating functions Σ( EN) for systems of (a) one, (b) two, and (c) three harmonic oscillators.
17 See, e.g., QM Secs. 2.9 and 5.4.
18 Let me hope that the reader knows that the ground-state energy is experimentally measurable – for example, using the famous Casimir effect – see, e.g., QM Sec. 9.1. (In Sec. 5.5 below I will briefly discuss another method of experimental observation of that energy.)
Chapter 2
Page 10 of 44
SM: Statistical Mechanics
An evident generalization of these formulas to arbitrary N gives the number of states19
1
N
E
( E )
N
.
(2.40)
N
N!
Differentiating this expression over the energy, we get
d( E )
E
N
1
N 1
g( E )
N
(2.41)
N
dE
( N
N
,
)!
1
N
so that
S ( E ) ln g( E ) const ln
(2.42)
N
N
N
( N
)!
1 ( N )
1 ln E N ln(
N
)
.
const
For N >> 1 we can ignore the difference between N and ( N – 1) in both instances, and use the Stirling formula (27) to simplify this result as
E
E
N
E
N
S ( E) const N ln
1
N ln
ln
.
(2.43)
N
N
(The second, approximate step is only valid at very high E/ ratios, when the logarithm in Eq. (43) is substantially larger than 1.) Returning for a second to the density of states, we see that in the limit N →
, it is exponentially large:
N
S
E
g( E )
N
e
,
(2.44)
N
so that the conditions (37) may be indeed satisfied within a very broad range of Δ E.
Now we can use Eq. (43) to find all thermodynamic properties of the system, though only in the limit E >> . Indeed, according to thermodynamics, if the system’s volume and the number of particles in it are fixed, the derivative dS/ dE is nothing else than the reciprocal temperature in thermal equilibrium – see Eq. (1.9). In our current case, we imply that the harmonic oscillators are distinct, for example by their spatial positions. Hence, even if we can speak of some volume of the system, it is certainly fixed.20 Differentiating Eq. (43) over energy E, we get
Classical
1
dS
oscillator:
N
N
1
.
(2.45)
average
T
dE
E
E
N
N
energy
Reading this result backward, we see that the average energy E of a harmonic oscillator equals T (i.e.
k B T K is SI units). At this point, the first-time student of thermodynamics should be very much relieved to see that the counter-intuitive thermodynamic definition (1.9) of temperature does indeed correspond to what we all have known about this notion from our kindergarten physics courses.
The result (45) may be readily generalized. Indeed, in quantum mechanics, a harmonic oscillator with eigenfrequency may be described by the Hamiltonian operator
19 The coefficient 1/ N! in this formula has the geometrical meaning of the (hyper)volume of the N-dimensional right pyramid with unit sides.
20 For the same reason, the notion of pressure P in such a system is not clearly defined, and neither are any thermodynamic potentials but E and F.
Chapter 2
Page 11 of 44
SM: Statistical Mechanics
ˆ 2
p
ˆ2
q
ˆ
H
,
(2.46)
2m
2
where q is some generalized coordinate, p is the corresponding generalized momentum, m is oscillator’s mass,21 and is the spring constant, so that = (/m)1/2. Since in the thermodynamic equilibrium the density matrix is always diagonal in the basis of stationary states m (see Sec. 1 above), the quantum-mechanical averages of the kinetic and potential energies may be found from Eq. (7):
2
p
ˆ 2
2
p
q
ˆ2
q
W m
m ,
W m
m
(2.47)
m
,
2
m
m
m0
2m
2
m0
2
where Wm is the probability to occupy the m th energy level, and bra- and ket-vectors describe the stationary state corresponding to that level.22 However, both classical and quantum mechanics teach us that for any m, the bra-ket expressions under the sums in Eqs. (47), which represent the average kinetic and mechanical energies of the oscillator on its m th energy level, are equal to each other, and hence each of them is equal to Em/2. Hence, even though we do not know the probability distribution Wm yet (it will be calculated in Sec. 5 below), we may conclude that in the “classical limit” T >> , 2
2
p
q
T
.
(2.48) Equipartition
2
theorem
m
2
2
Now let us consider a system with an arbitrary number of degrees of freedom, described by a more general Hamiltonian:23
ˆ 2
p
ˆ2
q
ˆ
ˆ
ˆ
H H ,
with
j
j
j
H
,
(2.49)
j
j
j
2m
2
j
with (generally, different) frequencies j = ( j/m j)1/2. Since the “modes” (effective harmonic oscillators) contributing to this Hamiltonian, are independent, the result (48) is valid for each of the modes. This is the famous equipartition theorem: at thermal equilibrium with T >> j, the average energy of each so-called half-degree of freedom (which is defined as any variable, either pj or qj, giving a quadratic contribution to the system’s Hamiltonian), is equal to T/2.24 In particular, for each of three Cartesian component contributions to the kinetic energy of a free-moving particle, this theorem is valid for any temperature, because such components may be considered as 1D harmonic oscillators with vanishing potential energy, i.e. j = 0, so that condition T >> j is fulfilled at any temperature.
21 I am using this fancy font for the mass to avoid any chance of its confusion with the state number.
22 Note again that while we have committed the energy EN of N oscillators to be fixed (to apply Eq. (36), valid only for a microcanonical ensemble at thermodynamic equilibrium), the single oscillator’s energy E in our analysis may be arbitrary – within the limits << E EN ~ NT.
23 As a reminder, the Hamiltonian of any system whose classical Lagrangian function is an arbitrary quadratic form of its generalized coordinates and the corresponding generalized velocities, may be brought to the form (49) by an appropriate choice of “normal coordinates” qj which are certain linear combinations of the original coordinates – see, e.g., CM Sec. 6.2.
24 This also means that in the classical limit, the heat capacity of a system is equal to one-half of the number of its half-degrees of freedom (in the SI units, multiplied by k B).
Chapter 2
Page 12 of 44
SM: Statistical Mechanics
I believe that this case study of harmonic oscillator systems was a fair illustration of both the strengths and the weaknesses of the microcanonical ensemble approach.25 On one hand, we could readily calculate virtually everything we wanted in the classical limit T >> , but calculations for an arbitrary T ~ , though possible, are rather unpleasant because for that, all vertical steps of the function Σ( E N) have to be carefully counted. In Sec. 4, we will see that other statistical ensembles are much more convenient for such calculations.
Let me conclude this section with a short notice on deterministic classical systems with just a few degrees of freedom (and even simpler mathematical objects called “maps”) that may exhibit essentially disordered behavior, called the deterministic chaos.26 Such chaotic system may be approximately characterized by an entropy defined similarly to Eq. (29), where Wm are the probabilities to find it in different small regions of phase space, at well-separated small time intervals. On the other hand, one can use an expression slightly more general than Eq. (29) to define the so-called Kolmogorov (or “Kolmogorov-Sinai”) entropy K that characterizes the speed of loss of the information about the initial state of the system, and hence what is called the “chaos depth”. In the definition of K, the sum over m is replaced with the summation over all possible permutations { m} = m 0, m 1, …, mN-1 of small space regions, and Wm is replaced with W{ m}, the probability of finding the system in the corresponding regions m at time moment tm, with tm = m, in the limit 0, with N = const. For chaos in the simplest objects, 1D maps, K is equal to the Lyapunov exponent > 0.27 For systems of higher dimensionality, which are characterized by several Lyapunov exponents , the Kolmogorov entropy is equal to the phase-space average of the sum of all positive . These facts provide a much more practicable way of (typically, numerical) calculation of the Kolmogorov entropy than the direct use of its definition.28
2.3. Maxwell’s Demon, information, and computation
Before proceeding to other statistical distributions, I would like to make a detour to address one more popular concern about Eq. (24) – the direct relation between entropy and information. Some physicists are still uneasy with entropy being nothing else than the (deficit of) information, though to the best of my knowledge, nobody has yet been able to suggest any experimentally verifiable difference between these two notions. Let me give one example of their direct relation.29 Consider a cylinder containing just one molecule (considered as a point particle), and separated into two halves by a movable partition with a door that may be opened and closed at will, at no energy cost – see Fig. 4a. If the door is open and the system is in thermodynamic equilibrium, we do not know on which side of the partition the molecule is. Here the disorder, i.e. the entropy has the largest value, and there is no way to get, from a large ensemble of such systems in equilibrium, any useful mechanical energy.
25 The reader is strongly urged to solve Problem 2, whose task is to do a similar calculation for another key (“two-level”) physical system, and compare the results.
26 See, e.g., CM Chapter 9 and literature therein.
27 For the definition of , see, e.g., CM Eq. (9.9).
28 For more discussion, see, e.g., either Sec. 6.2 of the monograph H. G. Schuster and W. Just, Deterministic Chaos, 4th ed., Wiley-VHS, 2005, or the monograph by Arnold and Avez, cited in Sec. 1.
29 This system is frequently called the Szilard engine, after L. Szilard who published its detailed theoretical discussion in 1929, but is essentially a straightforward extension of the thought experiment suggested by J.
Maxwell as early as 1867.
Chapter 2
Page 13 of 44
SM: Statistical Mechanics
(a)
(b)
(c)
v
F
Fig. 2.4. The Szilard engine: a cylinder with a single molecule and a movable partition: (a) before and (b) after closing the door, and (c) after opening the door at the end of the expansion stage.
Now, let us consider that we know (as instructed by, in Lord Kelvin’s formulation, an omniscient Maxwell’s Demon) on which side of the partition the molecule is currently located. Then we may close the door, trapping the molecule, so that its repeated impacts on the partition create, on average, a pressure force F directed toward the empty part of the volume (in Fig. 4b, the right one). Now we can get from the molecule some mechanical work, say by allowing the force F to move the partition to the right, and picking up the resulting mechanical energy by some deterministic (zero-entropy) external mechanism. After the partition has been moved to the right end of the volume, we can open the door again (Fig. 4c), equalizing the molecule’s average pressure on both sides of the partition, and then slowly move the partition back to the middle of the volume – without its resistance, i.e. without doing any substantial work. With the continuing help by the Maxwell’s Demon, we can repeat the cycle again and again, and hence make the system perform unlimited mechanical work, fed “only” by the molecule’s thermal motion, and the information about its position – thus implementing the perpetual motion machine of the 2nd kind – see Sec. 1.6. The fact that such heat engines do not exist means that getting any new information, at non-zero temperature (i.e. at a substantial thermal agitation of particles) has a non-zero energy cost.
In order to evaluate this cost, let us calculate the maximum work per cycle that can be made by the Szilard engine (Fig. 4), assuming that it is constantly in the thermal equilibrium with a heat bath of temperature T. Formula. (21) tells us that the information supplied by the demon (on what exactly half of the volume contains the molecule) is exactly one bit, I (2) = 1. According to Eq. (24), this means that by getting this information we are changing the entropy of our system by
S ln 2 .
(2.50)
I
Now, it would be a mistake to plug this (negative) entropy change into Eq. (1.19). First, that relation is only valid for slow, reversible processes. Moreover (and more importantly), this equation, as well as its irreversible version (1.41), is only valid for a fixed statistical ensemble. The change SI does not belong to this category and may be formally described by the change of the statistical ensemble – from the one consisting of all similar systems (experiments) with an unknown location of the molecule, to a new ensemble consisting of the systems with the molecule in its certain (in Fig. 4, left) half.30
Now let us consider a slow expansion of the “gas” after the door had been closed. At this stage, we do not need the Demon’s help any longer (i.e. the statistical ensemble may be fixed), and can indeed use the relation (1.19). At the assumed isothermal conditions ( T = const), this relation may be integrated 30 This procedure of the statistical ensemble re-definition is the central point of the connection between physics and information theory, and is crucial in particular for any (or rather any meaningful :-) discussion of measurements in quantum mechanics – see, e.g., QM Secs. 2.5 and 10.1.
Chapter 2
Page 14 of 44
SM: Statistical Mechanics
over the whole expansion process, getting Q = T S. At the final position shown in Fig. 4c, the system’s entropy should be the same as initially, i.e. before the door had been opened, because we again do not know where in the volume the molecule is. This means that the entropy was replenished, during the reversible expansion, from the heat bath, by S = – SI = +ln2, so that Q = T S = T ln2. Since by the end of the whole cycle the internal energy E of the system is the same as before, all this heat could have gone into the mechanical energy obtained during the expansion. Thus the maximum obtained work per cycle (i.e. for each obtained information bit) is T ln2 ( k B T Kln2 in the SI units), about 410-21 Joule at room temperature. This is exactly the energy cost of getting one bit of new information about a system at temperature T. The smallness of that amount on the everyday human scale has left the Szilard engine an academic theoretical exercise for almost a century. However, recently several such devices, of various physical nature, were implemented experimentally (with the Demon’s role played by an instrument measuring the position of the particle without a substantial effect on its motion), and the relation Q =
T ln2 was proved, with a gradually increasing precision.31
Actually, discussion of another issue closely related to Maxwell’s Demon, namely of energy consumption at numerical calculations, was started earlier, in the 1960s. It was motivated by the exponential ( Moore’s-law) progress of the digital integrated circuits, which has led in particular, to a fast reduction of the energy E “spent” (turned into heat) per one binary logic operation. In the recent generations of semiconductor digital integrated circuits, the typical E is still above 10-17 J, i.e. still exceeds the room-temperature value of T ln2 410-21 J by several orders of magnitude. Still, some engineers believe that thermodynamics imposes this important lower limit on E and hence presents an insurmountable obstacle to the future progress of computation. Unfortunately, in the 2000s this delusion resulted in a substantial and unjustified shift of electron device research resources toward using “non-charge degrees of freedom” such as spin (as if they do not obey the general laws of statistical physics!), so that the issue deserves at least a brief discussion.
Let me believe that the reader of these notes understands that, in contrast to naïve popular talk, computers do not create any new information; all they can do is reshaping (“processing”) the input information, losing most of it on the go. Indeed, any digital computation algorithm may be decomposed into simple, binary logical operations, each of them performed by a circuit called the logic gate. Some of these gates (e.g., the logical NOT performed by inverters, as well as memory READ and WRITE
operations) do not change the amount of information in the computer. On the other hand, such information- irreversible logic gates as two-input NAND (or NOR, or XOR, etc.) erase one bit at each operation, because they turn two input bits into one output bit – see Fig. 5a.
In 1961, Rolf Landauer argued that each logic operation should turn into heat at least energy Irreversible
computation:
E
T ln 2 k T ln 2 .
(2.51)
min
B K
energy cost
This result may be illustrated with the Szilard engine (Fig. 4), operated in a reversed cycle. At the first stage, with the door closed, it uses external mechanical work E = T ln2 to reduce the volume in that the molecule is confined, from V to V/2, pumping heat Q = E into the heat bath. To model a logically irreversible logic gate, let us now open the door in the partition, and thus lose one bit of information about the molecule’s position. Then we will never get the work T ln2 back, because moving the partition 31 See, for example, A. Bérut et al. , Nature 483, 187 (2012); J. Koski et al., PNAS USA 111, 13786 (2014); Y. Jun et al., Phys. Rev. Lett. 113, 190601 (2014); J. Peterson et al., Proc. Roy. Soc. A 472, 20150813 (2016).
Chapter 2
Page 15 of 44
SM: Statistical Mechanics
back to the right, with the door open, takes place at zero average pressure. Hence, Eq. (51) gives a fundamental limit for energy loss (per bit) at the logically irreversible computation.
(a)
(b)
A
A
A
F
Fig. 2.5. Simple examples
F
of (a) irreversible and (b)
B
potentially reversible logic
B
circuits. Each rectangle
B
denotes a circuit storing one
bit of information.
However, in 1973 Charles Bennett came up with convincing arguments that it is possible to avoid such energy loss by using only operations that are reversible not only physically, but also logically.32 For that, one has to avoid any loss of information, i.e. any erasure of intermediate results, for example in the way shown in Fig. 5b.33 At the end of all calculations, after the result has been copied into memory, the intermediate results may be “rolled back” through reversible gates to be eventually merged into a copy of input data, again without erasing a single bit. The minimal energy dissipation at such reversible calculation tends to zero as the operation speed is decreased, so that the average energy loss per bit may be less than the perceived “fundamental thermodynamic limit” (51). The price to pay for this ultralow dissipation is a very high complexity of the hardware necessary for the storage of all intermediate results. However, using irreversible gates sparsely, it may be possible to reduce the complexity dramatically, so that in the future such mostly reversible computation may be able to reduce energy consumption in practical digital electronics.34
Before we leave Maxwell’s Demon behind, let me use it to revisit, for one more time, the relation between the reversibility of the classical and quantum mechanics of Hamiltonian systems and the irreversibility possible in thermodynamics and statistical physics. In the gedanken experiment shown in Fig. 4, the laws of mechanics governing the motion of the molecule are reversible at all times. Still, at partition’s motion to the right, driven by molecular impacts, the entropy grows, because the molecule picks up the heat Q > 0, and hence the entropy S = Q/ T > 0, from the heat bath. The physical mechanism of this irreversible entropy (read: disorder) growth is the interaction of the molecule with uncontrollable components of the heat bath, and the resulting loss of information about the motion of the molecule. Philosophically, such emergence of irreversibility in large systems is a strong argument against reductionism – a naïve belief that knowing the exact laws of Nature at the lowest, most fundamental level of its complexity, we can readily understand all phenomena on the higher levels of its 32 C. Bennett, IBM J. Res. Devel. 17, 525 (1973); see also C. Bennett, Int. J. Theor. Phys. 21, 905 (1982).
33 For that, all gates have to be physically reversible, with no static power consumption. Such logic devices do exist, though they are still not very practicable – see, e.g., K. Likharev, Int. J. Theor. Phys. 21, 311 (1982).
(Another reason for citing, rather reluctantly, my own paper is that it also gave constructive proof that the reversible computation may also beat the perceived “fundamental quantum limit”, E t > , where t is the time of the binary logic operation.)
34 Many currently explored schemes of quantum computing are also reversible – see, e.g., QM Sec. 8.5 and references therein.
Chapter 2
Page 16 of 44
Essential Graduate Physics
SM: Statistical Mechanics
organization. In reality, the macroscopic irreversibility of large systems is a good example35 of a new law (in this case, the 2nd law of thermodynamics) that becomes relevant on a substantially new, higher level of complexity – without defying the lower-level laws. Without such new laws, very little of the higher-level organization of Nature may be understood.
2.4. Canonical ensemble and the Gibbs distribution
As was shown in Sec. 2 (see also a few problems of the list given in the end of this chapter), the microcanonical distribution may be directly used for solving some simple problems. However, its further development, also due to J. Gibbs, turns out to be much more convenient for calculations.
Let us consider a statistical ensemble of macroscopically similar systems, each in thermal equilibrium with a heat bath of the same temperature T (Fig. 6a). Such an ensemble is called canonical.
(a)
(b)
system
E
E
Σ
under study
dQ, dS
Em, T
E
HB = E – Em
Fig. 2.6. (a) A system in a heat
bath (i.e. a canonical ensemble’s
heat bath
E
member) and (b) the energy
m
E HB, T
spectrum of the composite system
0
(including the heat bath).
It is intuitively evident that if the heat bath is sufficiently large, any thermodynamic variables characterizing the system under study should not depend on the heat bath’s environment. In particular, we may assume that the heat bath is thermally insulated, so that the total energy E of the composite system, consisting of the system of our interest plus the heat bath, does not change in time. For example, if the system of our interest is in a certain (say, m th ) quantum state, then the sum E E E
(2.52)
m
HB
is time-independent. Now let us partition the considered canonical ensemble of such systems into much smaller sub-ensembles, each being a microcanonical ensemble of composite systems whose total, time-independent energies E are the same – as was discussed in Sec. 2, within a certain small energy interval
E << E – see Fig. 6b. Due to the very large size of each heat bath in comparison with that of the system under study, the heat bath’s density of states g HB is very high, and E may be selected so that 1
E
E E E ,
(2.53)
m
HB
g
m'
HB
where m and m’ are any states of the system of our interest.
35 Another famous example is Charles Darwin’s theory of biological evolution.
Chapter 2
Page 17 of 44
SM: Statistical Mechanics
According to the microcanonical distribution, the probabilities to find the composite system, within each of these microcanonical sub-ensembles, in any state are equal. Still, the heat bath energies E HB = E – Em (Fig. 6b) of the members of this sub-ensemble may be different – due to the difference in Em. The probability W( Em) to find the system of our interest (within the selected sub-ensemble) in a state with energy Em is proportional to the number M of the corresponding heat baths in the sub-ensemble.
As Fig. 6b shows, in this case we may write M = g HB( E HB) E. As a result, within the microcanonical sub-ensemble with the total energy E,
W M g ( E ) E g ( E E ) E .
(2.54)
m
HB
HB
HB
m
Let us simplify this expression further, using the Taylor expansion with respect to relatively small Em << E. However, here we should be careful. As we have seen in Sec. 2, the density of states of a large system is an extremely fast growing function of energy, so that if we applied the Taylor expansion directly to Eq. (54), the Taylor series would converge for very small Em only. A much broader applicability range may be obtained by taking logarithms of both parts of Eq. (54) first: ln W const ln
g
E E
E
S
E E
,
(2.55)
m
(
)
HB
m
ln
const
(
)
HB
m
where the last equality results from the application of Eq. (36) to the heat bath, and ln E has been incorporated into the (inconsequential) constant. Now, we can Taylor-expand the (much more smooth) function of energy on the right-hand side, and limit ourselves to the two leading terms of the series: dS
ln W const
HB
S
E .
(2.56)
m
HB E
0
E 0 m
m
dE
m
HB
But according to Eq. (1.9), the derivative participating in this expression is nothing else than the reciprocal temperature of the heat bath, which (due to the large bath size) does not depend on whether Em is equal to zero or not. Since our system of interest is in the thermal equilibrium with the bath, this is also the temperature T of the system – see Eq. (1.8). Hence Eq. (56) is merely
E
ln W
m
const
.
(2.57)
m
T
This equality describes a substantial decrease of Wm as Em is increased by ~ T, and hence our linear approximation (56) is virtually exact as soon as E HB is much larger than T – the condition that is rather easy to satisfy, because as we have seen in Sec. 2, the average energy per one degree of freedom of the system of the heat bath is also of the order of T, so that its total energy is much larger because of its much larger size.
Now we should be careful again because so far Eq. (57) was only derived for a sub-ensemble with a certain fixed E. However, since the second term on the right-hand side of Eq. (57) includes only Em and T, which are independent of E, this relation, perhaps with different constant terms, is valid for all sub-ensembles of the canonical ensemble, and hence for that ensemble as the whole. Hence for the total probability to find our system of interest in a state with energy Em, in the canonical ensemble with temperature T, we can write
Em 1
Em
W const exp
(2.58)
Gibbs
m
exp
.
distribution
T Z
T
Chapter 2
Page 18 of 44
SM: Statistical Mechanics
.
This is the famous Gibbs distribution,36 sometimes called the “canonical distribution”, which is arguably the summit of statistical physics,37 because it may be used for a straightforward (or at least conceptually straightforward :-) calculation of all statistical and thermodynamic variables of a vast range of systems.
Before illustrating this, let us first calculate the coefficient Z participating in Eq. (58) for the general case. Requiring, per Eq. (4), the sum of all Wm to be equal 1, we get
Statistical
Em
sum
Z exp
,
(2.59)
m
T
where the summation is formally extended to all quantum states of the system, though in practical calculations, the sum may be truncated to include only the states that are noticeably occupied. The apparently humble normalization coefficient Z turns out to be so important for applications that it has a special name – or actually, two names: either the statistical sum or the partition function of the system.
To appreciate the importance of Z, let us use the general expression (29) for entropy to calculate it for the particular case of the canonical ensemble, i.e. the Gibbs distribution (58) of the probabilities Wn: ln Z
E
E
m
1
m
S W ln W
E
(2.60)
m
m
exp exp
m
.
m
Z m
T ZT m
T
On the other hand, according to the general rule (7), the thermodynamic (i.e. ensemble-averaged) value E of the internal energy of the system is
1
Em
E W E
E
(2.61a)
m
m
exp
m
,
m
Z m
T
so that the second term on the right-hand side of Eq. (60) is just E/T, while the first term equals ln Z, due to Eq. (59). (By the way, using the notion of reciprocal temperature 1/ T, with the account of Eq.
(59), Eq. (61a) may be also rewritten as
(ln Z)
E from Z
E
.
(2.61b)
This formula is very convenient for calculations if our prime interest is the average internal energy E
rather than F or Wn.) With these substitutions, Eq. (60) yields a very simple relation between the statistical sum and the entropy of the system:
E
S
ln Z .
(2.62)
T
Now using Eq. (1.33), we see that Eq. (62) gives a straightforward way to calculate the free energy F of the system from nothing other than its statistical sum (and temperature): 36 The temperature dependence of the type exp{-const/ T}, especially when showing up in rates of certain events, e.g., chemical reactions, is also frequently called the Arrhenius law – after chemist S. Arrhenius who has noticed this law in numerous experimental data. In all cases I am aware of, the Gibbs distribution is the underlying reason of the Arrhenius law. (We will see several examples of that later in this course.)
37 This is the opinion of many physicists, including Richard Feynman – who climbs on this “summit” already on the first page of his brilliant book Statistical Mechanics, CRC Press, 1998. (This is a collection of lectures on a few diverse, mostly advanced topics of statistical physics, rather than its systematic course, so that it can hardly be used as the first textbook on the subject. However, I can highly recommend its first chapter to all my readers.) Chapter 2
Page 19 of 44
SM: Statistical Mechanics
F E TS T
ln Z.
(2.63) F from Z
The relations (61b) and (63) play the key role in the connection of statistics to thermodynamics, because they enable the calculation, from Z alone, of the thermodynamic potentials of the system in equilibrium, and hence of all other variables of interest, using the general thermodynamic relations – see especially the circular diagram shown in Fig. 1.6, and its discussion in Sec. 1.4. Let me only note that to calculate the pressure P, e.g., from the second of Eqs. (1.35), we would need to know the explicit dependence of F, and hence of the statistical sum Z on the system’s volume V. This would require the calculation, by appropriate methods of either classical or quantum mechanics, of the dependence of the eigenenergies Em on the volume. Numerous examples of such calculations will be given later in the course.
Before proceeding to first such examples, let us notice that Eqs. (59) and (63) may be readily combined to give an elegant equality,
F
Em
exp exp
.
(2.64)
T
m
T
This equality, together with Eq. (59), enables us to rewrite the Gibbs distribution (58) in another form:
F Em
W exp
(2.65)
m
,
T
more convenient for some applications. In particular, this expression shows that since all probabilities Wm are below 1, F is always lower than the lowest energy level. Also, Eq. (65) clearly shows that the probabilities Wm do not depend on the energy reference, i. e. on an arbitrary constant added to all Em –
and hence to E and F.
2.5. Harmonic oscillator statistics
The last property may be immediately used in our first example of the Gibbs distribution application to a particular, but very important system – the harmonic oscillator, for a much more general case than was done in Sec. 2, namely for an arbitrary relation between T and .38 Let us consider a canonical ensemble of similar oscillators, each in a contact with a heat bath of temperature T. Selecting the ground-state energy /2 for the origin of E, the oscillator eigenenergies (38) become Em = m
(with m = 0, 1,…), so that the Gibbs distribution (58) for probabilities of these states is 1
E
1
m
m
W
exp
exp
,
(2.66)
m
Z
T Z
T
with the following statistical sum:
m
m
Z
exp
,
where exp
.
1
(2.67)
m0
T m0
T
This is just the well-known infinite geometric progression (the “geometric series”),39 with the sum 38 The task of making a similar (and even simpler) calculation for another key quantum-mechanical object, the two-level system, is left for the reader’s exercise.
39 See, e.g., MA Eq. (2.8b).
Chapter 2
Page 20 of 44
SM: Statistical Mechanics
1
1
Z
,
(2.68)
1
1
/ T
e
Quantum
oscillator: so that Eq. (66) yields
statistics
W
(2.69)
m
1
/ T
e
m / T
e
.
Figure 7a shows Wm for several lower energy levels, as functions of temperature, or rather of the T/ ratio. The plots show that the probability to find the oscillator in each particular state (except for the ground one, with m = 0) vanishes in both low- and high-temperature limits, and reaches its maximum value Wm ~ 0.3/ m at T ~ m, so that the contribution m Wm of each excited level to the average oscillator energy E is always smaller than .
(a)
(b)
1
2
W 0
E
S
1
W 1
C
W
2
W 3
0.1
0
F
1
0.01
2
0.1
1
10
0
0.5
1
1.5
2
2.5
3
T /
T /
Fig. 2.7. Statistical and thermodynamic parameters of a harmonic oscillator, as functions of temperature.
This average energy may be calculated in either of two ways: either using Eq. (61a) directly:
E E W
T
m
T
e
m
e
(2.70)
m
m
1 /
/
,
m0
m0
or (simpler) using Eq. (61b), as
1
E
ln Z
ln1
exp ,
where .
(2.71)
T
Both methods give (of course) the same result,40
40 It was first obtained in 1924 by S. Bose and is sometimes called the Bose distribution – a particular case of the Bose-Einstein distribution to be discussed in Sec. 8 below.
Chapter 2
Page 21 of 44
SM: Statistical Mechanics
1
Quantum
E E(, T )
,
(2.72) oscillator:
/ T
average
e
1
energy
which is valid for arbitrary temperature and plays a key role in many fundamental problems of physics.
The red line in Fig. 7b shows this result as a function of the normalized temperature. At relatively low temperatures, T << , the oscillator is predominantly in its lowest (ground) state, and its energy (on top of the constant zero-point energy /2, which was used in our calculation as the reference) is exponentially small: E exp{-/ T} << T, . On the other hand, in the high-temperature limit, the energy tends to T. This is exactly the result (a particular case of the equipartition theorem) that was obtained in Sec. 2 from the microcanonical distribution. Please note how much simpler is the calculation using the Gibbs distribution, even for an arbitrary ratio T/.
To complete the discussion of the thermodynamic properties of the harmonic oscillator, we can calculate its free energy using Eq. (63):
1
F T ln
T ln
/ T
1 e
.
(2.73)
Z
Now the entropy may be found from thermodynamics: either from the first of Eqs. (1.35), S = –(∂ F/∂ T) V, or (even more easily) from Eq. (1.33): S = ( E – F)/ T. Both relations give, of course, the same result:
1
S
ln
/ T
1 e
.
(2.74)
/ T
T e
1
Finally, since in the general case the dependence of the oscillator properties (essentially, of ) on volume V is not specified, such variables as P, , G, W, and are not defined, and what remains is to calculate the average heat capacity C per one oscillator:
2
2
/
E
e
T
/ 2 T
C
.
(2.75)
T
T ( /
e
T )
1 2
sinh( / 2 T )
The calculated thermodynamic variables are plotted in Fig. 7b. In the low-temperature limit ( T
<< ), they all tend to zero. On the other hand, in the high-temperature limit ( T >> ), F – T
ln( T/) –, S ln( T/) +, and C 1 (in the SI units, C k B). Note that the last limit is the direct corollary of the equipartition theorem: each of the two “half-degrees of freedom” of the oscillator gives, in the classical limit, the same contribution C = ½ into its heat capacity.
Now let us use Eq. (69) to discuss the statistics of the quantum oscillator described by Hamiltonian (46), in the coordinate representation. Again using the density matrix’ diagonality in thermodynamic equilibrium, we may use a relation similar to Eqs. (47) to calculate the probability density to find the oscillator at coordinate q:
(
w q)
W w ( q) W q
e
T
e m
T q ,
(2.76)
m
m
2
( )
m
m
1
/
2
/
( )
m
m0
m0
m0
where m( q) is the normalized eigenfunction of the m th stationary state of the oscillator. Since each
m( q) is proportional to the Hermite polynomial41 that requires at least m elementary functions for its 41 See, e.g., QM Sec. 2.10.
Chapter 2
Page 22 of 44
SM: Statistical Mechanics
representation, working out the sum in Eq. (76) is a bit tricky,42 but the final result is rather simple: w( q) is just a normalized Gaussian distribution (the “bell curve”),
1
2
q
(
w q)
,
(2.77)
2
exp
1/ 2
2
q
2( q
)
with q = 0, and
q 2 q
2
coth .
(2.78)
2 m
T
2
Since the function coth tends to 1 at → , and diverges as 1/ at → 0, Eq. (78) shows that the width q of the coordinate distribution is nearly constant (and equal to that, (/2 m)1/2, of the ground-state wavefunction 0) at T << , and grows as ( T/ m2)1/2 ( T/)1/2 at T/ → .
As a sanity check, we may use Eq. (78) to write the following expression,
2
q
/ ,
4
for T ,
U
coth
(2.79)
2
4
2 T
T
/2,
for T ,
for the average potential energy of the oscillator. To comprehend this result, let us recall that Eq. (72) for the average full energy E was obtained by counting it from the ground state energy /2 of the oscillator. If we add this reference energy to that result, we get
Quantum
oscillator:
total average
E
coth .
(2.80)
energy
e / T 1
2
2
T
2
We see that for arbitrary temperature, U = E/2, as was already discussed in Sec. 2. This means that the average kinetic energy, equal to E – U, is also the same:43
p 2
q 2
E
coth
.
(2.81)
2 m
2
2
4
T
2
In the classical limit T >> , both energies equal T/2, reproducing the equipartition theorem result (48).
2.6. Two important applications
The results of the previous section, especially Eq. (72), have innumerable applications in physics and related disciplines, but here I have time for a brief discussion of only two of them.
(i) Blackbody radiation. Let us consider a free-space volume V limited by non-absorbing (i.e.
ideally reflecting) walls. Electrodynamics tells us44 that the electromagnetic field in such a “cavity” may be represented as a sum of “modes” with the time evolution similar to that of the usual harmonic 42 The calculation may be found, e.g., in QM Sec. 7.2.
43 As a reminder: the equality of these two averages, at arbitrary temperature, was proved already in Sec. 2.
44 See, e.g., EM Sec. 7.8.
Chapter 2
Page 23 of 44
Essential Graduate Physics
SM: Statistical Mechanics
oscillator. If the volume V is large enough,45 the number of these modes within a small range dk of the wavevector magnitude k is
gV
3
gV
dN
d k
4 k 2
dk ,
(2.82)
3
3
(2 )
(2 )
where for electromagnetic waves, the degeneracy factor g is equal to 2, due to their two different independent (e.g., linear) polarizations of waves with the same wave vector k. With the linear, isotropic dispersion relation for waves in vacuum, k = / c, Eq. (82) yields
2
2
2
V
d
dN
4
V
d .
(2.83)
(2 )3
3
2 3
c
c
On the other hand, quantum mechanics says46 that the energy of such a “field oscillator” is quantized per Eq. (38), so that at thermal equilibrium its average energy is described by Eq. (72).
Plugging that result into Eq. (83), we see that the spectral density of the electromagnetic field’s energy, per unit volume, is
3
E dN
1
Planck’s
u()
.
(2.84) radiation
2 3
/ T
V d
c e
1
law
This is the famous Planck’s blackbody radiation law.47 To understand why its common name mentions radiation, let us consider a small planar part, of area dA, of a surface that completely absorbs electromagnetic waves incident from any direction. (Such “perfect black body” approximation may be closely approached using special experimental structures, especially in limited frequency intervals.) Figure 8 shows that if the arriving wave was planar, with the incidence angle , then the power d P() absorbed by the surface of small area dA, within a small frequency interval d, i.e. the energy incident at that area in unit time, would be equal to the radiation energy within the same frequency interval, contained inside an imaginary cylinder (shaded in Fig. 8) of height c, base area dA cos, and hence volume dV = c dA cos :
d P
( ) u
( )
d dV u
( )
d c dA
cos .
(2.85)
dA cos
dA
c
Fig. 2.8. Calculating the relation
between d P () and u() d.
45 In our current context, the volume should be much larger than ( c/ T)3, where c 3108 m/s is the speed of light. For the room temperature ( T k B300K 410-21 J), this lower bound is of the order of 10-16 m3.
46 See, e.g., QM Sec. 9.1.
47 Let me hope the reader knows that this law was first suggested in 1900 by Max Planck as an empirical fit for the experimental data on blackbody radiation, and this was the historic point at which the Planck constant (or rather h 2) was introduced – see, e.g., QM Sec. 1.1.
Chapter 2
Page 24 of 44
SM: Statistical Mechanics
Since the thermally-induced field is isotropic, i.e. propagates equally in all directions, this result should be averaged over all solid angles within the polar angle interval 0 /2:
d P ()
1
d ()
1 / 2
2
c
d cu()
sin d d cos u() .
(2.86)
P
dAd
4
dAd
4
4
0
0
Hence the Planck’s expression (84), multiplied by c/4, gives the power absorbed by such a “blackbody”
surface. But at thermal equilibrium, this absorption has to be exactly balanced by the surface’s own radiation, due to its non-zero temperature T.
I hope the reader is familiar with the main features of the Planck law (84), including its general shape (Fig. 9), with the low-frequency asymptote u() 2 (due to its historic significance bearing the special name of the Rayleigh-Jeans law), the exponential drop at high frequencies (the Wien law), and the resulting maximum of the function u(), reached at the frequency max with
82
.
2
T ,
(2.87)
max
i.e. at the wavelength max = 2/ k max = 2 c/max 2.22 c/ T.
10
1
u( )1
u
0
Fig. 2.9. The frequency dependence of the
0.1
blackbody radiation density, normalized by
u 0 T 3/22 c 3, according to the Planck law
(red line) and the Rayleigh-Jeans law (blue
line).
0.010.1
1
10
/ T
Still, I cannot help mentioning a few important particular values: one corresponding to the visible light (max ~ 500 nm) for the Sun’s effective surface temperature T K 6,000 K, and another one corresponding to the mid-infrared range (max ~10 m) for the Earth’s surface temperature T K 300 K.
The balance of these two radiations, absorbed and emitted by the Earth, determines its surface temperature and hence has the key importance for all life on our planet. This is why it is at the front and center of the current climate change discussions. As one more example, the cosmic microwave background (CMB) radiation, closely following the Planck law with T K = 2.725 K (and hence having the maximum density at max 1.9 mm), and in particular its (very small) anisotropy, is a major source of data for modern cosmology.
Now let us calculate the total energy E of the blackbody radiation inside some volume V. It may be found from Eq. (84) by its integration over all frequencies: 48,49
48 The last step in Eq. (88) uses a table integral, equal to (4)(4) = (3!)(4/90) = 4/15 – see, e.g., MA Eq. (6.8b), with s = 4, and then MA Eqs. (6.7e), and (2.7b).
Chapter 2
Page 25 of 44
Essential Graduate Physics
SM: Statistical Mechanics
3
4
3
2
d
VT
d
4
E V u() d V
V
T
.
(2.88)
2 3
2
3 3
3 3
/
c e
T 1 c e 1
15 c
0
0
0
Using Eq. (86) to recast Eq. (88) into the total power radiated by a blackbody surface, we get the well-known Stefan (or “Stefan-Boltzmann”) law 50
2
d P