When we model probability of a variable by , is often referred to as the free energy.
The name is coming from historical reason. The Gibbs-Boltzmann distribution for a configuration is proportional to . And the closest reason I found is from here
and is the Helmholtz free energy. Unfortunately, I doubt the above explanation is actually correct since we completely ignore the partition function. In my old notes of exponential family, I also mentioned free energy and Legendre transformation. I am also aware that Legendre transformation is just convex conjugate. But I just forgot all these now. π
Update:
I read me notes and Mackay’s textbook. I doubt Lecun’s slides mentioning free energy is indeed correct. It seems the free energy should be correct. The sign is a bit tricky as shown below though.
With Boltzmann distribution,
,
where is the inverse temperature. As usual, we will denote the denominator as the partition function and we will write as the log-partition function.
Note that has nice cumulant generating property. It can be readily verified that
and . Note that since , is a convex function. So the convex conjugate of ‘s convex conjugate is itself.
Consider the convex conjugate of ,
And the optimal satisfies . In other words, given a , the inverse temperature, , should be chosen such that the average energy is equal to . Let’s write and for and that satisfy the condition .
And what the hack is ? It turns out that it is simply the entropy . Because
.
So we also have
, where .
Let the free energy , we have
.
Remarks
I’m still not sure why free energy is called “free”. I guess the best the explanation is that is traditionally the Helmholtz free energy. While it seems to be related to available energy for work in thermodynamics, I doubt the same interpretation can be apply in this simple model. For one thing, note that when we increase the temperature of the system. Two things happened, the system is more likely to hop to higher energy states. At the same time, since it can spread to more states, the entropy increase. Since entropy is positive, the “free” energy decreases as temperature increases since .
LeCun’s definition in his lecture is very similar but is different. there is a function of some configuration ( and ). The definition here essentially the weighted sum of in LeCun’s lecture weighted by the probabilities of the states.