% ***********************************************************************
% Copyright (c) Erik G. Learned-Miller, 2004.
% ***********************************************************************
% vasicekExact(v,m) Estimate entropy of distribution from a sample.
%
% h=vasicek(v,m)
%
% Inputs:
% v is a vector of scalars.
% m is the "spacings" count which must be an integer between 1 and the
% length of the vector v. This can be thought of as a
% "smoothing" parameter. Larger values give lower variance but
% higher bias. A good default should be floor(sqrt(len(v))).
%
% Output:
% h is an estimate of the entropy of the distribution from which
% the sample v was taken.
%
% The function returns a somewhat downward biased estimate of
% the entropy IN NATS. It is fast, but is also reasonably
% accurate. Its accuracy could be increased by adding a bias
% adjustment term. It uses the natural log rather than the possibly
% more intuitive log2 because natural log is faster.
%
% For more information on this estimator, see the following
% publications:
%
% Beirlant, J., Dudewicz, E. J., Gyorfi, L., and van der Meulen,
% E. C. "Nonparametric entropy estimation: An overview",
% International Journal of the Mathematical Statistics
% Sciences, 6, 17-39, 2001.
%
% Vasicek, Oldrich. "A test for normality based on sample entropy."
% Journal of the Royal Statistical Society, Series
% B. 38(1):54-59, 1976.
%
% Learned-Miller, Erik G. and Fisher, John W. "ICA using spacings
% estimates of entropy." Journal of Machine Learning Research, 4,
% 1271-1295, 2003.
%
function h=vasicekExact(v,m)
len=length(v);
orderStats=sort(v);
% Note that the intervals overlap for this estimator.
intvals=orderStats(m+1:len)-orderStats(1:len-m);
hvec=log(intvals);
h=sum(hvec);
%h=h/(len-m)+log((len+1)/m); % Simpler version. See Learned-Miller et al.
h=h/(len-m)+log(len+1)-psi(m); % With partial bias correction? See Beirlant et al.