Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check white noise generation #89

Closed
pausz opened this issue Feb 20, 2017 · 14 comments
Closed

Check white noise generation #89

pausz opened this issue Feb 20, 2017 · 14 comments
Assignees
Labels
core optimize refactor sci Identifies features that are of scientific interest.

Comments

@pausz
Copy link
Contributor

pausz commented Feb 20, 2017

The current implementation of White noise assumes we're working with a square grid in order to compute deltax. If the grid is rectangular, the current scaling of the noise will result in an incorrect value of deltax.

https://github.com/BrainDynamicsUSYD/nftsim/blob/master/src/timeseries.cpp#L330

Related issues #90 and #151.

Current behaviour if the grid has 450 nodes and Longside is 30 (nodes), then

  • deltax = 0.0166667 m (according to Wave)
  • deltax = 0.0235702 m (according to TimeSeries-White and WhiteCoherent)
Time: 8 Deltat: 9.765625e-04
Nodes: 64

    Connection matrix:
From:  1 
To 1:  0


Population 1: Stimulation
Length: 0.5
    Stimulus: White - Onset: 0 Mean: 0 PSD: 1e-05

Output: Node: All Start: 0
Population: 1
Dendrite:  
Propagator:
Coupling: 
x = out.data{1}; % out is the nftsim struct
[Pxx, F] = pwelch(x, [], [], 512, 1/9.765625e-04, 'onesided');

screenshot from 2018-02-15 15-51-17

Changing the time step to 4.8828e-04 and repeating the analysis we obtain:
screenshot from 2018-02-15 15-58-20

Bottom line: the scaling of the variance of the underlying normal distribution used to generate a sequence of random numbers, really depends on what property of the resulting signal one needs to keep constant (eg, PSD or average power). This scaling might be at odds with interpretations of what that sequence of random numbers is (eg, increments of a Wiener process, just and input signal with a really broad band), etc.

@pausz pausz added the bug label Feb 20, 2017
@pausz pausz self-assigned this Feb 20, 2017
@pausz
Copy link
Contributor Author

pausz commented Feb 20, 2017

A quick and dirty hack would be to read longside from the config file. However, longside should be available to a range of classes that require the value of deltax. See #90 and #151.

@pausz pausz modified the milestone: Mark II Mar 9, 2017
@pausz pausz changed the title White noise assumes square grid Check white noise generation May 22, 2017
@pausz
Copy link
Contributor Author

pausz commented May 22, 2017

The current formula to 'normalize' or 'scale' the standard deviation of the Gaussian distribution used to generate noise seems to have inconsistent units. Also, the from the derivation in the paper I gather that there was a disagreement/misunderstanding about what sigma**2 represents (it's power spectral density with units of coordinate_units**2 / freq_units and whether it should be kept constant or not when the discretization changes (eg, delta t, delta x, delta y).

PSD plot of Gaussian white noise (GWN) with mean=0 and var=4. The curve represents the average over 2048 trials of 4 seconds. The Nyquist frequency (fmax) is 512Hz (fs = 1/dt and fmax = 1/2 fs). The

2017-05-16_gwn_var-4_fmax-512_t-4_area

PSD plot of Gaussian white noise with mean=0 and var=4. The curve represents the average over 2048 trials of 4 seconds. The Nyquist frequency (fmax) is 128Hz (fs = 1/dt and fmax = 1/2 fs).

2017-05-16_gwn_var-4_fmax-128_t-4_area

The shaded area represents the power between 0-100Hz. This value is constant regardless the value of delta t. However, if sigma**2 (PSD) of GWN is kept constant and the bandwidth increases, then the total power of the signal increases. An increase in total power is expected since in the limit of infinite bandwidth the total power of white noise is infinite.

Now, if we wish to keep the total power constant when the bandwidth changes (the shaded area would cover from -fmax to fmax), then the PSD (sigma**2) has to be adjusted and it will decrease proportionally to delta t, delta x, and delta y. In crude terms PSD*bandwidth = constant.
Currently sigma**2 decreases inversely proportional to delta t,delta x and delta y.

@RomeshA
Copy link
Contributor

RomeshA commented May 22, 2017

How do you calculate the PSD/Hz? I get a different result in Matlab with pwelch:

P128 = [];
P512 = [];
for j = 2048:-1:1
	[P128(:,j),f128] = pwelch(2*randn(4*128,1),[],[],[],128);
	[P512(:,j),f512] = pwelch(2*randn(4*512,1),[],[],[],512);
end
plot(f128,mean(P128,2))
hold on
plot(f512,mean(P512,2))
xlabel('Frequency')
ylabel('Power/Hz')

gwn

Here it's the total power (integrated over all frequencies) that is the same

@pausz
Copy link
Contributor Author

pausz commented May 23, 2017

Thank you for the example. Indeed, the two plots above indicate that the power
of the input signal is the same for the two cases. Unfortunately, the example
does not provide further justification about the way sigma is currently scaled
in NF. I think there are several topics (PSD,Parseval's theorem, continuous white noise,
discrete white noise, gaussian white noise, stationary random processes, etc)
that have been merged into one big blob.

Also, sometimes it's difficult to know whether people are talking about power
or power spectral density since most of the time either power or PSD are given in arbitrary units.

If the variance of the Gaussian distribution is the same in the two cases,
then the PSD (measured in au^2/Hz), which is the height of the plots, should
be the same in the two cases. That is a result that follows from the Wiener-
Khinchin theorem. The total power should be larger in the second case, because
it has a wider bandwidth. However, here it seems that power (area) has a
similar value to the variance of the signal. So, I'm really confused where
that last equivalence is coming from. It's definitely not Parseval's theorem
which is basically power preservation between time and spectral domains. The
power of white noise is infinite, or at least it diverges to infinite. I know here we're working with a finite signal, and thus the power will be finite, but it should definitely blow up as the bandwidth increases.

Is Parseval's theorem satisfied in the example above?

Here is what I did:

rng(42) 
mu = 0;     %mean of each gaussian process 
L  = 2048;  %trials
T  = 4;     % s
dt = 2^-8;  % s
t  = dt:dt:T;   % s
N  = length(t); % samples
dt = t(2) - t(1); % s 
fs = 1 / dt;      % Hz 

sigma = 2;   % standard deviation (au/sqrt(Hz))
df = 1 / T;  % Hz

if L == 1
   z = mu + sigma*randn(1,N); 
else
    MU  = mu*ones(1,N);              % mean for all realizations
    Cxx = (sigma^2)*diag(ones(N,1)); % cov mat
    R   = chol(Cxx);                 

    % signals
    z = repmat(MU,L,1) + randn(L,N)*R;
end


%%
Z =  fft(z,[],2) / sqrt(N); %Scaling by sqrt(N);
if size(Z, 1) == 1
    Pzavg = Z.*conj(Z);
else   
    Pzavg = mean(Z.*conj(Z));  %mean power spectral density
end

% Check Parseval's theorem
% frequency
P = norm(Z(1, :)).^2; 
% time
p = norm(z(1, :)).^2;

%% Rearrange
f = (-N/2:N/2-1) * df;
Pzavg = fftshift(Pzavg); %Shift zero-frequency component to center of spectrum


%% Plot lines
subplot(1,1,1)
plot(f,Pzavg,'r');
axis([-fs/2 fs/2 0 10]); grid on;
ylabel('PSD(a.u.^{2}/Hz)');
xlabel('f[Hz]');
title({['PSD of Gaussian white noise \mu=0, \sigma^2=' num2str(sigma^2) ','],['f_{max}=' num2str(fs/2) '[Hz], \Delta f=' num2str(df) '[Hz],'], ['\Delta t=' num2str(dt) '[s] , T=' num2str(T) '[s]']});

I'm aware that the way power spectral density is computed for a given signal is
another matter and depends on language and implementation of fft and ifft and
scaling by 1/N by 1/N*fs or by 1/sqrt(N). The way I've done it here is the
simplest way to approximate the continuous Fourier transform. I'm not trying
to compute discrete Fourier coefficients, which normally is achieved by (in matlab) a
scaling of 1/N, where N is the length of the input signal.

In particular, pwelch with its default values as it was used in this example
divides the input signal into segments, applies a window and computes the one-
power spectrum. I'm not sure what pwelch does exaclty, but from the
documentation I got that "The periodogram is not a consistent estimator of the
true power spectral density of a wide-sense stationary process. Welch's
technique to reduce the variance of the periodogram breaks the time series
into segments, usually overlapping."

The main issue is that it is difficult to properly justify the scaling of sigma by dt, dx, dy as it is in the code and to write a step-wise derivation of the scaling expression.

@RomeshA
Copy link
Contributor

RomeshA commented May 23, 2017

Yes, Parseval's theorem is satisfied, at least as far as the definitions in Chris's notes. I just used pwelch() as it's a built-in, but the normalization is the same as rfft which is a single FFT with no window/taper, and (discrete) Parseval's theorem is satisfied with

>> x = 2*randn(4*512,1);
>> [f,~,P]=utils.rfft(x,512);
>> sum(abs(x).^2)/length(x)
ans =
    4.0440
>> sum(P).*(f(2)-f(1))
ans =
    4.0440

with the (f(2)-f(1)) corresponding to integration over frequency. Since the numerical FFT is both discrete and periodic, if we use Chris's Eq. 9, the LHS is the variance of the noise, and thus doesn't depend on the number of samples. Whereas the RHS does (writing the summation as N*C where C is the constant magnitude of the discrete Fourier coefficients). Eq. 9 then shows that as you increase N, the magnitude of the coefficients decreases for fixed variance. The normalization derived there seems correct for well-defined signals e.g. sines, so I'm not sure why it wouldn't be correct to use here, at least in the discrete case.

I'm admittedly not so comfortable in the continuous case - but there are at least a couple of indications that the variance of the process is expected to blow up as you increase the bandwidth. That's the result derived here. There's also a bit of discussion about the notation (second response, by Matt L). And see also the second post here. The other thing I'm not sure about is where you're using the norm to compute the energy - from here they associate the L2 norm with the energy of the signal, but both of those expressions are integrals so there's a dt or a df in there too. In particular, in Eq. 2.1 we could write v(t)^2 = c where c is the variance of the white noise (as that's the expectation value for v(t)^2 and its the same at all times). The integral over a fixed period of time is then the same irrespective of the sampling rate i.e. the energy depends on the variance and the duration of the signal, but not the sampling rate. But it's equally likely I'm missing something :)

The other question is about the 1/sqrt(N) vs 1/N - is there a derivation of this somewhere to see what the corresponding continuous equations are? Should this be used everywhere we use the FFT to compute power spectral density...? Why/why not?

@pausz
Copy link
Contributor Author

pausz commented May 30, 2017

Thank you for discussing this with me and sorry for the delay in replying (I was sick last week).
I'm a similar situation as you: It's very likely I'm misinterpreting and/or missing something in what I'm doing :/ ... or reading. It seems we've been reading similar posts and such; but we have understood different things:

And yes, I read Chris's notes and I see that it fits with your results and the examples are clear.
However, upon reading this post I understood that sigma^2 is the power spectral density per unit of frequency (so in au^2/Hz). If that is the case, then the height of a plot of psd vs f, should be sigma^2. Also, the PSD wouldn't change with the numbers of samples in the random processes and it wouldn't change even if you increase the bandwidth. What should change if you increase the bandwidth is the total power (integrated over all frequencies), while the variance (or alternatively PSD) remains constant.

About using the L2 norm: the square of the L2 norm gives the power of the signal and it should be the same of in the time domain and frequency domain.

About _v(t)^2 = c: This is something that I didn't have very clear because I've read conflicting material. For instance, here they say: "The average value of the square of the signal (which is equal to the variance if the signal has zero mean) is equal to the integral of the power spectral density. " Ok. From that I understand the variance is the total power (PSD integrated over frequencies). Now, if you look at the first row of figure 3.1 you can see the timeseries, the power spectrum (psd I'm guessing) and autocorrelation. From these plots, I estimate that the variance used in the example is 1 (look at the autocorrelation at lag 0). But when I look at the height of the power spectrum which is 1 I have trouble believing that the power of the signal (PSD integrated over all frequencies) is equal to the variance. It's clearly not. I'm not sure what to make out of that.

The integral over a fixed period of time is then the same irrespective of the sampling rate i.e. the energy depends on the variance and the duration of the signal, but not the sampling rate.

An increase in the signal sampling rate effectively increases the bandwidth. In the case of white noise, because it has a constant power spectral density it means that now you have power from additional frequency components. That reasoning is only valid assuming that sigma^2 represents the average PSD.

The factor of 1/sqrt(N) is used to have symmetry between the forward and inverse Fourier transforms
and to have an orthonormal basis. . I've also read some threads like this one and this one where people discuss the different scalings.

@RomeshA
Copy link
Contributor

RomeshA commented May 30, 2017

No worries, hope you're feeling better! :) I think we're closing in on it. IMO the critical question boils down to

What should change if you increase the bandwidth is the total power (integrated over all frequencies), while the variance (or alternatively PSD) remains constant.

That's the part I can't reconcile with the discrete FFT normalisation - I think we agree that as the bandwidth increases, if the PSD (au^2/Hz) is kept the same, then the total power (integrated over frequency) is increased. But my understanding is that in the time domain, this corresponds to the variance increasing. What your calculation is saying is that if you sample the white noise at a higher sampling rate, the energy in the time domain increases. This is then related to the computation of the L2 norm:

the square of the L2 norm gives the power of the signal and it should be the same of in the time domain and frequency domain.

This is another crucial part - my understanding (from sources like these) is that this is an integral. The other reason I think this is the case - suppose we have a deterministic signal like a sine, and suppose it has a duration of 1s and a frequency of 1Hz (so we are sampling 1 cycle). If we sample it at 100Hz, there'll be 100 data points, and you could compute the power based on the signal norm. Now suppose it's sampled at 200Hz - now there are 200 data points, and we could compute the power based on the signal norm again. Shouldn't the energy in this signal should be the same in both cases? This is only true if the norm is computed with an integral.

In the time domain, if the L2 norm is computed with an integral, then for white noise the power/energy won't go up if the sampling rate is increased. But without the integral, the vector norm will increase - I suspect this is the origin of our different results - Wolfram draws a distinction between the l2-norm (summation) and the L2-norm (integration), and I think it's the latter that refers to signal energy.

I think once we agree on this, then we'll automatically agree on everything else!

An increase in the signal sampling rate effectively increases the bandwidth. In the case of white noise, because it has a constant power spectral density it means that now you have power from additional frequency components.

So I agree with this, but I would say, if we start in the frequency domain and increase the bandwidth, what happens to the autocorrelation? The autocorrelation is only nonzero at zero lag, so the only thing that can change is value of the autocorrelation at tau=0. So if we increase the bandwidth with constant PSD, then I think the autocorrelation at tau=0 increases. And then I would say that the autocorrelation function for a Gaussian process corresponds directly to the variance, so this implies the variance has increased. I think where we differ then is that you would say the autocorrelation of the Gaussian process requires an additional term that accounts for the sampling rate...?

Another way I think about it is to consider synthesising a spectrally white time series from Fourier components, by setting all the components to the same amplitude and randomising their phases. Something like

latex-image-3

Then at a single point in time, the signal value is given by adding together N sines, each with a random phase. What are the statistics of this summation i.e. the statistics of the signal in the time domain? The mean should be zero, and the variance will depend on A_n (the PSD - the same for all components) and will increase with N. Increasing the bandwidth corresponds to having more terms, so this is also saying that as the bandwidth increases, so does the variance. Do you know what I'm missing here?

Regarding that Figure 3.1, I don't trust my ability to interpret that figure, because it's not clear whether they are plotting power spectral density or not. Also, without units on their time axis, we can't tell how their bandwidth for the power spectrum came about, so it's possible that the spectrum was truncated. Their frequency spectrum x-axis doesn't have units either. Maybe it's just meant to be a qualitative figure

@jamesmaclaurin
Copy link
Collaborator

jamesmaclaurin commented May 31, 2017 via email

@RomeshA
Copy link
Contributor

RomeshA commented Jun 1, 2017

Thanks James, I was hoping you'd join in :)

It seemed that in your notes, you are trying to keep the standard deviation to be constant, and asymptote the bandwidth to infinity and the power spectral density to zero. If you do this, then you should obtain a deterministic system in the limit of infinite bandwidth – i.e. as if there is no noise at all! I doubt in fact that this is what you really want from your model.

Yes, that's indeed not what I meant. The issue is that for the linear analytic spectrum, the PSD at say, 10Hz, is fixed, and of course you can evaluate the spectrum/transfer function at whatever set of frequencies you like because they are all independent. But when we integrate numerically, how do we ensure that the PSD of the noise in the bin spanning 10Hz has the same value that went into the theoretical calculation? So the aim is to keep the PSD constant as the bandwidth (in both frequency and wavenumber) changes, by adjusting the variance accordingly. The current scaling in NeuroField does what you've just said, where as the time step approaches 0 and the bandwidth thus approaches infinity, the variance also approaches infinity.

So at the very least, the clarity of the derivation needs to be improved!

@jamesmaclaurin
Copy link
Collaborator

jamesmaclaurin commented Jun 2, 2017 via email

@RomeshA
Copy link
Contributor

RomeshA commented Jun 2, 2017

I don't think anything has changed since this document, is that the one you're talking about? There, the equation at the top of page 4 has the variance being linearly proportional to the bandwidth (via the sampling rate) when the PSD (\phi^2_n(\omega)) is held constant [edit: note that this is the PSD that goes into the analytic calculation, which is selected independently in advance].

How should the magnitude of the solution be measured? In the time domain or in the frequency domain? I would expect that the power in a specific frequency range e.g. 1-100Hz should remain the same, but if the PSD has a fixed value and the bandwidth is increased, then the total power in the frequency domain is higher, and thus the power in the time domain should be higher as well i.e. the magnitude of the solution should grow with the variance, in the absence of any filtering, which reflects the definition of the white noise...?

@jamesmaclaurin
Copy link
Collaborator

jamesmaclaurin commented Jun 2, 2017 via email

@RomeshA
Copy link
Contributor

RomeshA commented Jun 2, 2017

Yes I think the top of Page 3 is incorrect because there I was quoting the textbook formulae that have the issues with notation discussed in the second response here. I knew there was a problem, but at the time I attributed this to the result corresponding to an ideal continuous case, rather than being an issue with the notation. So those formulae aren't actually used in the derivation of the result that's in NeuroField :) The scaling originates from the third equation on that page, where the expectation value of the white noise process (i.e. its variance) is equated to the total power in the spectrum integrated over frequency (i.e. accounting for bandwidth)

@jamesmaclaurin
Copy link
Collaborator

jamesmaclaurin commented Jun 3, 2017 via email

@pausz pausz added optimize refactor core sci Identifies features that are of scientific interest. labels Apr 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core optimize refactor sci Identifies features that are of scientific interest.
Projects
None yet
Development

No branches or pull requests

4 participants