Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eprop evidence_accumulation discussion #3

Open
tnowotny opened this issue Nov 23, 2020 · 0 comments
Open

Eprop evidence_accumulation discussion #3

tnowotny opened this issue Nov 23, 2020 · 0 comments

Comments

@tnowotny
Copy link
Collaborator

In this recording of delta g updates:

"$(DeltaG) += (eFiltered * $(E_post)) + (($(FAvg) - $(FTargetTimestep)) * $(CReg) * e);\n"

Are you missing a factor 1/(T_trial*n_trial) or in other words 1/t_total, where t_total is the total time since last weight update? (see eqn (7) in supplement)?

The other thing I don't understand when basing my knowledge on the Nature Comms paper and supplementary is the additional decay of e_ij through efiltered here:

"eFiltered = (eFiltered * $(Alpha)) + e;\n"

Equation (23) in the main paper seems to suggest the only decays happens through ZFilter

I am not sure I understand your moving average of the firing rate here:

SET_POST_SPIKE_CODE("$(FAvg) += (1.0 - $(AlphaFAv));\n");

... why the (1-alpha) rather than 1? (probably makes no difference though)
More generally, it would probably be safer to just do the full average over the previous learning period and keep constant during each period as suggested originally in the supplementary note 2

I have the same questions for the ALIF eprop implementation regarding regularisation and additional filtering of e_ij here:

"eFiltered = (eFiltered * $(Alpha)) + e;\n"
"// Apply weight update\n"
"$(DeltaG) += (eFiltered * $(E_post)) + (($(FAvg) - $(FTargetTimestep)) * $(CReg) * e);\n"

If you do a moving average of spike rate for the regulariser, I wonder whether 500 ms here:

500.0); // Firing rate averaging time constant [ms]

is much too short as it might resolve the presumably strong differences in the different phases of the experiment. It should probably be a scale that encompasses at least one, if not several trials (of some 2000 to 3000 ms each) - so probably more like >> 10000?

I was unable to find information about initial weight matrices. You seem to use G(0, weight0/sqrt(n)) for the input-RSNN connections, G(0,weight0/sqrt(2n_RSNN)) and G(0,weight0/sqrt(2n_RSNN)) ... where are those from (or guessed?)

I think the maxdelay here should be 1500 according to the paper?

constexpr double maxDelayMs = 1000.0;

I think in the paper there is an initial 50ms settling time before the first cue comes in, while I think you start on timestep 0 with the first cue according to this:

for(unsigned int timestep = 0; timestep < trialTimesteps; timestep++) {
// Cue
if(timestep < cueTimesteps) {
// Figure out what timestep within the cue we're in
const unsigned int cueTimestep = timestep % (Parameters::cuePresentTimesteps + Parameters::cueDelayTimesteps);

(not sure whether it matters)

How confident are you that the Adam optimizer works exactly as it should. I was struggling a little to decipher & while it now seems structurally understandable to me I don't know enough about how it's supposed to work including "standard parameters". If that wasn't working just right, it could (to state the obvious) have fatal consequences.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant