Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: Enabling training on MCPE pulses #29

Open
wants to merge 29 commits into
base: master
Choose a base branch
from

Conversation

fschlueter
Copy link
Contributor

@fschlueter fschlueter commented Jun 4, 2024

With this PR one can use event-generator to train models on MCPE pulses and hence use the generative models for simulations.

In addition to the new feature, this PR introduces a small number of small improvements. For example:

  • Specification of the Label key in the config file
  • Specification of the source hypothesis parameter names in the config
  • Calculating pdf, cdf, and probability quantiles per dom

fschlueter and others added 29 commits April 9, 2024 08:50
@fschlueter
Copy link
Contributor Author

@mhuen I would be interested to know what is missing for potentially merging this. There are probably a few shortcoming in the current implementation. For example: I allow to read MCPE pulses from hdf5 files, the function which read data from i3 files (i.e. frames) are not compatible yet.

@@ -876,7 +868,7 @@ def train(
# --------------------------
# evaluate on validation set
# --------------------------
if step % opt_config["validation_frequency"] == 0:
if step % opt_config["validation_frequency"] == 0 and step: # not validate on the fist iteration
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually find this first point very interesting and important, since it provides the baseline of the untrained model

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have, by the way, also tweaked the data input pipeline a bit in one of the branches feeding into the CollectBreakingChanges. Some of the work is now distributed further to the worker nodes, such that it is more easily parallelized, which will speed up the data pipeline. If the reason for you to make this change was to reduce the setup time until training starts, you could use this newer branch and reduce the number of additional files that you require per event pool in the validation data iterator or increase the number of workers to match the number of files. That way you only need one iteration of file loading until the queue is populated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Okay I will revert this than.

file_or_frame, *args, **kwargs
)
file_or_frame, *args,
label_key=self.label_module.configuration.config["label_key"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to remain in the kwargs. Modules are not forced to specify/use a label_key. But I also have a slightly different implementation in the CollectBreakingChanges branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite sure that I fully understand why it hace to remain in kwargs.

@mhuen
Copy link
Collaborator

mhuen commented Jun 6, 2024

I am also in the process of some larger changes in event-generator, including some necessary breaking changes. This is currently being collected in the branch CollectBreakingChanges. The reason I mention this here is because some of these features are also added there, such as the variable label key name. The MCPE stuff may have some further reaching implications, as the pulse type may differ when applying the model. So one would have to think that one through and see if it would work down the line and/or if there are additional places in the code that need adjustment. I do see that you added it as a mutable setting, which is good, but there may be other necessary modifications.

What I have done so far when training on MCPEs, was simply to save the MCPE as reco pulses in hdf5 format. Which is also the workaround I chose for other datatypes that didn't have an icetray hdf converter setup yet. In any case, I am in the process of implementing a number of updates. And since these will necessarily break compatibility to older models, I am also cleaning up the code a little bit, and removing old parts that were added for compatibility. Perhaps it's best to merge these changes to that branch? Then one can also clean up some of the stuff, when not having to worry about backwards compatibility

@fschlueter
Copy link
Contributor Author

Just a note, we do successful train models on MCPEs. If you meant with changes "down the line" might be necessary to train them, this is working already (the question if the training can be as good as possible might be a different one).

@fschlueter
Copy link
Contributor Author

In the context of larger changes. So far it seems that the asymmetric gaussians are hardcoded in many places. I would like to test a triple pandel function instead. I think the time pdf model could be made flexible but it is a more inversive change I think. Were you ever planing of doing this?

@mhuen
Copy link
Collaborator

mhuen commented Jun 6, 2024

It's more about once the model is exported and applied later on. Say you want to test running it on reco pulses or so. Does that work?

@mhuen
Copy link
Collaborator

mhuen commented Jun 6, 2024

In the context of larger changes. So far it seems that the asymmetric gaussians are hardcoded in many places. I would like to test a triple pandel function instead. I think the time pdf model could be made flexible but it is a more inversive change I think. Were you ever planing of doing this?

I was thinking about testing this. But yeah, this would involve a bit of restructuring of the models. I think the rest of the code should be modular enough that it doesn't matter for them. But the current base source model and derived classes assume the asymmetric Gaussians

@mhuen
Copy link
Collaborator

mhuen commented Jun 6, 2024

I think the base class is actually fine. It's just the utility functions (pdf, cdf) that assume it. But these are only needed for debugging purposes. Training and application of the model itself uses the get_tensors method and all this requires is a result tensor named pulse_pdf to which the evaluated PDF for each pulse/pe is written to. So one should be able to test this fairly easily, by simply adding a derived class that utilizes a different mixture basis

@fschlueter
Copy link
Contributor Author

It's more about once the model is exported and applied later on. Say you want to test running it on reco pulses or so. Does that work?

What do you mean with running it on reco pulses? We have not done much with the models yet. Only started evaluating the prediction of the total charge against MC and photonics.

@fschlueter
Copy link
Contributor Author

I think the base class is actually fine. It's just the utility functions (pdf, cdf) that assume it. But these are only needed for debugging purposes. Training and application of the model itself uses the get_tensors method and all this requires is a result tensor named pulse_pdf to which the evaluated PDF for each pulse/pe is written to. So one should be able to test this fairly easily, by simply adding a derived class that utilizes a different mixture basis

I will leave in a week to Greenland. I probably wont be able to do something in this direction before but I can have a look afterwards.

@mhuen
Copy link
Collaborator

mhuen commented Jun 6, 2024

Do you remember if the triple pandel function had an analytic CDF? Or at least if it's easy to evaluate with existing tensorflow functions? (We need to have the gradients)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants