Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feat] Meas class for gaussian error propagation. #245

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

fjosw
Copy link
Owner

@fjosw fjosw commented Nov 17, 2024

I sometimes want to do Gaussian error propagation and find the syntax for creating a scalar cov_Obs a bit inconvenient. For this reason I propose a Meas class that simplifies this.

import pyerrors as pe
value = 1.1
standard_error = 0.23
meas = pe.Meas(value, standard_error)
covo = pe.cov_Obs(value, standard_error ** 2, meas.names[0])
assert covo == meas

Standard errors do not need to be squared and if no name is provided, a uuid is generated instead. I also propose to automatically evaluate the gamma method when an Obs exclusively consists of cov_Obs / Meas as the error estimation should be unambigous and the performance oberhead should be minimal.

In the process I also found a small bug in the cov_Obs implementation.

@fjosw fjosw requested a review from s-kuberski November 17, 2024 11:28
raise TypeError(f"dvalue has to be a float or int, not {type(dvalue)}")
super().__init__(samples=[], names=[], means=[])
if name is None:
name = uuid.uuid4().hex
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @fjosw,
the class looks good to me. Thanks for the idea and addition. I just wanted to ask if this is something that we want. I do not mind it, and if there is a specific use case for you, we can, of course, leave it in.

@s-kuberski
Copy link
Collaborator

Hi,

thanks for the suggestion! What was the bug in the existing implementation? The missing 'float()'?

In general, I think that it is handy to have a wrapper that makes ones life easier. However, I ask myself if assigning automatically chosen random names to observables is something that one should do. I know that we already do it for priors in fits and there it would be very hard to circumvent it. Here, there could be the possibility that people introduce their external observable with the new class and destroy their error propagation because they add the corresponding uncertainty multiple times. What exactly is the use case you were aiming at?

It is certainly a very good idea to perform the error propagation automatically if no MC data is present!

@fjosw
Copy link
Owner Author

fjosw commented Nov 18, 2024

Hi @jkuhl-uni & @s-kuberski ,

What was the bug in the existing implementation? The missing 'float()'?

Yes, we currently compute the gradients for __pow__ via autograd which cannot cope with int, so I had to add the float cast to make it work. We could actually provide the gradients for pow directly which should be faster. I can prepare a PR for that.

if this is something that we want
I ask myself if assigning automatically chosen random names to observables is something that one should do.

My underlying assumption for the random uuid is that, unlike Monte Carlo data, constants with errors are usually statistically independent of all other sources of error. If that is the case it might be easier to automatically assign a random string instead of having the user come up with unique identifiers. The option to assign a dedicated name of course still exists.

By adding the Meas class we could slightly increase the scope of pyerrors as a tool for Gaussian error propagation which feels a bit cumbersome with the current UI. I was considering adding another example to detail this feature but if you feel like it is too dangerous to blindly assign a random name, I'm happy to remove it or change the default.

@s-kuberski
Copy link
Collaborator

We could actually provide the gradients for pow directly which should be faster. I can prepare a PR for that.

That sounds good!

By adding the Meas class we could slightly increase the scope of pyerrors as a tool for Gaussian error propagation which feels a bit cumbersome with the current UI.

I agree that the current UI is not perfectly suited for quick error propagation, because I chose to ask for the covariance and because we require a name. If one would like to allow users to easily perform similar tasks as they would do in the 'uncertainties' package, the proposed change would be the perfect solution, I think.

My underlying assumption for the random uuid is that, unlike Monte Carlo data, constants with errors are usually statistically independent of all other sources of error. If that is the case it might be easier to automatically assign a random string instead of having the user come up with unique identifiers. The option to assign a dedicated name of course still exists.

As soon as someone runs a script twice and saves the output somewhere to process it afterwards, I would think that a dedicated naming scheme is very important to ensure proper error estimation. If someone is not aware of this, it could easily happen that they input, say, the physical pion mass using the new class and end up with uncorrelated contributions from the same observable. I'd think that names really matter as soon as one does a proper analysis.

So probably it boils down to anticipating what users do with the package that would still provide all the means to do everything properly and the question if not asking for a name per default might make it easier to do mistakes. I'm not really sure about how to proceed.

@fjosw
Copy link
Owner Author

fjosw commented Nov 20, 2024

As soon as someone runs a script twice and saves the output somewhere to process it afterwards, I would think that a dedicated naming scheme is very important to ensure proper error estimation.

If your main concern is about saving and reusing objects with automatically generated names, we could simply prevent such Obs to be exported. I added an explicit check to the json module which throws a ValueError in these cases. Do you feel like this is safe enough or are there still other silent failure modes?

@s-kuberski
Copy link
Collaborator

I'm still not fully convinced, but I don't want to block it, if you'd like to have the feature in. I still fear that people create the same object twice with different uuid in their analysis, when they are not as good organized as you ;). Basically, I think that it is somewhat bad style if the final object that comes out of the analysis contains a contribution that is not clearly identifiable. It goes a bit against one of the basic ideas... So I would think that the feature would only be good for some back of the envelope error propagation, not for real world analyses.

In case we add the class, I would think that it would be good to expand the docsctring, clearly stating the restrictions of the class and pointing to the covobs class for "proper" error propagation. Also, I was thinking about the name. Why did you choose "scalar measurement"? Somehow, I would relate a measurement to something that comes out of a MC chain. I don't have a good suggestion for an alternative, though.

@s-kuberski
Copy link
Collaborator

Maybe we could make the name non-optional and still use an uuid if None or an empty string is passed? This would at least prompt users to think about names. Of course, it again adds a bit more luggage to the easy creation of these objects. At least, one would not have to come up with names, if one just wants to do a bit of error propagation where everything can be assumed to be uncorrelated.

@jkuhl-uni
Copy link
Collaborator

I also thought about runtime warnings like
Warning: No name given, using UUID.
However, I think those would be pretty ugly and not helpful since a) the class is used as intended and a warning would still be issued and b) they could vanish in too many notifications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants