-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Particle.pdgid should return int #263
Comments
Thanks for bringing this up. I would be curious to see how you are using What you ask is not a totally trivial change. The immediate issue is that powerful searches such as
would no longer be possible with such a simple one-line selection function, and that's really bad IMO as that's the kind of area where the package shines. Is that really such a trouble to do @henryiii, what's your immediate reaction? |
My immediate reaction is a) it's easier to convert to int than it is to find and convert to PDGID, b) PDGID provides extra functionality, and duck types like an int (causing non-Numba users to be only inconvenienced by the change), and c) why not just teach Numba that a PDGID is an LLVM int? We could probably even add that to Particle pretty easily. |
Also, |
Yes, that's right. |
Context: In my Numba combined function, I need to look up the mass of particles based on their PID and I need to look up how many strange quarks it has. My workaround is to generate some Numba It is a weird design decision to make It is weird that Perhaps you heard somewhere that composition is better than inheritance, but these statements are never generally true. You used composition here, but that choice is not natural. If |
To stress this again, if |
More surprises which are a consequence of |
This was partially a historical result, rather than a conscious design. @eduardo-rodrigues mostly wrote PDGID, and I mostly wrote Particle. PDGID did not exist when Particle was designed, so it would be quite odd for Particle to inherit from it. A 1.0 redesign could possibly include larger changes, and possibly include moving the particle search functionality out of Particle and into a Particles/ParticleTable class (have not thought about it at this point, maybe @eduardo-rodrigues has). Another option would be to reduce the correspondence between PDGID's and Particles, and instead treat
The defense of the current design would be that the intent of Again, the real problem is that Numba is picky, and doesn't treat subclasses of ints as ints like normal Python does. But we could probably easily teach it otherwise (maybe even teaching it a few details about PDGIDs, in fact) if there's a strong enough desire for it. PS: @eduardo-rodrigues , if you are interested in enabling all PDGID's methods on particle, but without turning Particles also into ints and causing |
By they way, snippets or pointers to Numba code would be very interesting; it might help us decide how to support Numba more natively. |
PS: Second reminder: you can also make Particles that do not have a valid PDGID by specifying all the properties. |
Ok, I get why Particle does not derive from PDGID, but then I find it weird to allow |
Some pseudo-code of what would be useful to do in Numba (my actual code is more complex): @nb.njit
def get_mass(array_of_pids):
masses = np.empty(len(array_of_pids))
for i, pid in enumerate(arrays_of_pids):
m = Particle.from_pdgid(pid).mass
masses[i] = m if m is not None else np.nan
return masses |
Thanks for the reminder and apologies for forgetting to follow-up this thread. I will do so this week. Yes, agreed that it's weird. I can't remember why that was introduced but I would very much like to remove |
As you say @HDembinski , A lot has been said already about inheritance of I can see that inheritance would give us some functionality for free, such as the Agreed that You mentioned inheritance and composition. I see your point. But the design was to have also a One could also put forward to actually have Now, this discussion is raising the fact that methods such as Yes @henryiii, the idea of a particle table class came to my mind already, and precisely because of the behaviour discussed above. This could come in the future, sure, and I think we should give the idea a proper thought ... |
Yes, I think being able to do |
Agree. Am about to commit my PR to remove |
@HDembinski, on your numba example below: for practical purposes would you not get a nice speedup using a case such as from cachetools import cached, LFUCache @cacher This because you won't in practice be dealing with more than 64 (maybe even less?) particle IDs the bulk of the time. |
If you are Python3 only, then that tool is built in to Python, actually: import functools
@functools.lrucache
def pdgid_to_mass(pidid):
... https://docs.python.org/3/library/functools.html#functools.lru_cache Anyway, Edit: I'm not sure about that, though, since we moved to a list of objects system. Maybe it would be better to add "soon" to the final sentence. |
I find it strange that something like |
@eduardo-rodrigues Can I make another push to make PDGID a subclass of int? For all purposes, it is an int with extra methods and context. When two classes have an "is-a" relationship, then one should use inheritance and not composition (Meyers, "Effective C++"). Right now, I cannot pass a PDGID to a numba jitted function, because it is not recognized by numba as an int. If you make PDGID a subclass of int, this works. import numba as nb
class Foo(int): pass
@nb.njit
def f(x):
return x
f(1) # ok
f(Foo(1)) # ok |
This would be solved by making Particle a subclass of PDGID, like I suggested. I still maintain that this is the most natural design. A particle is uniquely defined by its PDGID, therefore they have an "is-a" relationship. |
Does that return int or Foo for the second case? |
It returns an int, but I don't care about returning PDGIDs. I use the pdgid inside the jitted function to make comparisons with input arrays that contain pdg ids. I select those elements of the input arrays that match the pdgid. My workaround right now is something like this: kaon_pid = int(particle.literals.k_plus.pdgid) # awkwardly complex
@nb.njit
def select(px, py, pz, pid, mypid):
r = []
for pxi, pyi, pzi, pidi in zip(px, py, pz, pid):
if pidi == mypid:
# do stuff with pxi, pyi, pzi; fill r
return np.array(r)
px, py, pz = # some arrays from uproot
r = select(px, py, pz, pid, kaon_pid) With both proposed changes (Particle inherits from PDGID, PDGID inherits from int), the awkward line If you don't follow the suggestion that Particle inherits from PDGID, the code still becomes simpler because I don't have to do the int conversion, which is something that my students always stumble over. I really wish we would have had this discussion during the design phase of these classes. It is hard to change design later, you have to get it right at the beginning. |
Thanks for the comment. Yes and no, in fact. Yes charge and alike are a property of a particle, but they are also totally a property of a PDG ID since the latter encode with a precise system/convention charge and a bunch of other internal quantum numbers. It's been on my to-do list to dynamically populate the Particle class with properties such as the ones you mention. I will try and get to this asap but time is often not too much on my side (you may have seen several activities from me here and in DecayLanguage recently, though). |
That's fair and important to have such feedback, @HDembinski. I need to think this over towards a 1.0 version. I also welcome Henry's input. As just said, I will try and deal with this and outstanding issues in the near future, as it's dragging. |
Yes, this is why the class Particle should inherit from the class PDGID. If you "dynamically populate the Particle class with properties" from PDGID, you just make things more complicated. What is the problem with inheriting from PDGID? |
To be fair, there was a method doing exactly this in the original particle; it was called The point of Please do not add anything dynamically. This breaks static type checkers and IDEs; a method that's not statically available is nearly useless, IMO. Having good static typing is more important that docs for some users like me, and Particle doesn't really have any docs. ;) If you do want to combine the methods, you can use a mixin, but this also gives particle a really large number of methods, while it's a bit better organized if everything computable from the ID system lives on PDGID rather than in two places. |
These were developed separately (I wrote Particle, Eduardo wrote PDGID). Though, I do see it as a separation of concerns. PDGID understands how to interpret the PDGID numbering scheme. Particle holds particle data, which includes a PDGID number (usually) along with other values.
Currently we always have a PDGID, but with other schemes available, I'm not sure that's truly a requirement - I know we wanted to be open to other schemes. Do we have to always have a PDGID? If it inherits, it's a hard requirement. Also, that makes Particle an int, forcing things (like Maybe that's okay, but it's much less controlled and compartmentalized than what we have now. We could also add just a collection of values with a mixin. We pick methods we want to share and mixin both PDGID and Particle. I'd make the point of sharing |
Is a Particle an int? I think no, it is not an int. There is an int associated with it, yes, but it is not an int. You can have two Particles with the same PDGID. It is composed with an int (and other things). I think we were even keeping things open for a Particle created without a PDGID. Is a PDGID an int? Yes, it is. It has some extra methods, but it is an int, converting to and from int is lossless. You cannot have two PDGIDs with the same int value. So, IMO, composition is the exactly the correct design for Particle, and inheritance for PDGID. |
Thanks, Henry, for the comments. I gave this some more thought and I will backpaddle on the point that PDGID should inherit from int. The main issue is that PDGID would inherit a lot of arithmetic operators that make little sense ( I still think that Particle should inherit from PDGID and moreover, Particle.pdgid should return an int and not the class PDGID. We only use this awkward construction right now because Particle does not inherit from PDGID. It makes sense that Particle has all the methods that PDGID has, because a Particle is essentially a PDGID with extra information from a different database. I still maintain there is an "is-a" relationship between them. Supporting other codes does not change this, there always has to be a unique mapping between any other system and the PDG system. I cannot follow the argument that they should not inherit because of separation of concern. The two classes already have well defined separate roles, inheritance does not change that. Particle simply offers a strict superset of the features of PDGID and therefore can inherit the methods from PDGID. A PDGID should be explicitly convertible to an int, which it is, since I am also glad that you now agree with me that static typing is good, a few years ago your opinion was still different. |
Nitpick: a class that is losslessly transformable to https://docs.python.org/3/reference/datamodel.html#object.__index__ |
Ok fair point, the difference between On second thought, I am not sure we want a PDGID to work in a slice, |
Hello Hans, Henry, Time to finalise on some long-standing issues. This is one of them, that talks about design choices. We had several batches of exchanges. I have the impression that one has to move forward in steps, and for example do not try and solve what is here and #277 at the same time. It also seems to me that we have divergent opinions on what the Particle class should inherit from, etc. For example, I still cannot see why Particle would effectively inherit from PDGID at which point I could do I had suggested at some point that a pragmatic way forward would be to have Particle.pdgid -> int This actually does not seem crazy and would solve the issue you raise; I do note it does not solve the design disagreements. There's quite some few bits of class-internal changes to make for this, but all pretty trivial I believe. If you both find that an improvement I will get it out soon. With such a change above there are consequences since for example
would no longer work unless one puts PDGID instead of pdgid. Now, if I make p.has_bottom a valid method - users have been asking about it anyway, also for p.is_lepton instead of p.pdgid.is_lepton, etc. - then we're good and one will simply be able to do
All in all another improvement. This will imply in the end a relatively minor change of API for the concrete type returned by Let me know how you feel about this. I have the impression this could be "the best of both worlds", at least pragmatically. Many thanks. Henry, you had say that
We can talk "offline" about that, or directly via the MR. |
To make a class which derives from int but prevents arithmetic, you have to overload all unwanted operators. Ugh. class MyInt(int):
def __new__(cls, *args):
return int.__new__(cls, *args)
def __add__(self, other):
raise RuntimeError
MyInt(1) + MyInt(1)
# RuntimeError Maybe it is better to not derive from From a purist point of view, arithmetic should not be allowed for PDGID, but in practice I find it hard to imagine where this would lead to a bug. I think Particle should inherit from PDGID, because the ID system of the particle data group is the established standard, and since each Particle has an associated unique PDGID (vice versa it is not the case, but that's ok, Particle inherits from PDGID and not vice versa). Particle.pdgid -> int
Particle.PDGID -> PDGID This is very unpythonic in my view and very surprising. Please don't do this. It goes against "There should be one - and preferably only one - obvious way to do it".
The motivation for letting Particle inherit from PDGID is to precisely make Is your main opposition that inheritance makes |
I find this inheritance structure surprising. If we tightly couple Maybe let's ask the question what the benefit is of having a |
Originally, PDGID was developed by Henry and Particle by Eduardo. The idea of PDGID was to provide all information that can be extracted directly from the numerical value of the PDGID, without a lookup into a database. You cannot get the particle mass or name from PDGID, but you can get other info like I think this makes sense and is a sensible split. Of course you can make a mega-class by just concatenating the contents of PDGID and Particle, but having them separate is better for maintenance, as it allows for clear separation of concern. My preferred solution is for Particle to inherit PDGID and to have a property |
IMO, Particle is not a PDGID, and having a massive number of methods on a class is daunting. The current organization, We now support more ID systems than PDGID like Geant3ID, PythiaID, and Corsika7, it's possible we might want to be able to create particles without a PDGID. This flexibility was part of the original design and would be destroyed by inheriting from PDGID. A subclass of int fulfills the role of int, the only reason you can see a difference is due to Numba, which is their problem, not ours. This is a tenant of subclassing - the subclass behaves like the parent. So the issue here "Particle.pdgid should return an int" is already fulfilled. It does. It happens to be a subclass, but it absolutely is an int. My preferred solution would be to teach numba that PDGID is an int and then close this issue. |
Indeed having the PDGID class is absolutely necessary and a no-brainer. It's the thing that encodes lots of particle quantum number information. And it is also necessary as we provide other ID systems as said above, together with converters between them. We agreed at least on not having PDGID inherit from int and I have a MR essentially ready for that. I had to implement some functions by hand, to be able to make some relevant and useful comparisons, but that's trivial. I will create the MR later today ... |
I reject that argument. Having a massive number of methods is fine and warranted, if those methods make sense to have on the class. It makes sense to have
That's a detail that is not relevant to end users.
That's bad and not a solution to my issues with the current API.
All these should be mappable to a PDGID. If you want to create particles without a PDGID (for what purpose, actually?), you can set it to 0, which is an invalid value. I don't see that as an obstacle. Note that we need the ID mapping anyway to support methods like
We should definitely do that, but closing the discussion is premature, because it has evolved a bit away from that particular issue. We are discussing other issues here which are not solved by teaching numba that PDGID is an int. |
One more point on the "Particle is an int" part, it seems – I didn't know this before – pdgids are not unique, e.g. the proton seems to have two representations:
But:
Am I missing something? Are these indeed different particles? |
You got it exactly right, the PDGIDs are non-unique for the proton and the neutron (I think these are the only special cases?). We have 2212 for the Proton as a bag of quarks, but the proton is also the nucleus of the hydrogen atom. For nuclei, we have this second naming scheme, which uses the number of nucleons and the charge. The neutron is not a nucleus, but it can also be represented with a nucleus code and some programs use it, as far as I remember. I think p1 and p2 should compare equal in your example, since both particles are logically the same. We should fix this in the PDGID class. Such an overload of the equality operator can also be achieved also for a class that derives from int. |
I think the same is true for the neutron:
and their anti-particles. |
Is that also the reason why |
Hello. trying to find some bandwidth the reply to this thread and other ones ... This and the other issues you posted, such as #620, are all linked. Indeed PDG IDs are unique, as they should, but we then got the feature when I added nuclei info, that the proton and the neutron have their standard "particle" number and "nuclei" number as well, as Hans says. These are the only such cases.
Yep, If anyone is happy to help (I'm particularly busy for the next 2 weeks), that would be great.
Yep. But there is a little bit more to that one. I will try and reply on the other issues very soon. Thank you for stress-testing :-). |
Context: I use Numba a lot, so it would be great if
particle
would work with Numba directly, but I guess that this is probably a more elaborate project. In the meantime as a workaround, I extract information from particle and store that in Python lists and dicts, which I then pass to the numba-compiled functions.The attribute
Particle.pdgid
is of typeparticle.pdgid.pdgid.PDGID
instead ofint
, which leads to unexpected issues. I argue it would be better to return a plainint
. The typeparticle.pdgid.pdgid.PDGID
emulates a plain int, so you won't notice the difference much in pure Python, but it is a different type nonetheless and Numba does not understand this type. If I prepare a list or dict of PDG IDs to pass to Numba, I need to callint(Particle.pdgid)
explicitly, which is doable but irritating.I argue that there is no benefit of
Particle.pdgid
beingparticle.pdgid.pdgid.PDGID
here, because the extra information that can be queried fromparticle.pdgid.pdgid.PDGID
is can already be queried fromParticle
.The text was updated successfully, but these errors were encountered: