Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Particle.pdgid should return int #263

Open
HDembinski opened this issue Aug 27, 2020 · 44 comments
Open

Particle.pdgid should return int #263

HDembinski opened this issue Aug 27, 2020 · 44 comments
Labels
⚙️ enhancement New feature or request

Comments

@HDembinski
Copy link
Member

HDembinski commented Aug 27, 2020

Context: I use Numba a lot, so it would be great if particle would work with Numba directly, but I guess that this is probably a more elaborate project. In the meantime as a workaround, I extract information from particle and store that in Python lists and dicts, which I then pass to the numba-compiled functions.

The attribute Particle.pdgid is of type particle.pdgid.pdgid.PDGID instead of int, which leads to unexpected issues. I argue it would be better to return a plain int. The type particle.pdgid.pdgid.PDGID emulates a plain int, so you won't notice the difference much in pure Python, but it is a different type nonetheless and Numba does not understand this type. If I prepare a list or dict of PDG IDs to pass to Numba, I need to call int(Particle.pdgid) explicitly, which is doable but irritating.

I argue that there is no benefit of Particle.pdgid being particle.pdgid.pdgid.PDGID here, because the extra information that can be queried from particle.pdgid.pdgid.PDGID is can already be queried from Particle.

@eduardo-rodrigues
Copy link
Member

Thanks for bringing this up. I would be curious to see how you are using Particle with Numba, and what are issues part from this PDGID to ìnt` conversion, at least for my own education.

What you ask is not a totally trivial change. The immediate issue is that powerful searches such as

>>> # Find all neutral beauty hadrons
>>> Particle.findall(lambda p: p.pdgid.has_bottom and p.charge==0)

would no longer be possible with such a simple one-line selection function, and that's really bad IMO as that's the kind of area where the package shines.

Is that really such a trouble to do [int(p.pdgid) for p in Particle.findall(...)] at the top of a chunk of work?

@henryiii, what's your immediate reaction?

@henryiii
Copy link
Member

My immediate reaction is a) it's easier to convert to int than it is to find and convert to PDGID, b) PDGID provides extra functionality, and duck types like an int (causing non-Numba users to be only inconvenienced by the change), and c) why not just teach Numba that a PDGID is an LLVM int? We could probably even add that to Particle pretty easily.

@henryiii
Copy link
Member

Also, int(p) is a good way to get an integer particle number, IIRC, and is one less character than p.pdgid.

@eduardo-rodrigues
Copy link
Member

Also, int(p) is a good way to get an integer particle number, IIRC, and is one less character than p.pdgid.

Yes, that's right.

@HDembinski
Copy link
Member Author

HDembinski commented Aug 28, 2020

Context: In my Numba combined function, I need to look up the mass of particles based on their PID and I need to look up how many strange quarks it has. My workaround is to generate some Numba Dicts outside the Numba function and fill those with the information, then I pass those Dicts to Numba functions. Sure, one can do this, but I noticed then and there, that particle is not designed to process large arrays of PIDs, like "take an array of PIDs and give me a new array with all the masses".

It is a weird design decision to make Particle not derive from PDGID, but allow int(Particle(...)), effectively treating Particle like a PDGID. If Particle is effectively a PDGID, it should offer the full superset of the interface of PDGID, in other words, it should derive from PDGID. If there is an obvious "is-a" relationship between Particle and PDGID (I think there is), then inheritance is the right tool. See e.g. Meyers Effective C++.

It is weird that .pdgid is not returning a plain int, because a PID is fundamentally a plain int. When I call a data attribute of a class that is to be used in numerical code, I expect it to return plain data types unless there is a good reason not to.

Perhaps you heard somewhere that composition is better than inheritance, but these statements are never generally true. You used composition here, but that choice is not natural. If Particle did derive from PDGID, it wouldn't have to offer access to PDGID via an attribute and we would have this discussion.

@HDembinski
Copy link
Member Author

To stress this again, if Particle does not derive from PDGID, then it is very unnatural that int(Particle(...)) works, since Particle according to your design "is not" a PDGID.

@HDembinski
Copy link
Member Author

More surprises which are a consequence of Particle not inheriting from PDGID: abs(Particle(...)) fails, but abs(PDGID(...)) works.

@henryiii
Copy link
Member

henryiii commented Aug 29, 2020

Perhaps you heard somewhere that composition is better than inheritance

This was partially a historical result, rather than a conscious design. @eduardo-rodrigues mostly wrote PDGID, and I mostly wrote Particle. PDGID did not exist when Particle was designed, so it would be quite odd for Particle to inherit from it.

A 1.0 redesign could possibly include larger changes, and possibly include moving the particle search functionality out of Particle and into a Particles/ParticleTable class (have not thought about it at this point, maybe @eduardo-rodrigues has). Another option would be to reduce the correspondence between PDGID's and Particles, and instead treat pdgid as one of many possible representations, removing __int__().

PS: I really don't like APIs like "make a singleton-like class then use it" for a library supporting beginners, but we might be able to provide beginner friendly options and a "default" particle table.

The defense of the current design would be that the intent of int(Particle(...)) is clear - you are asking for a int representation of the current particle, and that by far most likely to be a PDG ID int (see option two, though, which might remove this). But a particle is not an int; it just has a explicit conversion to an int. abs(Particle(...)) fails (or should return a Particle, possibly?). Particle(...) + 1 has no meaning, since not all particles exist. PDGID() + 1 does, since it's just an int representing the ID of a particle. (PDGID(...) + 1).has_charm is a perfectly valid question. PDGID is valid for all possible ints, while Particle is a lookup-based object; the really do not have anything in common. Particle is not supposed to duck type as an int, while PDGID does (since it is an int). While PDGID is an int, but also has further information beyond an int.

Again, the real problem is that Numba is picky, and doesn't treat subclasses of ints as ints like normal Python does. But we could probably easily teach it otherwise (maybe even teaching it a few details about PDGIDs, in fact) if there's a strong enough desire for it.

PS: @eduardo-rodrigues , if you are interested in enabling all PDGID's methods on particle, but without turning Particles also into ints and causing Particle() + 1 to "work" but make an invalid particle, we can explore this; I have a simple idea for how this could be done.

@henryiii
Copy link
Member

By they way, snippets or pointers to Numba code would be very interesting; it might help us decide how to support Numba more natively.

@henryiii
Copy link
Member

henryiii commented Aug 29, 2020

PS: int(p) produces an int, and PDGID(p) produces a PDGID. p.pdgid is just a shortcut for PDGID(p).

Second reminder: you can also make Particles that do not have a valid PDGID by specifying all the properties. int(p) may fail.

@HDembinski
Copy link
Member Author

Ok, I get why Particle does not derive from PDGID, but then I find it weird to allow int(Particle(...)). You want to teach people to use int(Particle(...).pdgid) then. Allowing int conversion for both PDGID and Particles smears this important distinction between the two.

@HDembinski
Copy link
Member Author

Some pseudo-code of what would be useful to do in Numba (my actual code is more complex):

@nb.njit
def get_mass(array_of_pids):
    masses = np.empty(len(array_of_pids))
    for i, pid in enumerate(arrays_of_pids):
          m = Particle.from_pdgid(pid).mass
          masses[i] = m if m is not None else np.nan
    return masses

@eduardo-rodrigues
Copy link
Member

Ok, I get why Particle does not derive from PDGID, but then I find it weird to allow int(Particle(...)). You want to teach people to use int(Particle(...).pdgid) then. Allowing int conversion for both PDGID and Particles smears this important distinction between the two.

Thanks for the reminder and apologies for forgetting to follow-up this thread. I will do so this week.

Yes, agreed that it's weird. I can't remember why that was introduced but I would very much like to remove Particle.__init__ asap. I'm sure I will give more reasoning when properly replying.

@eduardo-rodrigues
Copy link
Member

As you say @HDembinski ,
"""
... particle is not designed to process large arrays of PIDs ...",like "take an array of PIDs and give me a new array with all the masses"
""".
But the important bit here is what you mean by large. If you are going to make this ID-mass match for tens of thousands or millions of entries in some ROOT file or alike, then the for-loop is going to be "slow". This being said, you would rather wrap things in a very simple cache function to avoid checking pion ID-mass matches and other very commonly found particles, for the sake of example.
Unless I'm misunderstanding your comment.

A lot has been said already about inheritance of Particle from PDGID. Yes, a particle is uniquely identified in particular in MC programs by its PDG ID. But a particle is not an int number that cleverly encodes various properties for the particle it relates to. A particle is quite a different beast - not talking here in terms of code.

I can see that inheritance would give us some functionality for free, such as the is_XXX properties of a PDG ID. Still, I very much prefer to keep the classes uninherited.

Agreed that Particle.__init__ can look rather awkward and I will just now create a PR to remove this. I don't think it will have an impact on users.

You mentioned inheritance and composition. I see your point. But the design was to have also a PDGID class separate from Particle, precisely because these can be different things. @henryiii has already dwell into that.

One could also put forward to actually have
Particle.pdgid -> int
Particle.to_pdgid -> PDGID
Would that be crazy or rather useful?

Now, this discussion is raising the fact that methods such as Particle.findall(...) return a list of Particle instances
whereas many other properties are simply for a Particle instance. In that respect the Particle class does two things.
Almost any user is happy with the functionality but we could create a ParticleTable (or alike) class in the future, as @henryiii suggests. This is a separate discussion, though.

Yes @henryiii, the idea of a particle table class came to my mind already, and precisely because of the behaviour discussed above. This could come in the future, sure, and I think we should give the idea a proper thought ...
BTW, I would actually like to release version 1.0.0 very soon, as the API has been stable to > 95% C.L. for a while.
What do you think?

@eduardo-rodrigues
Copy link
Member

PS: @eduardo-rodrigues , if you are interested in enabling all PDGID's methods on particle, but without turning Particles also into ints and causing Particle() + 1 to "work" but make an invalid particle, we can explore this; I have a simple idea for how this could be done.

Yes, I think being able to do Particle.is_meson makes sense. Let's discuss that implementation in a separate thread/issue/PR, OK?

@eduardo-rodrigues
Copy link
Member

Ok, I get why Particle does not derive from PDGID, but then I find it weird to allow int(Particle(...)). You want to teach people to use int(Particle(...).pdgid) then. Allowing int conversion for both PDGID and Particles smears this important distinction between the two.

Agree. Am about to commit my PR to remove Particle.__int__.

@eduardo-rodrigues
Copy link
Member

@HDembinski, on your numba example below: for practical purposes would you not get a nice speedup using a case such as

from cachetools import cached, LFUCache
cacher = cached(cache=LFUCache(maxsize=64))

@cacher
def pdgid_to_mass(pdgid):
# ...
return mass

This because you won't in practice be dealing with more than 64 (maybe even less?) particle IDs the bulk of the time.

@henryiii
Copy link
Member

henryiii commented Sep 29, 2020

If you are Python3 only, then that tool is built in to Python, actually:

import functools

@functools.lrucache
def pdgid_to_mass(pidid):
    ...

https://docs.python.org/3/library/functools.html#functools.lru_cache

Anyway, Particle.from_pdgid will be tricky to do inside of Numba, since it is doing a table lookup. For now, I assume the best method would be to make a dict of pdgid:mass and give that to the function, since you can pass a dict in. In the (not too soon, not too distant) future, we should be able to teach Numba more about PDGIDs, but converting to mass (that is, using Particle in Numba) is unlikely to happen.

Edit: I'm not sure about that, though, since we moved to a list of objects system. Maybe it would be better to add "soon" to the final sentence.

@maxnoe
Copy link
Contributor

maxnoe commented Nov 2, 2022

>>> # Find all neutral beauty hadrons
>>> Particle.findall(lambda p: p.pdgid.has_bottom and p.charge==0)

I find it strange that something like has_bottom or charge lives on Particle.pdgid and not on Particle itself. These are properties of the particle, not of the ID system.

@HDembinski
Copy link
Member Author

HDembinski commented Nov 3, 2022

@eduardo-rodrigues Can I make another push to make PDGID a subclass of int? For all purposes, it is an int with extra methods and context. When two classes have an "is-a" relationship, then one should use inheritance and not composition (Meyers, "Effective C++"). Right now, I cannot pass a PDGID to a numba jitted function, because it is not recognized by numba as an int. If you make PDGID a subclass of int, this works.

import numba as nb

class Foo(int): pass

@nb.njit
def f(x):
     return x

f(1) # ok
f(Foo(1)) # ok

@HDembinski
Copy link
Member Author

HDembinski commented Nov 3, 2022

I find it strange that something like has_bottom or charge lives on Particle.pdgid and not on Particle itself. These are properties of the particle, not of the ID system.

This would be solved by making Particle a subclass of PDGID, like I suggested. I still maintain that this is the most natural design. A particle is uniquely defined by its PDGID, therefore they have an "is-a" relationship.

@maxnoe
Copy link
Contributor

maxnoe commented Nov 3, 2022

Does that return int or Foo for the second case?

@HDembinski
Copy link
Member Author

HDembinski commented Nov 3, 2022

Does that return int or Foo for the second case?

It returns an int, but I don't care about returning PDGIDs. I use the pdgid inside the jitted function to make comparisons with input arrays that contain pdg ids. I select those elements of the input arrays that match the pdgid.

My workaround right now is something like this:

kaon_pid = int(particle.literals.k_plus.pdgid)  # awkwardly complex

@nb.njit
def select(px, py, pz, pid, mypid):
    r = []
    for pxi, pyi, pzi, pidi in zip(px, py, pz, pid):
          if pidi == mypid:
               # do stuff with pxi, pyi, pzi; fill r
    return np.array(r)

px, py, pz = # some arrays from uproot
r = select(px, py, pz, pid, kaon_pid)

With both proposed changes (Particle inherits from PDGID, PDGID inherits from int), the awkward line
kaon_pid = int(particle.literals.k_plus.pdgid)
simplifies to
kaon_pid = particle.literals.k_plus
which makes this so short that I can just pass it directly to select
r = select(px, py, pz, particle.literals.k_plus)
which makes this code very readable.

If you don't follow the suggestion that Particle inherits from PDGID, the code still becomes simpler because I don't have to do the int conversion, which is something that my students always stumble over.
kaon_pid = particle.literals.k_plus.pdgid

I really wish we would have had this discussion during the design phase of these classes. It is hard to change design later, you have to get it right at the beginning.

@eduardo-rodrigues
Copy link
Member

>>> # Find all neutral beauty hadrons
>>> Particle.findall(lambda p: p.pdgid.has_bottom and p.charge==0)

I find it strange that something like has_bottom or charge lives on Particle.pdgid and not on Particle itself. These are properties of the particle, not of the ID system.

Thanks for the comment. Yes and no, in fact. Yes charge and alike are a property of a particle, but they are also totally a property of a PDG ID since the latter encode with a precise system/convention charge and a bunch of other internal quantum numbers.

It's been on my to-do list to dynamically populate the Particle class with properties such as the ones you mention. I will try and get to this asap but time is often not too much on my side (you may have seen several activities from me here and in DecayLanguage recently, though).

@eduardo-rodrigues
Copy link
Member

That's fair and important to have such feedback, @HDembinski. I need to think this over towards a 1.0 version. I also welcome Henry's input. As just said, I will try and deal with this and outstanding issues in the near future, as it's dragging.

@HDembinski
Copy link
Member Author

Thanks for the comment. Yes and no, in fact. Yes charge and alike are a property of a particle, but they are also totally a property of a PDG ID since the latter encode with a precise system/convention charge and a bunch of other internal quantum numbers.

Yes, this is why the class Particle should inherit from the class PDGID. If you "dynamically populate the Particle class with properties" from PDGID, you just make things more complicated. What is the problem with inheriting from PDGID?

@henryiii
Copy link
Member

To be fair, there was a method doing exactly this in the original particle; it was called __int__. This method would have solved the numba issue, but it was removed, partially because it looked like __init__, and partially because the difference between __int__ (lossy conversion to int) and __index__ (lossless conversion to int) was unclear (to be fair, it was only fully clarified from the CPython side in Python 3.8+).

The point of pdgid is to return a PDGID so that users can query it, etc. The point of __int__ was to get the PDGID as an int. For any use other than Numba, if we provide a PDGID class (that's a subclass of int) and .pdgid doesn't return that class, that would be terrible. Besides also not being backward compatible with the correct usage today.

Please do not add anything dynamically. This breaks static type checkers and IDEs; a method that's not statically available is nearly useless, IMO. Having good static typing is more important that docs for some users like me, and Particle doesn't really have any docs. ;) If you do want to combine the methods, you can use a mixin, but this also gives particle a really large number of methods, while it's a bit better organized if everything computable from the ID system lives on PDGID rather than in two places.

@henryiii
Copy link
Member

What is the problem with inheriting from PDGID?

These were developed separately (I wrote Particle, Eduardo wrote PDGID).

Though, I do see it as a separation of concerns. PDGID understands how to interpret the PDGID numbering scheme. Particle holds particle data, which includes a PDGID number (usually) along with other values.

Particle should inherit from the class PDGID

Currently we always have a PDGID, but with other schemes available, I'm not sure that's truly a requirement - I know we wanted to be open to other schemes. Do we have to always have a PDGID? If it inherits, it's a hard requirement. Also, that makes Particle an int, forcing things (like __index__) on Particle. Every PDGID method is then a Particle method.

Maybe that's okay, but it's much less controlled and compartmentalized than what we have now. We could also add just a collection of values with a mixin. We pick methods we want to share and mixin both PDGID and Particle. I'd make the point of sharing __int__, but you could pick something private instead if you wanted to. That way we have control, and don't have to add int's methods to Particle. You will still be unable to bit_count a Particle. ;)

@henryiii
Copy link
Member

henryiii commented Nov 10, 2022

Is a Particle an int? I think no, it is not an int. There is an int associated with it, yes, but it is not an int. You can have two Particles with the same PDGID. It is composed with an int (and other things). I think we were even keeping things open for a Particle created without a PDGID.

Is a PDGID an int? Yes, it is. It has some extra methods, but it is an int, converting to and from int is lossless. You cannot have two PDGIDs with the same int value.

So, IMO, composition is the exactly the correct design for Particle, and inheritance for PDGID.

@HDembinski
Copy link
Member Author

HDembinski commented Nov 15, 2022

Thanks, Henry, for the comments. I gave this some more thought and I will backpaddle on the point that PDGID should inherit from int. The main issue is that PDGID would inherit a lot of arithmetic operators that make little sense (pid = PDGID(22); pid += 1 # bad) and overriding all those defeats the purpose of inheritance.

I still think that Particle should inherit from PDGID and moreover, Particle.pdgid should return an int and not the class PDGID. We only use this awkward construction right now because Particle does not inherit from PDGID. It makes sense that Particle has all the methods that PDGID has, because a Particle is essentially a PDGID with extra information from a different database. I still maintain there is an "is-a" relationship between them. Supporting other codes does not change this, there always has to be a unique mapping between any other system and the PDG system. I cannot follow the argument that they should not inherit because of separation of concern. The two classes already have well defined separate roles, inheritance does not change that. Particle simply offers a strict superset of the features of PDGID and therefore can inherit the methods from PDGID.

A PDGID should be explicitly convertible to an int, which it is, since __int__ is implemented for PDGID. If Particle inherits from PDGID, it also inherits this conversion, which is good. Having __int__ is enough to make it work with numba, so that solution would make me happy. I cannot following the argument that __int__ looks too much like __init__. Both are methods that users should not call directly, so it does not matter.

I am also glad that you now agree with me that static typing is good, a few years ago your opinion was still different.

@maxnoe
Copy link
Contributor

maxnoe commented Nov 15, 2022

A PDGID should be explicitly convertible to an int, which it is, since int is implemented for PDGID.

Nitpick: a class that is losslessly transformable to int should implement __index__, not __int__.

https://docs.python.org/3/reference/datamodel.html#object.__index__

@HDembinski
Copy link
Member Author

HDembinski commented Nov 15, 2022

Ok fair point, the difference between __int__ and __index__ was not clear to me. Feel free to substitute __int__ with __index__, I don't care either way, I only want it to work with Numba.

On second thought, I am not sure we want a PDGID to work in a slice, __index__ allows that. As far as I understand, __index__ is closer to an implicit conversion in C++, while __int__ is more like an explicit conversion in C++.

@eduardo-rodrigues eduardo-rodrigues added the ⚙️ enhancement New feature or request label Dec 15, 2022
@eduardo-rodrigues
Copy link
Member

Hello Hans, Henry,

Time to finalise on some long-standing issues. This is one of them, that talks about design choices. We had several batches of exchanges. I have the impression that one has to move forward in steps, and for example do not try and solve what is here and #277 at the same time. It also seems to me that we have divergent opinions on what the Particle class should inherit from, etc. For example, I still cannot see why Particle would effectively inherit from PDGID at which point I could do Particle.from_pdgid(211)+Particle.from_pdgid(2212). This being said, I am not claiming that every single operation that does not make sense with Particle or PDGID is disabled (checking a few things I noticed that some operations are for the moment possible, unfortunately, such as PDGID(211)+PDGID(1), and that ought to be fixed as well).

I had suggested at some point that a pragmatic way forward would be to have

Particle.pdgid -> int
Particle.PDGID -> PDGID

This actually does not seem crazy and would solve the issue you raise; I do note it does not solve the design disagreements.

There's quite some few bits of class-internal changes to make for this, but all pretty trivial I believe. If you both find that an improvement I will get it out soon.

With such a change above there are consequences since for example

>>> # Find all neutral beauty hadrons
>>> Particle.findall(lambda p: p.pdgid.has_bottom and p.charge==0)

would no longer work unless one puts PDGID instead of pdgid. Now, if I make p.has_bottom a valid method - users have been asking about it anyway, also for p.is_lepton instead of p.pdgid.is_lepton, etc. - then we're good and one will simply be able to do

>>> Particle.findall(lambda p: p.has_bottom and p.charge==0)

All in all another improvement.

This will imply in the end a relatively minor change of API for the concrete type returned by Particle.pdgid. I've been thinking about a version 1.0 anyway, so ...

Let me know how you feel about this. I have the impression this could be "the best of both worlds", at least pragmatically.

Many thanks.

Henry, you had say that

PS: @eduardo-rodrigues , if you are interested in enabling all PDGID's methods on particle,
but without turning Particles also into ints and causing Particle() + 1 to "work" but make an invalid particle, we can explore this;
I have a simple idea for how this could be done.

We can talk "offline" about that, or directly via the MR.

@HDembinski
Copy link
Member Author

HDembinski commented Apr 6, 2023

I noticed that some operations are for the moment possible, unfortunately, such as PDGID(211)+PDGID(1), and that ought to be fixed as well)

To make a class which derives from int but prevents arithmetic, you have to overload all unwanted operators. Ugh.

class MyInt(int):
     def __new__(cls, *args):
         return int.__new__(cls, *args)
     def __add__(self, other):
         raise RuntimeError

MyInt(1) + MyInt(1)
# RuntimeError

Maybe it is better to not derive from int then.

From a purist point of view, arithmetic should not be allowed for PDGID, but in practice I find it hard to imagine where this would lead to a bug.

I think Particle should inherit from PDGID, because the ID system of the particle data group is the established standard, and since each Particle has an associated unique PDGID (vice versa it is not the case, but that's ok, Particle inherits from PDGID and not vice versa).

Particle.pdgid -> int
Particle.PDGID -> PDGID

This is very unpythonic in my view and very surprising. Please don't do this. It goes against "There should be one - and preferably only one - obvious way to do it".

Now, if I make p.has_bottom a valid method - users have been asking about it anyway, also for p.is_lepton instead of p.pdgid.is_lepton, etc. - then we're good and one will simply be able to do [...]

The motivation for letting Particle inherit from PDGID is to precisely make p.has_bottom etc. happen. It is intuitive that Particle has such methods and via inheritance you get them automatically without having to write a single line of code. This is not only neat but also great for maintenance and a clear powerful design. Scott Meyers says "Inheritance should model an 'is-a' relationship". A Particle is [uniquely defined by] a PDGID (see argument above), so they have that relationship.

Is your main opposition that inheritance makesParticle.from_pdgid(211)+Particle.from_pdgid(2212) possible? Then I propose to fix that in PDGID and still let Particle inherit from PDGID.

@maxnoe
Copy link
Contributor

maxnoe commented Apr 6, 2023

I find this inheritance structure surprising.

If we tightly couple Particle with the pdgid system, I'd suggest to not have two classes doing the roughly the same but have Particle have a pdgid that is a plain integer and all methods that provide semantic operations like has_bottom etc. be implemented on Particle, essentially getting rid of the PDGID class.

Maybe let's ask the question what the benefit is of having a Particle and a PDGID class if the intention is to tightly couple Particle to the PDGID system anyway.

@HDembinski
Copy link
Member Author

HDembinski commented Apr 11, 2023

Originally, PDGID was developed by Henry and Particle by Eduardo. The idea of PDGID was to provide all information that can be extracted directly from the numerical value of the PDGID, without a lookup into a database. You cannot get the particle mass or name from PDGID, but you can get other info like is_lepton.

I think this makes sense and is a sensible split. Of course you can make a mega-class by just concatenating the contents of PDGID and Particle, but having them separate is better for maintenance, as it allows for clear separation of concern.

My preferred solution is for Particle to inherit PDGID and to have a property pdgid that returns just an int. This would not be backward compatible, of course. As a workaround, one can add the property pid which returns an int and deprecate pdgid which returns PDGID.

@henryiii
Copy link
Member

henryiii commented Apr 11, 2023

IMO, Particle is not a PDGID, and having a massive number of methods on a class is daunting. The current organization, p.pdgid.<property> lets the reader know that that's a computed property of a PDGID, and not something stored in Particle. Particle is a collection of data from the PDG. It's a bit more structured, making it a little harder to write, but it's easier to read. We could have a __getattr__ that looks up the PDGID method if the Particle one is not available, and then produces a nice error message that tells you to write .pdgid.<method> instead - I think that would help quite a bit.

We now support more ID systems than PDGID like Geant3ID, PythiaID, and Corsika7, it's possible we might want to be able to create particles without a PDGID. This flexibility was part of the original design and would be destroyed by inheriting from PDGID.

A subclass of int fulfills the role of int, the only reason you can see a difference is due to Numba, which is their problem, not ours. This is a tenant of subclassing - the subclass behaves like the parent. So the issue here "Particle.pdgid should return an int" is already fulfilled. It does. It happens to be a subclass, but it absolutely is an int.

My preferred solution would be to teach numba that PDGID is an int and then close this issue.

@eduardo-rodrigues
Copy link
Member

eduardo-rodrigues commented Apr 12, 2023

Indeed having the PDGID class is absolutely necessary and a no-brainer. It's the thing that encodes lots of particle quantum number information. And it is also necessary as we provide other ID systems as said above, together with converters between them.

We agreed at least on not having PDGID inherit from int and I have a MR essentially ready for that. I had to implement some functions by hand, to be able to make some relevant and useful comparisons, but that's trivial. I will create the MR later today ...

@HDembinski
Copy link
Member Author

HDembinski commented Apr 12, 2023

IMO, Particle is not a PDGID, and having a massive number of methods on a class is daunting.

I reject that argument. Having a massive number of methods is fine and warranted, if those methods make sense to have on the class. It makes sense to have is_lepton etc on Particle. On the contrary, naively, no one would expect such a method to be on an attribute called Particle.pdgid. It only makes sense to you, Henry, because you are the developer and you designed it this way. I repeat myself: when we design APIs we must priorize the needs of our users and take their perspective into account. You need to have some empathy.

The current organization, p.pdgid. lets the reader know that that's a computed property of a PDGID, and not something stored in Particle.

That's a detail that is not relevant to end users. p.pdgid.<property> is unintuitive and it requires extra typing. I use particle a lot and also in intractive contexts. I really want to type less, especially since I expect the attributes of PDGID on the Particle anyway. I don't know why you are resisting this. This should be a no-brainer.

Particle is a collection of data from the PDG. It's a bit more structured, making it a little harder to write, but it's easier to read. We could have a getattr that looks up the PDGID method if the Particle one is not available, and then produces a nice error message that tells you to write .pdgid. instead - I think that would help quite a bit.

That's bad and not a solution to my issues with the current API.

We now support more ID systems than PDGID like Geant3ID, PythiaID, and Corsika7, it's possible we might want to be able to create particles without a PDGID. This flexibility was part of the original design and would be destroyed by inheriting from PDGID.

All these should be mappable to a PDGID. If you want to create particles without a PDGID (for what purpose, actually?), you can set it to 0, which is an invalid value. I don't see that as an obstacle. Note that we need the ID mapping anyway to support methods like is_lepton. Without a mapping, one has to implement such methods for Geant3ID, PythiaID, etc.

My preferred solution would be to teach numba that PDGID is an int and then close this issue.

We should definitely do that, but closing the discussion is premature, because it has evolved a bit away from that particular issue. We are discussing other issues here which are not solved by teaching numba that PDGID is an int.

@maxnoe
Copy link
Contributor

maxnoe commented Sep 3, 2024

One more point on the "Particle is an int" part, it seems – I didn't know this before – pdgids are not unique, e.g. the proton seems to have two representations:

In [20]: p1 = Particle.from_pdgid(2212); p2 = Particle.from_pdgid(1000010010)

In [21]: print(p1.describe())
Name: p              ID: 2212         Latex: $p$
Mass  = 938.2720882 ± 0.0000003 MeV
Width = 0.0 MeV
Q (charge)        = 1.0     J (total angular) = 0.5      P (space parity) = +
C (charge parity) = None    I (isospin)       = 0.5      G (G-parity)     = None
    SpinType: 5
    Quarks: uud
    Antiparticle name: p~ (antiparticle status: Barred)

In [22]: print(p2.describe())
Name: p              ID: 1000010010   Latex: $p$
Mass  = 938.2720882 ± 0.0000003 MeV
Width = 0.0 MeV
Q (charge)        = 1.0     J (total angular) = 0.5      P (space parity) = +
C (charge parity) = None    I (isospin)       = 0.5      G (G-parity)     = None
    SpinType: 5
    Quarks: uud
    Antiparticle name: p~ (antiparticle status: Barred)

But:

In [23]: p1 == p2
Out[23]: False

Am I missing something? Are these indeed different particles?

@HDembinski
Copy link
Member Author

HDembinski commented Sep 4, 2024

You got it exactly right, the PDGIDs are non-unique for the proton and the neutron (I think these are the only special cases?). We have 2212 for the Proton as a bag of quarks, but the proton is also the nucleus of the hydrogen atom. For nuclei, we have this second naming scheme, which uses the number of nucleons and the charge. The neutron is not a nucleus, but it can also be represented with a nucleus code and some programs use it, as far as I remember.

I think p1 and p2 should compare equal in your example, since both particles are logically the same. We should fix this in the PDGID class. Such an overload of the equality operator can also be achieved also for a class that derives from int.

@maxnoe
Copy link
Contributor

maxnoe commented Sep 4, 2024

You got it exactly right, the PDGIDs are non-unique for the proton (I think it is the only special case?).

I think the same is true for the neutron:

In [10]: Particle.findall(name="n")
Out[10]: 
[<Particle: name="n", pdgid=2112, mass=939.5654205 ± 0.0000005 MeV>,
 <Particle: name="n", pdgid=1000000010, mass=939.5654205 ± 0.0000005 MeV>]

and their anti-particles.

@maxnoe
Copy link
Contributor

maxnoe commented Sep 4, 2024

Is that also the reason why Particle.from_name("n") and Particle.from_name("p") do not work?

@eduardo-rodrigues
Copy link
Member

Hello. trying to find some bandwidth the reply to this thread and other ones ...

This and the other issues you posted, such as #620, are all linked. Indeed PDG IDs are unique, as they should, but we then got the feature when I added nuclei info, that the proton and the neutron have their standard "particle" number and "nuclei" number as well, as Hans says. These are the only such cases.

I think p1 and p2 should compare equal in your example, since both particles are logically the same. We should fix this in the PDGID class. Such an overload of the equality operator can also be achieved also for a class that derives from int.

Yep, If anyone is happy to help (I'm particularly busy for the next 2 weeks), that would be great.

Is that also the reason why Particle.from_name("n") and Particle.from_name("p") do not work?

Yep. But there is a little bit more to that one. I will try and reply on the other issues very soon.

Thank you for stress-testing :-).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⚙️ enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants