Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Classification of atoms using categories #689

Closed
pfebrer opened this issue Feb 23, 2024 · 14 comments
Closed

Classification of atoms using categories #689

pfebrer opened this issue Feb 23, 2024 · 14 comments

Comments

@pfebrer
Copy link
Contributor

pfebrer commented Feb 23, 2024

Following discussion in #688

I think it would be nice that categories facilitated classification. Imagine you have three categories cat1, cat2 and cat3. There could be a nice way of grouping atoms based on whether they belong to those categories or not. I.e. if I have a geometry, I could do something like:

geometry.classify((cat1, cat2, cat3))

And I would get an array saying to which group each atom belongs:

[0, 0, 0, 2, 1, 2, ..., -1, 0]

This I think is basically what the "Or" category does, but returning indices instead of the categories themselves.

Another thing that would be useful for classification is to provide some property and ask for the creation of groups based on that property:

geometry.classify(AtomNneighbors)

and you get maybe a tuple with the group indices and the meaning of each group:

[0, 0, 0, 1, 0, 2, 0, 1, 2], (AtomNneighbors(0), AtomNneighbors(1), AtomNneighbors(2) )
@zerothi
Copy link
Owner

zerothi commented Feb 23, 2024

the question, I guess, is more whether we want a new classify vs categorize.

Also, what should happen when classify can inadvertently categorize as multiple versions, same problem as Category.

I.e.

catA = ...
catB = ...
geometry.classify((catA, catB, catA | catB))

Consider more the case when users creates the last catC by accident?

It seems to me the functionality could be simple enough:

result = geometry.categorize(catA | catB | catC)
catA in result[0]

wouldn't this work equivalently?

@zerothi
Copy link
Owner

zerothi commented Feb 23, 2024

Or perhaps catA.indices(result) would give back the indices that are the same?

@ialcon
Copy link

ialcon commented Feb 23, 2024

From what I see in the example by Pol, I understand that the implementation you are thinking about would require each atom to be uniquely associated to one category, right?

@zerothi
Copy link
Owner

zerothi commented Feb 23, 2024

From what I see in the example by Pol, I understand that the implementation you are thinking about would require each atom to be uniquely associated to one category, right?

Yes, and I think that can be surprising to end users when that isn't the case. Then I guess it would be better to return the category that matches any of them.

@pfebrer
Copy link
Contributor Author

pfebrer commented Feb 23, 2024

From what I see in the example by Pol, I understand that the implementation you are thinking about would require each atom to be uniquely associated to one category, right?

If the goal is to classify, yes :)

Maybe it would be best to have a separate thing for classifying. So Classifier would be separate from Category. I just saw some sinergies between the two things. But I guess what both things share is to compute some atom-wise property, and then they perform different operations with it.

So perhaps it could be worth it to implement AtomProperty and then make categories and classifiers potentially use them.

@zerothi
Copy link
Owner

zerothi commented Feb 23, 2024

From what I see in the example by Pol, I understand that the implementation you are thinking about would require each atom to be uniquely associated to one category, right?

If the goal is to classify, yes :)

Maybe it would be best to have a separate thing for classifying. So Classifier would be separate from Category. I just saw some sinergies between the two things. But I guess what both things share is to compute some atom-wise property, and then they perform different operations with it.

So perhaps it could be worth it to implement AtomProperty and then make categories and classifiers potentially use them.

But can we do without? The problem is that distinguishing between Classifier and Category will not be obvious, and if 90% cases can be handled via one, and the rest 10% requires some tweaking, then I think readability has higher priority.

@pfebrer
Copy link
Contributor Author

pfebrer commented Feb 23, 2024

Why isn't it obvious? A category just determines whether an atom fulfills or not a certain criteria, and a classifier groups atoms based on properties. The grouping might even have thresholds to tweak how many groups you want to create, for example, which are not so easy to determine by hand.

An example that is impossible (or I wouldn't know how to do it) to implement with categories: I once had a box of water molecules and I wanted to determine which atoms belonged to the same molecule, so that I could plot the PDOS per water molecule. I used some sklearn clustering algorithm (I don't remember which) and then I got an array containing to which water molecule (cluster) each atom belonged. This is a classification based on some property (distance between atoms) and some grouping algorithm. I don't know how to acheive this with categories, and I believe this kind of things will be useful to many people that don't even know how to start implementing them themselves.

@zerothi
Copy link
Owner

zerothi commented Feb 23, 2024

This I would argue aims at us changing what the categories can do. Meaning, that possibly a category should be able to contain more content, and not just index like.

I am primarily worried about the implications on documenting Category vs Classifier for end-users. From a first glance they look very similar, and I bet Category would just be a subset of Classifier, so hence we might as well change everything to do what you want the Classifier to do, no?

@pfebrer
Copy link
Contributor Author

pfebrer commented Feb 23, 2024

Hmm yeah that's true, a category is kind of a classifier with just two groups.

As I said in the other issue, I'm too busy to think about anything useful now, so I'll continue this discussion next week 😅

@ialcon
Copy link

ialcon commented Feb 23, 2024

Just one question for me to follow... What is the difference between classifier and categorize?

@zerothi
Copy link
Owner

zerothi commented Feb 23, 2024

Just one question for me to follow... What is the difference between classifier and categorize?

currently we don't have a Classifier @pfebrer is advocating on adding this. I am trying to figure out if we can have a single way to do both (i.e. only have a Category, or name it Classifier).
So an open discussion on that for now. :)

@ialcon
Copy link

ialcon commented Feb 23, 2024

Ok, thanks Nick. I will try to read your various issues commenting on this topic early next week - so that I have a better overall picture. For me it is hard to see how classifier/category could be used to classify atoms in different hexagons within, say, a GNR - specially because such hexagons would have common (sharing) atoms.. But there might be the way to do this.

@zerothi
Copy link
Owner

zerothi commented Feb 28, 2024

Ok, thanks Nick. I will try to read your various issues commenting on this topic early next week - so that I have a better overall picture. For me it is hard to see how classifier/category could be used to classify atoms in different hexagons within, say, a GNR - specially because such hexagons would have common (sharing) atoms.. But there might be the way to do this.

The idea here, would be to fundamentally restructure the Category class. So that it can re-act to the surroundings (other atoms than it-self).

@zerothi
Copy link
Owner

zerothi commented Apr 15, 2024

We have too many categories issues, see #687 which will be the main topic now.

@zerothi zerothi closed this as completed Apr 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants