Feature vector - "feature names" #68

materialsguy · 2021-08-13T09:26:13Z

Hello,

I'm currently analysing a machine learning model of somebody else, that is trained using soap feature vectors.
The code generating the feature vector looks something like that:

soap = SOAP(species=species, periodic=True, rcut=2.5, nmax=8, lmax=8, average="inner", sparse=False) feature_vectors = soap.create(atoms, n_jobs=1)

Where species is a set that holds the different element names and atoms is a list containing Atom typed elements like: Atoms(symbols='O18Al12', pbc=True, cell=[[4.76, 0.0, 0.0], [-2.379999999999999, 4.122280922013928, 0.0], [0.0, 0.0, 12.993]], spacegroup_kinds=...). The feature_vectors are then transformed into a rather big pd.dataframe that contains 1109304 columns.

Is there a way to find out the feature names (physical meaning) of the single values of a feature_vector? For me currently it is "just" a row in a dataframe which the model then is based on without any column descriptions. For my analysis it would be interesting to know which column is representing what in a physical way since my analysis results in some kind of feature importance of the respective column.

Thank you very much.

Best regards,

Claus

The text was updated successfully, but these errors were encountered:

lauri-codes · 2021-08-13T10:07:21Z

Hi @materialsguy!

This is an excellent topic. Some time ago I saw something similar in matminer, where you can call feature_labels() to get some kind of information about the features. I do have this as one of the TODO's in our kanban, but as of now, it is not directly possible.

In practice implementing it should be fairly straightforward, but I cannot give any timeline on this. It is possible to reverse-engineer some of the label information by using the get_location()-method, which gives the slice for the given species-pair. But this does not currently support getting the location of specific (l, n)-values.

materialsguy · 2021-08-13T11:21:38Z

Thank you for the quick reply. I also think such an implementation would really help from a machine learning feature engineering & feature analysis perspective, especially when the analysis is done by somebody that has not the full knowledge about the feature vectors themselves from a physical point of view. Please let me know when you implemented it.

I will have a look at the get_location()-method.

Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature vector - "feature names" #68

Feature vector - "feature names" #68

materialsguy commented Aug 13, 2021 •

edited

Loading

lauri-codes commented Aug 13, 2021

materialsguy commented Aug 13, 2021 •

edited

Loading

Feature vector - "feature names" #68

Feature vector - "feature names" #68

Comments

materialsguy commented Aug 13, 2021 • edited Loading

lauri-codes commented Aug 13, 2021

materialsguy commented Aug 13, 2021 • edited Loading

materialsguy commented Aug 13, 2021 •

edited

Loading

materialsguy commented Aug 13, 2021 •

edited

Loading