You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm currently analysing a machine learning model of somebody else, that is trained using soap feature vectors.
The code generating the feature vector looks something like that:
Where species is a set that holds the different element names and atoms is a list containing Atom typed elements like: Atoms(symbols='O18Al12', pbc=True, cell=[[4.76, 0.0, 0.0], [-2.379999999999999, 4.122280922013928, 0.0], [0.0, 0.0, 12.993]], spacegroup_kinds=...). The feature_vectors are then transformed into a rather big pd.dataframe that contains 1109304 columns.
Is there a way to find out the feature names (physical meaning) of the single values of a feature_vector? For me currently it is "just" a row in a dataframe which the model then is based on without any column descriptions. For my analysis it would be interesting to know which column is representing what in a physical way since my analysis results in some kind of feature importance of the respective column.
Thank you very much.
Best regards,
Claus
The text was updated successfully, but these errors were encountered:
This is an excellent topic. Some time ago I saw something similar in matminer, where you can call feature_labels() to get some kind of information about the features. I do have this as one of the TODO's in our kanban, but as of now, it is not directly possible.
In practice implementing it should be fairly straightforward, but I cannot give any timeline on this. It is possible to reverse-engineer some of the label information by using the get_location()-method, which gives the slice for the given species-pair. But this does not currently support getting the location of specific (l, n)-values.
Thank you for the quick reply. I also think such an implementation would really help from a machine learning feature engineering & feature analysis perspective, especially when the analysis is done by somebody that has not the full knowledge about the feature vectors themselves from a physical point of view. Please let me know when you implemented it.
Hello,
I'm currently analysing a machine learning model of somebody else, that is trained using soap feature vectors.
The code generating the feature vector looks something like that:
soap = SOAP(species=species, periodic=True, rcut=2.5, nmax=8, lmax=8, average="inner", sparse=False) feature_vectors = soap.create(atoms, n_jobs=1)
Where
species
is a set that holds the different element names andatoms
is a list containing Atom typed elements like:Atoms(symbols='O18Al12', pbc=True, cell=[[4.76, 0.0, 0.0], [-2.379999999999999, 4.122280922013928, 0.0], [0.0, 0.0, 12.993]], spacegroup_kinds=...)
. Thefeature_vectors
are then transformed into a rather big pd.dataframe that contains 1109304 columns.Is there a way to find out the feature names (physical meaning) of the single values of a feature_vector? For me currently it is "just" a row in a dataframe which the model then is based on without any column descriptions. For my analysis it would be interesting to know which column is representing what in a physical way since my analysis results in some kind of feature importance of the respective column.
Thank you very much.
Best regards,
Claus
The text was updated successfully, but these errors were encountered: