You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For fun I also borrowed some other data from This Link and see how personality and test performance can be condensed to a dimensionally reduced model. personality_score.csv
Question1 : what is the proper way of selecting sufficient amont of dimensions to preserve data and avoiding noise? Kaiser–Meyer–Olkin, Levene, and others all seem to be better descriptors compared to "Eigenvalue > 1" rule.
Question 2: Can PCA be integrated with something else such that it can behave like PCR and Lasso Regression? (as in reducing the amounf of unnecessary columns before attempting to be accurate)
Questions 3: Can ICA be used to discover significant columns? It is seen as A way to isolate components after using PCA to assess proper dimension count
Current find: Six Components are enough to describe all data with Eigenvalue > 1 when used with RobustScaler. [9.136, 2.683, 2.078, 1.420, 1.328, 1.090] similar to the amount without scaling MaxAbsScaler yield only one weak element whilst StandardScaler yielded 159 components with first six being [52.051, 12.815, 9.561, 8.205, 6.741, 5.902]. It seems that normalization does not help with clearing noise in some cases.
Question 4: How can one check the significance of an ICA component
Question 5: If one were to use the 159 components, what are the strategy of determining the designation of the most useful columns in each column?
frompandasimportread_csvfromsklearn.preprocessingimportMaxAbsScaler, RobustScaler, StandardScalerdf=read_csv('https://files.catbox.moe/4nztka.csv')
df=df.drop(columns=df.columns[0], axis=1)
X, y=df.drop(columns=['AFQT']), df[['AFQT']]
# X_transformed = MaxAbsScaler().fit_transform(X) # in case of Yes/No QuestionX_transformed=RobustScaler().fit_transform(X) # in case of Likert Scale# X_transformed = StandardScaler().fit_transform(X) # in case of aggregatesfromnumpyimportmean, dotfromsklearn.decompositionimportPCA, FastICAfrompandasimportDataFramepca=PCA()
X_transformed_pca=pca.fit_transform(X_transformed)
suff_len=len([iforiinpca.explained_variance_ifi>1])
print(pca.explained_variance_[:suff_len])
ica=FastICA(n_components=suff_len)
X_transformed_ica=ica.fit_transform(X_transformed)
df_comp=DataFrame(ica.components_, columns=X.columns)
For fun I also borrowed some other data from This Link and see how personality and test performance can be condensed to a dimensionally reduced model.
personality_score.csv
Question1 : what is the proper way of selecting sufficient amont of dimensions to preserve data and avoiding noise? Kaiser–Meyer–Olkin, Levene, and others all seem to be better descriptors compared to "Eigenvalue > 1" rule.
Question 2: Can PCA be integrated with something else such that it can behave like PCR and Lasso Regression? (as in reducing the amounf of unnecessary columns before attempting to be accurate)
Questions 3: Can ICA be used to discover significant columns? It is seen as A way to isolate components after using PCA to assess proper dimension count
The text was updated successfully, but these errors were encountered: