Principal Component Analysis (PCA) is a dimensionality reduction technique that is used to extract important features from high-dimensional datasets. PCA works by identifying the principal components of the data, which are linear combinations of the original features that capture the most variation in the data
Linear Discriminant Analysis (LDA) is a dimensionality reduction technique that is used to reduce the number of features in a dataset while maintaining the class separability. LDA is a supervised technique, meaning that it uses the class labels to perform the dimensionality reduction. LDA is a popular technique for dimensionality reduction in the field of pattern recognition and machine learning
- The dataset for this project is the AT&T Face Database
- The dataset is open-source and can be downloaded from Kaggle.
- The dataset contains 400 images of 40 people, each person has 10 images.
- The images are of size 92x112 pixels & in grayscale.
- The images are stored in the
datasets/faces
folder.
- The Data Matrix is 400 x 10304 where each image is flattened in a vector and saved as a row in the Matrix
- Each 10 images of a person are given the same label from [1:40]
paths = ["datasets/s" + str(i) for i in range(1, 41)]
cnt = 0
Data = np.zeros((400, 10304))
labels = np.zeros((400, 1))
for i in range(40):
labels[i * 10 : (i + 1) * 10] = i + 1
for path in paths:
files = os.listdir(path)
for file in files:
img = Image.open(path + "/" + file)
np_img = np.array(img)
np_img = np_img.flatten()
Data[cnt] = np_img
cnt += 1
- Keeping the odd rows (assuming row[0] is the odd) for Training and the even rows (assuming row[1] is even) for Testing
X_train = Data[0::2]
X_test = Data[1::2]
y_train = labels[0::2]
y_test = labels[1::2]
def get_PCA(X_train, alpha):
# Compute the mean of the training data
mean_face = np.mean(X_train, axis=0)
# subtract the mean from the training data
X_train_centralized = X_train - mean_face
# compute the covariance matrix
cov_matrix = X_train_centralized @ X_train_centralized.T
# compute the eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)
# sort the eigenvectors descindigly by eigenvalues
idx = np.argsort(eigenvalues)[::-1]
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]
# restore the original eigenvectors
eigenvectors_converted = X_train_centralized.T @ eigenvectors
# normalize the eigenvectors_converted
eigenfaces = eigenvectors_converted / np.linalg.norm(eigenvectors_converted, axis=0)
# compute the number of components to keep
sum = 0
no_components = 0
for i in range(len(eigenvalues)):
sum += eigenvalues[i]
no_components += 1
if sum / np.sum(eigenvalues) >= alpha:
break
# project the training data on the eigenfaces
return mean_face, eigenfaces[:, :no_components]
Trick to get eigen values/vectors from Cov Matrix
- The Cov Matrix is Computed as Z.T * Z (10304 x 10304) so getting the eigen values/vectors from this matrix requires too much time.
- Instead we computed Cov matrix as Z * Z.T, According to Linear Algebra the eigen values computed from this matrix is the same as the original one but takes only the first 200 (which covers over 99% of the total variance).
- Where the original eigen vectors are restored by this formula: ui=A*vi where ui is the original eigen vector (10304 x 1) and vi is the smaller one (200 x 1).
- It gives the same results and in lesser time.
def PCA_Projected_data(mean_face,eigenfaces):
X_train_centered = X_train - mean_face
X_train_projected = X_train_centered @ eigenfaces
X_test_centered = X_test - mean_face
X_test_projected = X_test_centered @ eigenfaces
return X_train_projected, X_test_projected
- The Classifier is trained with the projected training data using knn.fit()
- Then the classifier is given the projected test data and the predicted values (labels) are saved in Y_pred
- The y_pred is compared with the y_test to get accuracy (actual labels)
def Test_PCA(alpha, k):
mean_face, eigenfaces = get_PCA(X_train, alpha)
X_train_pca, X_test_pca = PCA_Projected_data(mean_face, eigenfaces)
knn = KNeighborsClassifier(k, weights="distance")
knn.fit(X_train_pca, y_train.ravel())
y_pred = knn.predict(X_test_pca)
accuracy = accuracy_score(y_test, y_pred.ravel())
return accuracy
Accuracy | |
---|---|
0.80 | 0.945 |
0.85 | 0.94 |
0.90 | 0.94 |
0.95 | 0.93 |
- As alpha increases, the classification accuracy decreases due to overfitting and it is more sensitive to noises
- Introducing our second trick
- Regularization step: before computing the inverse of the S_W matrix we add a value of 1e-7 to the diagonal of the matrix.
-It is a common practice when dealing with the inversion of matrices in the context of Linear Discriminant Analysis (LDA) or covariance matrices in general. It is often used to ensure numerical stability, especially when the matrices involved might be close to singular.
S_W += 1e-7 * np.identity(X_train.shape[1])
def get_LDA(X_train, y_train):
y_train = np.squeeze(y_train)
class_means = np.array([np.mean(X_train[y_train == i], axis=0) for i in range(1, 41)])
class_sizes = np.array([np.sum(y_train == i) for i in range(1, 41)])
# Compute overall mean
overall_mean = np.mean(X_train, axis=0)
# Compute within-class scatter matrix
S_W = np.zeros((X_train.shape[1], X_train.shape[1]))
for i in range(1, 41):
# Use boolean index to select rows from X_train
class_data = X_train[y_train == i]
centered_data = class_data - class_means[i - 1]
S_W += np.dot(centered_data.T, centered_data)
# # Regularize S_W
S_W += 1e-7 * np.identity(X_train.shape[1])
# Compute between-class scatter matrix
S_B = np.zeros((X_train.shape[1], X_train.shape[1]))
for i in range(1, 41):
# Use boolean index to select rows from X_train
class_data = X_train[y_train == i]
class_diff = class_means[i - 1] - overall_mean
S_B += class_sizes[i - 1] * np.outer(class_diff, class_diff)
# Solve generalized eigenvalue problem
eigenvalues, eigenvectors = np.linalg.eig(np.dot(np.linalg.inv(S_W), S_B))
# Sort eigenvalues and eigenvectors
idx = np.argsort(eigenvalues)[::-1]
sorted_eigenvectors = eigenvectors[:, idx]
# Take only the dominant eigenvectors
projection_matrix = sorted_eigenvectors[:, :39]
return np.real(projection_matrix)
def LDA_projected_data(projection_matrix):
projected_X_train = np.dot(X_train, projection_matrix)
projected_X_test = np.dot(X_test, projection_matrix)
return projected_X_train, projected_X_test
- The Classifier is trained with the projected training data using knn.fit()
- Then the classifier is given the projected test data and the predicted values (labels) are saved in Y_pred
- The y_pred is compared with the y_test to get accuracy (actual labels)
def Test_LDA(k):
projected_X_train, projected_X_test = LDA_projected_data(LDA_projection_matrix)
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(projected_X_train, y_train.ravel())
y_pred = knn.predict(projected_X_test)
accuracy = accuracy_score(y_test, y_pred.ravel())
return accuracy
- LDA Accuracy for k = 1 is 95%
- PCA
1 | 3 | 5 | 7 | 9 | |
---|---|---|---|---|---|
0.80 | 0.945 | 0.9 | 0.895 | 0.88 | 0.835 |
0.85 | 0.94 | 0.9 | 0.895 | 0.86 | 0.83 |
0.90 | 0.94 | 0.905 | 0.895 | 0.85 | 0.815 |
0.95 | 0.93 | 0.9 | 0.865 | 0.83 | 0.8 |
- LDA
accuracy | |
---|---|
1 | 0.950 |
3 | 0.915 |
5 | 0.890 |
7 | 0.875 |
9 | 0.860 |
- Kernel PCA is a non-linear dimensionality reduction technique that uses a kernel function to map high-dimensional data into a lower-dimensional space. This allows it to capture non-linear relationships between variables that are not possible with linear PCA.
- The time complexity of normal PCA is O(d^3), where d is the number of dimensions, while the time complexity of kernel PCA is O(n^3), where n is the number of data points. The computation of the kernel matrix is the most computationally expensive step in kernel PCA.
- Kernel PCA may be more accurate than normal PCA for datasets with non-linear relationships between variables, as it can capture these relationships. However, kernel PCA is more prone to overfitting than normal PCA, and the choice of kernel function can greatly affect the performance of kernel PCA.
def Test_Polynomial_Kernel_PCA(no_of_components, k):
kpca = KernelPCA(n_components=no_of_components, kernel="poly")
X_train_kpca = kpca.fit_transform(X_train)
X_test_kpca = kpca.transform(X_test)
knn = KNeighborsClassifier(k, weights="distance")
knn.fit(X_train_kpca, y_train.ravel())
y_pred = knn.predict(X_test_kpca)
accuracy = accuracy_score(y_test, y_pred.ravel())
return accuracy
- Comparison Between Normal PCA and Kernel PCA
-
Shrinkage LDA (Linear Discriminant Analysis) is a variant of the standard LDA method that is used for classification and dimensionality reduction. The key difference between shrinkage LDA and normal LDA is that the former incorporates a regularization term that shrinks the sample covariance matrix towards a diagonal matrix.
-
This regularization is particularly useful when dealing with high-dimensional data, as it helps to overcome the small sample size problem by stabilizing the covariance estimates. Shrinkage LDA has been shown to outperform traditional LDA in terms of classification accuracy, especially when the number of features is much larger than the number of observations.
-
Another advantage of shrinkage LDA is that it can handle multicollinearity between the predictor variables, which can be a problem in standard LDA when the predictors are highly correlated. In summary, shrinkage LDA is a powerful tool for classification and dimensionality reduction that can improve the accuracy of LDA in high-dimensional and small sample size settings.
def Test_Shrinkage_LDA(k):
lda = LinearDiscriminantAnalysis(solver="eigen", shrinkage="auto")
lda.fit(X_train, y_train.ravel())
X_train_projection = lda.transform(X_train)
X_test_projection = lda.transform(X_test)
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train_projection, y_train.ravel())
y_pred = knn.predict(X_test_projection)
accuracy = accuracy_score(y_test, y_pred.ravel())
return accuracy
- Comparison Between Normal LDA & Shrinkage LDA
- The split function
def split_data(data,labels):
bonus_x_train = np.zeros((280,10304))
bonus_x_test = np.zeros((120,10304))
bonus_y_train = np.zeros((280,1))
bonus_y_test = np.zeros((120,1))
# split each person's data into 7 training images and 3 testing images
for i in range (40):
bonus_x_train[i*7:(i+1)*7] = data[i*10:i*10+7]
bonus_x_test[i*3:(i+1)*3] = data[i*10+7:i*10+10]
bonus_y_train[i*7:(i+1)*7] = labels[i*10:i*10+7]
bonus_y_test[i*3:(i+1)*3] = labels[i*10+7:i*10+10]
# shuffle the data
indices = np.arange(280)
np.random.shuffle(indices)
bonus_x_train = bonus_x_train[indices]
bonus_y_train = bonus_y_train[indices]
return bonus_x_train,bonus_x_test,bonus_y_train,bonus_y_test
Original Split | Bonus Split | |
---|---|---|
0.80 | 0.945 | 0.966667 |
0.85 | 0.94 | 0.958333 |
0.90 | 0.94 | 0.95 |
0.95 | 0.93 | 0.941667 |
- It is clear that after splitting the dataset to 70/30 the accuracy is better for all values of alpha
Original Split | Bonus Split | |
---|---|---|
LDA | 0.95 | 0.975 |
This documentation outlines the implementation of Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) for the task of classifying face and non-face images. The goal is to explore the effectiveness of dimensionality reduction techniques and assess the model's performance. Comparing the impact of non-faces number in the accuracy of the model.
- NumPy
- Scikit-learn
- Images
- Download a semi large dataset of non faces images (~550) sample.
- Loaded non-face images from the specified file paths.
- Convert images into gray scale
- Resized each image to 92x112 pixels.
- Flattened each image to a 1D array.
- Loaded face images from the specified file paths.
- Resized each image to 92x112 pixels.
- Flattened each image to a 1D array.
- Created binary labels (1 for faces, 0 for non-faces).
- shuffle face images and thier labels
- shuffle non_faces images and thier labels
- Split the data into training and testing sets using
split_data
. - Combined face and non-face data.
- Re-shuffle the combined dataset
def split_data(faces, faces_labels, non_faces,non_faces_labels,non_faces_count,alpha,non_face_precentage_in_train=1):
if alpha == 0.5:
faces_train = faces[::2]
faces_train_labels = faces_labels[::2]
faces_test = faces[1::2]
faces_test_labels = faces_labels[1::2]
non_faces_train = non_faces[:int(non_faces_count*non_face_precentage_in_train):2]
non_faces_train_labels = non_faces_labels[:int(non_faces_count*non_face_precentage_in_train):2]
non_faces_test = non_faces[1:non_faces_count:2]
non_faces_test_labels = non_faces_labels[1:non_faces_count:2]
else:
n = len(faces)
n_train = int(n*alpha)
idx = np.random.permutation(n)
train_idx = idx[:n_train]
test_idx = idx[n_train:]
faces_train = faces[train_idx]
faces_train_labels = faces_labels[train_idx]
faces_test = faces[test_idx]
faces_test_labels = faces_labels[test_idx]
n = non_faces_count
n_train = int(n*alpha)
idx = np.random.permutation(n)
train_idx = idx[:n_train]
test_idx = idx[n_train:]
non_faces_train = non_faces[train_idx]
non_faces_train_labels = non_faces_labels[train_idx]
non_faces_test = non_faces[test_idx]
non_faces_test_labels = non_faces_labels[test_idx]
return np.append(faces_train, non_faces_train, axis=0), np.append(faces_train_labels, non_faces_train_labels, axis=0), np.append(faces_test, non_faces_test, axis=0), np.append(faces_test_labels, non_faces_test_labels, axis=0)
-
Applied PCA for dimensionality reduction.
-
Explored the variance explained by different components.
-
Transformed both training and testing data using the selected number of components.
-
Use the most efficient alpha value from the first part
0.85
.def PCA(train_data,alpha=0.85): mean = np.mean(train_data, axis=0) centered_data = train_data - mean cov_matrix = np.dot(centered_data,centered_data.T) eig_values, eig_vectors = np.linalg.eigh(cov_matrix) idx = np.argsort(eig_values)[::-1] eig_values = eig_values[idx] eig_vectors = eig_vectors[:,idx] eig_vectors = np.dot(centered_data.T,eig_vectors) for i in range(eig_vectors.shape[1]): eig_vectors[:,i] = eig_vectors[:,i]/np.linalg.norm(eig_vectors[:,i]) total = np.sum(eig_values) k = 0 var = 0 while var/total < alpha: var += eig_values[k] k += 1 return eig_vectors[:,:k], mean def project_data(data, eigenvectors, mean,): return np.dot(data - mean, eigenvectors)
-
Applied LDA for dimensionality reduction.
-
Use only one dominant eigenvector as we have only two classes.
-
Transformed both training and testing data using the selected number of components.
def LDA (train_data, train_labels, k=1):] mean1 = np.mean(train_data[train_labels.ravel() == 1], axis=0) mean0 = np.mean(train_data[train_labels.ravel() == 0], axis=0) Sw = np.dot((train_data[train_labels.ravel() == 1] - mean1).T, (train_data[train_labels.ravel() == 1] - mean1)) + np.dot((train_data[train_labels.ravel() == 0] - mean0).T, (train_data[train_labels.ravel() == 0] - mean0)) Sb = np.dot((mean1 - mean0).reshape(-1,1), (mean1 - mean0).reshape(-1,1).T) eig_values, eig_vectors = np.linalg.eigh(np.dot(np.linalg.inv(Sw), Sb)) eig_values = np.real( eig_values) eig_vectors = np.real( eig_vectors) idx = np.argsort(eig_values)[::-1] eig_values = eig_values[idx] eig_vectors = eig_vectors[:,idx] return eig_vectors[:,:k]
-
Use KNN for training and evaluation, and stick to
k=1
for training andweight='distance'
, so it weight points by the inverse of their distance. -
Trained the model using the transformed data.
-
Evaluated the model using accuracy.
def knn_classifier(train_data, train_labels, test_data, test_labels, k=1): knn = KNeighborsClassifier( n_neighbors=1, weights='distance') knn.fit( train_data, train_labels.ravel() ) return accuracy_score(test_labels, knn.predict(test_data).ravel()),knn.predict(test_data).ravel()
-
On average it takes
41
components to retain 85.0% of the variance. -
On average the accuracy of the model is about
94.5%
. -
The maximum accuracy of the model is
- Only use one of dominant eigenvectors.
- On average the accuracy of the model is about
80%
.
-
For PCA
-
For LDA
While the number of non faces images increases the accuracy of the model decreases as it biased to recognize face images as non-face images. Also due to the increase of the data, the number of points increases and the gap between classes decreases so it make some noise and confusion for the classifier.
-
For PCA
-
For LDA
while the number of non-faces images in the training set increases the accuracy of the model increases as the number of non-faces images in the the test set fixed. So the model train on a lower number of non-faces images accoring to the test ones.