Face Recognition Using PCA & LDA Algorithms for Dimensionality Reduction (Eigen Faces)

PCA

Principal Component Analysis (PCA) is a dimensionality reduction technique that is used to extract important features from high-dimensional datasets. PCA works by identifying the principal components of the data, which are linear combinations of the original features that capture the most variation in the data

LDA

Linear Discriminant Analysis (LDA) is a dimensionality reduction technique that is used to reduce the number of features in a dataset while maintaining the class separability. LDA is a supervised technique, meaning that it uses the class labels to perform the dimensionality reduction. LDA is a popular technique for dimensionality reduction in the field of pattern recognition and machine learning

Dataset

The dataset for this project is the AT&T Face Database
The dataset is open-source and can be downloaded from Kaggle.
The dataset contains 400 images of 40 people, each person has 10 images.
The images are of size 92x112 pixels & in grayscale.
The images are stored in the datasets/faces folder.

Generating the Data Matrix and the Label vector

The Data Matrix is 400 x 10304 where each image is flattened in a vector and saved as a row in the Matrix
Each 10 images of a person are given the same label from [1:40]

paths = ["datasets/s" + str(i) for i in range(1, 41)]
cnt = 0
Data = np.zeros((400, 10304))
labels = np.zeros((400, 1))
for i in range(40):
    labels[i * 10 : (i + 1) * 10] = i + 1
for path in paths:
    files = os.listdir(path)
    for file in files:
        img = Image.open(path + "/" + file)
        np_img = np.array(img)
        np_img = np_img.flatten()
        Data[cnt] = np_img
        cnt += 1

Spliting the Dataset into Training and Test sets

Keeping the odd rows (assuming row[0] is the odd) for Training and the even rows (assuming row[1] is even) for Testing

X_train = Data[0::2]
X_test = Data[1::2]
y_train = labels[0::2]
y_test = labels[1::2]

PCA

pseudocode for PCA

def get_PCA(X_train, alpha):
    # Compute the mean of the training data
    mean_face = np.mean(X_train, axis=0)
    # subtract the mean from the training data
    X_train_centralized = X_train - mean_face
    # compute the covariance matrix
    cov_matrix = X_train_centralized @ X_train_centralized.T
    # compute the eigenvalues and eigenvectors
    eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)
    # sort the eigenvectors descindigly by eigenvalues
    idx = np.argsort(eigenvalues)[::-1]
    eigenvalues = eigenvalues[idx]
    eigenvectors = eigenvectors[:, idx]
    # restore the original eigenvectors
    eigenvectors_converted = X_train_centralized.T @ eigenvectors
    # normalize the eigenvectors_converted
    eigenfaces = eigenvectors_converted / np.linalg.norm(eigenvectors_converted, axis=0)
    # compute the number of components to keep
    sum = 0
    no_components = 0
    for i in range(len(eigenvalues)):
        sum += eigenvalues[i]
        no_components += 1
        if sum / np.sum(eigenvalues) >= alpha:
            break
    # project the training data on the eigenfaces
    return  mean_face, eigenfaces[:, :no_components]

Trick to get eigen values/vectors from Cov Matrix

The Cov Matrix is Computed as Z.T * Z (10304 x 10304) so getting the eigen values/vectors from this matrix requires too much time.
Instead we computed Cov matrix as Z * Z.T, According to Linear Algebra the eigen values computed from this matrix is the same as the original one but takes only the first 200 (which covers over 99% of the total variance).
Where the original eigen vectors are restored by this formula: ui=A*vi where ui is the original eigen vector (10304 x 1) and vi is the smaller one (200 x 1).
It gives the same results and in lesser time.

The First 5 eigen faces

Projecting The Train Data and Test Data using the Same Projection Matrix

def PCA_Projected_data(mean_face,eigenfaces):
    X_train_centered = X_train - mean_face
    X_train_projected = X_train_centered @ eigenfaces
    X_test_centered = X_test - mean_face
    X_test_projected = X_test_centered @ eigenfaces
    return X_train_projected, X_test_projected

Using KNN with K=1 as a classifier

The Classifier is trained with the projected training data using knn.fit()
Then the classifier is given the projected test data and the predicted values (labels) are saved in Y_pred
The y_pred is compared with the y_test to get accuracy (actual labels)

def Test_PCA(alpha, k):
    mean_face, eigenfaces = get_PCA(X_train, alpha)
    X_train_pca, X_test_pca = PCA_Projected_data(mean_face, eigenfaces)
    knn = KNeighborsClassifier(k, weights="distance")
    knn.fit(X_train_pca, y_train.ravel())
    y_pred = knn.predict(X_test_pca)
    accuracy = accuracy_score(y_test, y_pred.ravel())
    return accuracy

The Accuracy for each value of alpha

	Accuracy
0.80	0.945
0.85	0.94
0.90	0.94
0.95	0.93

As alpha increases, the classification accuracy decreases due to overfitting and it is more sensitive to noises

LDA

Introducing our second trick
Regularization step: before computing the inverse of the S_W matrix we add a value of 1e-7 to the diagonal of the matrix. -It is a common practice when dealing with the inversion of matrices in the context of Linear Discriminant Analysis (LDA) or covariance matrices in general. It is often used to ensure numerical stability, especially when the matrices involved might be close to singular. S_W += 1e-7 * np.identity(X_train.shape[1])

pseudocode for LDA

def get_LDA(X_train, y_train):
    y_train = np.squeeze(y_train)
    class_means = np.array([np.mean(X_train[y_train == i], axis=0) for i in range(1, 41)])
    class_sizes = np.array([np.sum(y_train == i) for i in range(1, 41)])

    # Compute overall mean
    overall_mean = np.mean(X_train, axis=0)

    # Compute within-class scatter matrix
    S_W = np.zeros((X_train.shape[1], X_train.shape[1]))
    for i in range(1, 41):
        # Use boolean index to select rows from X_train
        class_data = X_train[y_train == i]
        centered_data = class_data - class_means[i - 1]
        S_W += np.dot(centered_data.T, centered_data) 

    # # Regularize S_W
    S_W += 1e-7 * np.identity(X_train.shape[1])

    # Compute between-class scatter matrix
    S_B = np.zeros((X_train.shape[1], X_train.shape[1]))
    for i in range(1, 41):
        # Use boolean index to select rows from X_train
        class_data = X_train[y_train == i]
        class_diff = class_means[i - 1] - overall_mean
        S_B += class_sizes[i - 1] * np.outer(class_diff, class_diff)

    # Solve generalized eigenvalue problem
    eigenvalues, eigenvectors = np.linalg.eig(np.dot(np.linalg.inv(S_W), S_B))

    # Sort eigenvalues and eigenvectors
    idx = np.argsort(eigenvalues)[::-1]
    sorted_eigenvectors = eigenvectors[:, idx]

    # Take only the dominant eigenvectors
    projection_matrix = sorted_eigenvectors[:, :39]
    return np.real(projection_matrix)

Projecting The Train Data and Test Data using the Same Projection Matrix

def LDA_projected_data(projection_matrix):
    projected_X_train = np.dot(X_train, projection_matrix)
    projected_X_test = np.dot(X_test, projection_matrix)
    return projected_X_train, projected_X_test

Using KNN with K=1 as a classifier

The Classifier is trained with the projected training data using knn.fit()
Then the classifier is given the projected test data and the predicted values (labels) are saved in Y_pred
The y_pred is compared with the y_test to get accuracy (actual labels)

def Test_LDA(k):
    projected_X_train, projected_X_test = LDA_projected_data(LDA_projection_matrix)
    knn = KNeighborsClassifier(n_neighbors=k)
    knn.fit(projected_X_train, y_train.ravel())
    y_pred = knn.predict(projected_X_test)
    accuracy = accuracy_score(y_test, y_pred.ravel())
    return accuracy

LDA Accuracy for k = 1 is 95%

Classifier Tunning (Hyperparameter Tuning for K in KNN)

The tie breaking is done by choosing the least distance (weights = "distance")

PCA

	1	3	5	7	9
0.80	0.945	0.9	0.895	0.88	0.835
0.85	0.94	0.9	0.895	0.86	0.83
0.90	0.94	0.905	0.895	0.85	0.815
0.95	0.93	0.9	0.865	0.83	0.8

LDA

	accuracy
1	0.950
3	0.915
5	0.890
7	0.875
9	0.860

Variations

PCA Variant (Kernel PCA)

Kernel PCA is a non-linear dimensionality reduction technique that uses a kernel function to map high-dimensional data into a lower-dimensional space. This allows it to capture non-linear relationships between variables that are not possible with linear PCA.
The time complexity of normal PCA is O(d^3), where d is the number of dimensions, while the time complexity of kernel PCA is O(n^3), where n is the number of data points. The computation of the kernel matrix is the most computationally expensive step in kernel PCA.
Kernel PCA may be more accurate than normal PCA for datasets with non-linear relationships between variables, as it can capture these relationships. However, kernel PCA is more prone to overfitting than normal PCA, and the choice of kernel function can greatly affect the performance of kernel PCA.

def Test_Polynomial_Kernel_PCA(no_of_components, k):
    kpca = KernelPCA(n_components=no_of_components, kernel="poly")
    X_train_kpca = kpca.fit_transform(X_train)
    X_test_kpca = kpca.transform(X_test)
    knn = KNeighborsClassifier(k, weights="distance")
    knn.fit(X_train_kpca, y_train.ravel())
    y_pred = knn.predict(X_test_kpca)
    accuracy = accuracy_score(y_test, y_pred.ravel())
    return accuracy

Comparison Between Normal PCA and Kernel PCA

LDA Variant (Shrinkage LDA)

Shrinkage LDA (Linear Discriminant Analysis) is a variant of the standard LDA method that is used for classification and dimensionality reduction. The key difference between shrinkage LDA and normal LDA is that the former incorporates a regularization term that shrinks the sample covariance matrix towards a diagonal matrix.
This regularization is particularly useful when dealing with high-dimensional data, as it helps to overcome the small sample size problem by stabilizing the covariance estimates. Shrinkage LDA has been shown to outperform traditional LDA in terms of classification accuracy, especially when the number of features is much larger than the number of observations.
Another advantage of shrinkage LDA is that it can handle multicollinearity between the predictor variables, which can be a problem in standard LDA when the predictors are highly correlated. In summary, shrinkage LDA is a powerful tool for classification and dimensionality reduction that can improve the accuracy of LDA in high-dimensional and small sample size settings.

def Test_Shrinkage_LDA(k):
    lda = LinearDiscriminantAnalysis(solver="eigen", shrinkage="auto")
    lda.fit(X_train, y_train.ravel())
    X_train_projection = lda.transform(X_train)
    X_test_projection = lda.transform(X_test)
    knn = KNeighborsClassifier(n_neighbors=k)
    knn.fit(X_train_projection, y_train.ravel())
    y_pred = knn.predict(X_test_projection)
    accuracy = accuracy_score(y_test, y_pred.ravel())
    return accuracy

Comparison Between Normal LDA & Shrinkage LDA

Splitting The Dataset into 70% Training and 30% Testing

The split function

def split_data(data,labels):
    bonus_x_train = np.zeros((280,10304))
    bonus_x_test = np.zeros((120,10304))
    bonus_y_train = np.zeros((280,1))
    bonus_y_test = np.zeros((120,1))
    # split each person's data into 7 training images and 3 testing images
    for  i in range (40):
        bonus_x_train[i*7:(i+1)*7] = data[i*10:i*10+7]
        bonus_x_test[i*3:(i+1)*3] = data[i*10+7:i*10+10]
        bonus_y_train[i*7:(i+1)*7] = labels[i*10:i*10+7]
        bonus_y_test[i*3:(i+1)*3] = labels[i*10+7:i*10+10]
    # shuffle the data
    indices = np.arange(280)
    np.random.shuffle(indices)
    bonus_x_train = bonus_x_train[indices]
    bonus_y_train = bonus_y_train[indices]   
    return bonus_x_train,bonus_x_test,bonus_y_train,bonus_y_test

Comparison Between Ordinary Split and Train Split in PCA

	Original Split	Bonus Split
0.80	0.945	0.966667
0.85	0.94	0.958333
0.90	0.94	0.95
0.95	0.93	0.941667

It is clear that after splitting the dataset to 70/30 the accuracy is better for all values of alpha

Comparison Between Ordinary Split and Train Split in LDA

	Original Split	Bonus Split
LDA	0.95	0.975

Face vs. Non-Face Classification using PCA and LDA

Introduction

This documentation outlines the implementation of Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) for the task of classifying face and non-face images. The goal is to explore the effectiveness of dimensionality reduction techniques and assess the model's performance. Comparing the impact of non-faces number in the accuracy of the model.

Implementation Details

Libraries Used

NumPy
Scikit-learn
Images

Data Preprocessing

Non-Face Images

Download a semi large dataset of non faces images (~550) sample.
Loaded non-face images from the specified file paths.
Convert images into gray scale
Resized each image to 92x112 pixels.
Flattened each image to a 1D array.

Face Images

Loaded face images from the specified file paths.
Resized each image to 92x112 pixels.
Flattened each image to a 1D array.

Labels

Created binary labels (1 for faces, 0 for non-faces).

Shuffle images within each class

shuffle face images and thier labels
shuffle non_faces images and thier labels

Data Splitting

Split the data into training and testing sets using split_data.
Combined face and non-face data.
Re-shuffle the combined dataset

def split_data(faces, faces_labels, non_faces,non_faces_labels,non_faces_count,alpha,non_face_precentage_in_train=1):
    if alpha == 0.5:
        faces_train = faces[::2]
        faces_train_labels = faces_labels[::2]
        faces_test = faces[1::2]
        faces_test_labels = faces_labels[1::2]
        non_faces_train = non_faces[:int(non_faces_count*non_face_precentage_in_train):2]
        non_faces_train_labels = non_faces_labels[:int(non_faces_count*non_face_precentage_in_train):2]
        non_faces_test = non_faces[1:non_faces_count:2]
        non_faces_test_labels = non_faces_labels[1:non_faces_count:2]
    else:
        n = len(faces)
        n_train = int(n*alpha)
        idx = np.random.permutation(n)
        train_idx = idx[:n_train]
        test_idx = idx[n_train:]
        faces_train = faces[train_idx]
        faces_train_labels = faces_labels[train_idx]
        faces_test = faces[test_idx]
        faces_test_labels = faces_labels[test_idx]
        n = non_faces_count
        n_train = int(n*alpha)
        idx = np.random.permutation(n)
        train_idx = idx[:n_train]
        test_idx = idx[n_train:]
        non_faces_train = non_faces[train_idx]
        non_faces_train_labels = non_faces_labels[train_idx]
        non_faces_test = non_faces[test_idx]
        non_faces_test_labels = non_faces_labels[test_idx]
    
    return np.append(faces_train, non_faces_train, axis=0), np.append(faces_train_labels, non_faces_train_labels, axis=0), np.append(faces_test, non_faces_test, axis=0), np.append(faces_test_labels, non_faces_test_labels, axis=0)

sample of training set
sample of test set

Dimensionality Reduction: PCA

Applied PCA for dimensionality reduction.
Explored the variance explained by different components.
Transformed both training and testing data using the selected number of components.

Use the most efficient alpha value from the first part 0.85.

def PCA(train_data,alpha=0.85):
    mean = np.mean(train_data, axis=0)
    centered_data = train_data - mean
    cov_matrix = np.dot(centered_data,centered_data.T)
    eig_values, eig_vectors = np.linalg.eigh(cov_matrix)
    idx = np.argsort(eig_values)[::-1]
    eig_values = eig_values[idx]
    eig_vectors = eig_vectors[:,idx]
    eig_vectors = np.dot(centered_data.T,eig_vectors)
    for i in range(eig_vectors.shape[1]):
        eig_vectors[:,i] = eig_vectors[:,i]/np.linalg.norm(eig_vectors[:,i])
    total = np.sum(eig_values)
    k = 0
    var = 0
    while var/total < alpha:
        var += eig_values[k]
        k += 1
    return eig_vectors[:,:k], mean

def project_data(data, eigenvectors, mean,):
    return np.dot(data - mean, eigenvectors)

Dimensionality Reduction: LDA

Applied LDA for dimensionality reduction.
Use only one dominant eigenvector as we have only two classes.

Transformed both training and testing data using the selected number of components.

def LDA (train_data, train_labels, k=1):]
    mean1 = np.mean(train_data[train_labels.ravel() == 1], axis=0)
    mean0 = np.mean(train_data[train_labels.ravel() == 0], axis=0)

    Sw = np.dot((train_data[train_labels.ravel() == 1] - mean1).T, 
                (train_data[train_labels.ravel() == 1] - mean1)) 
            + np.dot((train_data[train_labels.ravel() == 0] - mean0).T, 
                        (train_data[train_labels.ravel() == 0] - mean0))
    Sb = np.dot((mean1 - mean0).reshape(-1,1), (mean1 - mean0).reshape(-1,1).T)

    eig_values, eig_vectors = np.linalg.eigh(np.dot(np.linalg.inv(Sw), Sb))
    eig_values = np.real( eig_values)
    eig_vectors = np.real( eig_vectors)
    idx = np.argsort(eig_values)[::-1]
    eig_values = eig_values[idx]
    eig_vectors = eig_vectors[:,idx]
    return eig_vectors[:,:k]

Model Training and Evaluation

Use KNN for training and evaluation, and stick to k=1 for training and weight='distance', so it weight points by the inverse of their distance.
Trained the model using the transformed data.

Evaluated the model using accuracy.

def knn_classifier(train_data, train_labels, test_data, test_labels, k=1):
    knn = KNeighborsClassifier( n_neighbors=1, weights='distance')
    knn.fit( train_data, train_labels.ravel() )
    return accuracy_score(test_labels, knn.predict(test_data).ravel()),knn.predict(test_data).ravel()

Results

PCA Results

On average it takes 41 components to retain 85.0% of the variance.
On average the accuracy of the model is about 94.5%.
The maximum accuracy of the model is

LDA Results

Only use one of dominant eigenvectors.
On average the accuracy of the model is about 80%.

Visualizing Results and Comparisons

Showed failure and success cases with visualizations.

For PCA
For LDA

Plotted accuracy vs the number of non-face images while fixing the number of face images.

For PCA
For LDA

While the number of non faces images increases the accuracy of the model decreases as it biased to recognize face images as non-face images. Also due to the increase of the data, the number of points increases and the gap between classes decreases so it make some noise and confusion for the classifier.

Criticize the accuracy measure for large numbers of non-faces images in the training data.

For PCA
For LDA

while the number of non-faces images in the training set increases the accuracy of the model increases as the number of non-faces images in the the test set fixed. So the model train on a lower number of non-faces images accoring to the test ones.

Contributors

Mohamed Wael
Ahmed Samir

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
datasets		datasets
images		images
Faces_vs_Non-Faces.ipynb		Faces_vs_Non-Faces.ipynb
PCA_LDA.ipynb		PCA_LDA.ipynb
README.md		README.md

YoussefAboelwafa/Face-Recognition

Folders and files

Latest commit

History

Repository files navigation

Face Recognition Using PCA & LDA Algorithms for Dimensionality Reduction (Eigen Faces)

PCA

LDA

Dataset

Generating the Data Matrix and the Label vector

Spliting the Dataset into Training and Test sets

PCA

pseudocode for PCA

The First 5 eigen faces

Projecting The Train Data and Test Data using the Same Projection Matrix

Using KNN with K=1 as a classifier

The Accuracy for each value of alpha

LDA

pseudocode for LDA

Projecting The Train Data and Test Data using the Same Projection Matrix

Using KNN with K=1 as a classifier

Classifier Tunning (Hyperparameter Tuning for K in KNN)

The tie breaking is done by choosing the least distance (weights = "distance")

Variations

PCA Variant (Kernel PCA)

LDA Variant (Shrinkage LDA)

Splitting The Dataset into 70% Training and 30% Testing

Comparison Between Ordinary Split and Train Split in PCA

Comparison Between Ordinary Split and Train Split in LDA

Face vs. Non-Face Classification using PCA and LDA

Introduction

Implementation Details

Libraries Used

Data Preprocessing

Non-Face Images

Face Images

Labels

Shuffle images within each class

Data Splitting

Dimensionality Reduction: PCA

Dimensionality Reduction: LDA

Model Training and Evaluation

Results

PCA Results

LDA Results

Visualizing Results and Comparisons

Showed failure and success cases with visualizations.

Plotted accuracy vs the number of non-face images while fixing the number of face images.

Criticize the accuracy measure for large numbers of non-faces images in the training data.

Contributors

About

Topics

Resources

Stars

Watchers

Forks

Contributors 3

Languages