conclusion.tex

\chapter{Conclusion}
\label{chap:conclusion}

The study investigated on how deep convolutional neural networks can be applied to solve the multi-label image classification problem for real-world datasets. Results give insights on what challenges can arise in building such classification system, what level of image classification performance can be achieved, and which factors can influence the end results.

Challenges and issues revealed during analysis of the NTB dataset and implementation of the classification system include:
\begin{itemize}
    \item A unique set of categories which requires either to train a new model or to reuse a pre-trained neural network with adjustments in the architecture to solve classification problems.
    \item Difficulties in category tree transformation connected with the decision of which categories include, exclude and merge.
    \item Categories which are unsuitable for automatic training purposes including contextual, abstract and ambiguous labels.
    \item Contextual images which can not be classified only by visual features and require additional information.
    \item Issues connected with the classification of objects and people located in the background of the pictures.
    \item Duplicates of images in the dataset which can cause issues for both training and testing processes.
\end{itemize}

Each of the discovered challenges can potentially influence the final classification performance. Solutions for some of the issues can be automatized to a particular level. Results from the study also suggest that there is a potential in the automatic method of filtering contextual images described in the research.

Results from the experiments show a big potential of using pre-trained convolutional neural networks in solving the problem of multi-label image classification on a real-world dataset. The real accuracy of the system for a particular dataset will depend on its size and the extent of issues present including those discovered in the study. However, the results chapter can give an indication of which classification system performance level can be achieved when training on a dataset similar to the one used in the research.

Results suggest that employing more modern network architecture do not necessarily improve end classification performance. However, results also indicate that different neural networks can perform better in different conditions. The two neural network architectures tested in the research showed different classification performance for various sizes of the training sample. A decision on which network configuration to use depend on the requirements of the particular task.

The main limitation of the study is that all experiments were performed on a single dataset. Therefore the generalizability of results, findings, and insights is in question. Results are considered likely to be more generalizable to datasets similar to the one used in the study. Experiments were designed to maximize reproducibility and internal validity. However, further studies should be done to validate obtained results.


% % in rw datasets you deal with what you have. For instance, labels in parent categories should be used
% % in the standard dt they take images, then desing categories and then manually or semimanually label them

% % TODO: While which network configuration to use should be considered in each particular case, the study gives some insights on how different decisions can influence both training performance and final system classification precision. Two different network architectures were tested The choice of the solver method can influence 
% % Dataset splitting can be a challenge for multi-label classification case. Random sampling approach used in the main experiments of the study showed the best results in terms of the closest average ratio to the desired one. %The limitation on minimal number of images in one category can depend on how tightly connected are categories in the dataset and amout of images in them.  % not onlu real-world

% The NTB dataset analysis together with the tree transformation performed revealed possible challenges and issues of building multi-label classification system based on real-world datasets including:
% \begin{itemize}
%     \item Currently, modern neural networks do not incorporate hierarchical structure of the categories tree, therefore it has to be transformed to a flat structure. Depending on a specific dataset, different approaches can be used to perform this transformation. Characteristics of the dataset that should be considered include: if both parent and child categories are used to label images, if the relationship type between categories is consistent across the tree, and if the manual labeling rules are consistent. Depending on the dataset this process can be automatized to a certain level.
%     \item There can be contextual, combined, too abstract, and ambiguous categories. Such categories should be explicitly separated from other ones before the training process in order to acheive better system performance. % more categoires -> less weights/neurons can be used for others
%     %\item categories with different purpose?
%     \item One of the main challenges is to deal with contextual images, which can be classified only with additional information. While manual separation of such images could give the best outcome, the study suggest approach that uses known connections between categories in order to filter out some part of such images automatically.
%     \item Not consistent understanding and use of particular category between different manual labelers can result in reduced classification performance. However, results from the study suggest that in some cases it is still possible for a system to generalize on the category. This system can be further used to improve consistency in the initial dataset. % sign of triumph
    
%     % \item duplicates
% \end{itemize}

% Results from the experiments show a big potential of fine-tuning pretrained convolutional neural networks in solving the problem of multi-label image classification on a real-world dataset. The actual level of classification for a particular dataset performance will depend on its quality and size. However, the results chapter can give indication of which classification system performance level can be achieved when training on a dataset similar to the one used in the research.
% % However, it is expected that correlations and insights will likely to be applicabe to other real-world datasets as well.
% Further investigation of the trained networks shows that due to existing errors in the original dataset, the end classification system performance can be even better than suggested by the results.

% Results suggest that improvement in the network architecture do not necessarily improve end classification performance. However, results also indicate that different neural networks can perform better in different conditions. For example, more modern GoogleNet architecture compared to older CaffeNet showed better results for categories with larger sample size, but had opposite effect on the categories with small number of pictures.

% % The manual work on the category tree is most likely required, but not necessarily on the image level. Results suggest that it is possible to further improve system performance by improving the dataset using available connections between categories (removing portrait and press-conference pics from sports categories) .. The next level would be to train network, use it to improve dataset and retrain it on it again.

% % An additional discovery This fact also implies that small imperfections of the dataset do not influence the end performance on a big scale. %But in depends on the size of sample.


% The main limitation of study is that all experiments were performed on the single available dataset, therefore the generalizability of results and insights is in question. Results are considered likely to be more generalizable to datasets similar to the one used in the study. Experiments were designed to maximize reproducablility and internal validity. However, further studies should be done to validate obtained results.