This Project for CS252 Final Project @Macau university of Science and Technology.
Copyright @ 2022, Chen Sun, JinchengTian, Xinyun Chen, Yunpeng Zhou. All rights reserved.
Please click here for our personal homepage to contact us.
-
This project is base on the MAC OS system, we have not developed a Windows version for the time being.
-
This tool can only handle .png and .jpg formats at present, please make sure that the images you upload belong to the above two formats.
This project is mainly divided into two parts, namely cartoon converter and image resource sharing platform. You can access our platform through cartoon converter.
Cartoon Converter is an image processing tool that runs locally. It can convert any style of pictures into cartoon style. We now provide four styles for you to choose from. If you want to know more about it, please click here.
We provide a Demo here:
If you want to see the detailed user manual, please click here.
The picture resource sharing platform is another part of this project.
This platform is used in conjunction with the converter. Its function is to provide a resource sharing platform for artists to download and upload pictures, and to provide creative inspiration and picture resources for art practitioners.
Anyone can upload or download pictures from this platform. Of course, we also provide an interface on the converter to help users easily upload their converted cartoon-style pictures.
We have provided an interface for entering the platform on the converter and uploading pictures to the platform.
https://github.com/MeditatorE/Cartoon-Converter-Platform.git
cd Cartoon-Converter-Platform
2. Install required modules
pip install -r requirements.txt
python main.py
We uploaded our development plan on Jira. In general, our development process is divided into three stages:
Sprint I mainly develops local cartoon converters,
Sprint II mainly develops online resource sharing platforms and
Sprint III mainly responsible for integrating the online and offline parts and optimizing the entire project.
You can go to our development page on Jira to learn more.
At this stage we mainly need to train the model and implement a GUI as an interface to help users access the model. We use a model called Cartoon GAN, which was presented in a paper at the 2018 CVPR conference called CartoonGAN: Generative Adversarial Networks for Photo Cartoonization. Its specific structure is shown below:
CartoonGAN is a Generative Adversarial Network (GAN) framework specialized for cartoon stylization.
Actually, existing methods are not satisfied with the cartoonization effect, because
(1) cartoon style has unique characteristics, is highly simplified and abstract, and
(2) cartoon images tend to have sharp edges, smooth color shading and relatively simple textures.
However, this model proposes two new losses suitable for cartoonization:
(1) the semantic content loss, which is formulated as sparse regularization in the high-level feature maps of the VGG network to cope with the large stylistic variation between photos and cartoons, and
(2) Edge-promoting adversarial loss for preserving sharp edges. We further introduce an initialization phase to improve the convergence of the network to the target manifold.
The result is that the model is able to generate high-quality cartoon images from real-world photos.
In the implementation part, in order to avoid the high cost of collecting different styles of datasets, we finally decided to download the model pre-training model and related code implementation from a GitHub project.
For GUI, we use Tkinter to implement, we embed all APIs for accessing models or web pages into the GUI, and the GUI looks as shown in the following figure:
In this part we originally planned to design a login page to be used after login, but in order to simplify the operation process and we hope this converter can be used more conveniently, we finally decided to delete this login page.
Click here to learn more about the login page.
In the second stage, we mainly develop an online resource sharing platform. In this part, we mainly use the Chevereto framework to build our web page. Our web page is built on a server. Unfortunately, our link is temporary, so we provide A tutorial to help you quickly rebuild this page.
Anyone can upload and download images from our database on our resource sharing platform, we provide a Demo here:
In this part, we need to integrate the online and offline parts and optimize the function of our converter. We also developed our personal web page and online user manual at this stage to help users quickly understand how to use our project.
Here is the Demo:
For the quality of the output picture, we use two indicators for evaluation, PSNR and SSIM.
PSNR is the abbreviation of "Peak Signal to Noise Ratio", that is, the peak signal-to-noise ratio, which is an objective standard for evaluating images, which has limitations, Generally used for an engineering project between maximum signal and background noise. It is the logarithm of the mean square error between the original image and the processed image relative to (2^n-1)^2 (the square of the maximum value of the signal, n is the number of bits per sample value), and its unit is dB.
SSIM (Structural Similarity), structural similarity, is an indicator to measure the similarity of two images. Structural similarity index defines structural information as independent of brightness and contrast from the perspective of image composition, reflecting the properties of object structure in the scene, and models distortion as a combination of three different factors of brightness, contrast and structure. The mean is used as an estimate of brightness, the standard deviation is used as an estimate of contrast, and the covariance is used as a measure of structural similarity.
We have also uploaded all images and related codes for evaluation, if you are interested please click here.
Through the analysis of SSIM and PSNR, we can confirm that our generated images are of good quality without distortion and structural confusion, and our image processing tools have new performance.
We randomly select 20 images to calculate these two indicators, and then calculate the average, the results are shown below:
Indicators/Style | Hayao | Hosoda | Paprika | Shinkai |
---|---|---|---|---|
SSIM | 0.649 | 0.692 | 0.632 | 0.743 |
PSNR | 28.139 | 28.288 | 28.078 | 28.431 |
All planned functions of our entire project have been completed, and the evaluation results show that our image processing tools have excellent performance, and in the functional testing process, our project has completed every expected function perfectly and there are no cases that are contrary to the expected results.
In summary, our project development process and subsequent testing phases have been completed. In the subsequent work, we plan to use more abundant data to improve the quality of the generated images to further optimize the performance of this project.
In addition, in the future we may continue to develop the Windows version of this project and make this project compatible with more image formats.