- We propose a One-shot Diffusion Mimicker (One-DM) for stylized handwritten text generation, which only requires a single reference sample as style input, and imitates its writing style to generate handwritten text with arbitrary content.
- Previous state-of-the-art methods struggle to accurately extract a user's handwriting style from a single sample due to their limited ability to learn styles. To address this issue, we introduce the high-frequency components of the reference sample to enhance the extraction of handwriting style. The proposed style-enhanced module can effectively capture the writing style patterns and suppress the interference of background noise.
- Extensive experiments on handwriting datasets in English, Chinese, and Japanese demonstrate that our approach with a single style reference even outperforms previous methods with 15x-more references.
Overview of the proposed One-DM
- [2024/10/24] We have provided a well-trained One-DM checkpoint on Google Drive and Baidu Drive :)
- [2024/09/16] This work is reported by Synced (机器之心).
- [2024/09/07]🔥🔥🔥 We open-source the first version of One-DM that can generate the handwritten words. (Later versions supporting Chinese and Japanese will be released soon.)
conda create -n One-DM python=3.8 -y
conda activate One-DM
# install all dependencies
conda env create -f environment.yml
We provide English datasets in Google Drive | Baidu Netdisk | ShiZhi AI. Please download these datasets, uzip them and move the extracted files to /data.
Model | Google Drive | Baidu Netdisk | ShiZhi AI |
---|---|---|---|
Pretrained One-DM | Google Drive | Baidu Netdisk | ShiZhi AI |
Pretrained OCR model | Google Drive | Baidu Netdisk | ShiZhi AI |
Pretrained Resnet18 | Google Drive | Baidu Netdisk | ShiZhi AI |
Note: Please download these weights, and move them to /model_zoo. (If you cannot access the pre-trained VAE model available on Hugging Face, please refer to the pinned issue for guidance.)
- training on English dataset
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=2 train.py \
--feat_model model_zoo/RN18_class_10400.pth \
--log English
- finetune on English dataset
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 train_finetune.py \
--one_dm ./Saved/IAM64_scratch/English-timestamp/model/epoch-ckpt.pt \
--ocr_model ./model_zoo/vae_HTR138.pth --log English
Note:
Please modify timestamp
and epoch
according to your own path.
- test on English dataset
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 test.py \
--one_dm ./Saved/IAM64_finetune/English-timestamp/model/epoch-ckpt.pt \
--generate_type oov_u --dir ./Generated/English
Note:
Please modify timestamp
and epoch
according to your own path.
- Comparisons with industrial image generation methods on handwritten text generation
- Comparisons with industrial image generation methods on Chinese handwriting generation
- English handwritten text generation
- Chinese and Japanese handwriting generation
If you find our work inspiring or use our codebase in your research, please cite our work:
@inproceedings{one-dm2024,
title={One-Shot Diffusion Mimicker for Handwritten Text Generation},
author={Dai, Gang and Zhang, Yifan and Ke, Quhui and Guo, Qiangya and Huang, Shuangping},
booktitle={European Conference on Computer Vision},
year={2024}
}