Skip to content

Latest commit

 

History

History
100 lines (71 loc) · 3.22 KB

Install_in_Windows_en.md

File metadata and controls

100 lines (71 loc) · 3.22 KB

Using PDF-Extract-Kit on Windows

Overview

The project was initially developed with a default environment of Linux servers, so running it directly on a Windows machine can be challenging. After encountering some issues, we have compiled a list of problems that might arise on Windows and documented them in this guide. Since the Windows environment is highly fragmented, not all solutions provided here may apply to your specific setup. If you have any questions, please raise them in an issue.

Preprocessing

To run the project smoothly on Windows, perform the following preparations:

Using in CPU Environment

1.Create a Virtual Environment

Use either venv or conda, with Python version recommended as 3.10.

2.Install Dependencies

pip install -r requirements+cpu.txt

# For detectron2, compile it yourself as per https://github.com/facebookresearch/detectron2/issues/5114
# Or use our precompiled wheel
pip install https://github.com/opendatalab/PDF-Extract-Kit/raw/main/assets/whl/detectron2-0.6-cp310-cp310-win_amd64.whl

3.Modify Configurations for CPU Inference

PDF-Extract-Kit/configs/model_configs.yaml:2

device: cpu

PDF-Extract-Kit/modules/layoutlmv3/layoutlmv3_base_inference.yaml:72

DEVICE: cpu

4.Run the Application

python pdf_extract.py --pdf demo/demo1.pdf

Using in GPU Environment

1.Verify CUDA and GPU Memory

2.Create a Virtual Environment

Use either venv or conda, with Python version recommended as 3.10.

3.Install Dependencies

pip install -r requirements+cpu.txt

# For detectron2, compile it yourself as per https://github.com/facebookresearch/detectron2/issues/5114
# Or use our precompiled wheel
pip install https://github.com/opendatalab/PDF-Extract-Kit/blob/main/assets/whl/detectron2-0.6-cp310-cp310-win_amd64.whl

# For GPU usage, ensure PyTorch is installed with CUDA support.
pip install --force-reinstall torch==2.3.1 torchvision==0.18.1 --index-url https://download.pytorch.org/whl/cu118

3.Modify Configurations for CUDA Inference

PDF-Extract-Kit/configs/model_configs.yaml:2

device: cuda

PDF-Extract-Kit/modules/layoutlmv3/layoutlmv3_base_inference.yaml:72

DEVICE: cuda

4.Run the Application

python pdf_extract.py --pdf demo/demo1.pdf