Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(web_api): adapt to new pipeline API changes #1412

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 24 additions & 47 deletions projects/web_api/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,32 +1,19 @@
# Use the official Ubuntu base image
FROM ubuntu:latest

# ENV http_proxy http://127.0.0.1:7890
# ENV https_proxy http://127.0.0.1:7890
FROM ubuntu:22.04

# Set environment variables to non-interactive to avoid prompts during installation
ENV DEBIAN_FRONTEND=noninteractive
ENV LANG C.UTF-8

# ADD sources.list /etc/apt
# RUN apt-get clean



# Update the package list and install necessary packages
RUN apt-get -q update \
&& apt-get -q install -y --no-install-recommends \
apt-utils \
bats \
build-essential
RUN apt-get update && apt-get install -y vim net-tools procps lsof curl wget iputils-ping telnet lrzsz git

RUN apt-get update && \
apt-get install -y \
software-properties-common && \
add-apt-repository ppa:deadsnakes/ppa && \
apt-get update && \
apt-get install -y \
RUN apt-get -q update && \
apt-get -q install -y --no-install-recommends \
build-essential \
software-properties-common \
# gpg \
# && add-apt-repository ppa:deadsnakes/ppa \
&& apt-get update \
&& apt-get install -y \
python3.10 \
python3.10-venv \
python3.10-distutils \
Expand All @@ -35,41 +22,31 @@ RUN apt-get update && \
git \
libgl1 \
libglib2.0-0 \
&& rm -rf /var/lib/apt/lists/*

# RUN unset http_proxy && unset https_proxy
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

# Set Python 3.10 as the default python3
RUN update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.10 1

# Create a virtual environment for MinerU
RUN python3 -m venv /opt/mineru_venv
RUN pip config set global.index-url https://mirrors.aliyun.com/pypi/simple
# Activate the virtual environment and install necessary Python packages
RUN /bin/bash -c "source /opt/mineru_venv/bin/activate && \
# Create a virtual environment for MinerU and install packages
RUN python3 -m venv /opt/mineru_venv && \
pip config set global.index-url https://mirrors.aliyun.com/pypi/simple && \
/bin/bash -c "source /opt/mineru_venv/bin/activate && \
pip install --upgrade pip && \
pip install magic-pdf[full] --extra-index-url https://myhloli.github.io/wheels/ --no-cache-dir"


RUN /bin/bash -c "source /opt/mineru_venv/bin/activate && \
pip install fastapi uvicorn python-multipart --no-cache-dir"

RUN /bin/bash -c "source /opt/mineru_venv/bin/activate && \
pip uninstall paddlepaddle -y"

RUN /bin/bash -c "source /opt/mineru_venv/bin/activate && \
python -m pip install paddlepaddle-gpu==3.0.0b1 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/ --no-cache-dir"
pip install magic-pdf[full] --extra-index-url https://myhloli.github.io/wheels/ --no-cache-dir && \
pip install fastapi uvicorn python-multipart --no-cache-dir && \
pip uninstall paddlepaddle -y && \
pip install paddlepaddle-gpu==3.0.0b1 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/ --no-cache-dir"

# Copy the configuration file template and set up the model directory
COPY magic-pdf.template.json /root/magic-pdf.json
ADD models /opt/models
ADD .paddleocr /root/.paddleocr
ADD app.py /root/app.py
COPY models/models /opt/models
COPY layoutreader /opt/layoutreader
COPY .paddleocr /root/.paddleocr
COPY app.py /root/app.py
COPY magic-pdf.json /root/magic-pdf.json

WORKDIR /root

# Set the models directory in the configuration file (adjust the path as needed)
RUN sed -i 's|/tmp/models|/opt/models|g' /root/magic-pdf.json

# Create the models directory
# RUN mkdir -p /opt/models
Expand Down
4 changes: 4 additions & 0 deletions projects/web_api/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,3 +42,7 @@

> dockerhub地址:docker pull quincyqiang/mineru:0.1-models

## 构建方式:

1. 拷贝`hantian/layoutreader`,`opendatalab/PDF-Extract-Kit-1.0`,`paddleocr`模型到当前目录。
2. `docker build --build-arg http_proxy=http://127.0.0.1:7890 --build-arg https_proxy=http://127.0.0.1:7890 -t mineru-api .`
Loading
Loading