Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates for vllm 0.6.2 #12338

Merged
merged 16 commits into from
Nov 12, 2024
Merged

Updates for vllm 0.6.2 #12338

merged 16 commits into from
Nov 12, 2024

Conversation

gc-fu
Copy link
Contributor

@gc-fu gc-fu commented Nov 5, 2024

Description

Updates for vLLm to using vLLM 0.6.2.

We need to change the followings:

  • Initial Dockerfile
  • vLLM related updates
  • update benchmark_latency.py
  • update benchmark_throughput.py
  • Examples, in ipex-llm/python/llm/example/GPU/vLLM-Serving
  • vLLM worker
  • Update final Dockerfile before merge, this is for changing building branches, check the TODO in the code.
  • Merge Fix awq and gptq error on vllm0.6.2 analytics-zoo/vllm#47
  • Test image functionality... Done by Wang, Jun

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

  • N/A
  • Unit test: Please manually trigger the PR Validation here by inputting the PR number (e.g., 1234). And paste your action link here once it has been successfully finished.
  • Application test
  • Document test
  • ...

5. Known issues

  • Sometimes, this will fail on initial start up, and got timeout error...

@gc-fu
Copy link
Contributor Author

gc-fu commented Nov 6, 2024

@@ -17,7 +17,7 @@ In this example, we will run Llama2-7b model using Arc A770 and provide `OpenAI-

### 0. Environment

To use Intel GPUs for deep-learning tasks, you should install the XPU driver and the oneAPI Base Toolkit 2024.0. Please check the requirements at [here](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU#requirements).
To use Intel GPUs for deep-learning tasks, you should install the XPU driver and the oneAPI Base Toolkit 2024.1. Please check the requirements at [here](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU#requirements).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the link correspond with oneapi 2024.1? ipex-llm is mainly using 2024.0 on arc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VLLM_BUILD_XPU_OPS=1 pip install --no-build-isolation -v -e .
pip install outlines==0.0.34 --no-deps
pip install interegular cloudpickle diskcache joblib lark nest-asyncio numba scipy
VLLM_TARGET_DEVICE=xpu pip install --no-build-isolation -v . && \
Copy link
Contributor

@xiangyuT xiangyuT Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove && \

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@gc-fu
Copy link
Contributor Author

gc-fu commented Nov 12, 2024

Copy link
Contributor

@xiangyuT xiangyuT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@gc-fu gc-fu merged commit 0ee54fc into main Nov 12, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants