-
Notifications
You must be signed in to change notification settings - Fork 40
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
root
committed
Oct 20, 2024
1 parent
09d3cae
commit 86da32c
Showing
1 changed file
with
48 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
|
||
DeepEval is a simple-to-use, open-source LLM evaluation framework, for evaluating large-language model systems. It is similar to Pytest but specialized for unit testing LLM outputs. DeepEval incorporates the latest research to evaluate LLM outputs based on metrics such as G-Eval, hallucination, answer relevancy, RAGAS, etc., which uses LLMs and various other NLP models that runs locally on your machine for evaluation. | ||
|
||
We customize models to support more local LLMs services for the evaluation of metrics such as , hallucination, answer relevancy, etc. | ||
|
||
# 🚀 QuickStart | ||
|
||
|
||
## Installation | ||
|
||
``` | ||
pip install ../../../requirements.txt | ||
``` | ||
|
||
## Launch Service of LLM-as-a-Judge | ||
|
||
To setup a LLM model, we can use [tgi-gaudi](https://github.com/huggingface/tgi-gaudi) to launch a service. For example, the follow command is to setup the [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) model on 2 Gaudi2 cards: | ||
|
||
``` | ||
# please set your llm_port and hf_token | ||
docker run -p {your_llm_port}:80 --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e HF_TOKEN={your_hf_token} -e PREFILL_BATCH_BUCKET_SIZE=1 -e BATCH_BUCKET_SIZE=8 --cap-add=sys_nice --ipc=host ghcr.io/huggingface/tgi-gaudi:2.0.5 --model-id mistralai/Mixtral-8x7B-Instruct-v0.1 --max-input-tokens 2048 --max-total-tokens 4096 --sharded true --num-shard 2 --max-batch-total-tokens 65536 --max-batch-prefill-tokens 2048 | ||
``` | ||
|
||
## Writing your first test case | ||
|
||
```python | ||
import pytest | ||
from deepeval import assert_test | ||
from deepeval.metrics import AnswerRelevancyMetric | ||
from deepeval.test_case import LLMTestCase | ||
|
||
def test_case(): | ||
from evals.evaluation.deepeval.models.endpoint_models import TGIEndpointModel | ||
endpoint = TGIEndpointModel(model="http://localhost:{your_llm_port}/generate") | ||
answer_relevancy_metric = AnswerRelevancyMetric(threshold=0.5, model=endpoint) | ||
test_case = LLMTestCase( | ||
input="What if these shoes don't fit?", | ||
# Replace this with the actual output from your LLM application | ||
actual_output="We offer a 30-day full refund at no extra costs.", | ||
retrieval_context=["All customers are eligible for a 30 day full refund at no extra costs."] | ||
) | ||
assert_test(test_case, [answer_relevancy_metric]) | ||
``` | ||
|
||
## Acknowledgements | ||
|
||
The evalution inherits from [deepeval](https://github.com/confident-ai/deepeval) repo. Thank for the founders of Confident AI. |