docs(evals): Updated docs for evals #1512

ssbushi · 2024-12-13T04:08:56Z

Checklist (if applicable):

PR title is following https://www.conventionalcommits.org/en/v1.0.0/
Tested (manually, unit tested, etc.)
Docs updated (updated docs or a docs bug required)

docs/evaluation.md

ssbushi · 2024-12-15T00:40:33Z

docs/evaluation.md

+
+You can see the details of your evaluation run in this page, including original input, extracted context and metrics (if any). 
+
+<!-- TODO, more convincing conclusion here? -->


@mjchristy thoughts?

ssbushi · 2024-12-15T00:43:55Z

docs/evaluation.md

-You can also provide custom extractors to be used in `eval:extractData` and
-`eval:flow` commands. Custom extractors allow you to override the default
-extraction logic giving you more power in creating datasets and evaluating them.
+<!-- TOOD: Any caveats on where this approach does not work (ES5 or something?)  -->


@pavelgj thoughts?

mjchristy

Overall, this LGTM. I wonder if we need to give them a RAG flow to evaluate, esp. given that we chose 'maliciousness'.

mjchristy · 2024-12-16T16:09:17Z

docs/evaluation.md

-  // ...
-});
-```
+1. **Inference-based evaluation**: In this type of evaluation, a system is run on a collection of pre-determined inputs and the corresponding outputs are assessed for quality. 


"A system is run on" sounds strange to me. Maybe we just say, "This type of evaluation is run against a collection of..." or similar

s/1./*

Numbered lists are really more for sequences (i.e., procedures, etc.).

mjchristy · 2024-12-16T16:10:23Z

docs/evaluation.md

-```posix-terminal
-  npm install @genkit-ai/evaluator @genkit-ai/vertexai
-```
+2. **Raw evaluation**: This type of evaluation directly assesses the quality of inputs without any inference. This approach typically is used with automated evaluation using metrics. All required fields for evaluation (`context`, `output`) must be present in the input dataset. This is useful when you have data coming from an external source (eg: collected from your production traces) and you simply want to have an objective measurement of the quality of the collected data.


Do you mean quality of "outputs" here?

Hmm, no? Because the inputs contain the outputs ;)

I can see how this can be confusing though.

ssbushi · 2024-12-18T15:58:24Z

Overall, this LGTM. I wonder if we need to give them a RAG flow to evaluate, esp. given that we chose 'maliciousness'.

Maliciousness is not RAG specific.

thedmail · 2024-12-18T22:49:23Z

docs/evaluation.md

-  // ...
-});
-```
+1. **Inference-based evaluation**: In this type of evaluation, a system is run on a collection of pre-determined inputs and the corresponding outputs are assessed for quality. 


s/1./*

Numbered lists are really more for sequences (i.e., procedures, etc.).

thedmail · 2024-12-18T22:49:59Z

docs/evaluation.md

-```posix-terminal
-  npm install @genkit-ai/evaluator @genkit-ai/vertexai
-```
+2. **Raw evaluation**: This type of evaluation directly assesses the quality of inputs without any inference. This approach typically is used with automated evaluation using metrics. All required fields for evaluation (`context`, `output`) must be present in the input dataset. This is useful when you have data coming from an external source (eg: collected from your production traces) and you simply want to have an objective measurement of the quality of the collected data.


s/2./*
(see comment on L17)

ssbushi added 2 commits December 13, 2024 04:04

evals docs

d26761f

fixes;

d2bebb8

ssbushi requested review from pavelgj, mjchristy and shrutip90 December 13, 2024 04:23

ssbushi marked this pull request as ready for review December 13, 2024 04:25

ssbushi commented Dec 13, 2024

View reviewed changes

docs/evaluation.md Outdated Show resolved Hide resolved

ssbushi requested a review from thedmail December 13, 2024 04:34

ssbushi mentioned this pull request Dec 13, 2024

feat(evals): Make context support any type. #1517

Merged

3 tasks

ssbushi added 2 commits December 13, 2024 19:36

update to new JSON type

27de483

new structure

438b594

ssbushi commented Dec 15, 2024

View reviewed changes

ssbushi added 2 commits December 15, 2024 01:03

final touches

4598cbf

Merge branch 'main' into sb/evalsDocs

e675231

mjchristy reviewed Dec 16, 2024

View reviewed changes

thedmail reviewed Dec 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(evals): Updated docs for evals #1512

docs(evals): Updated docs for evals #1512

ssbushi commented Dec 13, 2024 •

edited

Loading

ssbushi Dec 15, 2024

ssbushi Dec 15, 2024

mjchristy left a comment

mjchristy Dec 16, 2024

thedmail Dec 18, 2024

mjchristy Dec 16, 2024

ssbushi Dec 18, 2024

ssbushi commented Dec 18, 2024

thedmail Dec 18, 2024

thedmail Dec 18, 2024


		You can see the details of your evaluation run in this page, including original input, extracted context and metrics (if any).

		<!-- TODO, more convincing conclusion here? -->

docs(evals): Updated docs for evals #1512

Are you sure you want to change the base?

docs(evals): Updated docs for evals #1512

Conversation

ssbushi commented Dec 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mjchristy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ssbushi commented Dec 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ssbushi commented Dec 13, 2024 •

edited

Loading