Skip to content

Commit

Permalink
fixes;
Browse files Browse the repository at this point in the history
  • Loading branch information
ssbushi committed Dec 13, 2024
1 parent d26761f commit d2bebb8
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 11 deletions.
2 changes: 1 addition & 1 deletion docs/evaluation.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Genkit faithfulness and answer relevancy metrics:
```ts
import { genkit } from 'genkit';
import { genkitEval, GenkitMetric } from '@genkit-ai/evaluator';
import { vertexAI, textEmbedding004, gemini15Pro } from '@genkit-ai/vertexai';
import { vertexAI, textEmbedding004, gemini15Flash, gemini15Pro } from '@genkit-ai/vertexai';

const ai = genkit({
plugins: [
Expand Down
13 changes: 3 additions & 10 deletions docs/plugin-authoring-evaluator.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ export async function deliciousnessScore<
throw new Error('Output is required for Deliciousness detection');
}

//Hydrate the prompt and generate an evaluation result
// Hydrate the prompt and generate an evaluation result
const deliciousnessPrompt = getDeliciousnessPrompt(ai);
const response = await deliciousnessPrompt(
{
Expand Down Expand Up @@ -152,13 +152,6 @@ export function createDeliciousnessEvaluator<
}
```

<!-- TODO: Test out the deliciousness evaluator
export const deliciousness = createDeliciousnessEvaluator(ai, gemini15Pro);
-->

The `defineEvaluator` method is similar to other Genkit constructors like `defineFlow`, `defineRetriever` etc. The user should provide an `EvaluatorFn` to the `defineEvaluator` callback. The `EvaluatorFn` accepts a `BaseEvalDataPoint` which corresponds to a single entry in a dataset under evaluation, along with an optional custom options parameter if specified. The function, should process the datapoint and return an `EvalResponse` object.

Here are the Zod Schemas for `BaseEvalDataPoint` and `EvalResponse`:
Expand Down Expand Up @@ -268,7 +261,7 @@ export function createUSPhoneRegexEvaluator(ai: Genkit): EvaluatorAction {
}
```

## Configuration
## Putting it together

### Plugin definition

Expand Down Expand Up @@ -352,4 +345,4 @@ genkit eval:run deliciousness_dataset.json

Navigate to `localhost:4000/evaluate` to view your results in the Genkit UI.

It is important to note that confidence in custom evaluators will increase as you benchmark them with standard datasets or approaches. Iterate on the results of such benchmarks to improve your evaluators' performance till it reaches the desired quality.
It is important to note that confidence in custom evaluators will increase as you benchmark them with standard datasets or approaches. Iterate on the results of such benchmarks to improve your evaluators' performance till it reaches the desired quality.

0 comments on commit d2bebb8

Please sign in to comment.