diff --git a/docs/evaluation.md b/docs/evaluation.md index b4bb19721..1c303af3f 100644 --- a/docs/evaluation.md +++ b/docs/evaluation.md @@ -18,7 +18,7 @@ Genkit faithfulness and answer relevancy metrics: ```ts import { genkit } from 'genkit'; import { genkitEval, GenkitMetric } from '@genkit-ai/evaluator'; -import { vertexAI, textEmbedding004, gemini15Pro } from '@genkit-ai/vertexai'; +import { vertexAI, textEmbedding004, gemini15Flash, gemini15Pro } from '@genkit-ai/vertexai'; const ai = genkit({ plugins: [ diff --git a/docs/plugin-authoring-evaluator.md b/docs/plugin-authoring-evaluator.md index b0338c838..d9ec72b26 100644 --- a/docs/plugin-authoring-evaluator.md +++ b/docs/plugin-authoring-evaluator.md @@ -90,7 +90,7 @@ export async function deliciousnessScore< throw new Error('Output is required for Deliciousness detection'); } - //Hydrate the prompt and generate an evaluation result + // Hydrate the prompt and generate an evaluation result const deliciousnessPrompt = getDeliciousnessPrompt(ai); const response = await deliciousnessPrompt( { @@ -152,13 +152,6 @@ export function createDeliciousnessEvaluator< } ``` - - The `defineEvaluator` method is similar to other Genkit constructors like `defineFlow`, `defineRetriever` etc. The user should provide an `EvaluatorFn` to the `defineEvaluator` callback. The `EvaluatorFn` accepts a `BaseEvalDataPoint` which corresponds to a single entry in a dataset under evaluation, along with an optional custom options parameter if specified. The function, should process the datapoint and return an `EvalResponse` object. Here are the Zod Schemas for `BaseEvalDataPoint` and `EvalResponse`: @@ -268,7 +261,7 @@ export function createUSPhoneRegexEvaluator(ai: Genkit): EvaluatorAction { } ``` -## Configuration +## Putting it together ### Plugin definition @@ -352,4 +345,4 @@ genkit eval:run deliciousness_dataset.json Navigate to `localhost:4000/evaluate` to view your results in the Genkit UI. -It is important to note that confidence in custom evaluators will increase as you benchmark them with standard datasets or approaches. Iterate on the results of such benchmarks to improve your evaluators' performance till it reaches the desired quality. \ No newline at end of file +It is important to note that confidence in custom evaluators will increase as you benchmark them with standard datasets or approaches. Iterate on the results of such benchmarks to improve your evaluators' performance till it reaches the desired quality.