Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prompt reorg for caching + usage collection #743

Merged
merged 9 commits into from
Oct 1, 2024
5 changes: 5 additions & 0 deletions docs/genaisrc/genaiscript.d.ts

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

11 changes: 10 additions & 1 deletion docs/src/content/docs/reference/scripts/context.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@
### Referencing

The `def` function returns a variable name that can be used in the prompt.
The name might be formatted diferently to accommodate the model's preference.
The name might be formatted differently to accommodate the model's preference.

```js "const f = "
const f = def("FILE", file)
Expand Down Expand Up @@ -182,6 +182,15 @@
def("FILE", env.files, { sliceSample: 100 })
```

### Prompt Caching

You can specify `ephemeral: true` to turn on some prompt caching optimization. In paricular, a `def` with `ephemeral` will be rendered at the back of the prompt

Check notice on line 187 in docs/src/content/docs/reference/scripts/context.md

View workflow job for this annotation

GitHub Actions / build

There is a typo in the word "particular."
pelikhan marked this conversation as resolved.
Show resolved Hide resolved
to persist the [cache prefix](https://openai.com/index/api-prompt-caching/).

```js
def("FILE", env.files, { ephemeral: true })
```

Check failure on line 192 in docs/src/content/docs/reference/scripts/context.md

View workflow job for this annotation

GitHub Actions / build

The section on prompt caching may contain incorrect or unclear information regarding the use of `ephemeral: true`. The documentation should clearly explain the effect of this parameter on prompt caching and the rendering order of `def` in the prompt.
pelikhan marked this conversation as resolved.
Show resolved Hide resolved

## Data definition (`defData`)

The `defData` function offers additional formatting options for converting a data object into a textual representation. It supports rendering objects as YAML, JSON, or CSV (formatted as a markdown table).
Expand Down
5 changes: 5 additions & 0 deletions genaisrc/genaiscript.d.ts

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions packages/auto/genaiscript.d.ts

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

87 changes: 59 additions & 28 deletions packages/core/src/chat.ts
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ import {
ChatCompletionResponse,
ChatCompletionsOptions,
ChatCompletionTool,
ChatCompletionUsage,
ChatCompletionUsages,
ChatCompletionUserMessageParam,
CreateChatCompletionRequest,
} from "./chattypes"
Expand Down Expand Up @@ -364,6 +366,7 @@ function structurifyChatSession(
schemas: Record<string, JSONSchema>,
genVars: Record<string, string>,
options: GenerationOptions,
usages: ChatCompletionUsages,
others?: {
resp?: ChatCompletionResponse
err?: any
Expand Down Expand Up @@ -426,17 +429,20 @@ function structurifyChatSession(
error,
genVars,
schemas,
usages,
}
}

async function processChatMessage(
req: CreateChatCompletionRequest,
resp: ChatCompletionResponse,
messages: ChatCompletionMessageParam[],
tools: ToolCallback[],
chatParticipants: ChatParticipant[],
schemas: Record<string, JSONSchema>,
genVars: Record<string, string>,
options: GenerationOptions
options: GenerationOptions,
usages: ChatCompletionUsages
pelikhan marked this conversation as resolved.
Show resolved Hide resolved
): Promise<RunPromptResult> {
const {
stats,
Expand All @@ -445,6 +451,8 @@ async function processChatMessage(
cancellationToken,
} = options

accumulateChatUsage(usages, req.model, resp.usage)
pelikhan marked this conversation as resolved.
Show resolved Hide resolved

if (resp.text)
messages.push({
role: "assistant",
Expand Down Expand Up @@ -534,11 +542,29 @@ export function mergeGenerationOptions(
}
}

function accumulateChatUsage(
usages: ChatCompletionUsages,
model: string,
usage: ChatCompletionUsage
) {
if (!usage) return

const u =
usages[model] ??
(usages[model] = <ChatCompletionUsage>{
completion_tokens: 0,
prompt_tokens: 0,
total_tokens: 0,
})
u.completion_tokens += u.completion_tokens
u.prompt_tokens += u.prompt_tokens
u.total_tokens += u.total_tokens
}

export async function executeChatSession(
connectionToken: LanguageModelConfiguration,
cancellationToken: CancellationToken,
messages: ChatCompletionMessageParam[],
vars: Partial<ExpansionVariables>,
toolDefinitions: ToolCallback[],
schemas: Record<string, JSONSchema>,
completer: ChatCompletionHandler,
Expand Down Expand Up @@ -567,6 +593,7 @@ export async function executeChatSession(
: undefined
trace.startDetails(`🧠 llm chat`)
if (tools?.length) trace.detailsFenced(`🛠️ tools`, tools, "yaml")
const usages: ChatCompletionUsages = {}
try {
let genVars: Record<string, string>
while (true) {
Expand All @@ -585,34 +612,35 @@ export async function executeChatSession(
let resp: ChatCompletionResponse
try {
checkCancelled(cancellationToken)
const req: CreateChatCompletionRequest = {
model,
temperature: temperature,
top_p: topP,
max_tokens: maxTokens,
seed,
stream: true,
messages,
tools,
response_format:
responseType === "json_object"
? { type: responseType }
: responseType === "json_schema"
? {
type: "json_schema",
json_schema: {
name: "result",
schema: toStrictJSONSchema(
responseSchema
),
strict: true,
},
}
: undefined,
}
try {
trace.startDetails(`📤 llm request`)
resp = await completer(
{
model,
temperature: temperature,
top_p: topP,
max_tokens: maxTokens,
seed,
stream: true,
messages,
tools,
response_format:
responseType === "json_object"
? { type: responseType }
: responseType === "json_schema"
? {
type: "json_schema",
json_schema: {
name: "result",
schema: toStrictJSONSchema(
responseSchema
),
strict: true,
},
}
: undefined,
},
req,
connectionToken,
genOptions,
trace
Expand All @@ -625,13 +653,15 @@ export async function executeChatSession(
}

const output = await processChatMessage(
req,
resp,
messages,
toolDefinitions,
chatParticipants,
schemas,
genVars,
genOptions
genOptions,
usages
)
if (output) return output
} catch (err) {
Expand All @@ -640,6 +670,7 @@ export async function executeChatSession(
schemas,
genVars,
genOptions,
usages,
{ resp, err }
)
}
Expand Down
10 changes: 10 additions & 0 deletions packages/core/src/chattypes.ts
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,15 @@ export interface AICIRequest {
}

// Aliases for OpenAI chat completion types
export type ChatCompletionUsage = Omit<
OpenAI.Completions.CompletionUsage,
"completion_tokens_details"
>

/**
* Per model storage of chat completion usages.
*/
export type ChatCompletionUsages = Record<string, ChatCompletionUsage>

// Text content part of a chat completion
export type ChatCompletionContentPartText =
Expand Down Expand Up @@ -99,6 +108,7 @@ export interface ChatCompletionResponse {
toolCalls?: ChatCompletionToolCall[] // List of tool calls made during the response
finishReason?: // Reason why the chat completion finished
"stop" | "length" | "tool_calls" | "content_filter" | "cancel" | "fail"
usage?: ChatCompletionUsage // Usage information for the completion
}

// Alias for OpenAI's API error type
Expand Down
5 changes: 5 additions & 0 deletions packages/core/src/genaisrc/genaiscript.d.ts

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

25 changes: 19 additions & 6 deletions packages/core/src/openai.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import { normalizeInt, trimTrailingSlash } from "./util"
import { logVerbose, normalizeInt, trimTrailingSlash } from "./util"
import { LanguageModelConfiguration, host } from "./host"
import {
AZURE_OPENAI_API_VERSION,
Expand All @@ -19,6 +19,7 @@ import {
ChatCompletionToolCall,
ChatCompletionResponse,
ChatCompletionChunk,
ChatCompletionUsage,
} from "./chattypes"
import { resolveTokenEncoder } from "./encoders"
import { toSignal } from "./cancellation"
Expand Down Expand Up @@ -93,17 +94,20 @@ export const OpenAIChatCompletion: ChatCompletionHandler = async (
return { text: cached, finishReason: cachedFinishReason, cached: true }
}

const r2 = { ...req, model }
const r2 = {
...req,
stream: true,
stream_options: { include_usage: true },
model,
}
let postReq: any = r2

let url = ""
const toolCalls: ChatCompletionToolCall[] = []

if (cfg.type === "openai" || cfg.type === "localai") {
r2.stream = true
url = trimTrailingSlash(cfg.base) + "/chat/completions"
} else if (cfg.type === "azure") {
r2.stream = true
delete r2.model
url =
trimTrailingSlash(cfg.base) +
Expand Down Expand Up @@ -175,6 +179,7 @@ export const OpenAIChatCompletion: ChatCompletionHandler = async (
let finishReason: ChatCompletionResponse["finishReason"] = undefined
let chatResp = ""
let pref = ""
let usage: ChatCompletionUsage

const decoder = host.createUTF8Decoder()
if (r.body.getReader) {
Expand All @@ -193,15 +198,22 @@ export const OpenAIChatCompletion: ChatCompletionHandler = async (
if (cancellationToken?.isCancellationRequested) finishReason = "cancel"

trace.appendContent("\n\n")
trace.itemValue(`finish reason`, finishReason)
trace.itemValue(`🏁 finish reason`, finishReason)
if (usage) {
trace.itemValue(
`🪙 tokens`,
`${usage.total_tokens} total, ${usage.prompt_tokens} prompt, ${usage.completion_tokens} completion`
)
}

if (done && finishReason === "stop")
await cacheStore.set(
cachedKey,
{ text: chatResp, finishReason },
{ trace }
)

return { text: chatResp, toolCalls, finishReason }
return { text: chatResp, toolCalls, finishReason, usage }

pelikhan marked this conversation as resolved.
Show resolved Hide resolved
function doChunk(value: Uint8Array) {
// Massage and parse the chunk of data
Expand All @@ -216,6 +228,7 @@ export const OpenAIChatCompletion: ChatCompletionHandler = async (
}
try {
const obj: ChatCompletionChunk = JSON.parse(json)
if (obj.usage) usage = obj.usage
if (!obj.choices?.length) return ""
else if (obj.choices?.length != 1)
throw new Error("too many choices in response")
Expand Down
2 changes: 1 addition & 1 deletion packages/core/src/promptcontext.ts
Original file line number Diff line number Diff line change
Expand Up @@ -234,7 +234,7 @@ export async function createPromptContext(
})

// Freeze project options to prevent modification
const projectOptions = Object.freeze({ prj, vars, env })
const projectOptions = Object.freeze({ prj, env })
const ctx: PromptContext & RunPromptContextNode = {
...createChatGenerationContext(options, trace, projectOptions),
script: () => {},
Expand Down
Loading
Loading