Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add prompty parser #687

Merged
merged 24 commits into from
Sep 4, 2024
Merged

Add prompty parser #687

merged 24 commits into from
Sep 4, 2024

Conversation

pelikhan
Copy link
Member

@pelikhan pelikhan commented Sep 4, 2024

This pull request adds a prompty parser and documentation for git hooks. The prompty parser allows for parsing markdown strings with frontmatter and content split into roles. The git hooks documentation explains how to attach the commit-msg git hook to run the message generation on demand using the husky framework.

  • 🖊️ The markdown file docs/src/content/docs/guides/auto-git-commit-message.mdx underwent several modifications. Emojis were removed from section headers and there was an expansion in the "Running the Script" section, including changes to the Genaiscript command and a detailed guide on "Using git hooks".

  • 👀 The regular expressions in the packages/core/src/mustache.ts file received an update, which now includes 'assistant' in its search criteria. This appears to be a change in how the code detects and handles roles within the system.

  • 📝 The tests available in packages/core/src/prompty.test.ts were improved by adding new test cases for the promptyParse function. These tests cover scenarios such as parsing an empty markdown string, a markdown string without frontmatter, one with valid frontmatter, with content split into roles, and content without roles. These new tests ensure that the function works as expected in various cases.

  • 🌟 A new source file packages/core/src/prompty.ts was introduced, which provides an implementation of the promptyParse function. This function parses the text input into a structure that includes frontmatter, content, and messages with specific roles. This addition likely enhances how the system processes and interprets text data.

generated by pr-describe


You can also attach to the [commit-msg](https://git-scm.com/docs/githooks#_commit_msg) git hook to run the message generation on demand.
Using the [huksy](https://typicode.github.io/husky/) framework, we can register the execution
of genaiscript in the `.husky/commit-msg` file.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect tool name 'huksy'; the correct name is 'Husky'.

generated by pr-docs-review-commit incorrect_tool_name

so that it gets populated in the `env.files` variable.

```bash title=".husky/commit-msg"
npx --yes genaiscript run commit-msg "$1"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect usage of 'npx --yes'; it should be removed as it is not necessary when running local npm scripts or dependencies.

generated by pr-docs-review-commit incorrect_usage


...

await host.writeText(msg.filename, message)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect code comment 'otherwize'; the correct spelling is 'otherwise'.

generated by pr-docs-review-commit incorrect_code_comment


...

await host.writeText(msg.filename, message)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code comment 'cancel("commit message already exists")' suggests using a function 'cancel' which is not defined or explained in the provided code snippet.

generated by pr-docs-review-commit incorrect_code_comment

@@ -18,7 +18,7 @@ export async function interpolateVariables(

// remove prompty roles
// https://github.com/microsoft/prompty/blob/main/runtime/prompty/prompty/parsers.py#L113C21-L113C77
content = content.replace(/^\s*(system|user):\s*$/gim, "\n")
content = content.replace(/^\s*(system|user|assistant)\s*:\s*$/gim, "\n")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regular expression used to replace roles in the content has been updated to include 'assistant' along with 'system' and 'user'. This change might affect the existing functionality if the 'assistant' role was not intended to be replaced in the content.

generated by pr-review-commit regex_update

@@ -0,0 +1,75 @@
import { promptyParse } from "./prompty"
import { describe, test, beforeEach } from "node:test"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The imported modules 'node:test' and 'node:assert/strict' might not exist. Please ensure that these modules are available in the node environment.

generated by pr-review-commit import_error

Copy link

github-actions bot commented Sep 4, 2024

The changes in the GIT_DIFF include:

  1. In the file mustache.ts, the parser is updated to recognize the 'assistant' role in addition to 'user' and 'system'. This means the assistant's responses can now be parsed correctly.

  2. Two new files are added: prompty.test.ts and prompty.ts.

    a. prompty.test.ts implements a variety of tests for the promptyParse function.

    b. prompty.ts implements the promptyParse function. This function parses chat transcripts, splitting them into 'system', 'user', and 'assistant' roles. It also parses markdown frontmatter into a structured format.

The changes appear to be well-written and well-tested.

However, I have a few suggestions:

  1. The promptyParse function in prompty.ts could be refactored for readability and efficiency, as the current implementation requires multiple passes through the content.

  2. The parsing of roles in the promptyParse function assumes that every line starting with 'system', 'user', or 'assistant' will be followed by a colon, but there is no error handling or fallback for cases where this is not true.

Overall, these changes look good, but some small improvements could be made to enhance readability and robust handling of unexpected input.

So my response is "LGTM 🚀, but with a few possible improvements to consider."

generated by pr-review

...

await host.writeText(msg.filename, message)
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The section "Using git hooks" introduces the use of a third-party tool called "huksy" which is likely a typo and should be corrected to "husky".

generated by pr-docs-review-commit documentation_content

@@ -309,6 +310,7 @@ Commands:
query
tokens [options] <files...> Count tokens in a set of files
jsonl2json Converts JSONL files to a JSON file
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The command prompty is mentioned but it seems to be a duplicate of the parse prompty command which is explained in detail later in the document. This could lead to confusion and should be clarified or removed if it's indeed a duplicate.

generated by pr-docs-review-commit documentation_content

file input JSONL files

Options:
-h, --help display help for command
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The section parse prompty is introduced but the description "Converts .prompty files to genaiscript" is not accurate since .prompty is not a recognized file extension. This should be corrected to reflect the actual functionality of the command.

generated by pr-docs-review-commit documentation_content

.command("prompty")
.description("Converts .prompty files to genaiscript")
.argument("<file...>", "input JSONL files")
.action(prompty2genaiscript)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The argument description for the "prompty" command is incorrect. It should not be "input JSONL files" but rather "input .prompty files".

generated by pr-review-commit incorrect_argument_description

const script = promptyToGenAIScript(doc)
await writeText(gf, script)
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no error handling for the async operations within the "prompty2genaiscript" function. If an error occurs during file reading or writing, it will not be caught and handled.

generated by pr-review-commit missing_error_handling

.join("\n")

return src
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function "promptyToGenAIScript" does not validate its input. It assumes that the input "doc" is always in the correct format, which may not always be the case.

generated by pr-review-commit missing_input_validation

const fs = await expandFiles(files)
for (const f of fs) {
console.log(f)
const gf = replaceExt(f, ".genai.mts")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no error handling for the promptyParse function. If the parsing fails, the error will not be caught and the application may crash.

generated by pr-review-commit missing_error_handling

.join("\n")

return src
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'promptyToGenAIScript' function is quite complex and long. Consider breaking it down into smaller, more manageable functions.

generated by pr-review-commit complex_function

const script = promptyToGenAIScript(doc)
await writeText(gf, script)
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functions 'expandFiles', 'replaceExt', 'readText', and 'writeText' are used but not imported or defined in this file.

generated by pr-review-commit missing_dependency

npx --yes genaiscript run commit-msg "$1"
```

In the script, we check if the content of the file already has a user message, otherwize generate a new message.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo found, change "otherwize" to "otherwise".

generated by pr-docs-review-commit typo

@@ -108,7 +110,35 @@ Then you can run the script using:
npm run gcm
```

## Using git hooks

You can also attach to the [commit-msg](https://git-scm.com/docs/githooks#_commit_msg) git hook to run the message generation on demand.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo found, change "huksy" to "husky".

generated by pr-docs-review-commit typo

@@ -309,6 +309,7 @@ Commands:
query
tokens [options] <files...> Count tokens in a set of files
jsonl2json Converts JSONL files to a JSON file
prompty [options] <file...> Converts .prompty files to genaiscript
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect command name, change "prompty" to "parse prompty".

generated by pr-docs-review-commit command_name

}

export function promptyToGenAIScript(doc: PromptyDocument) {
const { messages, meta } = doc
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing escape character in regular expression

generated by pr-review-commit missing_escape

if (PROMPTY_REGEX.test(filename)) {
const doc = await promptyParse(content)
content = await promptyToGenAIScript(doc)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Async function without await expression

generated by pr-review-commit async_without_await

@@ -49,6 +49,11 @@ export function parseBoolean(s: string) {
: undefined
}

export function deleteUndefinedValues<T extends Record<string, any>>(o: T): T {
for (const k in o) if (o[k] === undefined) delete o[k]
return o
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function mutates its input, which can lead to unexpected side effects

generated by pr-review-commit mutate_input

export function deleteUndefinedValues<T extends Record<string, any>>(o: T): T {
for (const k in o) if (o[k] === undefined) delete o[k]
return o
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid mutating function parameters. It can lead to unexpected behavior.

generated by pr-review-commit mutation

@@ -389,6 +390,21 @@ Options:
-h, --help display help for command
```

### `parse prompty`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect heading format, remove the emoji to match MDX standards.

generated by pr-docs-review-commit mdx_heading_format

Extra fields that genaiscript use:

- `files` to specify one or many files to populate `env.files`
- `tests` to specify one or many tests
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New documentation file for .prompty files needs review for accuracy and completeness.

generated by pr-docs-review-commit mdx_new_file

?.split(/\n/g)
.filter((l) => l && !/^#/.test(l)) // filter out comments
.join("\n")
if (msgContent) cancel("commit message already exists")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure the logic for checking if a commit message already exists is correct and handles edge cases properly.

generated by pr-docs-review-commit script_logic


...

await host.writeText(msg.filename, message)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verify that the script correctly writes the commit message to the file specified by the git hook.

generated by pr-docs-review-commit script_logic

@pelikhan pelikhan changed the title Add prompty parser and git hooks documentation Add prompty parser Sep 4, 2024
packages/core/src/prompty.ts Dismissed Show dismissed Hide dismissed
@pelikhan pelikhan merged commit cbc6b27 into main Sep 4, 2024
11 checks passed
@pelikhan pelikhan deleted the promptyrun branch September 4, 2024 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant