-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Learnings from building AI-based code generator #13719
Conversation
Your site preview for commit 9f3f943 is ready! 🎉 http://www-testing-pulumi-docs-origin-pr-13719-9f3f943c.s3-website.us-west-2.amazonaws.com. |
Your site preview for commit b467d22 is ready! 🎉 http://www-testing-pulumi-docs-origin-pr-13719-b467d222.s3-website.us-west-2.amazonaws.com. |
Your site preview for commit 1df9502 is ready! 🎉 http://www-testing-pulumi-docs-origin-pr-13719-1df95029.s3-website.us-west-2.amazonaws.com. |
Your site preview for commit b097b50 is ready! 🎉 http://www-testing-pulumi-docs-origin-pr-13719-b097b506.s3-website.us-west-2.amazonaws.com. |
b097b50
to
5268b2c
Compare
Your site preview for commit 5268b2c is ready! 🎉 http://www-testing-pulumi-docs-origin-pr-13719-5268b2cc.s3-website.us-west-2.amazonaws.com. |
Your site preview for commit 51eab42 is ready! 🎉 http://www-testing-pulumi-docs-origin-pr-13719-51eab42b.s3-website.us-west-2.amazonaws.com. |
Your site preview for commit a195091 is ready! 🎉 http://www-testing-pulumi-docs-origin-pr-13719-a1950912.s3-website.us-west-2.amazonaws.com. |
Your site preview for commit 6c9d65b is ready! 🎉 http://www-testing-pulumi-docs-origin-pr-13719-6c9d65bc.s3-website.us-west-2.amazonaws.com. |
Your site preview for commit 4b86a80 is ready! 🎉 http://www-testing-pulumi-docs-origin-pr-13719-4b86a800.s3-website.us-west-2.amazonaws.com. |
Your site preview for commit 274f954 is ready! 🎉 http://www-testing-pulumi-docs-origin-pr-13719-274f9548.s3-website.us-west-2.amazonaws.com. |
Your site preview for commit 3982830 is ready! 🎉 http://www-testing-pulumi-docs-origin-pr-13719-39828303.s3-website.us-west-2.amazonaws.com. |
Your site preview for commit fd1f812 is ready! 🎉 http://www-testing-pulumi-docs-origin-pr-13719-fd1f8127.s3-website.us-west-2.amazonaws.com. |
Your site preview for commit b6681db is ready! 🎉 http://www-testing-pulumi-docs-origin-pr-13719-b6681dbf.s3-website.us-west-2.amazonaws.com. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fascinating article! Well done.
I think it needs a stronger conclusion ( let me know if you want help ) and I left a few comments, but I learned some things, and the cooking angle was fun.
Great job.
- The **programming language**. This information can again come directly from the organizational preferences, or from the user's prior conversations. | ||
- The information about the **type** (or types) that must be created - its name and schema, the package it is in, and the capabilities it supports. | ||
|
||
While all three of the above can be conceptually called the RAG, only the last information is actually stored in the Registry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While all three of the above can be conceptually called the RAG, only the last information is actually stored in the Registry. | |
While all three of the above can be conceptually part of the RAG prompting process, only the last is actually retrieved from the registry. |
I found using "The RAG" confusing. Not sure this is better but I think some way to be more explicit about what step of RAG we are talking about would be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I need to tighten up the use of RAG - I use it as a noun (database) and a process description. I'll think about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tightened up the language a bit: RAG is a technique not database. LMK if it reads better now.
|
||
### Assessing search quality using recall and precision | ||
|
||
To assess how good our RAG is, we need to first understand the two fundamental concepts used in the information retrieval systems: the _recall_ and the _precision_. Imagine that you're looking for apple pie recipes in one of Jamie Oliver's cookbooks. The book has a recipe for a classic American apple pie, a Dutch apple pie and a modern take on a French apple tart. Due to the book's narrative approach with the recipes woven into the stories and context, you've managed to retrieve only the first two recipes but missed the French apple tart. Having retrieved 2 ouf 3 relevant documents, you have achieved a **67% recall**. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great example and keeping with the theme. Love it!
|
||
3. **User feedback**: Every "thumbs down" report gets analyzed to identify patterns and potential improvements. This direct user feedback has been invaluable in refining our system. | ||
|
||
This multi-layered approach helps us maintain high quality through continuous monitoring and improvement. It's not unlike a professional kitchen, where chefs talk about 'mise en place' - having everything in its place before service begins. But the real craft is in the constant tasting, adjusting, and refining during service. Running a code generator in production follows the same rhythm: careful preparation paired with continuous observation and adjustment. Every output - be it a soufflé or a storage bucket - must meet the mark. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like we need a more explicit conclusion. It would be nice to just summarize some key points at the end and perhaps tell them to try it out with a link.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See if you like it better now. The main points I try to articulate:
- This space is moving fast and we need to keep up with the latest tech.
- It only matters if we see hard data that shows improvements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last paragraph is great.
I think the thing that confuses me is for this section starting with "Wrapping up: the lessons for building reliable code generator" seems to actually introduce new information and be a conclusion and that combination is a little confusing.
Like Deterministic generation and Local evaluation pipeline don't seem like underlining what was in the article, but adding new information. Which is fine, but then maybe they should be a section and then you should have a conclusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about something like:
Wrapping up
Building an effective AI-powered code generator requires carefully balancing multiple competing concerns: the raw capabilities of LLMs with retrieved contextual knowledge, semantic search with traditional text matching, and thorough testing with real-world performance.
Our experience has taught us that success lies in treating code generation as a spicy stew ( or something : ... ) that requires continuous monitoring and refinement. The key ingredients are :
- a robust RAG system with well-tuned recall and precision,
- hybrid search capabilities that combine multiple approaches,
- a comprehensive testing and monitoring strategy that spans from development through production.
As we continue to evolve Pulumi's code generation capabilities, we're excited about expanding our self-debugging features and further refining our RAG implementation.
We invite you to try these capabilities in your own projects and share your experiences. Your feedback helps us continue improving and advancing the state of AI-assisted infrastructure development.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great feedback, thanks. Here is what I did:
- Added a new section on testing - I agree it introduces new content and is not a summary of what was said before
- Added a wrap-up section using your suggested text.
- souffle instead of spicy stew
- "hybrid search capabilities" is part of RAG, so the first point covers it.
LMK how you feel about the culinary theme. I toned it down a bit where it felt somewhat forced, but still not completely happy with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the theme. I think we should find a title that pulls it together without being too much.
Ideas:
Mise en Place for Code Generation: A RAG-Based Approach
A Recipe for a Better AI-based code generator
or something like that.
I like the conclusion as well.
|
||
Monitoring these typechecking errors in production can also provide valuable insight into the quality of the RAG and even suggest specific solutions. For example, failure to typecheck a member-access expression is a likely indicator of a missing type schema (a recall problem) or a "wrong" schema brought in by an irrelevant document (a precision problem). | ||
|
||
Self-debugging can also be extended to include the `pulumi preview` command, which is a "dry run" operation before the actual deployment and can detect many real or potential problems such as destructive actions, incorrect configurations that cannot be detected at compile time, dependency conflicts, and policy violations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tangent: I worked on a similar system in the past, and it had a cheat self-debugging stage where I just said something like 'this program has issues and should be improved', without doing the actually type checking. It would often correct errors but occasionally break perfectly valid code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah! I based this on my own experience with LLM when working with code where I often have to iterate to get the code to compile, but apparently this is quite common. I think we will see this approach formalized with next gen LLMs and wrappers like LangChain, where LLM at various points calls you back and asks for help/confirmation.
Today's approach with giving LLM a big prompt and hoping it can handle everything in one fell swoop does not scale.
Co-authored-by: Adam Gordon Bell <[email protected]>
Co-authored-by: Adam Gordon Bell <[email protected]>
Your site preview for commit d352528 is ready! 🎉 http://www-testing-pulumi-docs-origin-pr-13719-d352528b.s3-website.us-west-2.amazonaws.com. |
Your site preview for commit 8e7fd24 is ready! 🎉 http://www-testing-pulumi-docs-origin-pr-13719-8e7fd246.s3-website.us-west-2.amazonaws.com. |
1. Tighten up the use of RAG: it's a technique, not database 2. Strengthen the conclusion
Your site preview for commit ef51933 is ready! 🎉 http://www-testing-pulumi-docs-origin-pr-13719-ef519337.s3-website.us-west-2.amazonaws.com. |
Your site preview for commit 93d44fa is ready! 🎉 http://www-testing-pulumi-docs-origin-pr-13719-93d44faa.s3-website.us-west-2.amazonaws.com. |
Your site preview for commit 356a9bc is ready! 🎉 http://www-testing-pulumi-docs-origin-pr-13719-356a9bc4.s3-website.us-west-2.amazonaws.com. |
Your site preview for commit 4aaceea is ready! 🎉 http://www-testing-pulumi-docs-origin-pr-13719-4aaceeaa.s3-website.us-west-2.amazonaws.com. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed! Very engaging this is, well done. 💯 .
Great feedback gang. Mise en place. ⭐ . Recipe is nice, subtle. LGTM!
Blog post: learnings from building the code gen