Learnings from building AI-based code generator #13719

I think it needs a stronger conclusion ( let me know if you want help ) and I left a few comments, but I learned some things, and the cooking angle was fun.

Great job.

content/blog/codegen-learnings/index.md

adamgordonbell · 2025-01-06T15:59:36Z

content/blog/codegen-learnings/index.md

+- The **programming language**. This information can again come directly from the organizational preferences, or from the user's prior conversations.
+- The information about the **type** (or types) that must be created - its name and schema, the package it is in, and the capabilities it supports.
+
+While all three of the above can be conceptually called the RAG, only the last information is actually stored in the Registry.


Suggested change

While all three of the above can be conceptually called the RAG, only the last information is actually stored in the Registry.

While all three of the above can be conceptually part of the RAG prompting process, only the last is actually retrieved from the registry.

I found using "The RAG" confusing. Not sure this is better but I think some way to be more explicit about what step of RAG we are talking about would be helpful.

Yeah, I need to tighten up the use of RAG - I use it as a noun (database) and a process description. I'll think about it.

I tightened up the language a bit: RAG is a technique not database. LMK if it reads better now.

adamgordonbell · 2025-01-06T16:02:53Z

content/blog/codegen-learnings/index.md

+
+### Assessing search quality using recall and precision
+
+To assess how good our RAG is, we need to first understand the two fundamental concepts used in the information retrieval systems: the _recall_ and the _precision_. Imagine that you're looking for apple pie recipes in one of Jamie Oliver's cookbooks. The book has a recipe for a classic American apple pie, a Dutch apple pie and a modern take on a French apple tart. Due to the book's narrative approach with the recipes woven into the stories and context, you've managed to retrieve only the first two recipes but missed the French apple tart. Having retrieved 2 ouf 3 relevant documents, you have achieved a **67% recall**.


Great example and keeping with the theme. Love it!

adamgordonbell · 2025-01-06T16:09:54Z

content/blog/codegen-learnings/index.md

+
+3. **User feedback**: Every "thumbs down" report gets analyzed to identify patterns and potential improvements. This direct user feedback has been invaluable in refining our system.
+
+This multi-layered approach helps us maintain high quality through continuous monitoring and improvement. It's not unlike a professional kitchen, where chefs talk about 'mise en place' - having everything in its place before service begins. But the real craft is in the constant tasting, adjusting, and refining during service. Running a code generator in production follows the same rhythm: careful preparation paired with continuous observation and adjustment. Every output - be it a soufflé or a storage bucket - must meet the mark.


It feels like we need a more explicit conclusion. It would be nice to just summarize some key points at the end and perhaps tell them to try it out with a link.

See if you like it better now. The main points I try to articulate:

This space is moving fast and we need to keep up with the latest tech.

It only matters if we see hard data that shows improvements.

The last paragraph is great.

I think the thing that confuses me is for this section starting with "Wrapping up: the lessons for building reliable code generator" seems to actually introduce new information and be a conclusion and that combination is a little confusing.

Like Deterministic generation and Local evaluation pipeline don't seem like underlining what was in the article, but adding new information. Which is fine, but then maybe they should be a section and then you should have a conclusion.

What about something like:

Wrapping up

Building an effective AI-powered code generator requires carefully balancing multiple competing concerns: the raw capabilities of LLMs with retrieved contextual knowledge, semantic search with traditional text matching, and thorough testing with real-world performance.

Our experience has taught us that success lies in treating code generation as a spicy stew ( or something : ... ) that requires continuous monitoring and refinement. The key ingredients are :

a robust RAG system with well-tuned recall and precision,

hybrid search capabilities that combine multiple approaches,

a comprehensive testing and monitoring strategy that spans from development through production.

As we continue to evolve Pulumi's code generation capabilities, we're excited about expanding our self-debugging features and further refining our RAG implementation.

We invite you to try these capabilities in your own projects and share your experiences. Your feedback helps us continue improving and advancing the state of AI-assisted infrastructure development.

Great feedback, thanks. Here is what I did:

Added a new section on testing - I agree it introduces new content and is not a summary of what was said before

Added a wrap-up section using your suggested text.

souffle instead of spicy stew

"hybrid search capabilities" is part of RAG, so the first point covers it.

LMK how you feel about the culinary theme. I toned it down a bit where it felt somewhat forced, but still not completely happy with it.

I like the theme. I think we should find a title that pulls it together without being too much.

Ideas:
Mise en Place for Code Generation: A RAG-Based Approach
A Recipe for a Better AI-based code generator

or something like that.

I like the conclusion as well.

adamgordonbell · 2025-01-06T16:13:12Z

content/blog/codegen-learnings/index.md

+
+Monitoring these typechecking errors in production can also provide valuable insight into the quality of the RAG and even suggest specific solutions. For example, failure to typecheck a member-access expression is a likely indicator of a missing type schema (a recall problem) or a "wrong" schema brought in by an irrelevant document (a precision problem).
+
+Self-debugging can also be extended to include the `pulumi preview` command, which is a "dry run" operation before the actual deployment and can detect many real or potential problems such as destructive actions, incorrect configurations that cannot be detected at compile time, dependency conflicts, and policy violations.


tangent: I worked on a similar system in the past, and it had a cheat self-debugging stage where I just said something like 'this program has issues and should be improved', without doing the actually type checking. It would often correct errors but occasionally break perfectly valid code.

Yeah! I based this on my own experience with LLM when working with code where I often have to iterate to get the code to compile, but apparently this is quite common. I think we will see this approach formalized with next gen LLMs and wrappers like LangChain, where LLM at various points calls you back and asks for help/confirmation.
Today's approach with giving LLM a big prompt and hoping it can handle everything in one fell swoop does not scale.

Co-authored-by: Adam Gordon Bell <[email protected]>

pulumi-bot · 2025-01-06T17:17:55Z

Your site preview for commit d352528 is ready! 🎉

http://www-testing-pulumi-docs-origin-pr-13719-d352528b.s3-website.us-west-2.amazonaws.com.

pulumi-bot · 2025-01-06T17:19:50Z

Your site preview for commit 8e7fd24 is ready! 🎉

http://www-testing-pulumi-docs-origin-pr-13719-8e7fd246.s3-website.us-west-2.amazonaws.com.

1. Tighten up the use of RAG: it's a technique, not database 2. Strengthen the conclusion

pulumi-bot · 2025-01-06T18:02:40Z

Your site preview for commit ef51933 is ready! 🎉

http://www-testing-pulumi-docs-origin-pr-13719-ef519337.s3-website.us-west-2.amazonaws.com.

pulumi-bot · 2025-01-06T18:14:36Z

Your site preview for commit 93d44fa is ready! 🎉

http://www-testing-pulumi-docs-origin-pr-13719-93d44faa.s3-website.us-west-2.amazonaws.com.

pulumi-bot · 2025-01-06T21:07:20Z

Your site preview for commit 356a9bc is ready! 🎉

http://www-testing-pulumi-docs-origin-pr-13719-356a9bc4.s3-website.us-west-2.amazonaws.com.

pulumi-bot · 2025-01-06T22:14:13Z

Your site preview for commit 4aaceea is ready! 🎉

http://www-testing-pulumi-docs-origin-pr-13719-4aaceeaa.s3-website.us-west-2.amazonaws.com.

foot

Indeed! Very engaging this is, well done. 💯 .

Great feedback gang. Mise en place. ⭐ . Recipe is nice, subtle. LGTM!

Codegen post - template file

943014b

arturl had a problem deploying to testing December 24, 2024 19:52 — with GitHub Actions Failure

Checkpoint - intro, precision and recall

8cfbce3

arturl had a problem deploying to testing December 26, 2024 07:10 — with GitHub Actions Failure

arturl changed the title ~~Codegen post - template file~~ Learnings from building AI-based code generator Dec 26, 2024

Create meta.png

9f3f943

arturl had a problem deploying to testing December 26, 2024 07:35 — with GitHub Actions Failure

Install KaTeX

b467d22

arturl had a problem deploying to testing December 27, 2024 01:51 — with GitHub Actions Failure

Try to fix the formulas

1df9502

arturl had a problem deploying to testing December 27, 2024 02:13 — with GitHub Actions Failure

arturl had a problem deploying to testing December 27, 2024 03:01 — with GitHub Actions Failure

Fix formula, add flow diagram

5268b2c

arturl force-pushed the arturl/codegen-blog branch from b097b50 to 5268b2c Compare December 27, 2024 03:26

arturl had a problem deploying to testing December 27, 2024 03:26 — with GitHub Actions Failure

Code complete

51eab42

arturl had a problem deploying to testing December 27, 2024 10:05 — with GitHub Actions Failure

Wording

a195091

arturl had a problem deploying to testing December 27, 2024 10:46 — with GitHub Actions Failure

More wording changes

6c9d65b

arturl had a problem deploying to testing December 27, 2024 11:34 — with GitHub Actions Failure

More word-smithing

db00eae

adamgordonbell requested changes Jan 6, 2025

View reviewed changes

Update content/blog/codegen-learnings/index.md

d352528

Co-authored-by: Adam Gordon Bell <[email protected]>

arturl temporarily deployed to testing January 6, 2025 17:11 — with GitHub Actions Inactive

Update content/blog/codegen-learnings/index.md

8e7fd24

Co-authored-by: Adam Gordon Bell <[email protected]>

arturl temporarily deployed to testing January 6, 2025 17:12 — with GitHub Actions Inactive

Code review feedback

ef51933

1. Tighten up the use of RAG: it's a technique, not database 2. Strengthen the conclusion

arturl temporarily deployed to testing January 6, 2025 17:55 — with GitHub Actions Inactive

arturl requested a review from adamgordonbell January 6, 2025 17:59

Update publish date

93d44fa

arturl temporarily deployed to testing January 6, 2025 18:07 — with GitHub Actions Inactive

Improved "Wrapping up" section

356a9bc

arturl temporarily deployed to testing January 6, 2025 20:59 — with GitHub Actions Inactive

adamgordonbell approved these changes Jan 6, 2025

View reviewed changes

Update title

4aaceea

arturl temporarily deployed to testing January 6, 2025 22:07 — with GitHub Actions Inactive

foot approved these changes Jan 7, 2025

View reviewed changes

arturl merged commit fa2e371 into master Jan 7, 2025
8 checks passed

arturl temporarily deployed to testing January 7, 2025 15:08 — with GitHub Actions Inactive

arturl deleted the arturl/codegen-blog branch January 7, 2025 15:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Learnings from building AI-based code generator #13719

Learnings from building AI-based code generator #13719

arturl commented Dec 24, 2024 •

edited

Loading

pulumi-bot commented Dec 26, 2024

pulumi-bot commented Dec 27, 2024

pulumi-bot commented Dec 27, 2024

pulumi-bot commented Dec 27, 2024

pulumi-bot commented Dec 27, 2024

pulumi-bot commented Dec 27, 2024

pulumi-bot commented Dec 27, 2024

pulumi-bot commented Dec 27, 2024

pulumi-bot commented Jan 4, 2025

pulumi-bot commented Jan 4, 2025

pulumi-bot commented Jan 4, 2025

pulumi-bot commented Jan 4, 2025

pulumi-bot commented Jan 4, 2025

adamgordonbell left a comment

adamgordonbell Jan 6, 2025

arturl Jan 6, 2025

arturl Jan 6, 2025

adamgordonbell Jan 6, 2025

adamgordonbell Jan 6, 2025

arturl Jan 6, 2025

adamgordonbell Jan 6, 2025

adamgordonbell Jan 6, 2025 •

edited

Loading

arturl Jan 6, 2025

adamgordonbell Jan 6, 2025

adamgordonbell Jan 6, 2025

arturl Jan 6, 2025

pulumi-bot commented Jan 6, 2025

pulumi-bot commented Jan 6, 2025

pulumi-bot commented Jan 6, 2025

pulumi-bot commented Jan 6, 2025

pulumi-bot commented Jan 6, 2025

pulumi-bot commented Jan 6, 2025

foot left a comment

	While all three of the above can be conceptually called the RAG, only the last information is actually stored in the Registry.
	While all three of the above can be conceptually part of the RAG prompting process, only the last is actually retrieved from the registry.


		### Assessing search quality using recall and precision

		To assess how good our RAG is, we need to first understand the two fundamental concepts used in the information retrieval systems: the _recall_ and the _precision_. Imagine that you're looking for apple pie recipes in one of Jamie Oliver's cookbooks. The book has a recipe for a classic American apple pie, a Dutch apple pie and a modern take on a French apple tart. Due to the book's narrative approach with the recipes woven into the stories and context, you've managed to retrieve only the first two recipes but missed the French apple tart. Having retrieved 2 ouf 3 relevant documents, you have achieved a 67% recall.


		3. User feedback: Every "thumbs down" report gets analyzed to identify patterns and potential improvements. This direct user feedback has been invaluable in refining our system.

		This multi-layered approach helps us maintain high quality through continuous monitoring and improvement. It's not unlike a professional kitchen, where chefs talk about 'mise en place' - having everything in its place before service begins. But the real craft is in the constant tasting, adjusting, and refining during service. Running a code generator in production follows the same rhythm: careful preparation paired with continuous observation and adjustment. Every output - be it a soufflé or a storage bucket - must meet the mark.


		Monitoring these typechecking errors in production can also provide valuable insight into the quality of the RAG and even suggest specific solutions. For example, failure to typecheck a member-access expression is a likely indicator of a missing type schema (a recall problem) or a "wrong" schema brought in by an irrelevant document (a precision problem).

		Self-debugging can also be extended to include the `pulumi preview` command, which is a "dry run" operation before the actual deployment and can detect many real or potential problems such as destructive actions, incorrect configurations that cannot be detected at compile time, dependency conflicts, and policy violations.

Learnings from building AI-based code generator #13719

Learnings from building AI-based code generator #13719

Conversation

arturl commented Dec 24, 2024 • edited Loading

pulumi-bot commented Dec 26, 2024

pulumi-bot commented Dec 27, 2024

pulumi-bot commented Dec 27, 2024

pulumi-bot commented Dec 27, 2024

pulumi-bot commented Dec 27, 2024

pulumi-bot commented Dec 27, 2024

pulumi-bot commented Dec 27, 2024

pulumi-bot commented Dec 27, 2024

pulumi-bot commented Jan 4, 2025

pulumi-bot commented Jan 4, 2025

pulumi-bot commented Jan 4, 2025

pulumi-bot commented Jan 4, 2025

pulumi-bot commented Jan 4, 2025

adamgordonbell left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adamgordonbell Jan 6, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pulumi-bot commented Jan 6, 2025

pulumi-bot commented Jan 6, 2025

pulumi-bot commented Jan 6, 2025

pulumi-bot commented Jan 6, 2025

pulumi-bot commented Jan 6, 2025

pulumi-bot commented Jan 6, 2025

foot left a comment

Choose a reason for hiding this comment

arturl commented Dec 24, 2024 •

edited

Loading

adamgordonbell Jan 6, 2025 •

edited

Loading