Why Code Reviews Should Include Prompts in the Age of AI
If Prompting for Code is “Engineering”, We Should Review the Prompts
Given the propensity of LLMs to hallucinate and their eagerness to please, it is critical to review all LLM-generated code with a very skeptical eye. In my experience leading teams and reviewing thousands of code changes, the average code review misses many of the why questions. Most reviewers focus too much on syntax and other superficial issues and not enough on intent, suitability to requirements, risk, or maintainability.
LLM-generated code amplifies that problem.Today, the sheer volume of code that a developer using an LLM Copilot can generate, and the speed at which it is produced, puts extra pressure on reviewers, which is certainly not going to improve the quality of reviews. So the question is:
How can we increase our overall trust in LLM-generated code in this kind of development environment?
Why Prompt Review Matters
A while back, I wrote about how “prompting” is “coding” by another name. I also suggested that LLM-generated code should be clearly marked with metadata for the benefit of reviewers. If you agree with these premises, then perhaps we should go even further: Instead of just reviewing the code, we should also review the prompts that led to its generation.
There are several benefits to be gained from such a system. As engineering leaders, we can’t afford to treat LLMs as magical black boxes. We need practices and culture that treat prompt engineering and AI outputs as first-class citizens in the SDLC. Let’s take a look.
Benefits of Code + Prompt Review
“Code+prompt reviewers” can gain insight into the reasoning behind LLM-generated code. They can see the developer’s intent and confirm it aligns with the business requirements. They can also see how the AI model interpreted it, helping to detect subtle misalignments that might not be obvious from the code alone.
Seeing not just the code but also the prompts during review can also highlight code that deserves extra scrutiny. For example, if a prompt is vague or misleading, we might expect the generated code to have lower quality and more bugs. We can also more easily flag security-critical code, which is especially risky to blindly rely on when it comes from an LLM that may be trained on faulty data (perhaps even intentionally, in a kind of supply chain attack, but I’ll address that in a future post).
We have long known that one of the key benefits of code review is that developers can learn from each other and improve as engineers. Now that we are moving toward more reliance on prompts, developers who are better at prompt engineering can learn from others in a code+prompt review.
Having a record of prompts would also be valuable later on during debugging and maintenance, as it allows you to see why certain code was written the way it was.
What a Prompt-Aware Review System Could Look Like
Here’s how such a system might work:
For every LLM Copilot coding session, the system would record user prompts, model responses, and various metadata. When code is committed, the LLM conversation is associated with it, perhaps as a UUID link to an external system included in a git note.
In the code review interface, there would be a lot of room for experimentation with different visualizations and user experiences. Generally, you could imagine a toggle for the prompt and response history behind a proposed code change, allowing reviewers to see the intent in the prompts, compare it to the code, and perhaps see alternate responses that the developer considered and rejected.
If I still conducted software tools research (or if I just had more time), I might take a run at something like this. Maybe if you’ve read this far, you might like to as well. Go for it! Let’s make sure we are using new AI tools as safely and effectively as possible.
Work With Me
If you're building with AI and need help defining review workflows, coaching teams on prompt engineering, or making sure your org’s practices are both fast and safe, let’s talk. I take on short-term advisory work and longer-term leadership and architecture consulting.