vibe-coding
While AI rapidly speeds up code creation (e.g. "vibe coding"), it dangerously lowers the bar for introducing subtle bugs, security flaws, and unmaintainable complexity.
Yes, AI-assisted development is transforming how we build software, but it’s not a free pass to abandon rigor, review, or craftsmanship.
Steering Files and Context Integration:
- Steering Files: Kiro allows you to provide persistent project knowledge through "steering files" (e.g.,
product.md,structure.md,tech.md) in a.kiro/steering/directory. These files document your product's purpose, technology stack, project structure, naming conventions, coding standards, and more. This helps Kiro understand your project's unique context and adheres to your team's best practices.
It's completely understandable that "vibe coding" with LLM-generated AI code can lead to large PRs and a feeling of overwhelm when it comes to reviewing them meticulously. The sheer volume and the often "black box" nature of AI-generated code make traditional review methods less effective.
speed means nothing if the wheels fall off down the road
Strategy for Meticulous Review of LLM-Generated Code
1. Shift Your Mindset: From Line-by-Line to Outcome-Oriented
- Traditional Code Review: Focuses heavily on syntax, style, minor optimizations, and adherence to established patterns.
- LLM Code Review: Still considers the above, but adds a crucial layer of "Is this actually doing what I intended, and is it doing it safely and efficiently?" The LLM's goal is often to complete a task, not necessarily to create the best or most secure code.
2. Leverage Tools for the "Grunt Work"
Don't manually check what a machine can do faster and better.
- Static Analysis (Linters, SonarQube, ESLint, TSLint):
- Action: Ensure your CI/CD pipeline runs these rigorously before you even open the PR. If the LLM generates code that breaks lint rules, it should fail immediately.
- Focus: Auto-fixable issues should be fixed automatically. Non-auto-fixable issues should be flagged. This frees you from checking basic syntax, style, and common anti-patterns.
- Automated Testing (Unit, Integration, E2E):
- Action: This is your most critical line of defense. If the LLM generated new functionality, there must be new tests covering it. If it modified existing functionality, existing tests must still pass.
- Focus:
- For new code: Did the LLM generate the tests too? If not, you write them first, then verify the LLM's code against them. If it did generate tests, are they actually meaningful and not just superficial?
- For existing code: Run the existing test suite. A green pipeline is non-negotiable.
- Security Scanners (SAST/DAST):
- Action: Integrate tools like Snyk, OWASP ZAP, or commercial SAST/DAST solutions into your pipeline.
- Focus: These are vital for detecting common vulnerabilities (e.g., SQL injection, XSS, insecure deserialization) that LLMs can inadvertently introduce or fail to protect against.
- Complexity Metrics:
- Action: Many linters and static analysis tools provide metrics like Cyclomatic Complexity.
- Focus: Keep an eye on new functions or heavily modified sections that suddenly spike in complexity. LLMs can sometimes generate overly convoluted solutions.
3. Adopt a Tiered Review Approach
Instead of one monolithic review, break it down.
-
Tier 1: High-Level Sanity Check (Initial Pass)
- Goal: Quickly determine if the PR is generally on the right track or fundamentally flawed.
- What to look for:
- Does the PR description clearly explain what the LLM was prompted to do and why?
- Are there large chunks of unrelated code? (LLMs can sometimes hallucinate or include extra stuff).
- Are there any obvious security red flags (e.g.,
eval(), hardcoded secrets, bypassing sanitization)? - Does it introduce new, unexpected dependencies?
- Run the app: Do a quick manual smoke test of the affected feature. Does it seem to work?
- Outcome: If it passes this, move to Tier 2. If not, reject with high-level feedback.
-
Tier 2: Functional & Architectural Compliance (Deeper Dive)
- Goal: Verify that the code correctly implements the desired functionality and fits into your existing architecture.
- What to look for:
- Behavioral Correctness: Does it actually solve the problem described in the prompt? Test edge cases.
- API Usage: Is the LLM using your internal APIs correctly (e.g., your data access layer, utility functions)? LLMs don't "know" your internal conventions unless you explicitly fine-tune them or prompt them very well.
- Data Flow: Trace critical data through the new or modified code. Is it being handled correctly at each step (validation, transformation, storage)?
- Error Handling: Is error handling robust? What happens on failures? Are errors logged appropriately?
- Concurrency/Asynchrony: If there's async code (like promises, observables in Angular), is it handled correctly to prevent race conditions or memory leaks? (This is a common LLM weak spot).
- Resource Management: Are resources (e.g., file handles, network connections, subscriptions) properly opened and closed/unsubscribed?
- Adherence to Established Patterns: Does it follow your team's established design patterns (e.g., service layers, component structure)? LLMs might propose generic patterns.
-
Tier 3: Optimization & Readability (Refinement)
- Goal: Ensure the code is maintainable, readable, and reasonably performant. This is where you apply more traditional review rigor after the functional correctness is confirmed.
- What to look for:
- Clarity & Readability: Are variable names clear? Is the code easy to understand? Can it be refactored for simplicity? (LLMs can sometimes write verbose or overly clever code).
- Performance: Any obvious N+1 queries? Inefficient loops? Unnecessary computations?
- Comments & Documentation: Are critical or complex sections commented? Does public API have JSDoc/TSDoc? (LLMs are decent at generating comments, but they might be generic).
- Code Duplication: Has the LLM introduced new redundant code?
- Angular Specifics: Correct use of change detection, RxJS operators, component lifecycle, template syntax best practices (e.g.,
*ngFortrackBy), preventing memory leaks from subscriptions.
4. Improve Your Prompting (Preventative Measure)
The quality of your review is directly impacted by the quality of the generated code, which comes from your prompts.
- Be Specific: Instead of "write a component," say "write an Angular standalone component for a user profile, with inputs for
userIdand an outputprofileUpdated, adhering to ourUserProfileinterface and usingHttpClientto fetch data from/api/users/{userId}. Include basic error handling and a loading state." - Provide Context: Include relevant existing code snippets, interface definitions, or architectural patterns.
- Define Constraints: "Do not use
anytype," "Ensure all subscriptions are unsubscribed onngOnDestroy," "UseChangeDetectionStrategy.OnPush." - Iterate on Prompts: If the first generation isn't great, refine your prompt and try again. Treat it like debugging.
5. Tools and Techniques for Reviewing Large Diffs
- IDE Features (e.g., VS Code):
- Outline/Structure View: Quickly grasp the new functions/classes introduced.
- Git Blame: Useful for understanding who originally authored a line (before the LLM touched it), if you need context.
- Search: Search for specific keywords (
TODO,FIXME, common anti-patterns, security sensitive functions).
- Git/GitHub/GitLab Review Tools:
- Focus on Changed Files: Start with files that have fewer changes or are critical.
- Hide Whitespace Changes: Crucial for large diffs to reduce noise.
- "Viewed" Checkboxes: Mark files as reviewed as you go to track progress.
- Filtering Diffs: Some tools allow filtering by author, type of change, etc.
git diff --word-diff: Can sometimes make line changes more readable than standard line-based diffs.
Example Walkthrough (Angular Context)
You get a PR with an LLM-generated Angular component.
- Automated Checks First: CI/CD pipeline runs. Did tests pass? Did linting pass? Did security scans find anything? If any fail, PR is rejected immediately.
- High-Level Scan:
- Is the new component name reasonable?
- Does it look like a typical Angular component structure?
- Any bizarre imports or massive unrelated code blocks?
- Manually click around the affected area in a dev build.
- Functional & Architectural:
- Data Flow: If it's a form, is input validated? How is data sent to the backend? What's the response handling?
- Component Interaction: How does it communicate with parent/child components or services? Is
Input/Outputused correctly? - State Management: If it's managing complex state, is it doing so efficiently (e.g., immutable updates, RxJS)?
- Routing: If it has
routerLinks or programmatic navigation, are they correct? (Your current issue would be caught here, but after automated tests fail due to theExpressionChangedAfterItHasBeenCheckedErrorif it was severe enough to break CI).
- Refinement:
- Readability: Can you easily understand what each method does?
- Performance: Any
*ngForwithouttrackByon large lists? Unnecessary complex template expressions? - Angular Specifics: Are RxJS subscriptions being cleaned up? Is
asyncpipe used where appropriate?
This is a great question that gets to the heart of how development practices change as a project scales. The idea that "vibe coding" becomes less useful as a codebase grows larger is generally a widely accepted truth among software developers. Here's a breakdown of why this is the case: