Automate Your Dev Workflow with LLM Agents: Commits, PRs, and Code Review
Every developer knows the feeling: you've just finished a solid piece of work — a bug fix, a refactor, a new feature — and now you have to sit down and write about it. A commit message. A PR description. A code review for a colleague. All valuable things. None of them what you were hired to do.
LLMs are genuinely good at this category of work. They understand code, they understand context, and they can produce structured prose from a diff in under a second. This tutorial walks through building real automation for each of these tasks — code you can actually use in your workflow today.
What We're Automating
By the end of this guide, you'll have three working tools:
git-msg— generates a commit message fromgit diff --cachedpr-gen— drafts a full PR description from a branch diff against mainreview-diff— produces an actionable code review with severity ratings
All three are built on the same pattern: feed the LLM the relevant code context, give it a structured output format, parse the result.
Prerequisites
pip install anthropic openai gitpython rich
You'll need an API key from Anthropic (ANTHROPIC_API_KEY) or OpenAI (OPENAI_API_KEY). The examples use Claude 3.5 Sonnet — swap in gpt-4o if you prefer.
Tool 1: Automated Commit Messages
The Problem
A good commit message answers: What changed, and why? Writing one requires you to re-examine the diff you just created, summarize the intent, and phrase it correctly. An LLM has already read more commit messages than any human ever will.
Implementation
#!/usr/bin/env python3
# git-msg: generate a commit message from staged changes
import subprocess
import anthropic
def get_staged_diff() -> str:
result = subprocess.run(
["git", "diff", "--cached", "--stat", "--patch"],
capture_output=True, text=True
)
return result.stdout
def generate_commit_message(diff: str) -> str:
if not diff.strip():
return "No staged changes found."
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=256,
messages=[{
"role": "user",
"content": f"""Generate a git commit message for the following diff.
Rules:
- First line: imperative mood, max 72 chars (e.g. "fix(auth): handle expired token refresh")
- Blank line after subject
- Body: 2-4 bullet points explaining what changed and why
- Use conventional commit prefixes: feat, fix, refactor, docs, test, chore, perf
- Be specific — mention function names, file names where relevant
- Do NOT include a trailing period on the subject line
<diff>
{diff[:8000]}
</diff>
Output the commit message only, no commentary."""
}]
)
return message.content[0].text
if __name__ == "__main__":
diff = get_staged_diff()
msg = generate_commit_message(diff)
print(msg)
Run it with:
git add -p # stage your changes normally
python git-msg.py # review the suggested message
git commit -m "$(python git-msg.py)"
Making It Better
The quality improves significantly when you include recent commit history — the LLM can match your team's existing style:
def get_recent_commits(n: int = 10) -> str:
result = subprocess.run(
["git", "log", f"--max-count={n}", "--pretty=format:%s"],
capture_output=True, text=True
)
return result.stdout
# Add to the prompt:
recent = get_recent_commits()
prompt = f"""Recent commit messages for style reference:
{recent}
Now generate a commit message for this diff:
{diff[:7000]}"""
With 10 recent commits as examples, the generated messages will match your repository's conventions almost exactly.
Tool 2: PR Description Generator
The Problem
A good PR description saves reviewers from having to reverse-engineer what you did. It should include: what changed, why it changed, how to test it, and any risks or dependencies. Writing this from scratch takes 10-20 minutes. An LLM can produce a first draft in 3 seconds that you edit for 2 minutes.
Implementation
#!/usr/bin/env python3
# pr-gen: generate a PR description from a branch diff
import subprocess
import sys
import anthropic
def get_branch_diff(base: str = "main") -> tuple[str, list[str]]:
# Get commit log
log = subprocess.run(
["git", "log", f"{base}..HEAD", "--pretty=format:%s"],
capture_output=True, text=True
).stdout
# Get full diff
diff = subprocess.run(
["git", "diff", f"{base}...HEAD", "--stat", "--patch"],
capture_output=True, text=True
).stdout
commits = [line for line in log.splitlines() if line.strip()]
return diff, commits
def generate_pr_description(diff: str, commits: list[str]) -> str:
client = anthropic.Anthropic()
commit_list = "\n".join(f"- {c}" for c in commits)
message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{
"role": "user",
"content": f"""Generate a GitHub pull request description in Markdown.
Commits included:
{commit_list}
<diff>
{diff[:12000]}
</diff>
Use this exact structure:
## Summary
[2-4 bullet points: what this PR does and why]
## Changes
[Grouped by area: list specific files/functions changed]
## Test Plan
[Concrete steps to verify the changes work]
## Notes
[Breaking changes, migration steps, performance impact, or "None"]
Write clearly for a code reviewer who hasn't seen this work before.
Output the Markdown only."""
}]
)
return message.content[0].text
if __name__ == "__main__":
base = sys.argv[1] if len(sys.argv) > 1 else "main"
diff, commits = get_branch_diff(base)
description = generate_pr_description(diff, commits)
print(description)
Integration with GitHub CLI
Pipe it directly into a PR:
gh pr create \
--title "$(git log --max-count=1 --pretty=format:%s)" \
--body "$(python pr-gen.py main)"
Or write to clipboard and edit before submitting:
python pr-gen.py main | xclip -selection clipboard
Tool 3: Automated Code Review
The Problem
Code review is valuable but expensive — it requires context, expertise, and time. An LLM can act as a first-pass reviewer that catches the obvious issues before a human has to spend cycles on them. The goal isn't to replace human review; it's to raise the floor so human reviewers can focus on architectural concerns rather than naming conventions.
Implementation
#!/usr/bin/env python3
# review-diff: generate a structured code review
import subprocess
import sys
import json
import anthropic
REVIEW_SCHEMA = {
"issues": [
{
"severity": "critical|high|medium|low",
"category": "bug|security|performance|style|maintainability",
"file": "filename.py",
"line": "approximate line or range",
"description": "what the issue is",
"suggestion": "concrete fix or improvement"
}
],
"summary": "2-3 sentence overall assessment",
"approve": True
}
def get_diff_for_review(base: str = "main") -> str:
return subprocess.run(
["git", "diff", f"{base}...HEAD", "--patch"],
capture_output=True, text=True
).stdout
def review_diff(diff: str) -> dict:
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
messages=[{
"role": "user",
"content": f"""You are a senior software engineer performing a code review.
Analyze this diff carefully and identify issues.
Severity definitions:
- critical: will cause bugs, data loss, or security vulnerabilities
- high: likely to cause problems in production
- medium: code quality or maintainability concern worth fixing
- low: style or minor improvement suggestion
<diff>
{diff[:15000]}
</diff>
Return ONLY valid JSON matching this schema:
{json.dumps(REVIEW_SCHEMA, indent=2)}
Focus on real issues. Do not invent problems. If the code is clean, say so."""
}]
)
text = message.content[0].text
# Extract JSON if wrapped in markdown code blocks
if "```json" in text:
text = text.split("```json")[1].split("```")[0].strip()
elif "```" in text:
text = text.split("```")[1].split("```")[0].strip()
return json.loads(text)
def format_review(review: dict) -> str:
from rich.console import Console
from rich.table import Table
from rich import print as rprint
console = Console()
severity_colors = {
"critical": "red",
"high": "orange3",
"medium": "yellow",
"low": "dim"
}
table = Table(title="Code Review", show_lines=True)
table.add_column("Severity", style="bold", width=10)
table.add_column("Category", width=16)
table.add_column("Location", width=24)
table.add_column("Issue", width=50)
table.add_column("Suggestion", width=50)
for issue in review.get("issues", []):
sev = issue["severity"]
color = severity_colors.get(sev, "white")
table.add_row(
f"[{color}]{sev.upper()}[/{color}]",
issue.get("category", ""),
f"{issue.get('file', '')}:{issue.get('line', '')}",
issue.get("description", ""),
issue.get("suggestion", "")
)
console.print(table)
console.print(f"\n[bold]Summary:[/bold] {review.get('summary', '')}")
status = "[green]✓ APPROVE[/green]" if review.get("approve") else "[red]✗ REQUEST CHANGES[/red]"
console.print(f"[bold]Decision:[/bold] {status}\n")
if __name__ == "__main__":
base = sys.argv[1] if len(sys.argv) > 1 else "main"
diff = get_diff_for_review(base)
review = review_diff(diff)
format_review(review)
Sample output:
┌──────────┬─────────────────┬──────────────────────────┬───────────────────────────────────┐
│ Severity │ Category │ Location │ Issue │
├──────────┼─────────────────┼──────────────────────────┼───────────────────────────────────┤
│ HIGH │ security │ auth/handler.go:47 │ Token compared with == not │
│ │ │ │ constant-time equality │
├──────────┼─────────────────┼──────────────────────────┼───────────────────────────────────┤
│ MEDIUM │ performance │ repository/users.go:120 │ N+1 query: loading roles │
│ │ │ │ individually in a loop │
└──────────┴─────────────────┴──────────────────────────┴───────────────────────────────────┘
Summary: Mostly clean refactor with two issues worth addressing before merge.
Decision: ✗ REQUEST CHANGES
Handling Large Diffs
Real diffs can exceed LLM context windows. A practical chunking strategy:
def chunk_diff_by_file(diff: str) -> list[str]:
"""Split a diff into per-file chunks."""
chunks = []
current_chunk = []
for line in diff.splitlines(keepends=True):
if line.startswith("diff --git") and current_chunk:
chunks.append("".join(current_chunk))
current_chunk = []
current_chunk.append(line)
if current_chunk:
chunks.append("".join(current_chunk))
return chunks
def review_large_diff(diff: str) -> dict:
chunks = chunk_diff_by_file(diff)
all_issues = []
for chunk in chunks:
if len(chunk) < 100: # skip trivial files
continue
result = review_diff(chunk)
all_issues.extend(result.get("issues", []))
# Deduplicate and sort by severity
severity_order = {"critical": 0, "high": 1, "medium": 2, "low": 3}
all_issues.sort(key=lambda x: severity_order.get(x["severity"], 99))
return {"issues": all_issues, "approve": not any(
i["severity"] in ("critical", "high") for i in all_issues
)}
Putting It Together: A Pre-Push Hook
Install all three as a git pre-push hook to enforce review before anything reaches the remote:
# .git/hooks/pre-push
#!/bin/bash
set -e
echo "Running AI pre-push review..."
REVIEW=$(python review-diff.py main 2>&1)
if echo "$REVIEW" | grep -q "REQUEST CHANGES"; then
echo "$REVIEW"
echo ""
echo "AI review found high-severity issues. Fix them or run 'git push --no-verify' to bypass."
exit 1
fi
echo "AI review passed."
exit 0
chmod +x .git/hooks/pre-push
Cost and Latency
Using Claude Sonnet 4.6 at current pricing:
| Tool | Typical Input Tokens | Typical Output Tokens | Cost per Run |
|------|--------------------|-----------------------|-------------|
| git-msg | ~800 | ~100 | ~$0.001 |
| pr-gen | ~4,000 | ~500 | ~$0.006 |
| review-diff | ~8,000 | ~800 | ~$0.012 |
These are essentially free at individual scale. Even running all three 50 times per day costs under $1/month.
Latency is typically 1-3 seconds — fast enough for interactive use in a terminal.
What LLMs Are Good and Bad At in Code Review
Good at:
- Spotting null/nil pointer dereferences, off-by-one errors, missing error checks
- Identifying SQL injection, XSS, hardcoded credentials
- Catching missing test coverage for obvious cases
- Naming and readability feedback
- Detecting N+1 query patterns
Not good at:
- Understanding your specific business domain ("is this the right behavior?")
- Knowing your team's architectural decisions
- Catching subtle concurrency bugs that require deep state reasoning
- Identifying issues that require running the code
Use these tools to raise the floor, not replace human judgment.
Accessing AI APIs from Restricted Regions
If you're in a region where OpenAI or Anthropic APIs are restricted, FastSox Smart Mode routes API traffic through your chosen gateway automatically — no per-tool configuration required. Your local git commands and other traffic continue direct; only AI API calls take the proxy path.
Next Steps
- Add a
--styleflag to match different commit conventions (Angular, conventional commits, emoji-based) - Build a VS Code extension that runs review on save
- Wire
pr-geninto your CI pipeline to auto-populate PR descriptions on branch push - Add a vector DB to store past code review feedback and improve consistency over time
Related Articles
Build an AI Document Processing Pipeline: PDFs, CSVs, and Emails at Scale
Learn how to build a production-grade document processing pipeline using LLMs. Extract structured data from PDFs, classify and summarize emails, analyze CSVs with natural language — all automated, all running unattended.
Autonomous AI Agents: Building Self-Running Task Automation That Actually Works
Learn how to build AI agents that plan, execute, and recover from failures without human intervention. Covers the ReAct pattern, tool use, multi-step task execution, and the engineering patterns that separate reliable agents from demo-ware.