Automate Your Dev Workflow with LLM Agents: Commits, PRs, and Code Review

Every developer knows the feeling: you've just finished a solid piece of work — a bug fix, a refactor, a new feature — and now you have to sit down and write about it. A commit message. A PR description. A code review for a colleague. All valuable things. None of them what you were hired to do.

LLMs are genuinely good at this category of work. They understand code, they understand context, and they can produce structured prose from a diff in under a second. This tutorial walks through building real automation for each of these tasks — code you can actually use in your workflow today.

What We're Automating

By the end of this guide, you'll have three working tools:

git-msg — generates a commit message from git diff --cached
pr-gen — drafts a full PR description from a branch diff against main
review-diff — produces an actionable code review with severity ratings

All three are built on the same pattern: feed the LLM the relevant code context, give it a structured output format, parse the result.

Prerequisites

pip install anthropic openai gitpython rich

You'll need an API key from Anthropic (ANTHROPIC_API_KEY) or OpenAI (OPENAI_API_KEY). The examples use Claude 3.5 Sonnet — swap in gpt-4o if you prefer.

Tool 1: Automated Commit Messages

The Problem

A good commit message answers: What changed, and why? Writing one requires you to re-examine the diff you just created, summarize the intent, and phrase it correctly. An LLM has already read more commit messages than any human ever will.

Implementation

#!/usr/bin/env python3
# git-msg: generate a commit message from staged changes
import subprocess
import anthropic

def get_staged_diff() -> str:
    result = subprocess.run(
        ["git", "diff", "--cached", "--stat", "--patch"],
        capture_output=True, text=True
    )
    return result.stdout

def generate_commit_message(diff: str) -> str:
    if not diff.strip():
        return "No staged changes found."

    client = anthropic.Anthropic()
    message = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=256,
        messages=[{
            "role": "user",
            "content": f"""Generate a git commit message for the following diff.

Rules:
- First line: imperative mood, max 72 chars (e.g. "fix(auth): handle expired token refresh")
- Blank line after subject
- Body: 2-4 bullet points explaining what changed and why
- Use conventional commit prefixes: feat, fix, refactor, docs, test, chore, perf
- Be specific — mention function names, file names where relevant
- Do NOT include a trailing period on the subject line

<diff>
{diff[:8000]}
</diff>

Output the commit message only, no commentary."""
        }]
    )
    return message.content[0].text

if __name__ == "__main__":
    diff = get_staged_diff()
    msg = generate_commit_message(diff)
    print(msg)

Run it with:

git add -p          # stage your changes normally
python git-msg.py   # review the suggested message
git commit -m "$(python git-msg.py)"

Making It Better

The quality improves significantly when you include recent commit history — the LLM can match your team's existing style:

def get_recent_commits(n: int = 10) -> str:
    result = subprocess.run(
        ["git", "log", f"--max-count={n}", "--pretty=format:%s"],
        capture_output=True, text=True
    )
    return result.stdout

# Add to the prompt:
recent = get_recent_commits()
prompt = f"""Recent commit messages for style reference:
{recent}

Now generate a commit message for this diff:
{diff[:7000]}"""

With 10 recent commits as examples, the generated messages will match your repository's conventions almost exactly.

Tool 2: PR Description Generator

The Problem

A good PR description saves reviewers from having to reverse-engineer what you did. It should include: what changed, why it changed, how to test it, and any risks or dependencies. Writing this from scratch takes 10-20 minutes. An LLM can produce a first draft in 3 seconds that you edit for 2 minutes.

Implementation

#!/usr/bin/env python3
# pr-gen: generate a PR description from a branch diff
import subprocess
import sys
import anthropic

def get_branch_diff(base: str = "main") -> tuple[str, list[str]]:
    # Get commit log
    log = subprocess.run(
        ["git", "log", f"{base}..HEAD", "--pretty=format:%s"],
        capture_output=True, text=True
    ).stdout

    # Get full diff
    diff = subprocess.run(
        ["git", "diff", f"{base}...HEAD", "--stat", "--patch"],
        capture_output=True, text=True
    ).stdout

    commits = [line for line in log.splitlines() if line.strip()]
    return diff, commits

def generate_pr_description(diff: str, commits: list[str]) -> str:
    client = anthropic.Anthropic()

    commit_list = "\n".join(f"- {c}" for c in commits)

    message = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[{
            "role": "user",
            "content": f"""Generate a GitHub pull request description in Markdown.

Commits included:
{commit_list}

<diff>
{diff[:12000]}
</diff>

Use this exact structure:

## Summary
[2-4 bullet points: what this PR does and why]

## Changes
[Grouped by area: list specific files/functions changed]

## Test Plan
[Concrete steps to verify the changes work]

## Notes
[Breaking changes, migration steps, performance impact, or "None"]

Write clearly for a code reviewer who hasn't seen this work before.
Output the Markdown only."""
        }]
    )
    return message.content[0].text

if __name__ == "__main__":
    base = sys.argv[1] if len(sys.argv) > 1 else "main"
    diff, commits = get_branch_diff(base)
    description = generate_pr_description(diff, commits)
    print(description)

Integration with GitHub CLI

Pipe it directly into a PR:

gh pr create \
  --title "$(git log --max-count=1 --pretty=format:%s)" \
  --body "$(python pr-gen.py main)"

Or write to clipboard and edit before submitting:

python pr-gen.py main | xclip -selection clipboard

Tool 3: Automated Code Review

The Problem

Code review is valuable but expensive — it requires context, expertise, and time. An LLM can act as a first-pass reviewer that catches the obvious issues before a human has to spend cycles on them. The goal isn't to replace human review; it's to raise the floor so human reviewers can focus on architectural concerns rather than naming conventions.

Implementation

#!/usr/bin/env python3
# review-diff: generate a structured code review
import subprocess
import sys
import json
import anthropic

REVIEW_SCHEMA = {
    "issues": [
        {
            "severity": "critical|high|medium|low",
            "category": "bug|security|performance|style|maintainability",
            "file": "filename.py",
            "line": "approximate line or range",
            "description": "what the issue is",
            "suggestion": "concrete fix or improvement"
        }
    ],
    "summary": "2-3 sentence overall assessment",
    "approve": True
}

def get_diff_for_review(base: str = "main") -> str:
    return subprocess.run(
        ["git", "diff", f"{base}...HEAD", "--patch"],
        capture_output=True, text=True
    ).stdout

def review_diff(diff: str) -> dict:
    client = anthropic.Anthropic()

    message = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=2048,
        messages=[{
            "role": "user",
            "content": f"""You are a senior software engineer performing a code review.
Analyze this diff carefully and identify issues.

Severity definitions:
- critical: will cause bugs, data loss, or security vulnerabilities
- high: likely to cause problems in production
- medium: code quality or maintainability concern worth fixing
- low: style or minor improvement suggestion

<diff>
{diff[:15000]}
</diff>

Return ONLY valid JSON matching this schema:
{json.dumps(REVIEW_SCHEMA, indent=2)}

Focus on real issues. Do not invent problems. If the code is clean, say so."""
        }]
    )

    text = message.content[0].text
    # Extract JSON if wrapped in markdown code blocks
    if "```json" in text:
        text = text.split("```json")[1].split("```")[0].strip()
    elif "```" in text:
        text = text.split("```")[1].split("```")[0].strip()

    return json.loads(text)

def format_review(review: dict) -> str:
    from rich.console import Console
    from rich.table import Table
    from rich import print as rprint

    console = Console()

    severity_colors = {
        "critical": "red",
        "high": "orange3",
        "medium": "yellow",
        "low": "dim"
    }

    table = Table(title="Code Review", show_lines=True)
    table.add_column("Severity", style="bold", width=10)
    table.add_column("Category", width=16)
    table.add_column("Location", width=24)
    table.add_column("Issue", width=50)
    table.add_column("Suggestion", width=50)

    for issue in review.get("issues", []):
        sev = issue["severity"]
        color = severity_colors.get(sev, "white")
        table.add_row(
            f"[{color}]{sev.upper()}[/{color}]",
            issue.get("category", ""),
            f"{issue.get('file', '')}:{issue.get('line', '')}",
            issue.get("description", ""),
            issue.get("suggestion", "")
        )

    console.print(table)
    console.print(f"\n[bold]Summary:[/bold] {review.get('summary', '')}")
    status = "[green]✓ APPROVE[/green]" if review.get("approve") else "[red]✗ REQUEST CHANGES[/red]"
    console.print(f"[bold]Decision:[/bold] {status}\n")

if __name__ == "__main__":
    base = sys.argv[1] if len(sys.argv) > 1 else "main"
    diff = get_diff_for_review(base)
    review = review_diff(diff)
    format_review(review)

Sample output:

┌──────────┬─────────────────┬──────────────────────────┬───────────────────────────────────┐
│ Severity │ Category        │ Location                 │ Issue                             │
├──────────┼─────────────────┼──────────────────────────┼───────────────────────────────────┤
│ HIGH     │ security        │ auth/handler.go:47       │ Token compared with == not        │
│          │                 │                          │ constant-time equality             │
├──────────┼─────────────────┼──────────────────────────┼───────────────────────────────────┤
│ MEDIUM   │ performance     │ repository/users.go:120  │ N+1 query: loading roles          │
│          │                 │                          │ individually in a loop            │
└──────────┴─────────────────┴──────────────────────────┴───────────────────────────────────┘

Summary: Mostly clean refactor with two issues worth addressing before merge.
Decision: ✗ REQUEST CHANGES

Handling Large Diffs

Real diffs can exceed LLM context windows. A practical chunking strategy:

def chunk_diff_by_file(diff: str) -> list[str]:
    """Split a diff into per-file chunks."""
    chunks = []
    current_chunk = []

    for line in diff.splitlines(keepends=True):
        if line.startswith("diff --git") and current_chunk:
            chunks.append("".join(current_chunk))
            current_chunk = []
        current_chunk.append(line)

    if current_chunk:
        chunks.append("".join(current_chunk))

    return chunks

def review_large_diff(diff: str) -> dict:
    chunks = chunk_diff_by_file(diff)
    all_issues = []

    for chunk in chunks:
        if len(chunk) < 100:  # skip trivial files
            continue
        result = review_diff(chunk)
        all_issues.extend(result.get("issues", []))

    # Deduplicate and sort by severity
    severity_order = {"critical": 0, "high": 1, "medium": 2, "low": 3}
    all_issues.sort(key=lambda x: severity_order.get(x["severity"], 99))

    return {"issues": all_issues, "approve": not any(
        i["severity"] in ("critical", "high") for i in all_issues
    )}

Putting It Together: A Pre-Push Hook

Install all three as a git pre-push hook to enforce review before anything reaches the remote:

# .git/hooks/pre-push
#!/bin/bash
set -e

echo "Running AI pre-push review..."
REVIEW=$(python review-diff.py main 2>&1)

if echo "$REVIEW" | grep -q "REQUEST CHANGES"; then
    echo "$REVIEW"
    echo ""
    echo "AI review found high-severity issues. Fix them or run 'git push --no-verify' to bypass."
    exit 1
fi

echo "AI review passed."
exit 0

chmod +x .git/hooks/pre-push

Cost and Latency

Using Claude Sonnet 4.6 at current pricing:

| Tool | Typical Input Tokens | Typical Output Tokens | Cost per Run | |------|--------------------|-----------------------|-------------| | git-msg | ~800 | ~100 | ~$0.001 | | pr-gen | ~4,000 | ~500 | ~$0.006 | | review-diff | ~8,000 | ~800 | ~$0.012 |

These are essentially free at individual scale. Even running all three 50 times per day costs under $1/month.

Latency is typically 1-3 seconds — fast enough for interactive use in a terminal.

What LLMs Are Good and Bad At in Code Review

Good at:

Spotting null/nil pointer dereferences, off-by-one errors, missing error checks
Identifying SQL injection, XSS, hardcoded credentials
Catching missing test coverage for obvious cases
Naming and readability feedback
Detecting N+1 query patterns

Not good at:

Understanding your specific business domain ("is this the right behavior?")
Knowing your team's architectural decisions
Catching subtle concurrency bugs that require deep state reasoning
Identifying issues that require running the code

Use these tools to raise the floor, not replace human judgment.

Accessing AI APIs from Restricted Regions

If you're in a region where OpenAI or Anthropic APIs are restricted, FastSox Smart Mode routes API traffic through your chosen gateway automatically — no per-tool configuration required. Your local git commands and other traffic continue direct; only AI API calls take the proxy path.

Next Steps

Add a --style flag to match different commit conventions (Angular, conventional commits, emoji-based)
Build a VS Code extension that runs review on save
Wire pr-gen into your CI pipeline to auto-populate PR descriptions on branch push
Add a vector DB to store past code review feedback and improve consistency over time

Automate Your Dev Workflow with LLM Agents: Commits, PRs, and Code Review

What We're Automating

Prerequisites

Tool 1: Automated Commit Messages

The Problem

Implementation

Making It Better

Tool 2: PR Description Generator

The Problem

Implementation

Integration with GitHub CLI

Tool 3: Automated Code Review

The Problem

Implementation

Handling Large Diffs

Putting It Together: A Pre-Push Hook

Cost and Latency

What LLMs Are Good and Bad At in Code Review

Accessing AI APIs from Restricted Regions

Next Steps

Related Articles

Build an AI Document Processing Pipeline: PDFs, CSVs, and Emails at Scale

Autonomous AI Agents: Building Self-Running Task Automation That Actually Works