Skip to main content
Malicious Repo Intake Script for AI Code AssistantsResearch
5 min readFor Security Engineers

Malicious Repo Intake Script for AI Code Assistants

When a repository triggers code execution before you've reviewed a single line, your development environment becomes an attack surface. The TrustFall vulnerability in Claude Code highlights this risk: malicious repositories executing code with minimal user interaction.

You need a systematic intake process for any repository your AI coding assistants will access. Here's a validation script you can run before connecting AI tools to external code.

What This Script Does

This bash script performs automated security checks on repositories before AI code assistants analyze them. It flags common malicious patterns that exploit AI tool behavior: hidden executable files, suspicious automation hooks, and obfuscated scripts that trigger during repository operations.

The script targets the specific attack vector where malicious actors craft repositories to exploit AI assistants during code analysis. While it won't catch everything, it stops obvious supply chain attacks that rely on automated execution.

Prerequisites

Before running this script:

  • Unix-like environment (Linux, macOS, or WSL on Windows)
  • Git 2.30+ installed and accessible in your PATH
  • jq for JSON parsing (apt-get install jq or brew install jq)
  • Read access to the target repository (public or authenticated)
  • Quarantine directory where you'll clone suspicious repos for inspection

Define your organization's risk tolerance. The script flags potential issues — your team decides what constitutes a blocker versus a warning.

The Repository Intake Validation Script

#!/bin/bash
# repo-intake-validator.sh
# Validates external repositories before AI assistant access

set -euo pipefail

REPO_URL="${1:-}"
QUARANTINE_DIR="${QUARANTINE_DIR:-/tmp/repo-quarantine}"
REPORT_FILE="validation-report-$(date +%s).json"

if [[ -z "$REPO_URL" ]]; then
    echo "Usage: $0 <repository-url>"
    exit 1
fi

# Initialize report structure
cat > "$REPORT_FILE" <<EOF
{
  "repository": "$REPO_URL",
  "scan_timestamp": "$(date -u +%Y-%m-%dT%H:%M:%SZ)",
  "findings": [],
  "risk_score": 0
}
EOF

# Clone to quarantine
REPO_NAME=$(basename "$REPO_URL" .git)
CLONE_PATH="$QUARANTINE_DIR/$REPO_NAME"

echo "[1/6] Cloning to quarantine directory..."
git clone --depth 1 "$REPO_URL" "$CLONE_PATH" 2>/dev/null || {
    echo "ERROR: Clone failed. Check repository access."
    exit 1
}

cd "$CLONE_PATH"

# Check 1: Hidden executable files
echo "[2/6] Scanning for hidden executables..."
HIDDEN_EXEC=$(find . -type f -name ".*" -executable | wc -l)
if [[ $HIDDEN_EXEC -gt 0 ]]; then
    jq '.findings += [{"check": "hidden_executables", "severity": "high", "count": '$HIDDEN_EXEC'}]' "$REPORT_FILE" > tmp.$$.json && mv tmp.$$.json "$REPORT_FILE"
    jq '.risk_score += 25' "$REPORT_FILE" > tmp.$$.json && mv tmp.$$.json "$REPORT_FILE"
fi

# Check 2: Git hooks present
echo "[3/6] Checking for git hooks..."
if [[ -d .git/hooks ]]; then
    ACTIVE_HOOKS=$(find .git/hooks -type f ! -name "*.sample" | wc -l)
    if [[ $ACTIVE_HOOKS -gt 0 ]]; then
        jq '.findings += [{"check": "git_hooks", "severity": "critical", "count": '$ACTIVE_HOOKS'}]' "$REPORT_FILE" > tmp.$$.json && mv tmp.$$.json "$REPORT_FILE"
        jq '.risk_score += 40' "$REPORT_FILE" > tmp.$$.json && mv tmp.$$.json "$REPORT_FILE"
    fi
fi

# Check 3: Suspicious automation files
echo "[4/6] Scanning automation configurations..."
SUSPICIOUS_FILES=0
for pattern in ".github/workflows/*.yml" ".gitlab-ci.yml" "Jenkinsfile" ".circleci/config.yml"; do
    if compgen -G "$pattern" > /dev/null; then
        # Check for curl/wget to external domains in CI configs
        if grep -rE "(curl|wget).*http" $pattern 2>/dev/null | grep -v "github.com\|gitlab.com" > /dev/null; then
            ((SUSPICIOUS_FILES++))
        fi
    fi
done

if [[ $SUSPICIOUS_FILES -gt 0 ]]; then
    jq '.findings += [{"check": "suspicious_automation", "severity": "high", "count": '$SUSPICIOUS_FILES'}]' "$REPORT_FILE" > tmp.$$.json && mv tmp.$$.json "$REPORT_FILE"
    jq '.risk_score += 30' "$REPORT_FILE" > tmp.$$.json && mv tmp.$$.json "$REPORT_FILE"
fi

# Check 4: Obfuscated or encoded content
echo "[5/6] Detecting obfuscation patterns..."
OBFUSCATED=$(find . -type f \( -name "*.sh" -o -name "*.py" -o -name "*.js" \) -exec grep -l "eval\|exec\|base64" {} \; | wc -l)
if [[ $OBFUSCATED -gt 0 ]]; then
    jq '.findings += [{"check": "obfuscation_detected", "severity": "medium", "count": '$OBFUSCATED'}]' "$REPORT_FILE" > tmp.$$.json && mv tmp.$$.json "$REPORT_FILE"
    jq '.risk_score += 15' "$REPORT_FILE" > tmp.$$.json && mv tmp.$$.json "$REPORT_FILE"
fi

# Check 5: Package manager hooks
echo "[6/6] Checking package installation hooks..."
INSTALL_SCRIPTS=0
[[ -f "package.json" ]] && grep -q "postinstall\|preinstall" package.json && ((INSTALL_SCRIPTS++))
[[ -f "setup.py" ]] && grep -q "cmdclass" setup.py && ((INSTALL_SCRIPTS++))

if [[ $INSTALL_SCRIPTS -gt 0 ]]; then
    jq '.findings += [{"check": "install_hooks", "severity": "high", "count": '$INSTALL_SCRIPTS'}]' "$REPORT_FILE" > tmp.$$.json && mv tmp.$$.json "$REPORT_FILE"
    jq '.risk_score += 35' "$REPORT_FILE" > tmp.$$.json && mv tmp.$$.json "$REPORT_FILE"
fi

# Generate summary
FINAL_SCORE=$(jq -r '.risk_score' "$REPORT_FILE")
echo ""
echo "========================================="
echo "Repository: $REPO_NAME"
echo "Risk Score: $FINAL_SCORE/100"
echo "========================================="
jq -r '.findings[] | "[\(.severity | ascii_upcase)] \(.check): \(.count) instances"' "$REPORT_FILE"
echo ""

if [[ $FINAL_SCORE -ge 50 ]]; then
    echo "RECOMMENDATION: BLOCK - High risk repository"
    exit 2
elif [[ $FINAL_SCORE -ge 25 ]]; then
    echo "RECOMMENDATION: MANUAL REVIEW required"
    exit 1
else
    echo "RECOMMENDATION: PROCEED with caution"
    exit 0
fi

How to Customize It

Adjust risk scoring: The script assigns point values to each finding (25 for hidden executables, 40 for git hooks, etc.). Calibrate these based on your threat model. If your team frequently uses pre-commit hooks legitimately, reduce the git hooks penalty but add compensating checks.

Add domain allowlists: The suspicious automation check flags external curl/wget commands. Modify the grep exclusion to whitelist your internal artifact repositories:

grep -v "github.com\|gitlab.com\|your-artifactory.company.com"

Expand obfuscation detection: The current pattern catches eval, exec, and base64. Add language-specific patterns:

# For PowerShell obfuscation
-o -name "*.ps1" \) -exec grep -l "IEX\|Invoke-Expression\|-enc"

# For JavaScript packing
-exec grep -l "\\x[0-9a-f]{2}\|unescape"

Integrate with your workflow: Wrap this script in your repository approval process. For SOC 2 Type II controls around change management, this becomes evidence of systematic third-party code review.

Set organizational thresholds: The exit codes map to recommendations (0 = proceed, 1 = review, 2 = block). Connect these to your ticketing system:

./repo-intake-validator.sh "$REPO_URL" || {
    RESULT=$?
    if [[ $RESULT -eq 2 ]]; then
        # Auto-create security review ticket
        curl -X POST "$JIRA_API/issue" -d '{"fields":{"project":{"key":"SEC"},"summary":"High-risk repo flagged: '"$REPO_URL"'"}}'
    fi
}

Validation Steps

After customizing the script, test it against known-good and known-bad repositories:

1. Baseline with your internal repos: Run against three of your team's repositories. You should see risk scores under 25. If not, you've miscalibrated — your legitimate practices are triggering false positives.

2. Test with a deliberately risky repo: Create a test repository with a .git/hooks/post-checkout script and a package.json with a postinstall hook. The risk score should exceed 50 and return exit code 2.

3. Verify JSON output: Check that validation-report-*.json files are valid JSON and contain all expected fields:

jq empty validation-report-*.json && echo "Valid JSON" || echo "Parse error"

4. Confirm quarantine isolation: Ensure $QUARANTINE_DIR is on a separate filesystem or has restricted execution permissions. On Linux:

mount | grep $(df "$QUARANTINE_DIR" | tail -1 | awk '{print $1}')
# Should show 'noexec' flag

5. Integration test: Verify that blocked repositories never reach the assistant's workspace. The script should fail before any AI tool initialization.

This script doesn't replace human judgment. When vulnerabilities like TrustFall emerge, they exploit the gap between automated tooling and human review. This validation process narrows that gap by forcing every external repository through a security checkpoint before your AI assistants touch it.

Topics:Research

You Might Also Like