Skip to content

Troubleshooting Guide

This guide helps diagnose and resolve common issues with the RPi Generator Control system.

Table of Contents


Quick Diagnostics

System Health Check

Run these commands to quickly assess system status:

# GenMaster health
curl -s https://your-genmaster/api/health | jq .

# GenSlave health (via GenMaster)
curl -s https://your-genmaster/api/genslave/health \
  -H "Authorization: Bearer YOUR_TOKEN" | jq .

# Container status
docker compose ps

# Recent logs
docker compose logs --tail=50

Expected Healthy Response

{
  "status": "healthy",
  "generator_running": false,
  "slave_connected": true,
  "slave_armed": true,
  "victron_signal": "inactive",
  "database": "connected",
  "redis": "connected"
}

GenMaster Issues

Container Won't Start

Symptoms: Container exits immediately or keeps restarting.

Check logs:

docker compose logs genmaster

Common causes:

  1. Database not ready:

    Error: Connection refused to postgres:5432
    
    Fix: Ensure postgres container is healthy:
    docker compose ps postgres
    docker compose logs postgres
    

  2. Missing environment variables:

    Error: GENSLAVE_API_SECRET is required
    
    Fix: Check .env file has all required variables.

  3. Port already in use:

    Error: Address already in use :8000
    
    Fix: Stop conflicting service or change port.

Database Migrations Failed

Symptoms: App crashes with database schema errors.

Fix:

# Run migrations manually
docker compose exec genmaster alembic upgrade head

# Check migration status
docker compose exec genmaster alembic current

API Returns 500 Errors

Check application logs:

docker compose logs genmaster | grep -i error

Common causes: - Database connection lost - Redis connection lost - GenSlave unreachable (for proxy endpoints)

High Memory Usage

Check container stats:

docker stats genmaster

Fix: - Restart container: docker compose restart genmaster - Check for memory leaks in logs - Increase container memory limit if needed


GenSlave Issues

Container Won't Start

Check logs:

docker compose logs genslave

Common causes:

  1. GPIO access denied:

    Error: Unable to determine board revision
    
    Fix: Ensure privileged: true in docker-compose.yaml.

  2. Automation Hat not detected:

    Warning: automationhat library loaded but HAT not responding
    
    Fix:

  3. Check HAT is properly seated
  4. Enable SPI: sudo raspi-config → Interface Options → SPI
  5. Reboot Pi

Mock Mode When HAT is Present

Symptoms: Logs show "Mock HAT mode" even with hardware.

Check SPI:

ls /dev/spidev*
# Should show /dev/spidev0.0, /dev/spidev0.1

Enable SPI:

sudo raspi-config
# Interface Options → SPI → Enable
sudo reboot

Relay Not Clicking

  1. Check armed status:

    curl http://localhost:8001/api/relay/state \
      -H "X-API-Key: YOUR_SECRET"
    
    Must be "armed": true.

  2. Check power supply:

  3. Pi Zero needs stable 5V 2.5A
  4. Relay may not click with insufficient power

  5. Test relay directly:

    import automationhat
    automationhat.relay.one.on()  # Should click
    automationhat.relay.one.off()  # Should click
    


Communication Issues

GenSlave Not Reachable

From GenMaster, test connection:

# Via Tailscale hostname
ping genslave

# Test API
curl http://genslave:8001/api/health \
  -H "X-API-Key: YOUR_SECRET"

Common causes:

  1. Tailscale not connected:

    tailscale status
    
    Both devices should show as connected.

  2. Wrong IP/hostname in config: Check GENSLAVE_HOST in GenMaster's .env.

  3. Firewall blocking:

    # On GenSlave
    sudo ufw status
    sudo ufw allow 8001
    

  4. GenSlave container not running:

    # On GenSlave Pi
    docker compose ps
    

Heartbeat Failures

Symptoms: GenSlave shows "Failsafe triggered" or frequent disconnections.

Check heartbeat status:

curl http://genslave:8001/api/failsafe \
  -H "X-API-Key: YOUR_SECRET"

Common causes:

  1. Network latency:
  2. Heartbeat timeout too short
  3. Increase FAILSAFE_TIMEOUT_SECONDS

  4. GenMaster overloaded:

  5. Check GenMaster CPU/memory
  6. Check database performance

  7. Intermittent network:

  8. Check WiFi signal strength
  9. Consider wired connection

API Authentication Errors

Symptoms: 401 Unauthorized responses.

Check: 1. API secret matches: - GenMaster: GENSLAVE_API_SECRET - GenSlave: GENSLAVE_API_SECRET - Must be identical

  1. Header format:
    # Correct
    -H "X-API-Key: your-secret"
    
    # Wrong
    -H "Authorization: your-secret"
    

Generator Control Issues

Generator Won't Start

Run through this checklist:

  1. Is relay armed?

    curl https://your-genmaster/api/generator/state \
      -H "Authorization: Bearer YOUR_TOKEN" | jq .armed
    
    Must be true.

  2. Is there an active override?

    curl https://your-genmaster/api/override/status \
      -H "Authorization: Bearer YOUR_TOKEN"
    
    force_stop blocks automatic starts.

  3. Is runtime lockout active?

    curl https://your-genmaster/api/generator/runtime-limits \
      -H "Authorization: Bearer YOUR_TOKEN" | jq .lockout_active
    

  4. Is GenSlave connected? Check slave_connected in health endpoint.

  5. Is GenSlave armed? GenSlave must be armed to execute relay commands.

Generator Won't Stop

  1. Check if force_run override is active:

    curl https://your-genmaster/api/override/status \
      -H "Authorization: Bearer YOUR_TOKEN"
    

  2. Try force stop via GenSlave:

    curl -X POST http://genslave:8001/api/relay/off \
      -H "X-API-Key: YOUR_SECRET" \
      -d '{"force": true}'
    

State Mismatch Between Master and Slave

Symptoms: GenMaster shows running, GenSlave shows stopped (or vice versa).

This should self-heal via heartbeat. If not:

  1. Check heartbeat is working:

    docker compose logs genmaster | grep heartbeat
    

  2. Force reconciliation: Restart GenMaster to trigger startup reconciliation.

  3. Manual sync:

    # Set GenSlave to match desired state
    curl -X POST http://genslave:8001/api/relay/off \
      -H "X-API-Key: YOUR_SECRET" -d '{"force": true}'
    


Victron Integration Issues

Signal Not Detected

Check GPIO status:

docker compose logs genmaster | grep -i victron
docker compose logs genmaster | grep -i gpio

Common causes:

  1. GPIO not accessible:
  2. Pi 5 needs privileged: true and user: root
  3. Check device mappings for gpiochip

  4. Wiring issue:

  5. Verify connection to GPIO17 and GND
  6. Test with multimeter

  7. Mock mode enabled:

  8. Check MOCK_GPIO environment variable

Signal Stuck Active/Inactive

  1. Check Cerbo relay:
  2. Look for relay LED indicator
  3. Listen for relay click

  4. Test GPIO manually:

    from gpiozero import Button
    btn = Button(17, pull_up=True)
    print(btn.is_pressed)  # True = signal active
    

  5. Check for shorts:

  6. Disconnect wire and test again

Notification Issues

Notifications Not Sending

  1. Check channel is enabled:

    curl https://your-genmaster/api/notifications/channels \
      -H "Authorization: Bearer YOUR_TOKEN" | jq '.[].enabled'
    

  2. Check event configuration:

    curl https://your-genmaster/api/system-notifications \
      -H "Authorization: Bearer YOUR_TOKEN"
    

  3. Test channel:

    curl -X POST https://your-genmaster/api/notifications/channels/1/test \
      -H "Authorization: Bearer YOUR_TOKEN"
    

  4. Check logs:

    docker compose logs genmaster | grep -i notification
    

GenSlave Failsafe Not Notifying

  1. Check Apprise URLs configured:

    curl https://your-genmaster/api/genslave/notifications \
      -H "Authorization: Bearer YOUR_TOKEN"
    

  2. Check notifications enabled: Verify enabled: true in response.

  3. Check cooldown:

  4. Recent notification may have set cooldown
  5. Clear cooldown to test again

Database Issues

PostgreSQL Won't Start

Check logs:

docker compose logs postgres

Common causes:

  1. Disk full:

    df -h
    

  2. Corrupt data:

  3. Restore from backup
  4. Or delete volume (loses data):

    docker compose down -v
    docker compose up -d
    

  5. Permission issues:

    docker compose exec postgres ls -la /var/lib/postgresql/data
    

Connection Pool Exhausted

Symptoms: "Too many connections" errors.

Fix:

# Restart to reset connections
docker compose restart genmaster

# Long-term: increase pool size in config

Redis Connection Issues

Check Redis:

docker compose exec redis redis-cli ping
# Should return: PONG

Fix:

docker compose restart redis


Docker Issues

Container Keeps Restarting

Check exit code:

docker compose ps -a
# Look for exit code

Common exit codes: - 0: Clean exit - 1: Application error - 137: OOM killed (out of memory) - 139: Segfault

Out of Disk Space

# Check disk usage
df -h

# Clean Docker resources
docker system prune -a

# Clean specific volumes
docker volume prune

Permission Denied

If you see permission denied while trying to connect to the Docker daemon socket (or Got permission denied while trying to connect to the Docker daemon), choose ONE of the following based on your situation:

For one-off commands (recommended for occasional admin):

sudo docker compose ...    # or `sudo docker ...`

For daily use on a trusted workstation (lets you run docker without sudo):

sudo usermod -aG docker $USER
# Log out and log back in for the group change to take effect.

Do NOT chmod 666 /var/run/docker.sock

Making the Docker socket world-writable lets any local user — including any compromised low-privilege process — control Docker, which is effectively root on the host. This is a common piece of bad advice on Stack Overflow; ignore it.

Why is docker group membership 'effectively root'?

Anyone who can talk to the Docker daemon can spin up a privileged container that mounts the host's / and gives them a root shell. Only add yourself (or a service user) to the docker group on machines where you'd already be trusted as root.


Log Analysis

Viewing Logs

# All containers
docker compose logs

# Specific container
docker compose logs genmaster

# Follow logs
docker compose logs -f genmaster

# Last N lines
docker compose logs --tail=100 genmaster

# With timestamps
docker compose logs -t genmaster

Filtering Logs

# Errors only
docker compose logs genmaster 2>&1 | grep -i error

# Specific component
docker compose logs genmaster | grep -i heartbeat

# Time range (requires timestamps)
docker compose logs -t genmaster | grep "2026-05"

Common Log Patterns

Healthy patterns:

Heartbeat sent to GenSlave
Generator started - trigger: victron
GPIO monitor started on pin 17

Warning patterns:

Relay ON requested but relay not armed
Victron signal active but relay not armed
GenSlave connection timeout

Error patterns:

Failed to connect to GenSlave
Database connection lost
Failed to send notification


Getting Help

If you can't resolve an issue:

  1. Collect diagnostics:

    # Save to file
    docker compose logs > logs.txt
    docker compose ps >> logs.txt
    docker stats --no-stream >> logs.txt
    

  2. Check GitHub Issues: github.com/rjsears/pizero_generator_control/issues

  3. Open a new issue with:

  4. Description of problem
  5. Steps to reproduce
  6. Relevant log excerpts
  7. System information (Pi model, OS version)

Recovery Procedures

Lost Network After WiFi Change

A bad static IP, gateway, or subnet on a saved WiFi profile can leave a device unreachable over the network. Recovery options (local console, nmcli, SD card edit, Ethernet fallback) are documented separately:

→ See Network Recovery.

Full System Reset

If all else fails:

# Stop everything
docker compose down

# Remove all data (WARNING: loses all history)
docker compose down -v

# Pull fresh images
docker compose pull

# Start fresh
docker compose up -d

# Run migrations
docker compose exec genmaster alembic upgrade head

Restore from Backup

# Stop services
docker compose stop genmaster

# Restore database
docker compose exec -T postgres pg_restore \
  -U postgres -d genmaster < backup.dump

# Start services
docker compose start genmaster

Emergency Generator Stop

If automation isn't working, stop generator manually:

# Direct to GenSlave (bypasses GenMaster)
curl -X POST http://genslave:8001/api/relay/off \
  -H "X-API-Key: YOUR_SECRET" \
  -d '{"force": true}'

Or physically disconnect power to the Automation Hat relay.