223 lines
4.9 KiB
Markdown
223 lines
4.9 KiB
Markdown
# PiKVM Auto-Restart Monitor
|
||
|
||
Automatic monitoring and recovery agent for PiKVM that watches a connected host and performs a hard reset if the system becomes unresponsive due to thermal throttling or other issues.
|
||
|
||
## Features
|
||
|
||
- **Continuous Monitoring**: Pings the host every 3 minutes (configurable)
|
||
- **Fallback Detection**: Tries host IP first, falls back to gateway for network-level connectivity check
|
||
- **Auto Recovery**: Performs hard reset after 15 minutes of downtime (configurable)
|
||
- **GPIO Control**: Simulates power button presses for graceful shutdown followed by restart
|
||
- **Cool-Down Period**: Waits 90 seconds between power down and restart to allow system cooling
|
||
- **Docker Native**: Runs as a Docker Compose service on PiKVM
|
||
- **Comprehensive Logging**: Tracks all events and reboot history
|
||
|
||
## Hardware Requirements
|
||
|
||
- PiKVM with Raspberry Pi (4 or better recommended)
|
||
- GPIO pins configured for power button control
|
||
- Typically BCM GPIO 17 for power button
|
||
- Check your PiKVM documentation to confirm
|
||
- Network access to the host and gateway
|
||
|
||
## Quick Start
|
||
|
||
### 1. Clone or Copy to PiKVM
|
||
|
||
```bash
|
||
git clone <this-repo> /home/pikvm/plex-restart
|
||
cd /home/pikvm/plex-restart
|
||
```
|
||
|
||
### 2. Configure Environment
|
||
|
||
Copy the example config and update with your values:
|
||
|
||
```bash
|
||
cp .env.example .env
|
||
```
|
||
|
||
Edit `.env` with your host details:
|
||
|
||
```env
|
||
HOST_IP=192.168.1.10 # Your host's IP address
|
||
GATEWAY_IP=192.168.1.1 # Your network gateway IP
|
||
POWER_BUTTON_GPIO=17 # GPIO pin (confirm with PiKVM docs)
|
||
```
|
||
|
||
### 3. Deploy with Docker Compose
|
||
|
||
```bash
|
||
docker-compose up -d
|
||
```
|
||
|
||
Verify it's running:
|
||
|
||
```bash
|
||
docker-compose logs -f pikvm-monitor
|
||
```
|
||
|
||
## Configuration Options
|
||
|
||
All settings can be configured via environment variables in `.env`:
|
||
|
||
| Variable | Default | Description |
|
||
|----------|---------|-------------|
|
||
| `HOST_IP` | `192.168.1.10` | IP address of host to monitor |
|
||
| `GATEWAY_IP` | `192.168.1.1` | Fallback gateway for connectivity check |
|
||
| `PING_INTERVAL` | `180` | Seconds between pings (3 min) |
|
||
| `DOWNTIME_THRESHOLD` | `15` | Minutes of downtime before reset |
|
||
| `POWER_BUTTON_GPIO` | `17` | BCM GPIO pin for power button |
|
||
| `LONG_PRESS_DURATION` | `5` | Seconds to hold for power down |
|
||
| `SHORT_PRESS_DURATION` | `1` | Seconds to hold for power on |
|
||
| `WAIT_BEFORE_REBOOT` | `90` | Seconds to wait between power down/up |
|
||
|
||
### Example: Faster Recovery
|
||
|
||
To recover in 9 minutes instead of 15:
|
||
|
||
```env
|
||
PING_INTERVAL=180 # 3 minutes
|
||
DOWNTIME_THRESHOLD=9 # 9 minutes
|
||
```
|
||
|
||
This triggers reset after 3 failed pings (9 minutes total).
|
||
|
||
## Monitoring & Logs
|
||
|
||
### View Live Logs
|
||
|
||
```bash
|
||
docker-compose logs -f pikvm-monitor
|
||
```
|
||
|
||
### Inside Container
|
||
|
||
```bash
|
||
docker exec pikvm-monitor tail -f /var/log/pikvm-monitor.log
|
||
```
|
||
|
||
### Reset History
|
||
|
||
```bash
|
||
docker exec pikvm-monitor cat /var/lib/pikvm-monitor/state.txt
|
||
```
|
||
|
||
## Manual Control
|
||
|
||
### Stop Monitor
|
||
|
||
```bash
|
||
docker-compose down
|
||
```
|
||
|
||
### Restart Monitor
|
||
|
||
```bash
|
||
docker-compose restart pikvm-monitor
|
||
```
|
||
|
||
### View Status
|
||
|
||
```bash
|
||
docker-compose ps
|
||
```
|
||
|
||
## Troubleshooting
|
||
|
||
### Monitor Not Starting
|
||
|
||
Check logs:
|
||
```bash
|
||
docker-compose logs pikvm-monitor
|
||
```
|
||
|
||
Common issues:
|
||
- GPIO pins in use by another service
|
||
- Incorrect GPIO pin number
|
||
- Network connectivity issues
|
||
|
||
### Not Detecting Host Down
|
||
|
||
Verify connectivity manually:
|
||
```bash
|
||
ping <HOST_IP>
|
||
ping <GATEWAY_IP>
|
||
```
|
||
|
||
Check:
|
||
- Host IP is correct in `.env`
|
||
- Network can reach both IPs
|
||
- PiKVM has network access
|
||
|
||
### Power Button Not Working
|
||
|
||
1. Verify GPIO pin number in PiKVM documentation
|
||
2. Update `POWER_BUTTON_GPIO` in `.env`
|
||
3. Test GPIO access:
|
||
```bash
|
||
docker exec pikvm-monitor python3 -c "from gpiozero import Button; b = Button(17); print('GPIO working')"
|
||
```
|
||
|
||
## Architecture
|
||
|
||
The monitor runs as a single long-running process:
|
||
|
||
```
|
||
Startup
|
||
↓
|
||
Load Configuration
|
||
↓
|
||
Every 180 seconds:
|
||
├─ Ping HOST_IP
|
||
│ └─ If fails, ping GATEWAY_IP (fallback)
|
||
├─ If alive: Reset counter
|
||
└─ If down: Increment counter
|
||
└─ If counter × PING_INTERVAL ≥ DOWNTIME_THRESHOLD:
|
||
├─ Long press power button (5 sec)
|
||
├─ Wait 90 seconds
|
||
├─ Short press power button (1 sec)
|
||
└─ Reset counter
|
||
↓
|
||
Repeat
|
||
```
|
||
|
||
## Performance Considerations
|
||
|
||
- **CPU**: Minimal (~5-10% during checks)
|
||
- **Memory**: ~50-80MB
|
||
- **Network**: Single ICMP ping every 3 minutes
|
||
- **GPIO**: Brief pulses only during reset
|
||
|
||
Safe to run alongside other PiKVM services.
|
||
|
||
## Development
|
||
|
||
### Local Testing (without GPIO)
|
||
|
||
```bash
|
||
# Mock GPIO by catching exceptions during testing
|
||
python3 monitor.py
|
||
```
|
||
|
||
### Building Custom Image
|
||
|
||
```bash
|
||
docker build -t pikvm-monitor:latest .
|
||
```
|
||
|
||
## License
|
||
|
||
MIT
|
||
|
||
## Support
|
||
|
||
For issues with PiKVM GPIO access:
|
||
- [PiKVM Documentation](https://docs.pikvm.org/)
|
||
- [gpiozero Library](https://gpiozero.readthedocs.io/)
|
||
|
||
For issues with this monitor:
|
||
- Check logs: `docker-compose logs`
|
||
- Verify `.env` configuration
|
||
- Test GPIO pin access manually
|