Troubleshooting
This guide covers common issues and their solutions when working with Site Availability Monitoring.
Common Issues
Backend Issues
Backend Won't Start
Symptoms:
- Application exits immediately
- Port binding errors
- Configuration errors
Solutions:
- Check port availability:
# Check if port 8080 is in use
lsof -i :8080
netstat -tulpn | grep 8080
# Kill process using the port
kill -9 <PID>
- Validate configuration:
# Check YAML syntax
yamllint config.yaml
# Run with debug logging
SA_LOG_LEVEL=debug ./site-availability
- Check permissions:
# Ensure config file is readable
chmod 644 config.yaml
# Check directory permissions
ls -la config.yaml
Prometheus Connection Issues
Symptoms:
- No data showing in frontend
- Prometheus timeout errors
- Connection refused errors
Solutions:
- Verify Prometheus connectivity:
# Test Prometheus URL
curl http://prometheus:9090/api/v1/query?query=up
# Check network connectivity
ping prometheus
# Test from inside container
docker-compose exec backend wget -qO- http://prometheus:9090/metrics
- Check Prometheus configuration:
# Verify Prometheus targets
curl http://prometheus:9090/api/v1/targets
# Check Prometheus logs
docker-compose logs prometheus
- Authentication issues:
# Check HMAC secret
echo $SA_AUTHENTICATION_HMAC_SECRET
# Verify authentication headers
curl -H "Authorization: HMAC-SHA256 <signature>" http://localhost:8080/api/apps
Frontend Issues
Frontend Won't Load
Symptoms:
- Blank page or error messages
- Build failures
- API connection errors
Solutions:
- Check Node.js setup:
# Verify Node.js version
node --version # Should be 18.0+
# Clear npm cache
npm cache clean --force
# Reinstall dependencies
rm -rf node_modules package-lock.json
npm install
- Check API connectivity:
# Test backend API
curl http://localhost:8080/health
# Check CORS settings
curl -H "Origin: http://localhost:3000" http://localhost:8080/api/apps
- Check frontend configuration:
// src/config.js
const config = {
apiUrl: "http://localhost:8080", // Verify this matches your backend
updateInterval: 30000,
};
Map Not Displaying
Symptoms:
- Empty map area
- JavaScript errors in console
- Data not loading
Solutions:
- Check browser console:
// Open browser developer tools (F12)
// Look for JavaScript errors in Console tab
// Check Network tab for failed API requests
- Verify data format:
# Check API response format
curl http://localhost:8080/api/apps | jq
# Verify locations data
curl http://localhost:8080/api/locations | jq
- Browser compatibility:
# Test in different browsers
# Clear browser cache and cookies
# Disable browser extensions temporarily
Docker Issues
Container Build Failures
Symptoms:
- Docker build errors
- Missing dependencies
- Permission denied errors
Solutions:
- Check Dockerfile:
# Build with verbose output
docker build --no-cache -t site-availability-backend backend/
# Check base image
docker pull golang:1.21-alpine
# Verify file permissions
ls -la backend/
- Disk space issues:
# Check available space
df -h
# Clean up Docker
docker system prune -a
docker volume prune
Container Runtime Issues
Symptoms:
- Containers keep restarting
- Out of memory errors
- Network connectivity issues
Solutions:
- Check container logs:
# View logs
docker-compose logs backend
docker-compose logs frontend
# Follow logs in real-time
docker-compose logs -f backend
- Resource limits:
# Check resource usage
docker stats
# Monitor memory usage
docker exec -it <container> free -h
- Network issues:
# Check network configuration
docker network ls
docker network inspect site-availability_default
# Test inter-container connectivity
docker-compose exec frontend ping backend
Kubernetes/Helm Issues
Deployment Failures
Symptoms:
- Pods not starting
- Image pull errors
- Configuration errors
Solutions:
- Check pod status:
# View pods
kubectl get pods -n site-availability
# Describe failing pod
kubectl describe pod <pod-name> -n site-availability
# Check logs
kubectl logs <pod-name> -n site-availability
- Configuration issues:
# Validate Helm chart
helm lint chart/
# Check values
helm template site-availability chart/ --values chart/values.yaml
# Debug installation
helm install site-availability chart/ --dry-run --debug
- Resource constraints:
# Check node resources
kubectl top nodes
# Check resource quotas
kubectl describe quota -n site-availability
Data Issues
No Metrics Data
Symptoms:
- Empty dashboards
- Zero values everywhere
- Missing applications
Solutions:
- Verify metrics endpoints:
# Check if applications expose metrics
curl http://your-app:8080/metrics
# Verify Prometheus scraping
curl http://prometheus:9090/api/v1/query?query=up
- Check scraping configuration:
# Prometheus configuration
scrape_configs:
- job_name: "your-app"
static_configs:
- targets: ["your-app:8080"]
- Validate metric queries:
# Test metric queries directly in Prometheus
# Go to http://prometheus:9090
# Run query: up{instance="your-app:8080"}
Incorrect Location Display
Symptoms:
- Applications in wrong locations
- Missing location markers
- Incorrect coordinates
Solutions:
- Verify location configuration:
locations:
- name: "New York"
latitude: 40.712776 # Check these coordinates
longitude: -74.005974
- Check data mapping:
# Verify app-to-location mapping
curl http://localhost:8080/api/apps | jq '.[] | {name, location}'
Debugging Tools
Log Analysis
# Backend logs with debug level
SA_LOG_LEVEL=debug ./site-availability
# Follow logs in real-time
tail -f /var/log/site-availability.log
# Search for specific errors
grep -i "error" /var/log/site-availability.log
API Testing
# Health check
curl http://localhost:8080/health
# Get all applications
curl http://localhost:8080/api/apps | jq
# Get locations
curl http://localhost:8080/api/locations | jq
# Test metrics endpoint
curl http://localhost:8080/metrics
Network Debugging
# Test connectivity
telnet prometheus 9090
# Check DNS resolution
nslookup prometheus
# Trace network path
traceroute prometheus
# Check firewall rules
iptables -L
Performance Issues
High Memory Usage
Solutions:
- Increase scrape intervals
- Reduce number of monitored metrics
- Implement metric filtering
- Use Prometheus recording rules
Slow Response Times
Solutions:
- Optimize Prometheus queries
- Add caching layers
- Use Prometheus federation
- Scale horizontally
High CPU Usage
Solutions:
- Profile the application
- Optimize metric processing
- Reduce scraping frequency
- Use more efficient queries
Getting Help
Before Asking for Help
- Check logs for error messages
- Search existing issues on GitHub
- Try minimal configuration to isolate the problem
- Document reproduction steps clearly
When Reporting Issues
Include the following information:
# System information
uname -a
docker --version
go version
node --version
# Application logs
SA_LOG_LEVEL=debug ./site-availability 2>&1
# Configuration (remove sensitive data)
cat config.yaml
# Container status (if using Docker)
docker-compose ps
docker-compose logs
Community Resources
- 🐛 GitHub Issues: Report bugs
- 💬 Discussions: Ask questions
- 📖 Documentation: You're reading it!
Prevention
Monitoring Your Monitoring
Set up monitoring for the Site Availability Monitoring system itself:
- Health checks for backend services
- Alerting on service failures
- Log monitoring for errors
- Resource monitoring for containers
Best Practices
- Use configuration validation in CI/CD
- Test in staging environments first
- Monitor resource usage regularly
- Keep documentation up to date
- Regular backups of configuration and data
Still having issues? Open an issue on GitHub with detailed information about your problem.