Operational Procedures

Operational Procedures

Standardized operational procedures for managing the system, including bootstrap initialization, service management, configuration updates, and routine maintenance tasks.

Bootstrap System Initialization

Repository Setup

The infrastructure uses a standardized bootstrap system that must be initialized before any operations:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Clone repository
git clone <repository_url>
cd root.ermnvldmr.com

# Initialize repository-wide settings
./init.sh

# This sets up:
# - Git hooks directory (.githooks)
# - Repository-wide configurations
# - Development environment preparation

Host Initialization

Each host follows a standardized initialization pattern using the declarative service framework:

1
2
3
4
5
6
# Generic host initialization pattern
source "$(git -C "$(dirname "$0")" rev-parse --show-toplevel)/.scripts/bootstrap.sh"
navigate_to_repo_root "hostname"

# Node-specific initialization
init_service "service_name" "required_packages" "custom_setup_function"

Host-Specific Initialization Scripts

Each host maintains initialization scripts in the .host/.scripts/ directory with numbered priority:

daedalus (.host/.scripts/):

  • 00-base - Base system setup and essential packages
  • 10-ufw - Firewall configuration and Docker integration
  • 20-crowd - Additional service initialization
  • 30-cron - Scheduled job configuration

helios (.host/.scripts/):

  • 10-ufw - Firewall rules for local network integration
  • 20-cron - Network monitoring and maintenance jobs

icarus (.host/.scripts/):

  • 10-ufw - Firewall configuration for content services
  • 20-cron - Backup and maintenance scheduling

Service Initialization Pattern

All services follow a consistent initialization pattern:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
#!/usr/bin/env bash
# Service initialization script template

source "$(git -C "$(dirname "$0")" rev-parse --show-toplevel)/.scripts/bootstrap.sh"
SCRIPT_NAME="init/hostname/service"

# Repository navigation with node context
navigate_to_repo_root "hostname"

# Custom service configuration
setup_service_config() {
    local config_dir="$REPO_ROOT/$NODE/.host/service"
    if [[ -d "$config_dir" ]]; then
        log_info "Applying service configuration..."
        # Configuration application logic
    else
        log_warn "No configuration found at $config_dir"
    fi
}

# Declarative service initialization
init_service "service_name" "required_packages" "setup_service_config"

Configuration Management Procedures

Environment Variable Management

Configuration uses environment variable substitution with standardized variable patterns:

Standard Environment Variables:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Node identification
export NODE="hostname"              # Current node name

# Tier paths for data management
export TIER1="$REPO_ROOT/$NODE/.host/@tier1"
export TIER2="$REPO_ROOT/$NODE/.host/@tier2" 
export TIER3="$REPO_ROOT/$NODE/.host/@tier3"

# Framework paths
export SCRIPTS="$REPO_ROOT/.scripts"
export REPO_ROOT="$(git rev-parse --show-toplevel)"

Service-Specific Variables:

1
2
3
4
5
6
7
# Domain and networking
export DOMAIN="ermnvldmr.com"
export HOST_IP="192.168.1.10"

# Security credentials (loaded from @tier1)
export TELEGRAM_BOT_URL="${TELEGRAM_BOT_URL:?error}"
export API_KEY="${API_KEY:?error}"

Template Processing

Configuration templates use envsubst for variable substitution:

Single File Processing

1
2
3
4
# Apply environment variables to single template
.scripts/ops/envsubst-crontab ./node/.host/cron/crontab.template

# Results in rendered crontab with variables replaced

Directory Processing

1
2
3
4
5
6
7
# Process entire directory of templates
export SERVICE_CONFIG=production
.scripts/ops/envsubst-directory ./templates ./output

# Processes all .template files recursively
# Preserves directory structure
# Backs up existing files

Firewall Rule Processing

1
2
3
4
5
# Apply UFW rules with variable substitution
.scripts/ops/envsubst-ufw ./node/.host/ufw/rules ./node/.host/ufw/host.rules

# Processes Docker service rules and host-level rules
# Applies idempotent firewall configurations

Data Tier Management Procedures

Tier Structure Setup

Initialize complete data tier infrastructure for a node:

1
2
3
4
5
6
7
8
# Setup tier structure with storage paths
.scripts/ops/setup-tiers icarus /mnt/ssd/tier1 /mnt/hdd/tier2 /mnt/bulk/tier3

# Creates:
# - Symbolic links for all projects (@tier1, @tier2, @tier3)
# - Host-level .host/@tier links
# - Shared directory structure
# - Environment file links (.env -> @tier1/.env)

Interactive Setup Process:

  1. Confirmation Prompts: Asks before removing existing symlinks
  2. Safety Checks: Validates paths and permissions
  3. Structure Creation: Creates complete directory hierarchy
  4. Link Verification: Confirms all symbolic links are correct

Data Synchronization

Automated backup and synchronization procedures:

1
2
3
4
5
6
# Manual tier synchronization
.scripts/ops/sync-tiers icarus $TELEGRAM_BOT_URL

# Automated synchronization via cron
# Defined in node/.host/cron/crontab templates
0 2 * * * $SCRIPTS/ops/sync-tiers icarus $TELEGRAM_BOT_URL

Synchronization Process:

  1. Validation: Checks node directory and Docker Compose configuration
  2. Tier 1 Sync: Critical configuration and secrets
  3. Tier 2 Sync: Application data and databases
  4. Error Handling: Automatic rollback and Telegram alerts
  5. Success Reporting: Confirmation notifications

Service Management Procedures

Docker Compose Operations

Standard Docker Compose management across all services:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Navigate to service directory
cd daedalus/root

# Standard service operations
docker compose up -d                    # Start services
docker compose down                     # Stop services
docker compose restart service-name     # Restart specific service
docker compose logs -f service-name     # View service logs
docker compose pull                     # Update container images

# Health check and status
docker compose ps                       # Service status
docker compose top                      # Process information

Service Health Monitoring

Each service implements health checks for monitoring:

1
2
3
4
5
6
7
8
9
# Check service health status
docker compose ps --filter "health=healthy"
docker compose ps --filter "health=unhealthy"

# Manual health check validation
docker compose exec service-name healthcheck-command

# Example: Database integrity check
docker compose exec postgres pg_isready -d database_name

Log Management

Centralized logging procedures for troubleshooting:

1
2
3
4
5
6
7
8
9
# Service-specific logs
docker compose logs service-name -f --tail 100

# System-wide log analysis
journalctl -u docker -f                # Docker daemon logs
tail -f /var/log/syslog | grep docker  # System Docker events

# Application-specific logs (stored in @tier3)
tail -f ./tier3/logs/application.log

Security Management Procedures

Firewall Management

UFW firewall configuration using the automation framework:

1
2
3
4
5
6
7
8
9
# Apply firewall rules for a node
cd node/.host
../../.scripts/ops/envsubst-ufw ./ufw/rules ./ufw/host.rules

# Manual UFW operations (when needed)
sudo ufw status verbose              # Check current rules
sudo ufw allow from 192.168.1.0/24  # Allow local network
sudo ufw deny 22/tcp                # Deny SSH (be careful!)
sudo ufw reload                     # Apply rule changes

UFW Rule Structure:

1
2
3
4
5
6
7
8
9
# Docker service rules (processed by ufw-docker)
# File: node/.host/ufw/rules/service.rules
allow service-name 80/tcp
allow service-name 443/tcp

# Host-level rules (processed by standard UFW)
# File: node/.host/ufw/host.rules
allow from 192.168.1.0/24 to any port 22
deny from any to any port 23

Certificate Management

SSL/TLS certificate management procedures:

1
2
3
4
5
6
7
8
# Check certificate status
docker compose exec traefik cat /letsencrypt/acme.json | jq '.Certificates'

# Force certificate renewal
docker compose restart traefik

# Manual certificate verification
openssl s_client -connect domain.com:443 -servername domain.com

Secret Management

Secure handling of sensitive configuration:

1
2
3
4
5
6
7
8
# Secrets stored in @tier1 with restricted access
ls -la ./tier1/service/secrets/

# Docker secrets integration
docker compose config --resolve-secrets

# Environment variable validation
env | grep -E "(PASSWORD|TOKEN|KEY|SECRET)" | wc -l

Maintenance Procedures

System Updates

Automated system maintenance and update procedures:

1
2
3
4
5
6
7
# Check for available package updates
.scripts/ops/check-package-updates icarus $TELEGRAM_BOT_URL

# Manual system updates (when required)
sudo apt update && sudo apt upgrade -y
sudo apt autoremove -y
sudo systemctl reboot

Container Updates

Docker container and image maintenance:

1
2
3
4
5
6
7
8
# Update container images
docker compose pull                 # Pull latest images
docker compose up -d               # Recreate containers with new images

# Clean up unused resources
docker system prune -f             # Remove unused containers/networks
docker image prune -f              # Remove unused images
docker volume prune -f             # Remove unused volumes (be careful!)

Backup Verification

Regular backup verification procedures:

1
2
3
4
5
6
7
8
# Verify backup integrity
rclone check source remote: --one-way

# Test restore capability
rclone copy remote:backup/test /tmp/restore-test

# Database backup verification
docker compose exec postgres pg_dumpall | gzip > backup-test.sql.gz

Monitoring and Alerting Procedures

Health Check Automation

Automated health monitoring with Telegram integration:

1
2
3
4
5
6
7
8
# Manual health check execution
source .scripts/bootstrap.sh
setup_error_trap "Health Check" "$TELEGRAM_BOT_URL" "$NODE"

# Service-specific health validation
health_check_service "docker"
health_check_service "nginx"
validate_network_connectivity "google.com" 443 5

Alert Management

Telegram notification management:

1
2
3
4
5
6
7
8
9
# Test notification system
telegram_info "Test" "$TELEGRAM_BOT_URL" "Testing notification system" "$NODE"

# Error notification with automatic context
setup_error_trap "Operation Name" "$TELEGRAM_BOT_URL" "$NODE"
# Any script failure will automatically send alert with:
# - Script name and line number
# - Failed command
# - Timestamp and node information

Log Analysis

Systematic log analysis procedures:

1
2
3
4
5
6
7
8
# Service error analysis
docker compose logs service-name 2>&1 | grep -i error

# System resource monitoring
docker stats --no-stream
df -h                              # Disk usage
free -h                           # Memory usage
systemctl status docker           # Docker daemon status

Recovery Procedures

Service Recovery

Standard service recovery procedures:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# Service restart sequence
docker compose down service-name
docker compose up -d service-name

# Configuration recovery
git checkout HEAD -- node/service/config/
docker compose restart service-name

# Complete service rebuild
docker compose down service-name
docker compose pull service-name
docker compose up -d service-name

Data Recovery

Data restoration from backups:

1
2
3
4
5
6
# Tier-specific recovery
rclone sync remote:backup/tier1 ./tier1/
rclone sync remote:backup/tier2 ./tier2/

# Database recovery
docker compose exec postgres pg_restore -d database backup.sql

Network Recovery

Network connectivity restoration:

1
2
3
4
5
6
7
# Docker network recreation
docker network prune -f
docker compose down && docker compose up -d

# Firewall rule reset
sudo ufw --force reset
.scripts/ops/envsubst-ufw ./node/.host/ufw/rules

Git Repository Management

Configuration Synchronization

Repository synchronization procedures:

1
2
3
4
5
6
7
# Sync node changes to remote repository
.scripts/ops/sync-remote icarus $TELEGRAM_BOT_URL

# Manual git operations
git add node/
git commit -m "Node configuration update"
git push origin sync/node

Configuration Versioning

Version control for infrastructure changes:

1
2
3
4
5
6
7
8
# Track configuration changes
git status                         # Check modified files
git diff node/                    # Review configuration changes
git add node/ && git commit -m "Update node configuration"

# Tag releases
git tag -a v1.2.0 -m "Infrastructure release v1.2.0"
git push --tags

Development and Testing Procedures

Docker Compose Validation

Configuration validation and testing:

1
2
3
4
5
6
7
# Lint Docker Compose files
.scripts/ops/lint-docker-compose            # All files
.scripts/ops/lint-docker-compose --hook     # Pre-commit mode

# Validate configuration
docker compose config                       # Parse and validate
docker compose config --resolve-secrets    # Include secrets

Environment Testing

Test environment validation:

1
2
3
4
5
6
7
8
9
# Validate required environment variables
validate_env_vars "NODE" "TELEGRAM_BOT_URL" "DOMAIN"

# Test network connectivity
validate_network_connectivity "domain.com" 443
validate_network_connectivity "database" 5432

# Test service dependencies
docker compose exec service nc -z dependency-service 5432

This comprehensive set of procedures ensures consistent, reliable operation while maintaining security, monitoring, and recovery capabilities.

Last updated on