--- name: proxmox-ha description: Configuracao de High Availability em cluster Proxmox -- HA Manager, fencing devices (STONITH) e failover automatico para VMs criticas. --- # Proxmox HA Configurar High Availability (HA) em cluster Proxmox com HA Manager, fencing devices e failover automatico para VMs criticas. ## Quando Usar - Configurar HA apos cluster formation (/proxmox-cluster) - Proteger VMs criticas com failover automatico - Configurar fencing devices (STONITH) - Definir HA groups por criticidade - Testar failover procedures ## Sintaxe ```bash /proxmox-ha configure --critical-vms [--fencing watchdog|ipmi] [--max-relocate 2] ``` ## Knowledge Sources ```bash mcp__notebooklm__notebook_query \ notebook_id:"276ccdde-6b95-42a3-ad96-4e64d64c8d52" \ query:"proxmox ha high availability fencing stonith failover" ``` --- ## Pre-Requisitos **1. Cluster formado:** ```bash pvecm status # Expected: Quorum: Active, Nodes: 2+ online ``` **2. Shared Storage ou Replication:** - **Shared storage** (NFS, Ceph): HA ideal (failover <30s) - **Sem shared storage**: ZFS replication ou boot time failover (~2-5min) **3. Fencing device configurado** - sem fencing = risco split-brain --- ## Workflow Completo ### Fase 1: Fencing Configuration Detalhes completos das 3 opcoes (Watchdog, IPMI, Network) em: `references/fencing-configuration.md` **Resumo:** Watchdog para inicio, IPMI para producao, evitar network fencing. ### Fase 2: HA Manager Configuration ```bash # Verificar status ha-manager status # Expected: quorum: OK, master: (elected), lrm: active ``` **Criar HA Groups por criticidade:** ```bash # Critical (priority 100) ha-manager groupadd critical \ --nodes "server.descomplicar.pt:100,cluster.descomplicar.pt:100" # Medium (priority 50) ha-manager groupadd medium \ --nodes "server.descomplicar.pt:50,cluster.descomplicar.pt:50" # Low (priority 10) ha-manager groupadd low \ --nodes "server.descomplicar.pt:10,cluster.descomplicar.pt:10" ``` ### Fase 3: Adicionar VMs a HA ```bash # VM 200 (EasyPanel Docker) ha-manager add vm:200 \ --group critical \ --max_restart 3 \ --max_relocate 2 \ --state started # VM 300 (CWP) ha-manager add vm:300 \ --group critical \ --max_restart 3 \ --max_relocate 2 \ --state started ``` **Parametros:** - `max_restart`: Tentativas restart no mesmo node antes de relocate - `max_relocate`: Maximo relocates entre nodes - `state started`: HA Manager garante VM esta sempre started ```bash # Verificar ha-manager status ``` ### Fase 4: Failover Testing Procedimentos detalhados de teste (shutdown clean, node crash simulado, live migration) e tuning de policies em: `references/failover-testing.md` ### Fase 5: Production Rollout Abordagem faseada (low -> medium -> critical) com monitorizacao de 30 dias. Documentar runbook em: `06-Operacoes/Procedimentos/D7-Tecnologia/PROC-HA-Failover.md` --- ## Best Practices **Fazer:** - Testar failover em VMs teste ANTES production - Configurar fencing (watchdog minimo, IPMI ideal) - Monitorizar quorum 24/7 - Documentar runbooks failover - Backup ANTES activar HA **Nao fazer:** - HA sem fencing (risco split-brain) - max_relocate muito alto (VM fica "bouncing") - Assumir instant failover sem shared storage - Testar failover em production sem plano --- ## Troubleshooting ### VM nao failover ```bash ha-manager status | grep vm:ID pvecm status journalctl -u pve-ha-crm -f ``` ### Split-brain detected ```bash # Shutdown 1 node completamente systemctl poweroff # No node restante: pvecm expected 1 # Force quorum com 1 node # Resolver networking, rejoin node shutdown ``` ### Failover loop (VM keeps restarting) ```bash # Pause HA temporario ha-manager set vm:ID --state disabled # Fix VM issue # Re-enable HA ha-manager set vm:ID --state started ``` --- ## References - `references/fencing-configuration.md` - Detalhes Watchdog, IPMI e Network fencing - `references/failover-testing.md` - Testes, policies, monitoring, alertas, production rollout - **NotebookLM:** 276ccdde-6b95-42a3-ad96-4e64d64c8d52 - **HA Manager Docs:** https://pve.proxmox.com/pve-docs/ha-manager.1.html - **Fencing:** https://pve.proxmox.com/wiki/Fencing