New plugins: core-tools New skills: auto-expense, ticket-triage, design, security-check, aiktop-tasks, daily-digest, imap-triage, index-update, mindmap, notebooklm, proc-creator, tasks-overview, validate-component, perfex-module, report, calendar-manager New agents: design-critic, design-generator, design-lead, design-prompt-architect, design-researcher, compliance-auditor, metabase-analyst, gitea-integration-specialist Updated: all plugin configs, knowledge datasets, existing skills Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
479 lines
9.8 KiB
Markdown
479 lines
9.8 KiB
Markdown
---
|
|
name: proxmox-cluster
|
|
description: Formar cluster Proxmox 2+ nodes com Corosync e Quorum. Use when user mentions "create cluster", "proxmox cluster", "pvecm", "join node", "cluster formation".
|
|
author: Descomplicar® Crescimento Digital
|
|
version: 1.0.0
|
|
quality_score: 75
|
|
user_invocable: true
|
|
desk_task: 1712
|
|
allowed-tools: Task, Read, Bash
|
|
dependencies:
|
|
- ssh-unified
|
|
- notebooklm
|
|
- proxmox-setup
|
|
---
|
|
|
|
# Proxmox Cluster
|
|
|
|
Formar cluster Proxmox 2+ nodes com Corosync, Quorum e preparação para High Availability.
|
|
|
|
## Quando Usar
|
|
|
|
- Formar cluster 2-node após migration complete
|
|
- Adicionar node a cluster existente
|
|
- Configurar quorum e fencing
|
|
- Preparar para HA (skill /proxmox-ha)
|
|
|
|
## Sintaxe
|
|
|
|
```bash
|
|
/proxmox-cluster create --node-a <ip-hostname> --node-b <ip-hostname> [--cluster-name]
|
|
/proxmox-cluster join --node <ip> --cluster <existing-cluster-ip>
|
|
```
|
|
|
|
## Exemplos
|
|
|
|
```bash
|
|
# Criar cluster 2-node
|
|
/proxmox-cluster create --node-a server.descomplicar.pt --node-b cluster.descomplicar.pt --cluster-name descomplicar
|
|
|
|
# Adicionar 3º node
|
|
/proxmox-cluster join --node pve-node3.descomplicar.pt --cluster cluster.descomplicar.pt
|
|
```
|
|
|
|
## Knowledge Sources
|
|
|
|
### NotebookLM
|
|
```bash
|
|
mcp__notebooklm__notebook_query \
|
|
notebook_id:"276ccdde-6b95-42a3-ad96-4e64d64c8d52" \
|
|
query:"proxmox cluster corosync quorum pvecm ha"
|
|
```
|
|
|
|
## Workflow Completo
|
|
|
|
### Pre-Requisites
|
|
|
|
**1. Verificar Nodes Prontos**
|
|
```bash
|
|
# Ambos nodes devem ter:
|
|
- Proxmox VE 8.x instalado (/proxmox-setup)
|
|
- Networking configurado (NAT ou vSwitch)
|
|
- PBS configurado (/pbs-config)
|
|
- Mesma versão PVE
|
|
- Hostnames únicos
|
|
- Conectividade IP entre nodes
|
|
```
|
|
|
|
**2. Validar Conectividade**
|
|
```bash
|
|
# De Node A → Node B
|
|
ping -c 3 <node-b-ip>
|
|
ssh root@<node-b-ip> pveversion
|
|
|
|
# De Node B → Node A
|
|
ping -c 3 <node-a-ip>
|
|
ssh root@<node-a-ip> pveversion
|
|
```
|
|
|
|
**3. Sincronizar Time (CRITICAL)**
|
|
```bash
|
|
# Ambos nodes devem ter NTP configurado
|
|
timedatectl status
|
|
|
|
# Instalar chrony se necessário
|
|
apt install chrony
|
|
systemctl enable --now chronyd
|
|
```
|
|
|
|
**4. Backup Pre-Cluster**
|
|
```bash
|
|
# Backup configs de ambos nodes
|
|
tar -czf /tmp/pre-cluster-backup.tar.gz /etc/pve /etc/network
|
|
|
|
# Transfer para PBS
|
|
```
|
|
|
|
### Fase 1: Cluster Creation (Node B)
|
|
|
|
**1.1 Criar Cluster em Node B (Primeiro Node)**
|
|
```bash
|
|
# SSH to Node B (cluster.descomplicar.pt)
|
|
ssh root@<node-b-ip>
|
|
|
|
# Criar cluster
|
|
pvecm create descomplicar
|
|
|
|
# Verificar
|
|
pvecm status
|
|
|
|
# Expected output:
|
|
# Cluster information
|
|
# Name: descomplicar
|
|
# Nodes: 1
|
|
# Expected votes: 1
|
|
```
|
|
|
|
**1.2 Obter Cluster Join Info**
|
|
```bash
|
|
# Obter join information (para Node A)
|
|
pvecm nodes
|
|
|
|
# Anot IP e nome do cluster
|
|
```
|
|
|
|
### Fase 2: Join Node A ao Cluster
|
|
|
|
**2.1 Join Node A**
|
|
```bash
|
|
# SSH to Node A (server.descomplicar.pt)
|
|
ssh root@<node-a-ip>
|
|
|
|
# Join cluster (fornecer IP do Node B)
|
|
pvecm add <node-b-ip>
|
|
|
|
# Durante processo:
|
|
# - Solicita password root do Node B
|
|
# - Transfere configuração cluster
|
|
# - Copia /etc/pve/
|
|
# - Reinicia serviços cluster
|
|
|
|
# AGUARDAR ~2-5min
|
|
```
|
|
|
|
**2.2 Verificar Join Successful**
|
|
```bash
|
|
# Em Node A:
|
|
pvecm status
|
|
|
|
# Expected output:
|
|
# Nodes: 2
|
|
# Expected votes: 2
|
|
# Quorum: 2 (Active)
|
|
|
|
# Listar nodes
|
|
pvecm nodes
|
|
|
|
# Should show both nodes
|
|
```
|
|
|
|
**2.3 Verificar Replicação /etc/pve/**
|
|
```bash
|
|
# Em Node A:
|
|
ls -lah /etc/pve/
|
|
|
|
# Should see:
|
|
# - nodes/ (ambos nodes)
|
|
# - qemu-server/ (VMs)
|
|
# - lxc/ (containers)
|
|
# - storage.cfg (shared)
|
|
|
|
# Teste: Criar VM em Node A via Web UI
|
|
# Verificar aparece em Node B também
|
|
```
|
|
|
|
### Fase 3: Quorum Configuration
|
|
|
|
**3.1 Verificar Quorum Votes**
|
|
```bash
|
|
pvecm status | grep "Expected votes"
|
|
|
|
# 2-node cluster:
|
|
# Expected votes: 2
|
|
# Quorum: 2
|
|
|
|
# CRITICAL: Com 2 nodes, perder 1 node = perder quorum
|
|
```
|
|
|
|
**3.2 Configurar QDevice (Opcional - 2-node clusters)**
|
|
|
|
**Problema 2-node cluster:** Se 1 node falha, cluster perde quorum (não pode fazer alterações).
|
|
|
|
**Solução:** Adicionar QDevice externo (3º vote)
|
|
|
|
```bash
|
|
# Em VPS externo leve (ou Raspberry Pi):
|
|
apt install corosync-qnetd
|
|
|
|
# Em ambos PVE nodes:
|
|
apt install corosync-qdevice
|
|
|
|
# Configurar QDevice (em Node A ou B):
|
|
pvecm qdevice setup <qdevice-ip>
|
|
|
|
# Verificar
|
|
pvecm status
|
|
# Expected votes: 3 (2 nodes + 1 qdevice)
|
|
```
|
|
|
|
**Recomendação Cluster Descomplicar:**
|
|
- Iniciar sem QDevice (aceitar limitação 2-node)
|
|
- Adicionar QDevice futuro se necessário
|
|
- Ou adicionar 3º node físico
|
|
|
|
### Fase 4: Storage Configuration
|
|
|
|
**4.1 Configurar Shared Storage (Opcional)**
|
|
|
|
**Opções:**
|
|
- NFS share
|
|
- Ceph (mínimo 3 nodes)
|
|
- ZFS replication (não shared, mas sync)
|
|
|
|
**Para 2-node sem shared storage:**
|
|
- VMs ficam em local storage de cada node
|
|
- Live migration copia disk (mais lento mas funciona)
|
|
- HA usa storage replication ou aceita downtime de boot
|
|
|
|
**4.2 Configurar PBS como Shared**
|
|
```bash
|
|
# PBS já configurado (/pbs-config)
|
|
# Adicionar PBS storage em ambos nodes via Web UI
|
|
|
|
# Datacenter → Storage → Add → Proxmox Backup Server
|
|
# ID: pbs-main
|
|
# Server: <pbs-ip>
|
|
# Datastore: main-store
|
|
# Content: VZDump backup files
|
|
# Nodes: ALL
|
|
```
|
|
|
|
### Fase 5: Networking Validation
|
|
|
|
**5.1 Verificar Cluster Network**
|
|
```bash
|
|
# Verificar Corosync usa network correcta
|
|
cat /etc/pve/corosync.conf
|
|
|
|
# Deve usar IP management (não vSwitch)
|
|
# bindnetaddr: <management-subnet>
|
|
```
|
|
|
|
**5.2 Testar Latência Entre Nodes**
|
|
```bash
|
|
# De Node A → Node B
|
|
ping -c 100 <node-b-ip> | tail -5
|
|
|
|
# Expected: <5ms latency (mesma datacenter)
|
|
# CRITICAL: >10ms pode causar issues cluster
|
|
```
|
|
|
|
**5.3 Configurar Cluster Network Redundancy (Opcional)**
|
|
```bash
|
|
# Se múltiplas networks disponíveis:
|
|
pvecm update-ring -interface <ip-ring1>
|
|
|
|
# Adicionar 2º ring (redundância)
|
|
# Requer 2 NICs ou VLANs separadas
|
|
```
|
|
|
|
### Fase 6: Firewall Cluster
|
|
|
|
**6.1 Portas Necessárias (abrir entre nodes)**
|
|
```bash
|
|
# Corosync:
|
|
UDP 5404-5405
|
|
|
|
# PVE cluster:
|
|
TCP 22 (SSH)
|
|
TCP 8006 (Web UI)
|
|
TCP 3128 (SPICE proxy)
|
|
TCP 85 (pvedaemon)
|
|
|
|
# Verificar firewall permite
|
|
iptables -L -n -v | grep 5404
|
|
```
|
|
|
|
**6.2 Firewall Proxmox (Web UI)**
|
|
```bash
|
|
# Datacenter → Firewall → Options
|
|
# Enable firewall: NO (inicialmente, configurar depois)
|
|
|
|
# Se enable:
|
|
# - Adicionar rules para cluster communication
|
|
# - Testar conectividade antes de aplicar
|
|
```
|
|
|
|
### Fase 7: Validation Tests
|
|
|
|
**7.1 Cluster Status**
|
|
```bash
|
|
# Ambos nodes:
|
|
pvecm status
|
|
|
|
# Expected:
|
|
# Quorum: Active
|
|
# Nodes: 2
|
|
# Total votes: 2
|
|
# Node online: 2
|
|
```
|
|
|
|
**7.2 Criar VM Teste**
|
|
```bash
|
|
# Node A: Criar VM 999
|
|
qm create 999 --name cluster-test --memory 512 --cores 1
|
|
|
|
# Node B: Verificar VM aparece
|
|
qm list | grep 999
|
|
|
|
# Deve aparecer em ambos (shared /etc/pve/)
|
|
```
|
|
|
|
**7.3 Migrar VM Entre Nodes (Offline)**
|
|
```bash
|
|
# Migração offline (sem shared storage)
|
|
qm migrate 999 <node-b-name>
|
|
|
|
# Aguardar transfer completo
|
|
# Verificar VM migrou
|
|
```
|
|
|
|
**7.4 Simular Falha Node (CUIDADO)**
|
|
```bash
|
|
# Em ambiente teste:
|
|
# Shutdown Node B
|
|
systemctl stop pve-cluster corosync
|
|
|
|
# Node A deve continuar funcional
|
|
# Mas quorum perdido (2-node limitation)
|
|
|
|
# Reactivar Node B
|
|
systemctl start corosync pve-cluster
|
|
|
|
# Quorum restaura automaticamente
|
|
```
|
|
|
|
## Output Summary
|
|
|
|
```
|
|
✅ Cluster Proxmox formado: descomplicar
|
|
|
|
🖥️ Nodes:
|
|
- Node A: server.descomplicar.pt (138.201.X.X)
|
|
- Node B: cluster.descomplicar.pt (138.201.X.X)
|
|
- Total: 2 nodes
|
|
|
|
🗳️ Quorum:
|
|
- Expected votes: 2
|
|
- Active votes: 2
|
|
- Status: Active ✓
|
|
|
|
📁 Shared Config:
|
|
- /etc/pve/ replicated
|
|
- VMs visible em ambos nodes
|
|
- Storage config synced
|
|
|
|
💾 Storage:
|
|
- Local: ZFS rpool em cada node
|
|
- Backup: PBS shared (pbs-main)
|
|
- [Futuro] Shared storage: Ceph ou NFS
|
|
|
|
🔄 Migration:
|
|
- Offline migration: Enabled ✓
|
|
- Live migration: Enabled (sem shared storage = slow)
|
|
- HA: Ready (configurar com /proxmox-ha)
|
|
|
|
⚠️ Limitations (2-node cluster):
|
|
- Perder 1 node = perder quorum
|
|
- Solução: QDevice ou 3º node
|
|
- HA com fencing crítico
|
|
|
|
📋 Next Steps:
|
|
1. Configurar HA groups (/proxmox-ha)
|
|
2. Configurar fencing devices
|
|
3. Testar failover automático
|
|
4. Migrar VMs production para cluster
|
|
5. Monitorizar cluster health
|
|
|
|
⏱️ Formation time: ~15min
|
|
```
|
|
|
|
## 2-Node Cluster Considerations
|
|
|
|
### Quorum Issue
|
|
**Problema:** Perder 1 node = perder quorum (cluster read-only)
|
|
|
|
**Mitigações:**
|
|
1. **QDevice externo** (3º vote em VPS leve)
|
|
2. **expected_votes override** (emergência - perigoso)
|
|
3. **Adicionar 3º node** (ideal)
|
|
|
|
### Fencing CRITICAL
|
|
**Problema:** Split-brain (ambos nodes pensam que são primários)
|
|
|
|
**Solução:** Fencing obrigatório para HA
|
|
- STONITH (Shoot The Other Node In The Head)
|
|
- Power fencing via IPMI/iLO
|
|
- Network fencing (menos confiável)
|
|
|
|
### No Shared Storage
|
|
**Implicações:**
|
|
- Live migration mais lenta (copia disk)
|
|
- HA requer storage replication ou aceita downtime
|
|
- VMs ficam "pinned" ao node onde disk existe
|
|
|
|
**Alternativas:**
|
|
- Ceph (mínimo 3 nodes)
|
|
- NFS share externo
|
|
- ZFS replication (pvesr)
|
|
|
|
## Troubleshooting
|
|
|
|
### Node join fails
|
|
```bash
|
|
# Verificar conectividade
|
|
ping <other-node-ip>
|
|
ssh root@<other-node-ip>
|
|
|
|
# Verificar versões matching
|
|
pveversion
|
|
|
|
# Verificar /etc/hosts
|
|
cat /etc/hosts
|
|
# Deve ter entrada para ambos nodes
|
|
|
|
# Logs
|
|
journalctl -u pve-cluster -f
|
|
journalctl -u corosync -f
|
|
```
|
|
|
|
### Quorum lost
|
|
```bash
|
|
# Verificar status
|
|
pvecm status
|
|
|
|
# Nodes online mas quorum lost:
|
|
# - Verificar time sync (ntpd/chrony)
|
|
# - Verificar network latency
|
|
# - Restart corosync
|
|
|
|
systemctl restart corosync pve-cluster
|
|
```
|
|
|
|
### Split-brain
|
|
```bash
|
|
# CRITICAL: Ambos nodes pensam que são primários
|
|
|
|
# Identificar:
|
|
pvecm status # Em ambos nodes, status diferente
|
|
|
|
# Resolver:
|
|
# 1. Shutdown 1 node completamente
|
|
# 2. Fix networking/corosync no node online
|
|
# 3. Rejoin node shutdown
|
|
```
|
|
|
|
## References
|
|
|
|
- **NotebookLM:** 276ccdde-6b95-42a3-ad96-4e64d64c8d52
|
|
- **Proxmox Cluster Docs:** https://pve.proxmox.com/pve-docs/chapter-pvecm.html
|
|
- **Corosync:** https://corosync.github.io/corosync/
|
|
|
|
---
|
|
|
|
**Versão:** 1.0.0 | **Autor:** Descomplicar® | **Data:** 2026-02-14
|
|
|
|
---
|
|
|
|
**/** @author Descomplicar® | @copyright 2026 **/
|