- All SKILL.md files now <500 lines (avg reduction 69%) - Detailed content extracted to references/ subdirectories - Frontmatter standardised: only name + description (Anthropic standard) - New skills: brand-guidelines, spec-coauthor, report-templates, skill-creator - Design skills: anti-slop guidelines, premium-proposals reference - Removed non-standard frontmatter fields (triggers, version, author, category) Plugins affected: infraestrutura, marketing, dev-tools, crm-ops, gestao, core-tools, negocio, perfex-dev, wordpress, design-media Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
221 lines
7.5 KiB
Markdown
221 lines
7.5 KiB
Markdown
---
|
|
name: gateway-check
|
|
description: Health check rapido dos MCPs no gateway.descomplicar.pt — estado services (systemd+pm2), portas, memoria/CPU, erros recentes. Output tabela resumo.
|
|
context: fork
|
|
---
|
|
|
|
# /gateway-check v1.0
|
|
|
|
Health check rapido dos MCPs no servidor gateway (mcp-hub.descomplicar.pt).
|
|
|
|
**Referencia:** PROC-MCP-Desenvolvimento.md | Memory: `mcp-gateway.md`, `infra.md`
|
|
|
|
---
|
|
|
|
## Inventario MCPs Gateway
|
|
|
|
### pm2 (Node.js — /opt/mcp-gateway/)
|
|
|
|
| pm2 ID | Nome | Porta | Prioridade |
|
|
|--------|------|-------|------------|
|
|
| 0 | mcp-desk-crm | 3150 | P1 |
|
|
| 1 | mcp-memory | 3151 | P2 |
|
|
| 2 | mcp-wikijs | 3152 | P3 |
|
|
| 4 | mcp-moloni | 3158 | P2 |
|
|
|
|
### systemd (24 services)
|
|
|
|
| Service | Porta | Tipo | Prioridade |
|
|
|---------|-------|------|------------|
|
|
| mcp-time | 3163 | Node | P1 |
|
|
| google-workspace-mcp | 3164 | Python/FastMCP | P1 |
|
|
| n8n-mcp | 3157 | Node | P2 |
|
|
| gitea-mcp | 3162 | Go | P2 |
|
|
| gsc-mcp | 3153 | Python/FastMCP | P2 |
|
|
| google-analytics-mcp | 3156 | Python/FastMCP | P2 |
|
|
| imap-enterprise | 3155 | Node | P2 |
|
|
| context7-mcp | 3159 | Node | P3 |
|
|
| cwp-mcp | 3183 | Node/supergateway | P3 |
|
|
| cloudflare-dns-mcp | 3171 | Node/supergateway | P3 |
|
|
| mcp-youtube | 3187 | Python/FastMCP | P3 |
|
|
| youtube-research | 3184 | Node | P3 |
|
|
| magic-mcp | 3172 | Node/supergateway | P3 |
|
|
| mcp-echarts-mcp | 3173 | Node/supergateway | P3 |
|
|
| mcp-mermaid-mcp | 3174 | Node/supergateway | P3 |
|
|
| metabase-mcp | 3175 | Node/supergateway | P3 |
|
|
| pixabay-mcp | 3176 | Node/supergateway | P3 |
|
|
| replicate-mcp | 3177 | Node/supergateway | P3 |
|
|
| outline-api-mcp | 3178 | Node/supergateway | P3 |
|
|
| pexels-mcp | 3179 | Node/supergateway | P3 |
|
|
| penpot-mcp | 3180 | Node/supergateway | P3 |
|
|
| vimeo-mcp | 3181 | Node/supergateway | P3 |
|
|
| presenton-mcp | 3182 | Node/supergateway | P3 |
|
|
| mcp-reonic | 3160 | Node | P3 |
|
|
|
|
**Prioridades:** P1=critico (bloqueia trabalho) | P2=importante (degrada workflow) | P3=util
|
|
|
|
---
|
|
|
|
## Protocolo de Execucao
|
|
|
|
### 1. Estado dos services
|
|
|
|
```bash
|
|
# Executar via mcp__ssh-unified__ssh_execute(server="gateway")
|
|
|
|
# pm2
|
|
pm2 jlist 2>/dev/null | python3 -c "
|
|
import sys,json
|
|
for p in json.load(sys.stdin):
|
|
print(f\"{p['name']:20s} {p['pm2_env']['status']:10s} cpu={p['monit']['cpu']}% mem={p['monit']['memory']//1024//1024}MB restarts={p['pm2_env']['restart_time']} uptime={round(($(date +%s)*1000-p['pm2_env']['pm_uptime'])/3600000,1)}h\")
|
|
"
|
|
|
|
# systemd — estado + memoria
|
|
systemctl list-units --type=service --state=running,failed --no-pager | grep -i mcp
|
|
systemctl list-units --type=service --state=failed --no-pager | grep -i mcp
|
|
```
|
|
|
|
### 2. Verificar portas activas
|
|
|
|
```bash
|
|
# Confirmar que todas as portas esperadas estao a escutar
|
|
for port in 3150 3151 3152 3153 3155 3156 3157 3158 3159 3160 3162 3163 3164 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3187; do
|
|
if ss -tln | grep -q ":${port} "; then
|
|
echo "OK :${port}"
|
|
else
|
|
echo "DOWN :${port}"
|
|
fi
|
|
done
|
|
```
|
|
|
|
### 3. Memoria e CPU por MCP
|
|
|
|
```bash
|
|
# Top consumers de memoria
|
|
ps aux --sort=-%mem | head -20 | grep -E 'node|python|supergateway|mcp'
|
|
|
|
# Memoria total MCPs
|
|
ps aux | grep -E 'mcp|supergateway' | awk '{sum+=$6} END {printf "Total MCP RAM: %.0f MB\n", sum/1024}'
|
|
|
|
# Load do servidor
|
|
uptime
|
|
free -h
|
|
```
|
|
|
|
### 4. Erros recentes (ultimos 30min)
|
|
|
|
```bash
|
|
# pm2 logs com erros
|
|
pm2 logs --err --lines 5 --nostream 2>/dev/null
|
|
|
|
# systemd services com erros recentes
|
|
for svc in $(systemctl list-units --type=service --state=running | grep -i mcp | awk '{print $1}'); do
|
|
errs=$(journalctl -u $svc --since "30 min ago" -p err --no-pager -q 2>/dev/null | wc -l)
|
|
if [ "$errs" -gt 0 ]; then
|
|
echo "=== $svc ($errs erros) ==="
|
|
journalctl -u $svc --since "30 min ago" -p err --no-pager -q 2>/dev/null | tail -3
|
|
fi
|
|
done
|
|
```
|
|
|
|
### 5. Gateway nginx health
|
|
|
|
```bash
|
|
# Verificar nginx activo
|
|
systemctl is-active nginx
|
|
|
|
# Testar endpoint health (se existir)
|
|
curl -s -o /dev/null -w "%{http_code}" http://localhost/health 2>/dev/null || echo "no-health-endpoint"
|
|
```
|
|
|
|
---
|
|
|
|
## Execucao Pratica
|
|
|
|
Executar os 5 passos via `mcp__ssh-unified__ssh_execute(server="gateway")`. Agrupar comandos para minimizar chamadas SSH (maximo 2-3 chamadas).
|
|
|
|
**Chamada 1 — estado geral:**
|
|
```bash
|
|
echo "=== PM2 ===" && pm2 list 2>/dev/null && echo "=== SYSTEMD ===" && systemctl list-units --type=service --state=running --no-pager | grep -i mcp && echo "=== FAILED ===" && systemctl list-units --type=service --state=failed --no-pager | grep -i mcp && echo "=== LOAD ===" && uptime && free -h
|
|
```
|
|
|
|
**Chamada 2 — portas + memoria + erros:**
|
|
```bash
|
|
echo "=== PORTAS ===" && for port in 3150 3151 3152 3153 3155 3156 3157 3158 3159 3160 3162 3163 3164 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3187; do if ss -tln | grep -q ":${port} "; then echo "OK :${port}"; else echo "DOWN :${port}"; fi; done && echo "=== RAM MCPs ===" && ps aux | grep -E 'mcp|supergateway' | grep -v grep | awk '{sum+=$6} END {printf "Total: %.0f MB\n", sum/1024}' && echo "=== PM2 ERROS ===" && pm2 logs --err --lines 3 --nostream 2>/dev/null && echo "=== SYSTEMD ERROS (30min) ===" && for svc in $(systemctl list-units --type=service --state=running | grep -i mcp | awk '{print $1}'); do errs=$(journalctl -u $svc --since "30 min ago" -p err --no-pager -q 2>/dev/null | wc -l); if [ "$errs" -gt 0 ]; then echo "--- $svc ($errs erros) ---"; journalctl -u $svc --since "30 min ago" -p err --no-pager -q 2>/dev/null | tail -2; fi; done
|
|
```
|
|
|
|
---
|
|
|
|
## Output
|
|
|
|
Apresentar resultado como tabela resumo:
|
|
|
|
```markdown
|
|
## Gateway Health Check — [data via mcp-time]
|
|
|
|
**Servidor:** mcp-hub.descomplicar.pt | **Load:** X.XX | **RAM:** X.XG/XG | **MCPs RAM:** XXXMB
|
|
|
|
### Estado MCPs (X/28 operacionais)
|
|
|
|
| # | MCP | Porta | Gestor | Estado | RAM | Notas |
|
|
|---|-----|-------|--------|--------|-----|-------|
|
|
| 1 | mcp-desk-crm | 3150 | pm2 | OK/DOWN/WARN | XXmb | restarts, erros |
|
|
| ... | ... | ... | ... | ... | ... | ... |
|
|
|
|
### Alertas
|
|
- [P1] MCP X esta DOWN — accao sugerida
|
|
- [WARN] MCP Y tem N restarts nas ultimas Xh
|
|
- [WARN] RAM total MCPs > 2GB (limite recomendado)
|
|
|
|
### Erros Recentes
|
|
[lista de erros se existirem, agrupados por MCP]
|
|
```
|
|
|
|
**Criterios de estado:**
|
|
- **OK** — service running + porta a escutar + sem erros recentes
|
|
- **WARN** — running mas com erros recentes OU >5 restarts OU memoria >250MB
|
|
- **DOWN** — service parado OU porta nao escuta
|
|
|
|
---
|
|
|
|
## Troubleshooting Automatico
|
|
|
|
```
|
|
Se MCP DOWN:
|
|
1. Verificar service: systemctl status <nome>
|
|
2. Ver logs: journalctl -u <nome> --since "1h ago" --no-pager | tail -20
|
|
3. Se supergateway: verificar preload catch-errors.mjs (mcp-gateway.md)
|
|
4. Tentar restart: systemctl restart <nome>
|
|
5. Re-verificar porta
|
|
|
|
Se RAM total > 2GB:
|
|
1. Identificar top consumers
|
|
2. Verificar processos orphan: ps aux | grep -c supergateway
|
|
3. Se orphans > 28: limpar com pkill e restart escalonado (infra.md)
|
|
|
|
Se muitos restarts pm2:
|
|
1. pm2 logs <nome> --err --lines 20
|
|
2. Verificar se e o bug conhecido do supergateway (mcp-gateway.md)
|
|
```
|
|
|
|
---
|
|
|
|
## Anti-Patterns
|
|
|
|
- **Nunca** fazer restart massivo sem verificar primeiro (pode causar downtime)
|
|
- **Nunca** ignorar MCP P1 em estado DOWN
|
|
- **Sempre** reportar estado mesmo que tudo esteja OK (confirma que o check correu)
|
|
- **Sempre** incluir timestamp via mcp-time no output
|
|
|
|
---
|
|
|
|
## Integracao
|
|
|
|
- **/today** pode invocar `/gateway-check` como parte do checkup diario
|
|
- **/infra-check** faz verificacao mais ampla (inclui despesas); `/gateway-check` e focado apenas nos MCPs gateway
|
|
- Resultado pode ser publicado na discussao #31 (Logs) do projecto #65
|
|
|
|
---
|
|
|
|
*Skill v1.0.0 | 12-03-2026 | Descomplicar*
|