fix(project-manager): remover Dify KB das descriptions, marcar nota TODO

Dify foi removido 06-03-2026. Skills brainstorm/discover ainda referenciam-no
no corpo. Bump v1.2 + nota top-of-file. Reescrita workflow para próxima sessão.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-07 04:52:03 +01:00
parent 6285be6c2e
commit faef9b47dc
185 changed files with 9238 additions and 589 deletions
+164 -163
View File
@@ -1,220 +1,221 @@
---
name: gateway-check
description: Health check rapido dos MCPs no gateway.descomplicar.pt — estado services (systemd+pm2), portas, memoria/CPU, erros recentes. Output tabela resumo.
description: Gestao completa dos MCPs no gateway.descomplicar.pt — health check, restart, troubleshoot, mapa de portas, adicionar MCPs. Usar quando MCP falha, health check, ou gestao gateway.
context: fork
---
# /gateway-check v1.0
# /gateway-check v2.0
Health check rapido dos MCPs no servidor gateway (mcp-hub.descomplicar.pt).
Gestao e health check dos MCPs no servidor gateway (VM 103, gateway.descomplicar.pt).
**Referencia:** PROC-MCP-Desenvolvimento.md | Memory: `mcp-gateway.md`, `infra.md`
**Referencia:** Memory `mcp-gateway.md` | PROC-MCP-Desenvolvimento.md
---
## Inventario MCPs Gateway
## Acesso
- **VM:** 103 no Proxmox (QEMU)
- **IP:** 5.9.90.69
- **SSH:** `mcp__ssh-unified__ssh_execute(server="gateway")`
- **HTTPS:** `https://gateway.descomplicar.pt/v1/<nome>/mcp`
- **Nginx whitelist:** 188.251.199.30 (IP fixo NOS). Se 403 -> verificar IP com `curl -4 ifconfig.me`
- **Nginx config:** `/etc/nginx/sites-enabled/` no gateway
- **NAO confundir com:** server (VM 100, 5.9.90.105), easy (VM 101, 5.9.90.70), dev (LXC 102)
---
## Mapa de MCPs (30 services — actualizado 28-03-2026)
### pm2 (Node.js — /opt/mcp-gateway/)
| pm2 ID | Nome | Porta | Prioridade |
|--------|------|-------|------------|
| 0 | mcp-desk-crm | 3150 | P1 |
| 1 | mcp-memory | 3151 | P2 |
| 2 | mcp-wikijs | 3152 | P3 |
| 4 | mcp-moloni | 3158 | P2 |
| pm2 ID | Nome | Porta | nginx path | Prioridade |
|--------|------|-------|------------|------------|
| 0 | mcp-desk-crm | 3150 | /v1/desk-crm/mcp | P1 |
| 1 | mcp-memory | 3151 | /v1/memory/mcp | P1 |
| 2 | mcp-wikijs | 3152 | /v1/wikijs/mcp | P3 |
| 5 | mcp-moloni | 3158 | /v1/moloni/mcp | P2 |
| 6 | mcp-youtube-research | 3157 | /v1/youtube-research/mcp | P3 |
| 7 | mcp-youtube | 3187 | /v1/youtube/mcp | P3 |
### systemd (24 services)
| Service | Porta | Tipo | Prioridade |
|---------|-------|------|------------|
| mcp-time | 3163 | Node | P1 |
| google-workspace-mcp | 3164 | Python/FastMCP | P1 |
| n8n-mcp | 3157 | Node | P2 |
| gitea-mcp | 3162 | Go | P2 |
| gsc-mcp | 3153 | Python/FastMCP | P2 |
| google-analytics-mcp | 3156 | Python/FastMCP | P2 |
| imap-enterprise | 3155 | Node | P2 |
| context7-mcp | 3159 | Node | P3 |
| cwp-mcp | 3183 | Node/supergateway | P3 |
| cloudflare-dns-mcp | 3171 | Node/supergateway | P3 |
| mcp-youtube | 3187 | Python/FastMCP | P3 |
| youtube-research | 3184 | Node | P3 |
| magic-mcp | 3172 | Node/supergateway | P3 |
| mcp-echarts-mcp | 3173 | Node/supergateway | P3 |
| mcp-mermaid-mcp | 3174 | Node/supergateway | P3 |
| metabase-mcp | 3175 | Node/supergateway | P3 |
| pixabay-mcp | 3176 | Node/supergateway | P3 |
| replicate-mcp | 3177 | Node/supergateway | P3 |
| outline-api-mcp | 3178 | Node/supergateway | P3 |
| pexels-mcp | 3179 | Node/supergateway | P3 |
| penpot-mcp | 3180 | Node/supergateway | P3 |
| vimeo-mcp | 3181 | Node/supergateway | P3 |
| presenton-mcp | 3182 | Node/supergateway | P3 |
| mcp-reonic | 3160 | Node | P3 |
| Service | Porta | nginx path | Tipo | Prioridade |
|---------|-------|------------|------|------------|
| google-workspace-mcp | 3164 | /v1/google-workspace/mcp | Python/FastMCP | P1 |
| mcp-time | 3163 | /v1/mcp-time/mcp | Node | P1 |
| imap-enterprise | 3160 | /v1/imap/mcp | Node | P2 |
| gitea-mcp | 3162 | /v1/gitea/mcp | Go | P2 |
| n8n-mcp | 3161 | /v1/n8n/mcp | Node | P2 |
| gsc-mcp | 3153 | /v1/gsc/mcp | Python/FastMCP | P2 |
| google-analytics-mcp | 3156 | /v1/google-analytics/mcp | Python/FastMCP | P2 |
| context7-mcp | 3159 | /v1/context7/mcp | Node | P2 |
| mcp-reonic | 3170 | /v1/reonic/mcp | Node | P3 |
| cloudflare-dns-mcp | 3171 | /v1/cloudflare-dns/mcp | supergateway | P3 |
| magic-mcp | 3172 | /v1/magic/mcp | supergateway | P3 |
| mcp-echarts-mcp | 3173 | /v1/mcp-echarts/mcp | supergateway | P3 |
| mcp-mermaid-mcp | 3174 | /v1/mcp-mermaid/mcp | supergateway | P3 |
| metabase-mcp | 3175 | /v1/metabase/mcp | supergateway | P3 |
| pixabay-mcp | 3176 | /v1/pixabay/mcp | supergateway | P3 |
| replicate-mcp | 3177 | /v1/replicate/mcp | supergateway | P3 |
| outline-api-mcp | 3178 | /v1/outline-api/mcp | supergateway | P3 |
| pexels-mcp | 3179 | /v1/pexels/mcp | supergateway | P3 |
| penpot-mcp | 3180 | /v1/penpot/mcp | supergateway | P3 |
| vimeo-mcp | 3181 | /v1/vimeo/mcp | supergateway | P3 |
| presenton-mcp | 3182 | /v1/presenton/mcp | supergateway | P3 |
| cwp-mcp | 3183 | /v1/cwp/mcp | supergateway | P3 |
| design-engine-mcp | 3184 | /v1/design-engine/mcp | supergateway | P3 |
**Prioridades:** P1=critico (bloqueia trabalho) | P2=importante (degrada workflow) | P3=util
**Proxima porta livre:** 3188
---
## Protocolo de Execucao
## Protocolo de Health Check
### 1. Estado dos services
Executar via `mcp__ssh-unified__ssh_execute(server="gateway")` em 2 chamadas:
```bash
# Executar via mcp__ssh-unified__ssh_execute(server="gateway")
### Chamada 1 — estado geral
# pm2
pm2 jlist 2>/dev/null | python3 -c "
import sys,json
for p in json.load(sys.stdin):
print(f\"{p['name']:20s} {p['pm2_env']['status']:10s} cpu={p['monit']['cpu']}% mem={p['monit']['memory']//1024//1024}MB restarts={p['pm2_env']['restart_time']} uptime={round(($(date +%s)*1000-p['pm2_env']['pm_uptime'])/3600000,1)}h\")
"
# systemd — estado + memoria
systemctl list-units --type=service --state=running,failed --no-pager | grep -i mcp
systemctl list-units --type=service --state=failed --no-pager | grep -i mcp
```
### 2. Verificar portas activas
```bash
# Confirmar que todas as portas esperadas estao a escutar
for port in 3150 3151 3152 3153 3155 3156 3157 3158 3159 3160 3162 3163 3164 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3187; do
if ss -tln | grep -q ":${port} "; then
echo "OK :${port}"
else
echo "DOWN :${port}"
fi
done
```
### 3. Memoria e CPU por MCP
```bash
# Top consumers de memoria
ps aux --sort=-%mem | head -20 | grep -E 'node|python|supergateway|mcp'
# Memoria total MCPs
ps aux | grep -E 'mcp|supergateway' | awk '{sum+=$6} END {printf "Total MCP RAM: %.0f MB\n", sum/1024}'
# Load do servidor
uptime
free -h
```
### 4. Erros recentes (ultimos 30min)
```bash
# pm2 logs com erros
pm2 logs --err --lines 5 --nostream 2>/dev/null
# systemd services com erros recentes
for svc in $(systemctl list-units --type=service --state=running | grep -i mcp | awk '{print $1}'); do
errs=$(journalctl -u $svc --since "30 min ago" -p err --no-pager -q 2>/dev/null | wc -l)
if [ "$errs" -gt 0 ]; then
echo "=== $svc ($errs erros) ==="
journalctl -u $svc --since "30 min ago" -p err --no-pager -q 2>/dev/null | tail -3
fi
done
```
### 5. Gateway nginx health
```bash
# Verificar nginx activo
systemctl is-active nginx
# Testar endpoint health (se existir)
curl -s -o /dev/null -w "%{http_code}" http://localhost/health 2>/dev/null || echo "no-health-endpoint"
```
---
## Execucao Pratica
Executar os 5 passos via `mcp__ssh-unified__ssh_execute(server="gateway")`. Agrupar comandos para minimizar chamadas SSH (maximo 2-3 chamadas).
**Chamada 1 — estado geral:**
```bash
echo "=== PM2 ===" && pm2 list 2>/dev/null && echo "=== SYSTEMD ===" && systemctl list-units --type=service --state=running --no-pager | grep -i mcp && echo "=== FAILED ===" && systemctl list-units --type=service --state=failed --no-pager | grep -i mcp && echo "=== LOAD ===" && uptime && free -h
```
**Chamada 2 — portas + memoria + erros:**
### Chamada 2 — portas + erros
```bash
echo "=== PORTAS ===" && for port in 3150 3151 3152 3153 3155 3156 3157 3158 3159 3160 3162 3163 3164 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3187; do if ss -tln | grep -q ":${port} "; then echo "OK :${port}"; else echo "DOWN :${port}"; fi; done && echo "=== RAM MCPs ===" && ps aux | grep -E 'mcp|supergateway' | grep -v grep | awk '{sum+=$6} END {printf "Total: %.0f MB\n", sum/1024}' && echo "=== PM2 ERROS ===" && pm2 logs --err --lines 3 --nostream 2>/dev/null && echo "=== SYSTEMD ERROS (30min) ===" && for svc in $(systemctl list-units --type=service --state=running | grep -i mcp | awk '{print $1}'); do errs=$(journalctl -u $svc --since "30 min ago" -p err --no-pager -q 2>/dev/null | wc -l); if [ "$errs" -gt 0 ]; then echo "--- $svc ($errs erros) ---"; journalctl -u $svc --since "30 min ago" -p err --no-pager -q 2>/dev/null | tail -2; fi; done
echo "=== PORTAS ===" && for port in 3150 3151 3152 3153 3156 3157 3158 3159 3160 3161 3162 3163 3164 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3187; do if ss -tln | grep -q ":${port} "; then echo "OK :${port}"; else echo "DOWN :${port}"; fi; done && echo "=== RAM MCPs ===" && ps aux | grep -E 'mcp|supergateway' | grep -v grep | awk '{sum+=$6} END {printf "Total: %.0f MB\n", sum/1024}' && echo "=== ERROS (30min) ===" && for svc in $(systemctl list-units --type=service --state=running | grep -i mcp | awk '{print $1}'); do errs=$(journalctl -u $svc --since "30 min ago" -p err --no-pager -q 2>/dev/null | wc -l); if [ "$errs" -gt 0 ]; then echo "--- $svc ($errs) ---"; journalctl -u $svc --since "30 min ago" -p err --no-pager -q 2>/dev/null | tail -2; fi; done && echo "=== PM2 ERROS ===" && pm2 logs --err --lines 3 --nostream 2>/dev/null
```
### Output esperado
Apresentar como tabela resumo com data via mcp-time:
```
## Gateway Health — [data]
Servidor: gateway.descomplicar.pt | Load: X.XX | RAM: X.XG/XG | MCPs RAM: XXXMB
X/30 operacionais | Alertas: N
```
**Criterios:** OK=running+porta escuta | WARN=running com erros ou >5 restarts | DOWN=parado ou porta fechada
---
## Operacoes
### Restart de um MCP
```bash
# pm2
pm2 restart <nome>
# systemd
systemctl restart <nome>.service
```
### Ver logs de um MCP
```bash
# pm2
pm2 logs <nome> --lines 30 --nostream
# systemd
journalctl -u <nome>.service --since "1h ago" --no-pager | tail -30
```
### Testar endpoint especifico
```bash
# Internamente no gateway
curl -s http://127.0.0.1:<porta>/mcp -X POST \
-H "Content-Type: application/json" \
-H "Accept: application/json, text/event-stream" \
-d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"0.1"}}}'
# Externamente
curl -s https://gateway.descomplicar.pt/v1/<nome>/mcp -X POST \
-H "Content-Type: application/json" \
-H "Accept: application/json, text/event-stream" \
-d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"0.1"}}}'
```
---
## Output
## Troubleshooting
Apresentar resultado como tabela resumo:
### MCP DOWN
```markdown
## Gateway Health Check — [data via mcp-time]
1. Verificar service: `systemctl status <nome>` ou `pm2 show <nome>`
2. Ver logs: `journalctl -u <nome> --since "1h ago" --no-pager | tail -30`
3. Se supergateway: verificar preload `catch-errors.mjs` (ver abaixo)
4. Tentar restart: `systemctl restart <nome>`
5. Re-verificar porta: `ss -tln | grep :<porta>`
**Servidor:** mcp-hub.descomplicar.pt | **Load:** X.XX | **RAM:** X.XG/XG | **MCPs RAM:** XXXMB
### Supergateway crash (bug conhecido)
### Estado MCPs (X/28 operacionais)
- **Erro:** `No connection established for request ID: 0`
- **Fix:** preload script em `/opt/mcp-gateway/supergateway-catch-errors.mjs`
- **Activacao:** `Environment="NODE_OPTIONS=--import /opt/mcp-gateway/supergateway-catch-errors.mjs"` no unit file
- **15 services patchados:** todos os supergateway na tabela acima
- **Ao adicionar novo supergateway:** OBRIGATORIO adicionar esta linha ao unit file
| # | MCP | Porta | Gestor | Estado | RAM | Notas |
|---|-----|-------|--------|--------|-----|-------|
| 1 | mcp-desk-crm | 3150 | pm2 | OK/DOWN/WARN | XXmb | restarts, erros |
| ... | ... | ... | ... | ... | ... | ... |
### FastMCP Python + nginx (DNS rebinding)
### Alertas
- [P1] MCP X esta DOWN — accao sugerida
- [WARN] MCP Y tem N restarts nas ultimas Xh
- [WARN] RAM total MCPs > 2GB (limite recomendado)
### Erros Recentes
[lista de erros se existirem, agrupados por MCP]
FastMCP 1.26+ bloqueia Host headers externos. No nginx:
```nginx
proxy_set_header Host "127.0.0.1:<PORTA>"; # CORRECTO
# proxy_set_header Host $host; # ERRADO — FastMCP bloqueia
```
**Detectar:** Se MCP retorna `Invalid Host header` via HTTPS mas funciona em `curl localhost` -> e este problema.
**Criterios de estado:**
- **OK** — service running + porta a escutar + sem erros recentes
- **WARN** — running mas com erros recentes OU >5 restarts OU memoria >250MB
- **DOWN** — service parado OU porta nao escuta
### 403 Forbidden
IP nao esta na whitelist nginx. Verificar: `curl -4 ifconfig.me`
IP autorizado: `188.251.199.30`. Actualizar em `/etc/nginx/sites-enabled/` se mudou.
### RAM total > 2GB
1. Identificar top consumers: `ps aux --sort=-%mem | head -20 | grep -E 'node|python|supergateway'`
2. Processos orphan: `ps aux | grep -c supergateway`
3. Se orphans > 24: `pkill -f supergateway` e restart escalonado
---
## Troubleshooting Automatico
## Adicionar novo MCP ao gateway
```
Se MCP DOWN:
1. Verificar service: systemctl status <nome>
2. Ver logs: journalctl -u <nome> --since "1h ago" --no-pager | tail -20
3. Se supergateway: verificar preload catch-errors.mjs (mcp-gateway.md)
4. Tentar restart: systemctl restart <nome>
5. Re-verificar porta
Se RAM total > 2GB:
1. Identificar top consumers
2. Verificar processos orphan: ps aux | grep -c supergateway
3. Se orphans > 28: limpar com pkill e restart escalonado (infra.md)
Se muitos restarts pm2:
1. pm2 logs <nome> --err --lines 20
2. Verificar se e o bug conhecido do supergateway (mcp-gateway.md)
```
1. Instalar em `/opt/mcp-gateway/<nome>/` (Node) ou `/opt/mcp-<nome>/` (Python)
2. Porta: proxima livre (actualmente **3188**)
3. Criar unit file systemd (se supergateway: incluir preload catch-errors)
4. Criar bloco nginx (se FastMCP Python: Host header fix obrigatorio)
5. `systemctl daemon-reload && systemctl enable --now <nome>.service`
6. `nginx -t && systemctl reload nginx`
7. Testar internamente e externamente (ver comandos curl acima)
8. Adicionar a `~/.claude.json`: `{"type":"http","url":"https://gateway.descomplicar.pt/v1/<nome>/mcp"}`
9. Actualizar esta skill (mapa de portas + proxima porta livre)
10. Actualizar memory `mcp-gateway.md`
---
## Anti-Patterns
- **Nunca** fazer restart massivo sem verificar primeiro (pode causar downtime)
- **Nunca** restart massivo sem verificar primeiro
- **Nunca** ignorar MCP P1 em estado DOWN
- **Sempre** reportar estado mesmo que tudo esteja OK (confirma que o check correu)
- **Sempre** incluir timestamp via mcp-time no output
- **Nunca** confundir gateway (VM 103) com dev/server/easy
- **Sempre** reportar estado mesmo que tudo OK
- **Sempre** testar endpoint apos restart
- **Sempre** actualizar mapa de portas ao adicionar/remover MCPs
---
## Integracao
- **/today** pode invocar `/gateway-check` como parte do checkup diario
- **/infra-check** faz verificacao mais ampla (inclui despesas); `/gateway-check` e focado apenas nos MCPs gateway
- Resultado pode ser publicado na discussao #31 (Logs) do projecto #65
*Skill v2.0.0 | 28-03-2026 | Descomplicar*
---
*Skill v1.0.0 | 12-03-2026 | Descomplicar*
## Healing Log
Registo de erros conhecidos e como evitá-los. Lido automaticamente antes de executar.
```jsonl
{"date":"","issue":"","fix":"","source":"user|auto"}
```
*Adicionar nova linha após cada erro corrigido.*