# Workflow de Sincronização - /kb-sync

Documentação do processo de sincronização automática do índice de conhecimento.

---

## Visão Geral

```
┌─────────────────┐
│   /kb-sync      │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Listar Dify    │ ← mcp__dify-kb__dify_kb_list_datasets
│    Datasets     │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Ler sources    │ ← config/sources.json
│     .json       │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│   Comparar e    │
│    Detectar     │
│   Alterações    │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│   Actualizar    │
│  sources.json   │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│   Reportar      │
│   Alterações    │
└─────────────────┘
```

---

## Processo Detalhado

### 1. Obter Datasets Actuais

```
MCP: mcp__dify-kb__dify_kb_list_datasets
Params:
  - limit: 100
  - page: 1 (iterar se has_more)
```

**Resposta esperada:**
```json
{
  "success": true,
  "datasets": [
    {
      "id": "uuid",
      "name": "Nome do Dataset",
      "description": "...",
      "document_count": 123,
      "word_count": 456789,
      "created_at": "2025-01-25T..."
    }
  ],
  "pagination": {
    "total": 73,
    "has_more": false
  }
}
```

### 2. Ler Índice Actual

Ler `config/sources.json` e extrair:
- `dify_datasets` - mapeamento actual
- `last_sync` - última sincronização
- `total_datasets` - contagem anterior

### 3. Comparar e Detectar

#### Datasets Novos
```python
novos = []
for dataset in dify_response:
    slug = slugify(dataset.name)
    if slug not in sources.dify_datasets:
        novos.append({
            "slug": slug,
            "id": dataset.id,
            "name": dataset.name,
            "docs": dataset.document_count
        })
```

#### Datasets Removidos
```python
removidos = []
dify_ids = [d.id for d in dify_response]
for slug, info in sources.dify_datasets.items():
    if info.id not in dify_ids:
        removidos.append(slug)
```

#### Datasets Alterados
```python
alterados = []
for dataset in dify_response:
    slug = slugify(dataset.name)
    if slug in sources.dify_datasets:
        old = sources.dify_datasets[slug]
        if old.docs != dataset.document_count:
            alterados.append({
                "slug": slug,
                "old_docs": old.docs,
                "new_docs": dataset.document_count
            })
```

### 4. Actualizar sources.json

```python
# Actualizar metadados
sources.version = increment_minor(sources.version)
sources.updated = now()
sources.last_sync = now()
sources.total_datasets = len(dify_response)

# Remover datasets eliminados
for slug in removidos:
    del sources.dify_datasets[slug]

# Actualizar existentes e adicionar novos
for dataset in dify_response:
    slug = slugify(dataset.name)
    sources.dify_datasets[slug] = {
        "id": dataset.id,
        "name": dataset.name,
        "docs": dataset.document_count
    }

# Escrever ficheiro
write_json("config/sources.json", sources)
```

### 5. Gerar Relatório

```markdown
## Sync Concluído - 2026-01-25 15:30

| Métrica | Valor |
|---------|-------|
| **Total Datasets** | 73 |
| **Anterior** | 74 |
| **Novos** | 0 |
| **Removidos** | 1 |
| **Actualizados** | 5 |

### Datasets Removidos
- Demo Teste Agosto 2025

### Datasets Actualizados
| Dataset | Docs Antes | Docs Depois |
|---------|------------|-------------|
| marketing-digital | 81 | 85 |
| crocoblock-kb | 403 | 410 |

### Datasets Novos
(nenhum)
```

---

## Slugify

Função para converter nome do dataset em slug consistente:

```javascript
function slugify(name) {
  return name
    .toLowerCase()
    .normalize('NFD')
    .replace(/[\u0300-\u036f]/g, '')  // Remove acentos
    .replace(/[^a-z0-9]+/g, '-')       // Substitui não-alfanum
    .replace(/^-+|-+$/g, '');          // Remove hífens início/fim
}
```

**Exemplos:**
- "Marketing Digital" → "marketing-digital"
- "SEO (Search Engine Optimization)" → "seo-search-engine-optimization"
- "TI (Tecnologia da Informação)" → "ti-tecnologia-da-informacao"

---

## Frequência Recomendada

| Cenário | Frequência |
|---------|------------|
| Uso normal | Semanal |
| Após criar datasets | Imediato |
| Após remover datasets | Imediato |
| Debugging | Conforme necessário |

---

## Erros Comuns

| Erro | Causa | Solução |
|------|-------|---------|
| MCP timeout | Muitos datasets | Paginar requests |
| JSON inválido | Corrupção | Restaurar de backup |
| ID não encontrado | Dataset removido | Executar /kb-sync |
| Duplicados | Nomes similares | Verificar slugs manualmente |

---

## Backup Automático

Antes de cada sync, criar backup:
```
config/sources.json.bak
```

Restaurar se necessário:
```bash
cp config/sources.json.bak config/sources.json
```

---

*Workflow v1.0 | 2026-01-25*