Files
scripts/scraper/test_playwright.py
T
ealmeida 8e0dbbeca0 feat(bizin): scraper final com bypass Cloudflare + monitor de auto-reinício
- bizin_scraper_final.py: scraper híbrido curl_cffi + undetected-chromedriver
  com suporte a distritos e categorias, escrita segura (fsync) e enriquecimento externo
- monitor_scraper.sh: watchdog que reinicia o processo automaticamente em crash
- IMPLEMENTADO.md + README.md: actualizados para reflectir estado Abril 2026
- GEMINI.md: instruções técnicas de automação
- test_curl.py, test_curl_clean.py, test_playwright.py: scripts de teste/diagnóstico

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-28 17:16:48 +01:00

44 lines
1.6 KiB
Python

import asyncio
from playwright.async_api import async_playwright
from playwright_stealth import Stealth
async def test_bizin():
async with async_playwright() as p:
# Tentar usar o Chrome do sistema
try:
browser = await p.chromium.launch(headless=True, channel="chrome")
except:
browser = await p.chromium.launch(headless=True)
context = await browser.new_context(
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
)
await Stealth().apply_stealth_async(context)
page = await context.new_page()
print("Acedendo a https://pt.bizin.eu/por/ ...")
try:
await page.goto("https://pt.bizin.eu/por/", wait_until="domcontentloaded", timeout=30000)
except Exception as e:
print(f"Timeout ou erro na carga inicial: {e}")
# Esperar um pouco para o desafio resolver
print("Aguardando 45 segundos por possíveis desafios...")
await asyncio.sleep(45)
content = await page.content()
if "Just a moment..." in content or "Um momento…" in content:
print("Bloqueado pelo Cloudflare.")
else:
print("Sucesso! Página carregada.")
print(f"Título: {await page.title()}")
# Salvar sucesso para conferir
with open("logs/success_playwright.html", "w", encoding="utf-8") as f:
f.write(content)
await browser.close()
if __name__ == "__main__":
asyncio.run(test_bizin())