npm.io
3.0.41 • Published 3d ago

@skrillex1224/playwright-toolkit

Licence
ISC
Version
3.0.41
Deps
12
Size
2.6 MB
Vulns
0
Weekly
906
Install scriptsThis package runs scripts during installation (preinstall/install/postinstall)

Playwright Toolkit

面向 Apify/Crawlee Actor 开发者的实用工具库,提供反检测、拟人化操作、实时截图等功能。

安装

npm install @skrillex1224/playwright-toolkit

# 反检测所需的依赖
npm install playwright ghost-cursor-playwright

快速开始

import { Actor } from 'apify';
import { PlaywrightCrawler } from 'crawlee';
import { chromium } from 'playwright';

import { usePlaywrightToolKit } from '@skrillex1224/playwright-toolkit';

await Actor.init();

// 初始化工具箱
const { ApifyKit: KitHook, Launch, AntiCheat, DeviceInput, Humanize, Captcha, LiveView, Constants } = usePlaywrightToolKit();

// ⚠️ ApifyKit 需要异步初始化
const ApifyKit = await KitHook.useApifyKit();

// LiveView
const { startLiveViewServer, takeLiveScreenshot } = LiveView.useLiveView();

const crawlerOptions = Launch.getPlaywrightCrawlerOptions({
    runInHeadfulMode: false,
    isRunningAtHome: ApifyKit.isAtHome(),
    launcher: chromium,
});

const crawler = new PlaywrightCrawler({
    ...crawlerOptions,
    preNavigationHooks: [
        ...crawlerOptions.preNavigationHooks,
        async ({ page }) => {
            // 验证码监控
            Captcha.useCaptchaMonitor(page, {
                domSelector: '#captcha_container',
                onDetected: async () => { /* 处理验证码 */ }
            });
        }
    ],
    requestHandler: async ({ page }) => {
        // 初始化 Cursor
        await Humanize.initializeCursor(page);
        
        // 页面预热 (模拟人类浏览)
        await Humanize.warmUpBrowsing(page, 3000);
        
        // 执行步骤 (失败时自动截图并调用 Actor.fail)
        await ApifyKit.runStep('输入搜索', page, async () => {
            await Humanize.humanType(page, 'input', '搜索内容');
            await Humanize.humanClick(page, '#submit-btn');
        });
        
        // 推送成功数据
        await ApifyKit.pushSuccess({ result: 'data' });
    }
});

await startLiveViewServer();
await crawler.run(['https://example.com']);
await Actor.exit();

反检测功能

Desktop / Mobile Device

Constants.ActorInfo[actor].device 是 visitor 的设备真源,只允许 desktopmobile,未配置时默认 desktop。visitor 继续通过 RuntimeEnv.parseInput(input, actor.key)Launch.getPlaywrightCrawlerOptions({ runtimeState }) 传递状态;把某个 actor 的 device 改成 mobile 后,toolkit 会生成 Android Chrome 移动端指纹、移动端 viewport/touch context,并切到移动端 Humanize 行为。

已有 browser_profile.core 会记录 device。当旧 core 的 device 与当前 ActorInfo 不一致时,Launch 会重建 core,避免配置看起来是移动端但实际仍复用桌面端指纹。

架构
层次 问题 解决方案
指纹层 UA/屏幕/语言/时区一致性 Crawlee useFingerprints + AntiCheat
行为层 机械输入/点击/滚动 桌面端 ghost-cursor-playwright,移动端 touch Humanize
页面层 验证码/风控检测 Captcha 监控器
API 一览
模块 方法 说明
Launch getPlaywrightCrawlerOptions(options) 一次性返回 PlaywrightCrawler 所需公共配置(超时/指纹/代理/导航 hook)
AntiCheat applyPage(page, options?) 应用时区/语言/权限/视口
AntiCheat applyContext(context, options?) 仅应用 Context 设置
AntiCheat syncViewportWithScreen(page) 同步视口与屏幕
AntiCheat getTlsFingerprintOptions(userAgent?) got-scraping TLS 指纹
Humanize initializeCursor(page) 初始化 Cursor (必须先调用)
Humanize jitterMs(base, jitterPercent?) 生成带抖动的毫秒数 (同步,返回 number)
Humanize humanType(page, selector, text, options?) 人类化输入 (baseDelay=180ms ±40%)
Humanize humanPress(page, target?, key, options?) 人类化按键,支持当前焦点或先聚焦目标后按键
Humanize humanClick(page, selector, options?) 人类化点击 (reactionDelay=250ms ±40%)
Humanize warmUpBrowsing(page, baseDuration?) 页面预热 (3500ms ±40%)
Humanize simulateGaze(page, baseDurationMs?) 模拟注视 (2500ms ±40%)
Humanize randomSleep(baseMs, jitterPercent?) 随机延迟 (±30% 抖动)
DeviceInput click/clickPoint/move/focus/fill/type/press/drag(...) 机械输入适配层;根据 page runtime device 自动选择 desktop mouse/click 或 mobile touch/tap
Captcha useCaptchaMonitor(page, options) 验证码监控

模块详解

DeviceInput

DeviceInput 是非拟人化机械操作的统一入口。visitor 和 toolkit 内部如果需要执行“机械点击 / 坐标点击 / focus / fill / type / press / drag”,应调用 DeviceInput,不要直接调用 page.mousepage.touchscreenlocator.click() 这类 Playwright 原生输入 API。

它会读取 RuntimeEnv.applyToPage() 写入 page 的 device 状态:desktop 下使用 mouse/click 语义,mobile 下点击和拖拽优先使用 touch/tap 语义。focus/fill/type/press 这类 Playwright 本身跨设备一致的操作会保持原生语义。

更多规则见 DEVICE_INPUT.md

ApifyKit

需要异步初始化

const { ApifyKit: KitHook } = usePlaywrightToolKit();
const ApifyKit = await KitHook.useApifyKit();

// 执行步骤 (失败时自动截图 + 推送 Dataset + 调用 Actor.fail)
await ApifyKit.runStep('步骤名', page, async () => {
    // 你的逻辑
});

// 宽松版 (失败时只抛出异常,不调用 Actor.fail)
await ApifyKit.runStepLoose('步骤名', page, async () => {
    // 你的逻辑
});

// 推送成功数据 (data 字段会被包装)
await ApifyKit.pushSuccess({ key: 'value' });
// 输出: { code: 0, status: 'SUCCESS', timestamp: '...', data: { key: 'value' } }
CrawlerError

自定义错误类,可携带 codecontext,在 pushFailed 时自动解析:

const { Errors, Constants } = usePlaywrightToolKit();
const { CrawlerError } = Errors;
const { ErrorKeygen } = Constants;

// 简单用法 (只有 message)
throw new CrawlerError('未捕获 Feed 接口响应');

// 完整用法 (带 code 和 context)
throw new CrawlerError({
    message: '登录失败',
    code: ErrorKeygen.NotLogin,  // 会作为 pushFailed 的 code 字段
    context: { url: currentUrl, userId: '123' }
});

// 从普通 Error 转换
throw CrawlerError.from(originalError, {
    code: ErrorKeygen.Chaptcha,
    context: { step: '验证码检测' }
});

// pushFailed 输出:
// { code: 30000001, status: 'FAILED', error: {...}, context: {...}, meta: {...}, ... }
LiveView
const { LiveView } = usePlaywrightToolKit();
const { startLiveViewServer, takeLiveScreenshot } = LiveView.useLiveView();

await startLiveViewServer();
await takeLiveScreenshot(page, '当前状态');
Captcha
const { Captcha } = usePlaywrightToolKit();

// DOM 监控模式
Captcha.useCaptchaMonitor(page, {
    domSelector: '#captcha_container',
    onDetected: async () => { await Actor.fail('检测到验证码'); }
});

// URL 监控模式
Captcha.useCaptchaMonitor(page, {
    urlPattern: '/captcha',
    onDetected: async () => { await Actor.fail('检测到验证码'); }
});
Constants
const { Constants } = usePlaywrightToolKit();
const { ErrorKeygen, Status, StatusCode } = Constants;

// ErrorKeygen: { NotLogin: 30000001, Chaptcha: 30000002 }
// Status: { Success: 'SUCCESS', Failed: 'FAILED' }
// StatusCode: { Success: 0, Failed: -1 }
Utils
const { Utils } = usePlaywrightToolKit();

// 解析 SSE 流文本
const events = Utils.parseSseStream(sseText);

// 解析 Cookie 字符串
const cookies = Utils.parseCookies('key=value; key2=value2', '.example.com');
await page.context().addCookies(cookies);

// 全页面滚动截图 (自动检测所有滚动元素,强制展开后截图,默认会执行 watermarkify)
// 默认会将返回的 base64 压缩到 5MiB 以内,避免 Apify/Crawlee dataset 单条 item 超限
const base64Image = await Share.captureScreen(page);

// 截图只使用当前页面运行时 viewport;移动端请通过 ActorInfo.device 切换,不再通过截图参数覆盖
// 仅在完成后恢复页面高度和展开过的滚动容器
const image2 = await Share.captureScreen(page, { restore: true });

// 显式配置 watermarkify:全页淡水印 + 底部一行细条
const image3 = await Share.captureScreen(page, {
  watermarkify: {
    query: '你好',
    timezoneOffsetHours: 8,
    // 默认会在同一浏览器上下文里访问 https://myip.ipip.net/json 补充 IP / Loc,超时默认 10000ms
  },
});

// 可选:关闭内置 IP 查询,或用你自己的 resolver 覆盖 / 补充
const image4 = await Share.captureScreen(page, {
  watermarkify: {
    query: '你好',
    ipLookup: false,
    resolverTimeoutMs: 180,
    resolver: async ({ signal }) => {
      const resp = await fetch('https://example.com/ip', { signal });
      const data = await resp.json();
      return { ip: data.ip, location: data.city };
    },
    watermark: { opacity: 0.07, rotateDeg: -18 },
  },
});

// Prompt 字段默认取值链路:query -> prompt -> 页面 title
// 所以 runtime input 只传 query 就够了,底部 Prompt 展示的就是实际问 AI 的 query

// 如需原始截图,可显式关闭 watermarkify
const image5 = await Share.captureScreen(page, {
  watermarkify: false,
});
// 指定更小的返回体积。内部使用 Jimp 重编码:优先降 JPEG 质量,仍超限时再等比缩放。
const image6 = await Share.captureScreen(page, {
  maxBytes: 4 * 1024 * 1024,
  quality: 60,
  minQuality: 35,
  outputType: 'jpeg',
});
// 返回 base64 编码图片,默认输出会在超限时转为 JPEG 压缩

新增日志模板

日志模板统一维护在 playwright-toolkit/src/logger.jsLOG_DEFINITIONS

  1. LOG_DEFINITIONS 新增一项,至少包含:
  • key: 全局唯一标识(用于诊断匹配)
  • method: 暴露给 Logger.useTemplate() 的方法名
  • label/group/step/status/level/attention
  • 可选 buildDetails / throttleMs / throttleKey
  1. playwright-toolkit/index.d.tsLoggerTemplate 补充同名方法签名。

  2. 使用示例:

import { usePlaywrightToolKit } from '@skrillex1224/playwright-toolkit';

const { Logger } = usePlaywrightToolKit();
const templateLogger = Logger.useTemplate();
templateLogger.domChunk(120, 'preview text', true);

注意:

  • 模板会自动添加 [#log:<key>] 标签用于日志诊断。
  • 新增模板后需要重新构建/发布 toolkit。

License

ISC

Keywords