完善外部生成Worker动态扩缩容

新增外部生成controller进程角色与systemd服务

补齐队列统计procedure与spacetime-client绑定

更新生产部署脚本、健康巡检和server provision的worker/controller口径

新增容器worker smoke脚本并同步运维文档与团队记忆
This commit is contained in:
2026-06-12 15:21:35 +08:00
parent 69815d918a
commit 4a6c126366
30 changed files with 2030 additions and 28 deletions

View File

@@ -97,6 +97,64 @@ npm run container:up -- --scale external-generation-worker=1 external-generation
动态扩缩容验证必须保持 `GENARRATIVE_EXTERNAL_GENERATION_MODE=queue``inline` 模式下生成请求由 `api-server` 同步执行,不会被这些 worker 实例消费。
### 外部生成 Worker 隔离 Smoke
如果只想在本机隔离验证 worker 模式,不复用 `deploy/container/api-server.env`,使用专用脚本:
```bash
npm run container:worker-smoke -- smoke
```
该脚本会生成 gitignored 的 `deploy/container/worker-smoke/api-server.env` 与端口 state使用独立 compose project、独立 SpacetimeDB 数据卷和独立 host 端口,完成 `build -> up-spacetime -> publish -> up -> enqueue -> api-update -> enqueue`。测试 job 使用 `worker_smoke_unsupported` 类型,不访问真实 VectorEngine、LLM 或 OSS预期结果是 worker 领取队列任务后按“不支持的任务类型”执行失败分支,从而验证队列 claim、lease、失败回写路径和 API / worker 进程隔离。`external_generation_job` 是 private table脚本通过 worker 日志里的 job_id 和 unsupported 记录确认消费,不通过 CLI SQL 绕过权限。`smoke` 默认只启动 `api-server``external-generation-worker`,避免无关前端 / Nginx 镜像构建;需要同时验证 Nginx 时可分步执行 `up --with-nginx`
分步排查时可执行:
```bash
npm run container:worker-smoke -- init --force
npm run container:worker-smoke -- build
npm run container:worker-smoke -- up-spacetime
npm run container:worker-smoke -- publish
npm run container:worker-smoke -- up
npm run container:worker-smoke -- enqueue before-update
npm run container:worker-smoke -- api-update
npm run container:worker-smoke -- enqueue after-update
npm run container:worker-smoke -- status
```
如果隔离端口或库数据需要重置:
```bash
npm run container:worker-smoke -- smoke --force
```
`container:worker-smoke` 默认会把本机 `spacetime` 2.4.1 CLI 打成轻量 SpacetimeDB 镜像,避免首次 smoke 必须拉取官方大镜像;普通 `npm run container:*` 压测仍默认使用 `clockworklabs/spacetime:v2.4.1`。如果 Docker build 阶段在容器内拉取 crates.io 依赖不稳定,可让容器内 Cargo 复用本机 Cargo 缓存构建当前二进制,再打入临时 smoke 镜像。该模式默认使用 `rust:1.93-bookworm` 作为 builder、Debian bookworm smoke runtime 承载构建产物;需要换 builder 镜像时设置 `GENARRATIVE_WORKER_SMOKE_CARGO_IMAGE`,需要换运行时基础镜像时设置 `GENARRATIVE_WORKER_SMOKE_LOCAL_BASE_IMAGE`
```bash
npm run container:worker-smoke -- smoke --local-binary
```
`api-update` 只会 `--force-recreate api-server`,并校验 `external-generation-worker` 容器 ID 不变;如要同时重建 API 镜像,使用:
```bash
npm run container:worker-smoke -- api-update --build
```
验证 worker 动态扩缩容:
```bash
npm run container:worker-smoke -- scale 3
npm run container:worker-smoke -- ps
npm run container:worker-smoke -- enqueue scaled-workers
npm run container:worker-smoke -- scale 1
```
查看或清理隔离环境:
```bash
npm run container:worker-smoke -- logs external-generation-worker
npm run container:worker-smoke -- down -v
```
停止:
```bash

View File

@@ -2,7 +2,7 @@ name: genarrative-container-loadtest
services:
spacetimedb:
image: clockworklabs/spacetime:v2.4.1
image: ${GENARRATIVE_CONTAINER_SPACETIME_IMAGE:-clockworklabs/spacetime:v2.4.1}
user: root
command:
[
@@ -44,7 +44,7 @@ services:
cpus: "2.0"
mem_limit: 1g
env_file:
- ./api-server.env
- ${GENARRATIVE_CONTAINER_API_ENV_FILE:-./api-server.env}
environment:
GENARRATIVE_API_HOST: 0.0.0.0
GENARRATIVE_API_PORT: 8082
@@ -77,7 +77,7 @@ services:
cpus: "2.0"
mem_limit: 1g
env_file:
- ./api-server.env
- ${GENARRATIVE_CONTAINER_API_ENV_FILE:-./api-server.env}
environment:
GENARRATIVE_PROCESS_ROLE: external-generation-worker
GENARRATIVE_TRACKING_OUTBOX_DIR: /var/lib/genarrative/tracking-outbox-worker