补充 release SpacetimeDB 健康检查与巡检防回退
增加 SpacetimeDB 阶段化健康检查与 /readyz 阶段输出 记录 procedure/reducer/read 失败的阶段和耗时 补充 release 健康巡检 systemd timer 与生产 ops 预检 同步 API 构建部署、provision 脚本和运维文档
This commit is contained in:
@@ -15,6 +15,22 @@
|
|||||||
- 关联:相关文件、文档、提交或 Issue
|
- 关联:相关文件、文档、提交或 Issue
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## 生产冷备份后 API 不能只依赖 SpacetimeDB 自恢复
|
||||||
|
|
||||||
|
- 现象:release 机器 `03:20` 冷备份后,`spacetimedb.service` 已恢复,但作品列表、创作入口配置或公开 gallery 继续超时 / 502 / 504,`genarrative-api.service` 保持 stopped。
|
||||||
|
- 原因:`genarrative-api.service` 配置了 `Requires=spacetimedb.service`,冷备份停止 `spacetimedb.service` 时 API 会被 systemd 依赖关系一并停止;如果 `genarrative-database-backup.service` 只传 `--stop-service spacetimedb.service` 而漏掉 `--restart-service-after genarrative-api.service`,备份脚本只会恢复数据库,不会再拉起 API。
|
||||||
|
- 处理:生产冷备份 unit 和发布脚本必须带 `--restart-service-after genarrative-api.service`;仓库用 `npm run check:production-ops` 检查 systemd 模板、API build/deploy 归档和健康巡检链路。现场修复后执行 `systemctl daemon-reload`,但不要为了验证而手动触发冷备份。
|
||||||
|
- 验证:`systemctl cat genarrative-database-backup.service` 应包含该参数;`systemctl is-active spacetimedb.service genarrative-api.service nginx.service` 全为 `active`;`curl -fsS http://127.0.0.1:3101/v1/ping`、`/healthz`、`/readyz` 和代表性 `/api/runtime/puzzle/gallery` 均成功。
|
||||||
|
- 关联:`deploy/systemd/genarrative-database-backup.service`、`scripts/database-backup-to-oss.mjs`、`scripts/ops/production-health-patrol.mjs`、`docs/【开发运维】本地开发验证与生产运维-2026-05-15.md`。
|
||||||
|
|
||||||
|
## SpacetimeDB 45 秒超时要看 api-server 记录的阶段
|
||||||
|
|
||||||
|
- 现象:release 上 Nginx 能立刻连到 `api-server`,但 `/api/runtime/*/gallery`、`/api/creation-entry/config` 等请求在约 `GENARRATIVE_SPACETIME_PROCEDURE_TIMEOUT_SECONDS` 后返回 `502` / `504`。
|
||||||
|
- 原因:旧日志只能看到 HTTP 总耗时和最终状态,无法区分卡在连接池、SDK 建连、等待 `on_connect`、订阅 read model、等待 procedure / reducer 回调还是本地订阅 cache 读取。
|
||||||
|
- 处理:`spacetime-client` 内置阶段化健康检查和失败日志;`/readyz` 用 `GENARRATIVE_SPACETIME_HEALTH_CHECK_TIMEOUT_SECONDS` 短窗口检查 SpacetimeDB 连接租约,业务失败日志包含 `operation_kind`、`operation_name`、`spacetime_stage`、`elapsed_ms`。
|
||||||
|
- 验证:`/readyz` 失败时看 `details.spacetime.stage`;业务请求超时时查 `journalctl -u genarrative-api.service` 中同一时间窗口的 `SpacetimeDB client operation failed`,优先按 `pool_acquire`、`connect_build`、`connect_handshake`、`read_model_subscribe`、`procedure_result`、`reducer_result`、`read_cache` 分阶段处理。
|
||||||
|
- 关联:`server-rs/crates/spacetime-client/src/lib.rs`、`server-rs/crates/api-server/src/health.rs`、`docs/【开发运维】本地开发验证与生产运维-2026-05-15.md`。
|
||||||
|
|
||||||
## 新建草稿扣费不能和入口卡泥点配置分离
|
## 新建草稿扣费不能和入口卡泥点配置分离
|
||||||
|
|
||||||
- 现象:后台修改创作入口的 `mudPointCost` 后,入口卡和前置余额提示可能显示新数值,但用户真实钱包流水仍按代码常量扣除。
|
- 现象:后台修改创作入口的 `mudPointCost` 后,入口卡和前置余额提示可能显示新数值,但用户真实钱包流水仍按代码常量扣除。
|
||||||
|
|||||||
20
deploy/systemd/genarrative-health-patrol.service
Normal file
20
deploy/systemd/genarrative-health-patrol.service
Normal file
@@ -0,0 +1,20 @@
|
|||||||
|
[Unit]
|
||||||
|
Description=Genarrative Production Health Patrol
|
||||||
|
After=network-online.target genarrative-api.service spacetimedb.service nginx.service
|
||||||
|
Wants=network-online.target
|
||||||
|
ConditionPathExists=/opt/genarrative/current/scripts/ops/production-health-patrol.mjs
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=oneshot
|
||||||
|
User=root
|
||||||
|
Group=root
|
||||||
|
WorkingDirectory=/opt/genarrative/current
|
||||||
|
EnvironmentFile=-/etc/genarrative/health-patrol.env
|
||||||
|
ExecStart=/usr/bin/node /opt/genarrative/current/scripts/ops/production-health-patrol.mjs --status-file /var/lib/genarrative/health-patrol/status.json
|
||||||
|
TimeoutStartSec=30
|
||||||
|
|
||||||
|
# 巡检只读 systemd、HTTP 和 journal;只允许写入自己的最近一次状态文件。
|
||||||
|
NoNewPrivileges=true
|
||||||
|
PrivateTmp=true
|
||||||
|
ProtectSystem=full
|
||||||
|
ReadWritePaths=/var/lib/genarrative/health-patrol
|
||||||
13
deploy/systemd/genarrative-health-patrol.timer
Normal file
13
deploy/systemd/genarrative-health-patrol.timer
Normal file
@@ -0,0 +1,13 @@
|
|||||||
|
[Unit]
|
||||||
|
Description=Run Genarrative Production Health Patrol
|
||||||
|
|
||||||
|
[Timer]
|
||||||
|
OnBootSec=2min
|
||||||
|
OnCalendar=*-*-* *:0/5:00
|
||||||
|
Persistent=true
|
||||||
|
RandomizedDelaySec=30
|
||||||
|
AccuracySec=30s
|
||||||
|
Unit=genarrative-health-patrol.service
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=timers.target
|
||||||
@@ -96,6 +96,7 @@ npm run admin-web:typecheck
|
|||||||
```bash
|
```bash
|
||||||
npm run check:encoding
|
npm run check:encoding
|
||||||
npm run check:spacetime-schema
|
npm run check:spacetime-schema
|
||||||
|
npm run check:production-ops
|
||||||
npm run check:server-rs-ddd
|
npm run check:server-rs-ddd
|
||||||
npm run lint:eslint
|
npm run lint:eslint
|
||||||
npm run typecheck
|
npm run typecheck
|
||||||
@@ -217,7 +218,7 @@ UI 相关修改要重点验证:
|
|||||||
npm run database:backup:oss -- --data-dir /stdb --stop-service spacetimedb.service --restart-service-after genarrative-api.service
|
npm run database:backup:oss -- --data-dir /stdb --stop-service spacetimedb.service --restart-service-after genarrative-api.service
|
||||||
```
|
```
|
||||||
|
|
||||||
脚本会将数据目录打包成 `tar.gz`,上传到 `oss://<bucket>/<prefix>/<database>/<database>-<UTC时间>.tar.gz`。生产建议做冷备份:传入 `--stop-service spacetimedb.service`,脚本会在打包前停止服务、打包后恢复服务,再上传 OSS;因 `genarrative-api.service` 依赖 `spacetimedb.service`,生产定时冷备份还必须传入 `--restart-service-after genarrative-api.service`,确保备份后 API 随数据库一起恢复。由于 OSS 上传可能受服务器带宽限制,`Genarrative-Stdb-Module-Publish` 默认使用 `DATABASE_BACKUP_MODE=async`:先在 publish 前用 `--defer-upload` 生成本地冷备份和 `.manifest.json`,随后继续执行 publish;发布脚本退出前会用后台 `node ... --upload-archive <tar.gz>` 上传同一份发布前备份,不等待上传完成。发布脚本在校验 wasm 后、执行 `spacetime publish` 前会等待显式 `SPACETIME_SERVER_URL` 的 `/v1/ping` 就绪,默认最多等待 `60` 秒;如生产机器冷备份恢复 `spacetimedb.service` 较慢,可临时设置 `GENARRATIVE_STDB_PUBLISH_READY_TIMEOUT_SECONDS` 调整等待时间。需要强一致发布闸门时改用 `DATABASE_BACKUP_MODE=sync`(等价脚本参数 `--backup-mode sync`),备份会在 publish 前同步打包并上传,失败会阻断 publish;确认已有其他备份窗口时才使用 `DATABASE_BACKUP_MODE=skip`(兼容脚本参数 `--skip-backup`)。若业务不能接受停机窗口,应先规划 SpacetimeDB 原生快照或主备策略,不要直接在写入中的数据目录上做热拷贝并当作强一致备份。
|
脚本会将数据目录打包成 `tar.gz`,上传到 `oss://<bucket>/<prefix>/<database>/<database>-<UTC时间>.tar.gz`。生产建议做冷备份:传入 `--stop-service spacetimedb.service`,脚本会在打包前停止服务、打包后恢复服务,再上传 OSS;因 `genarrative-api.service` 依赖 `spacetimedb.service`,生产定时冷备份还必须传入 `--restart-service-after genarrative-api.service`,确保备份后 API 随数据库一起恢复。`2026-06-10` release 故障就是现场 unit 漏掉该参数,`03:20` 冷备份停止 SpacetimeDB 后 API 被依赖关系一并停止,备份脚本只恢复了 SpacetimeDB,API 直到人工重启前都不可用;后续现场变更、provision 模板和 Jenkins 归档都必须通过 `npm run check:production-ops` 防止回退。由于 OSS 上传可能受服务器带宽限制,`Genarrative-Stdb-Module-Publish` 默认使用 `DATABASE_BACKUP_MODE=async`:先在 publish 前用 `--defer-upload` 生成本地冷备份和 `.manifest.json`,随后继续执行 publish;发布脚本退出前会用后台 `node ... --upload-archive <tar.gz>` 上传同一份发布前备份,不等待上传完成。发布脚本在校验 wasm 后、执行 `spacetime publish` 前会等待显式 `SPACETIME_SERVER_URL` 的 `/v1/ping` 就绪,默认最多等待 `60` 秒;如生产机器冷备份恢复 `spacetimedb.service` 较慢,可临时设置 `GENARRATIVE_STDB_PUBLISH_READY_TIMEOUT_SECONDS` 调整等待时间。需要强一致发布闸门时改用 `DATABASE_BACKUP_MODE=sync`(等价脚本参数 `--backup-mode sync`),备份会在 publish 前同步打包并上传,失败会阻断 publish;确认已有其他备份窗口时才使用 `DATABASE_BACKUP_MODE=skip`(兼容脚本参数 `--skip-backup`)。若业务不能接受停机窗口,应先规划 SpacetimeDB 原生快照或主备策略,不要直接在写入中的数据目录上做热拷贝并当作强一致备份。
|
||||||
|
|
||||||
生产环境变量模板在 `deploy/env/api-server.env.example`:
|
生产环境变量模板在 `deploy/env/api-server.env.example`:
|
||||||
|
|
||||||
@@ -234,6 +235,17 @@ GENARRATIVE_DATABASE_BACKUP_OSS_ACCESS_KEY_SECRET=
|
|||||||
|
|
||||||
`GENARRATIVE_DATABASE_BACKUP_OSS_BUCKET` 为空时会回退 `ALIYUN_OSS_BUCKET`;AccessKey 默认复用 `ALIYUN_OSS_ACCESS_KEY_ID` / `ALIYUN_OSS_ACCESS_KEY_SECRET`,也可用 `GENARRATIVE_DATABASE_BACKUP_OSS_ACCESS_KEY_ID` / `GENARRATIVE_DATABASE_BACKUP_OSS_ACCESS_KEY_SECRET` 为备份 bucket 单独配置最小权限账号。`Genarrative-Server-Provision` 会创建 `/var/lib/genarrative/database-backups` 并归属 `genarrative:genarrative`,同时安装并启用 `genarrative-database-backup.timer`。手动检查定时器:`systemctl list-timers genarrative-database-backup.timer`;手动触发一次:`systemctl start genarrative-database-backup.service`。
|
`GENARRATIVE_DATABASE_BACKUP_OSS_BUCKET` 为空时会回退 `ALIYUN_OSS_BUCKET`;AccessKey 默认复用 `ALIYUN_OSS_ACCESS_KEY_ID` / `ALIYUN_OSS_ACCESS_KEY_SECRET`,也可用 `GENARRATIVE_DATABASE_BACKUP_OSS_ACCESS_KEY_ID` / `GENARRATIVE_DATABASE_BACKUP_OSS_ACCESS_KEY_SECRET` 为备份 bucket 单独配置最小权限账号。`Genarrative-Server-Provision` 会创建 `/var/lib/genarrative/database-backups` 并归属 `genarrative:genarrative`,同时安装并启用 `genarrative-database-backup.timer`。手动检查定时器:`systemctl list-timers genarrative-database-backup.timer`;手动触发一次:`systemctl start genarrative-database-backup.service`。
|
||||||
|
|
||||||
|
冷备份后必须做一次只读验收,不要只看 `genarrative-database-backup.service` 是否成功退出:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
systemctl is-active spacetimedb.service genarrative-api.service nginx.service
|
||||||
|
curl -fsS --max-time 5 http://127.0.0.1:3101/v1/ping
|
||||||
|
curl -fsS --max-time 5 http://127.0.0.1:8082/healthz
|
||||||
|
curl -fsS --max-time 5 http://127.0.0.1:8082/readyz
|
||||||
|
curl -fsS --max-time 5 http://127.0.0.1/api/creation-entry/config >/dev/null
|
||||||
|
curl -fsS --max-time 5 http://127.0.0.1/api/runtime/puzzle/gallery >/dev/null
|
||||||
|
```
|
||||||
|
|
||||||
## 生产运维
|
## 生产运维
|
||||||
|
|
||||||
生产部署当前口径:
|
生产部署当前口径:
|
||||||
@@ -244,11 +256,32 @@ Nginx 负责站点和反向代理
|
|||||||
Jenkins 按 web / api / Spacetime module / build / deploy / publish 拆分
|
Jenkins 按 web / api / Spacetime module / build / deploy / publish 拆分
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### 生产健康巡检
|
||||||
|
|
||||||
|
`Genarrative-Server-Provision` 会安装并启用 `genarrative-health-patrol.timer`,默认每 5 分钟运行一次 `genarrative-health-patrol.service`。巡检脚本随 API release 归档到 `/opt/genarrative/current/scripts/ops/production-health-patrol.mjs`,只读检查:
|
||||||
|
|
||||||
|
- `genarrative-api.service`、`spacetimedb.service`、`nginx.service` 是否 active。
|
||||||
|
- API 直连 `/healthz`、`/readyz`。
|
||||||
|
- SpacetimeDB 直连 `/v1/ping`。
|
||||||
|
- 默认直连 API 端口检查 `/api/creation-entry/config`、`/api/runtime/puzzle/gallery`、`/api/runtime/custom-world-gallery`;如需走 Nginx / 公网域名,在 `/etc/genarrative/health-patrol.env` 配置 `GENARRATIVE_HEALTH_PATROL_PUBLIC_BASE_URL=https://<域名>`。
|
||||||
|
- 最近 15 分钟 `genarrative-api.service`、`spacetimedb.service`、`nginx.service` 的 `err..alert` 日志。
|
||||||
|
|
||||||
|
巡检输出总状态 `OK / WARNING / CRITICAL`;只有 `CRITICAL` 默认让 systemd service 失败,`WARNING` 只写日志和状态文件,避免历史日志噪声把 timer 长期打成失败。最近一次结果写入 `/var/lib/genarrative/health-patrol/status.json`。手动执行:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
systemctl start genarrative-health-patrol.service
|
||||||
|
systemctl status genarrative-health-patrol.service --no-pager
|
||||||
|
journalctl -u genarrative-health-patrol.service -n 80 --no-pager
|
||||||
|
cat /var/lib/genarrative/health-patrol/status.json
|
||||||
|
```
|
||||||
|
|
||||||
|
如需接外部告警,可在 `/etc/genarrative/health-patrol.env` 配置 `GENARRATIVE_HEALTH_PATROL_WEBHOOK_URL`;脚本只会在 `WARNING` 或 `CRITICAL` 时向该 webhook 发送 JSON。未配置 webhook 时,告警来源是 systemd 失败状态、journal 和状态文件。
|
||||||
|
|
||||||
`Genarrative-Web-Build` 的主站构建失败若出现 Rollup 报错 `"xxx" is not exported by "src/services/publicWorkCode.ts"`,优先按前端公开作品号工具缺失处理,而不是排查 Jenkins 节点环境。修复时要让 `publicWorkCode.ts` 的 `build<Play>PublicWorkCode` 与 `isSame<Play>PublicWorkCode` 成对导出,并补 `src/services/publicWorkCode.test.ts` 覆盖对应玩法前缀;随后用 `npm run build:production-release -- --component web --name <临时名>` 复现 Jenkins web 构建路径。
|
`Genarrative-Web-Build` 的主站构建失败若出现 Rollup 报错 `"xxx" is not exported by "src/services/publicWorkCode.ts"`,优先按前端公开作品号工具缺失处理,而不是排查 Jenkins 节点环境。修复时要让 `publicWorkCode.ts` 的 `build<Play>PublicWorkCode` 与 `isSame<Play>PublicWorkCode` 成对导出,并补 `src/services/publicWorkCode.test.ts` 覆盖对应玩法前缀;随后用 `npm run build:production-release -- --component web --name <临时名>` 复现 Jenkins web 构建路径。
|
||||||
|
|
||||||
`Genarrative-Web-Build` 会把 `build/<version>/web.tar.gz`、`web.tar.gz.sha256`、`release-manifest.json` 和 `scripts/deploy/production-web-deploy.sh` 直接归档为 Jenkins 构建产物;`Genarrative-Web-Deploy` 只通过 `copyArtifacts` 从指定上游构建复制这些产物和部署脚本,不再在目标机器 checkout Git,再执行随构建归档的 `scripts/deploy/production-web-deploy.sh`。Web 发布不再读取构建机本地缓存目录,也不再通过 release agent `rsync` 回构建机拉取大包;如果 deploy 找不到 `web.tar.gz`,应先检查上游 Web Build 是否按同一 `BUILD_VERSION` 成功归档产物。
|
`Genarrative-Web-Build` 会把 `build/<version>/web.tar.gz`、`web.tar.gz.sha256`、`release-manifest.json` 和 `scripts/deploy/production-web-deploy.sh` 直接归档为 Jenkins 构建产物;`Genarrative-Web-Deploy` 只通过 `copyArtifacts` 从指定上游构建复制这些产物和部署脚本,不再在目标机器 checkout Git,再执行随构建归档的 `scripts/deploy/production-web-deploy.sh`。Web 发布不再读取构建机本地缓存目录,也不再通过 release agent `rsync` 回构建机拉取大包;如果 deploy 找不到 `web.tar.gz`,应先检查上游 Web Build 是否按同一 `BUILD_VERSION` 成功归档产物。
|
||||||
|
|
||||||
`Genarrative-Api-Build` 的 Jenkins 归档产物必须包含 `build/<version>/api-server`、`api-server.sha256`、`release-manifest.json`、`scripts/database-backup-to-oss.mjs`、`scripts/deploy/production-api-deploy.sh`、`scripts/deploy/maintenance-on.sh` 和 `scripts/deploy/maintenance-off.sh`。`deploy/systemd/genarrative-database-backup.service` 从 `/opt/genarrative/current/scripts/database-backup-to-oss.mjs` 执行冷备份,`Genarrative-Api-Deploy` 会从上游 API 构建产物复制部署脚本和备份脚本,不再在目标机器 checkout Git;如果 API 发布后 current release 中缺少该脚本,应先检查 `Genarrative-Api-Build` 的 `archiveArtifacts` 和 `Genarrative-Api-Deploy` 的 `copyArtifacts` 过滤器是否仍包含 `build/<version>/scripts/database-backup-to-oss.mjs`,不要只在部署机工作区手工补文件。
|
`Genarrative-Api-Build` 的 Jenkins 归档产物必须包含 `build/<version>/api-server`、`api-server.sha256`、`release-manifest.json`、`scripts/database-backup-to-oss.mjs`、`scripts/ops/production-health-patrol.mjs`、`scripts/deploy/production-api-deploy.sh`、`scripts/deploy/maintenance-on.sh` 和 `scripts/deploy/maintenance-off.sh`。`deploy/systemd/genarrative-database-backup.service` 从 `/opt/genarrative/current/scripts/database-backup-to-oss.mjs` 执行冷备份,`deploy/systemd/genarrative-health-patrol.service` 从 `/opt/genarrative/current/scripts/ops/production-health-patrol.mjs` 执行巡检;`Genarrative-Api-Deploy` 会从上游 API 构建产物复制部署脚本、备份脚本和巡检脚本,不再在目标机器 checkout Git。如果 API 发布后 current release 中缺少这些脚本,应先检查 `Genarrative-Api-Build` 的 `archiveArtifacts` 和 `Genarrative-Api-Deploy` 的 `copyArtifacts` 过滤器是否仍包含 `build/<version>/scripts/database-backup-to-oss.mjs` 与 `build/<version>/scripts/ops/production-health-patrol.mjs`,不要只在部署机工作区手工补文件。
|
||||||
|
|
||||||
`Genarrative-Stdb-Module-Build` 的 Jenkins 归档产物必须包含 `build/<version>/spacetime_module.wasm`、`spacetime_module.wasm.sha256`、`release-manifest.json`、`scripts/deploy/production-stdb-publish.sh`、`scripts/deploy/maintenance-on.sh`、`scripts/deploy/maintenance-off.sh` 和 `scripts/database-backup-to-oss.mjs`。`Genarrative-Stdb-Module-Publish` 只通过 `copyArtifacts` 复制这些产物和发布脚本,不再在目标机器 checkout Git;如果 publish 前备份脚本缺失,应先检查 Stdb Build 的归档列表和 Stdb Publish 的复制过滤器。
|
`Genarrative-Stdb-Module-Build` 的 Jenkins 归档产物必须包含 `build/<version>/spacetime_module.wasm`、`spacetime_module.wasm.sha256`、`release-manifest.json`、`scripts/deploy/production-stdb-publish.sh`、`scripts/deploy/maintenance-on.sh`、`scripts/deploy/maintenance-off.sh` 和 `scripts/database-backup-to-oss.mjs`。`Genarrative-Stdb-Module-Publish` 只通过 `copyArtifacts` 复制这些产物和发布脚本,不再在目标机器 checkout Git;如果 publish 前备份脚本缺失,应先检查 Stdb Build 的归档列表和 Stdb Publish 的复制过滤器。
|
||||||
|
|
||||||
@@ -275,7 +308,8 @@ dev 服务器上的 Gitea 内网入口固定为 `http://10.2.0.10/GenarrativeAI/
|
|||||||
|
|
||||||
- `api-server` 生产模板默认 `GENARRATIVE_API_LISTEN_BACKLOG=1024`、`GENARRATIVE_API_WORKER_THREADS=4`;本地未设置 worker threads 时继续使用 Tokio 默认值。
|
- `api-server` 生产模板默认 `GENARRATIVE_API_LISTEN_BACKLOG=1024`、`GENARRATIVE_API_WORKER_THREADS=4`;本地未设置 worker threads 时继续使用 Tokio 默认值。
|
||||||
- `GENARRATIVE_API_MAX_CONCURRENT_REQUESTS=512` 开启应用内 HTTP 并发背压;`GENARRATIVE_API_GALLERY_MAX_CONCURRENT_REQUESTS=320`、`GENARRATIVE_API_DETAIL_MAX_CONCURRENT_REQUESTS=64`、`GENARRATIVE_API_ADMIN_MAX_CONCURRENT_REQUESTS=16` 分别限制公开列表、公开详情和后台 API 热路径。超过许可时直接返回 `429 Too Many Requests` 和 `Retry-After: 1`,`/healthz` 与 `/readyz` 不受该限制。这些值不是 RPS 限速;如果压测中 429 上升但内存和 p95 收敛,说明背压正在保护进程。直连 `api-server` 的极高 RPS 压测若出现 `connection refused`,通常已经打到 TCP 监听 / accept 层,应同时检查 backlog、Nginx upstream keepalive 和前置限流。
|
- `GENARRATIVE_API_MAX_CONCURRENT_REQUESTS=512` 开启应用内 HTTP 并发背压;`GENARRATIVE_API_GALLERY_MAX_CONCURRENT_REQUESTS=320`、`GENARRATIVE_API_DETAIL_MAX_CONCURRENT_REQUESTS=64`、`GENARRATIVE_API_ADMIN_MAX_CONCURRENT_REQUESTS=16` 分别限制公开列表、公开详情和后台 API 热路径。超过许可时直接返回 `429 Too Many Requests` 和 `Retry-After: 1`,`/healthz` 与 `/readyz` 不受该限制。这些值不是 RPS 限速;如果压测中 429 上升但内存和 p95 收敛,说明背压正在保护进程。直连 `api-server` 的极高 RPS 压测若出现 `connection refused`,通常已经打到 TCP 监听 / accept 层,应同时检查 backlog、Nginx upstream keepalive 和前置限流。
|
||||||
- `api-server` 正常运行时 `/healthz` 返回进程存活状态,`/readyz` 返回是否仍接收新流量;收到 `SIGINT` / `SIGTERM` 后会先把 readiness 标记为不可用,再让 Axum 停止接新连接并等待已有 HTTP 请求排空。systemd 仍以 `KillSignal=SIGINT` 停服务,`TimeoutStopSec=90` 作为长请求排空上限。
|
- `api-server` 正常运行时 `/healthz` 只返回进程存活状态,`/readyz` 会同时检查进程是否仍接收新流量和 SpacetimeDB 连接租约是否健康;收到 `SIGINT` / `SIGTERM` 后会先把 readiness 标记为不可用,再让 Axum 停止接新连接并等待已有 HTTP 请求排空。systemd 仍以 `KillSignal=SIGINT` 停服务,`TimeoutStopSec=90` 作为长请求排空上限。
|
||||||
|
- SpacetimeDB 健康检查默认使用 `GENARRATIVE_SPACETIME_HEALTH_CHECK_TIMEOUT_SECONDS=2` 的短等待窗口,和业务 procedure 的 `GENARRATIVE_SPACETIME_PROCEDURE_TIMEOUT_SECONDS` 分开。`/readyz` 失败时 `details.spacetime.stage` 会标出当前卡住阶段:`pool_acquire`、`connect_build`、`connect_handshake`、`read_model_subscribe`、`procedure_result`、`reducer_result` 或 `read_cache`;`elapsedMs` / `timeoutMs` 用于确认是否命中健康检查窗口。业务请求日志也会写入 `operation_kind`、`operation_name`、`spacetime_stage` 和 `elapsed_ms`,后续 45 秒超时不再只靠 Nginx `request_time=45s` 推断。
|
||||||
- `genarrative-api.service` 设置 `LimitNOFILE=65535`、`TasksMax=2048`;上线后用 `systemctl show genarrative-api.service -p LimitNOFILE -p TasksMax -p TimeoutStopUSec` 和 `cat /proc/$(pidof api-server)/limits` 核对。
|
- `genarrative-api.service` 设置 `LimitNOFILE=65535`、`TasksMax=2048`;上线后用 `systemctl show genarrative-api.service -p LimitNOFILE -p TasksMax -p TimeoutStopUSec` 和 `cat /proc/$(pidof api-server)/limits` 核对。
|
||||||
- Server provision 不再通过 Windows helper 下载,也不再通过 Linux build 节点中转工具包。`Prepare Provision Tools` 在目标 dev / release agent 工作区内先检查 `/usr/local/bin/otelcol-contrib` 与 `${SPACETIME_ROOT}/bin/current`:版本已满足时直接复用目标机现有文件生成 `provision-tools/`,只有缺失或版本不匹配时才使用 `PROVISION_DOWNLOADS_DIR` 里的本地包或从配置的下载源准备 SpacetimeDB `2.4.1` / `otelcol-contrib 0.151.0`;如果目标服务器下载需要代理,在 `PROVISION_DOWNLOAD_PROXY` 配置目标机可访问的 HTTP 代理。
|
- Server provision 不再通过 Windows helper 下载,也不再通过 Linux build 节点中转工具包。`Prepare Provision Tools` 在目标 dev / release agent 工作区内先检查 `/usr/local/bin/otelcol-contrib` 与 `${SPACETIME_ROOT}/bin/current`:版本已满足时直接复用目标机现有文件生成 `provision-tools/`,只有缺失或版本不匹配时才使用 `PROVISION_DOWNLOADS_DIR` 里的本地包或从配置的下载源准备 SpacetimeDB `2.4.1` / `otelcol-contrib 0.151.0`;如果目标服务器下载需要代理,在 `PROVISION_DOWNLOAD_PROXY` 配置目标机可访问的 HTTP 代理。
|
||||||
- 除 `Genarrative-Server-Provision` 外,`Genarrative-Stdb-Module-Build`、`Genarrative-Web-Build`、`Genarrative-Api-Build`、`Genarrative-*Deploy`、`Genarrative-Database-Import/Export`、`Genarrative-Full-Build-And-Deploy` 和 `Genarrative-Notify-Email` 的生产流水线现都以 Linux agent 为主,仍按各自 Jenkinsfile 的 checkout 口径执行。Server provision 不使用公网备用 Git 源。
|
- 除 `Genarrative-Server-Provision` 外,`Genarrative-Stdb-Module-Build`、`Genarrative-Web-Build`、`Genarrative-Api-Build`、`Genarrative-*Deploy`、`Genarrative-Database-Import/Export`、`Genarrative-Full-Build-And-Deploy` 和 `Genarrative-Notify-Email` 的生产流水线现都以 Linux agent 为主,仍按各自 Jenkinsfile 的 checkout 口径执行。Server provision 不使用公网备用 Git 源。
|
||||||
|
|||||||
@@ -104,7 +104,7 @@ pipeline {
|
|||||||
|
|
||||||
stage('Archive') {
|
stage('Archive') {
|
||||||
steps {
|
steps {
|
||||||
archiveArtifacts artifacts: "build/${env.EFFECTIVE_BUILD_VERSION}/api-server,build/${env.EFFECTIVE_BUILD_VERSION}/api-server.sha256,build/${env.EFFECTIVE_BUILD_VERSION}/release-manifest.json,build/${env.EFFECTIVE_BUILD_VERSION}/scripts/database-backup-to-oss.mjs,scripts/deploy/production-api-deploy.sh,scripts/deploy/maintenance-on.sh,scripts/deploy/maintenance-off.sh", fingerprint: true
|
archiveArtifacts artifacts: "build/${env.EFFECTIVE_BUILD_VERSION}/api-server,build/${env.EFFECTIVE_BUILD_VERSION}/api-server.sha256,build/${env.EFFECTIVE_BUILD_VERSION}/release-manifest.json,build/${env.EFFECTIVE_BUILD_VERSION}/scripts/database-backup-to-oss.mjs,build/${env.EFFECTIVE_BUILD_VERSION}/scripts/ops/production-health-patrol.mjs,scripts/deploy/production-api-deploy.sh,scripts/deploy/maintenance-on.sh,scripts/deploy/maintenance-off.sh", fingerprint: true
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -65,7 +65,7 @@ pipeline {
|
|||||||
copyArtifacts(
|
copyArtifacts(
|
||||||
projectName: params.BUILD_JOB_NAME,
|
projectName: params.BUILD_JOB_NAME,
|
||||||
selector: specific(params.BUILD_NUMBER_TO_DEPLOY),
|
selector: specific(params.BUILD_NUMBER_TO_DEPLOY),
|
||||||
filter: "build/${params.BUILD_VERSION}/api-server,build/${params.BUILD_VERSION}/api-server.sha256,build/${params.BUILD_VERSION}/release-manifest.json,build/${params.BUILD_VERSION}/scripts/database-backup-to-oss.mjs,scripts/deploy/production-api-deploy.sh,scripts/deploy/maintenance-on.sh,scripts/deploy/maintenance-off.sh",
|
filter: "build/${params.BUILD_VERSION}/api-server,build/${params.BUILD_VERSION}/api-server.sha256,build/${params.BUILD_VERSION}/release-manifest.json,build/${params.BUILD_VERSION}/scripts/database-backup-to-oss.mjs,build/${params.BUILD_VERSION}/scripts/ops/production-health-patrol.mjs,scripts/deploy/production-api-deploy.sh,scripts/deploy/maintenance-on.sh,scripts/deploy/maintenance-off.sh",
|
||||||
target: '.',
|
target: '.',
|
||||||
fingerprintArtifacts: true
|
fingerprintArtifacts: true
|
||||||
)
|
)
|
||||||
|
|||||||
@@ -27,6 +27,7 @@
|
|||||||
"clean": "node -e \"require('fs').rmSync('dist', { recursive: true, force: true })\"",
|
"clean": "node -e \"require('fs').rmSync('dist', { recursive: true, force: true })\"",
|
||||||
"check:encoding": "node scripts/check-encoding.mjs",
|
"check:encoding": "node scripts/check-encoding.mjs",
|
||||||
"check:spacetime-schema": "node scripts/check-spacetime-schema-guard.mjs",
|
"check:spacetime-schema": "node scripts/check-spacetime-schema-guard.mjs",
|
||||||
|
"check:production-ops": "node scripts/check-production-ops-guardrails.mjs",
|
||||||
"assets:child-motion-demo": "node scripts/generate-child-motion-demo-assets.mjs",
|
"assets:child-motion-demo": "node scripts/generate-child-motion-demo-assets.mjs",
|
||||||
"assets:match3d-style-references": "node scripts/generate-match3d-style-references.mjs",
|
"assets:match3d-style-references": "node scripts/generate-match3d-style-references.mjs",
|
||||||
"check:visual-novel-vn11": "node scripts/check-visual-novel-vn11-negative-scan.mjs",
|
"check:visual-novel-vn11": "node scripts/check-visual-novel-vn11-negative-scan.mjs",
|
||||||
@@ -37,7 +38,7 @@
|
|||||||
"lint:guardrails": "npm run lint:eslint",
|
"lint:guardrails": "npm run lint:eslint",
|
||||||
"typecheck": "tsc -p tsconfig.typecheck-guardrails.json --noEmit",
|
"typecheck": "tsc -p tsconfig.typecheck-guardrails.json --noEmit",
|
||||||
"typecheck:guardrails": "npm run typecheck",
|
"typecheck:guardrails": "npm run typecheck",
|
||||||
"lint": "npm run check:encoding && npm run check:spacetime-schema && npm run lint:eslint && npm run typecheck",
|
"lint": "npm run check:encoding && npm run check:spacetime-schema && npm run check:production-ops && npm run lint:eslint && npm run typecheck",
|
||||||
"lint:fix": "eslint . --ext .ts,.tsx,.js,.mjs,.cjs --fix && prettier --write .",
|
"lint:fix": "eslint . --ext .ts,.tsx,.js,.mjs,.cjs --fix && prettier --write .",
|
||||||
"format": "prettier --write .",
|
"format": "prettier --write .",
|
||||||
"format:check": "prettier --check .",
|
"format:check": "prettier --check .",
|
||||||
|
|||||||
@@ -445,7 +445,7 @@ if [[ "${BUILD_SPACETIME}" -eq 1 ]]; then
|
|||||||
write_migration_bootstrap_secret_file
|
write_migration_bootstrap_secret_file
|
||||||
fi
|
fi
|
||||||
|
|
||||||
mkdir -p "${TARGET_DIR}/scripts" "${TARGET_DIR}/deploy"
|
mkdir -p "${TARGET_DIR}/scripts" "${TARGET_DIR}/scripts/ops" "${TARGET_DIR}/deploy"
|
||||||
cp "${SCRIPT_DIR}/deploy/maintenance-on.sh" "${TARGET_DIR}/scripts/maintenance-on.sh"
|
cp "${SCRIPT_DIR}/deploy/maintenance-on.sh" "${TARGET_DIR}/scripts/maintenance-on.sh"
|
||||||
cp "${SCRIPT_DIR}/deploy/maintenance-off.sh" "${TARGET_DIR}/scripts/maintenance-off.sh"
|
cp "${SCRIPT_DIR}/deploy/maintenance-off.sh" "${TARGET_DIR}/scripts/maintenance-off.sh"
|
||||||
cp "${SCRIPT_DIR}/deploy/maintenance-status.sh" "${TARGET_DIR}/scripts/maintenance-status.sh"
|
cp "${SCRIPT_DIR}/deploy/maintenance-status.sh" "${TARGET_DIR}/scripts/maintenance-status.sh"
|
||||||
@@ -466,6 +466,7 @@ copy_required_file "${SCRIPT_DIR}/spacetime-migration-common.mjs" "${TARGET_DIR}
|
|||||||
copy_required_file "${SCRIPT_DIR}/spacetime-authorize-migration-operator.mjs" "${TARGET_DIR}/scripts/spacetime-authorize-migration-operator.mjs" "数据库迁移授权脚本"
|
copy_required_file "${SCRIPT_DIR}/spacetime-authorize-migration-operator.mjs" "${TARGET_DIR}/scripts/spacetime-authorize-migration-operator.mjs" "数据库迁移授权脚本"
|
||||||
copy_required_file "${SCRIPT_DIR}/spacetime-revoke-migration-operator.mjs" "${TARGET_DIR}/scripts/spacetime-revoke-migration-operator.mjs" "数据库迁移撤权脚本"
|
copy_required_file "${SCRIPT_DIR}/spacetime-revoke-migration-operator.mjs" "${TARGET_DIR}/scripts/spacetime-revoke-migration-operator.mjs" "数据库迁移撤权脚本"
|
||||||
copy_required_file "${SCRIPT_DIR}/database-backup-to-oss.mjs" "${TARGET_DIR}/scripts/database-backup-to-oss.mjs" "数据库 OSS 备份脚本"
|
copy_required_file "${SCRIPT_DIR}/database-backup-to-oss.mjs" "${TARGET_DIR}/scripts/database-backup-to-oss.mjs" "数据库 OSS 备份脚本"
|
||||||
|
copy_required_file "${SCRIPT_DIR}/ops/production-health-patrol.mjs" "${TARGET_DIR}/scripts/ops/production-health-patrol.mjs" "生产健康巡检脚本"
|
||||||
|
|
||||||
copy_required_dir "${REPO_ROOT}/deploy/systemd" "${TARGET_DIR}/deploy/systemd" "systemd 配置"
|
copy_required_dir "${REPO_ROOT}/deploy/systemd" "${TARGET_DIR}/deploy/systemd" "systemd 配置"
|
||||||
copy_required_dir "${REPO_ROOT}/deploy/nginx" "${TARGET_DIR}/deploy/nginx" "Nginx 配置"
|
copy_required_dir "${REPO_ROOT}/deploy/nginx" "${TARGET_DIR}/deploy/nginx" "Nginx 配置"
|
||||||
@@ -485,7 +486,7 @@ cat >"${TARGET_DIR}/README.md" <<EOF
|
|||||||
- \`migration-bootstrap-secret.txt\`:构建 \`spacetime_module.wasm\` 时注入的迁移引导密钥,仅用于创建首个迁移操作员;请作为敏感文件保存到 Jenkins Secret Text,授权完成后不要长期留在公开归档中。
|
- \`migration-bootstrap-secret.txt\`:构建 \`spacetime_module.wasm\` 时注入的迁移引导密钥,仅用于创建首个迁移操作员;请作为敏感文件保存到 Jenkins Secret Text,授权完成后不要长期留在公开归档中。
|
||||||
- \`*.sha256\`:发布产物 checksum,用于部署前校验。
|
- \`*.sha256\`:发布产物 checksum,用于部署前校验。
|
||||||
- \`release-manifest.json\`:发布版本、源码 commit 与产物清单。
|
- \`release-manifest.json\`:发布版本、源码 commit 与产物清单。
|
||||||
- \`scripts/\`:维护模式脚本、数据库导入导出脚本、数据库 OSS 备份脚本、迁移授权脚本和 Jenkins inbound agent systemd 安装脚本。
|
- \`scripts/\`:维护模式脚本、数据库导入导出脚本、数据库 OSS 备份脚本、生产健康巡检脚本、迁移授权脚本和 Jenkins inbound agent systemd 安装脚本。
|
||||||
- \`deploy/\`:systemd、Nginx 和生产环境变量示例;\`deploy/nginx/genarrative-dev-http.conf\` 仅供无域名开发服初始化使用。
|
- \`deploy/\`:systemd、Nginx 和生产环境变量示例;\`deploy/nginx/genarrative-dev-http.conf\` 仅供无域名开发服初始化使用。
|
||||||
|
|
||||||
## 生产部署口径
|
## 生产部署口径
|
||||||
|
|||||||
64
scripts/check-production-ops-guardrails.mjs
Normal file
64
scripts/check-production-ops-guardrails.mjs
Normal file
@@ -0,0 +1,64 @@
|
|||||||
|
#!/usr/bin/env node
|
||||||
|
|
||||||
|
import {readFileSync} from 'node:fs';
|
||||||
|
|
||||||
|
const checks = [
|
||||||
|
{
|
||||||
|
file: 'deploy/systemd/genarrative-database-backup.service',
|
||||||
|
includes: '--restart-service-after genarrative-api.service',
|
||||||
|
reason: '生产冷备份恢复 SpacetimeDB 后必须显式拉起依赖它的 API 服务。',
|
||||||
|
},
|
||||||
|
{
|
||||||
|
file: 'deploy/systemd/genarrative-health-patrol.service',
|
||||||
|
includes: 'scripts/ops/production-health-patrol.mjs',
|
||||||
|
reason: '健康巡检 systemd service 必须调用随 API release 发布的巡检脚本。',
|
||||||
|
},
|
||||||
|
{
|
||||||
|
file: 'deploy/systemd/genarrative-health-patrol.timer',
|
||||||
|
includes: 'genarrative-health-patrol.service',
|
||||||
|
reason: '健康巡检 timer 必须绑定巡检 service。',
|
||||||
|
},
|
||||||
|
{
|
||||||
|
file: 'scripts/jenkins-server-provision.sh',
|
||||||
|
includes: 'genarrative-health-patrol.timer',
|
||||||
|
reason: 'Server-Provision 必须安装并启用健康巡检 timer。',
|
||||||
|
},
|
||||||
|
{
|
||||||
|
file: 'scripts/build-production-release.sh',
|
||||||
|
includes: 'production-health-patrol.mjs',
|
||||||
|
reason: '生产 API release 必须携带健康巡检脚本。',
|
||||||
|
},
|
||||||
|
{
|
||||||
|
file: 'scripts/deploy/production-api-deploy.sh',
|
||||||
|
includes: 'production-health-patrol.mjs',
|
||||||
|
reason: 'API deploy 必须把健康巡检脚本复制到 current release。',
|
||||||
|
},
|
||||||
|
{
|
||||||
|
file: 'jenkins/Jenkinsfile.production-api-build',
|
||||||
|
includes: 'scripts/ops/production-health-patrol.mjs',
|
||||||
|
reason: 'API Build 归档必须包含健康巡检脚本。',
|
||||||
|
},
|
||||||
|
{
|
||||||
|
file: 'jenkins/Jenkinsfile.production-api-deploy',
|
||||||
|
includes: 'scripts/ops/production-health-patrol.mjs',
|
||||||
|
reason: 'API Deploy 复制上游产物时必须包含健康巡检脚本。',
|
||||||
|
},
|
||||||
|
];
|
||||||
|
|
||||||
|
let failed = false;
|
||||||
|
|
||||||
|
for (const check of checks) {
|
||||||
|
const content = readFileSync(check.file, 'utf8');
|
||||||
|
if (!content.includes(check.includes)) {
|
||||||
|
failed = true;
|
||||||
|
console.error(
|
||||||
|
`[check:production-ops] ${check.file} 缺少 ${check.includes}。${check.reason}`,
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if (failed) {
|
||||||
|
process.exit(1);
|
||||||
|
}
|
||||||
|
|
||||||
|
console.log('[check:production-ops] OK');
|
||||||
@@ -334,7 +334,9 @@ chmod +x "${RELEASE_DIR}/api-server"
|
|||||||
|
|
||||||
BACKUP_SCRIPT_SOURCE="${SOURCE_DIR}/scripts/database-backup-to-oss.mjs"
|
BACKUP_SCRIPT_SOURCE="${SOURCE_DIR}/scripts/database-backup-to-oss.mjs"
|
||||||
WORKSPACE_BACKUP_SCRIPT_SOURCE="$(cd "${SCRIPT_DIR}/../.." && pwd)/scripts/database-backup-to-oss.mjs"
|
WORKSPACE_BACKUP_SCRIPT_SOURCE="$(cd "${SCRIPT_DIR}/../.." && pwd)/scripts/database-backup-to-oss.mjs"
|
||||||
mkdir -p "${RELEASE_DIR}/scripts"
|
HEALTH_PATROL_SCRIPT_SOURCE="${SOURCE_DIR}/scripts/ops/production-health-patrol.mjs"
|
||||||
|
WORKSPACE_HEALTH_PATROL_SCRIPT_SOURCE="$(cd "${SCRIPT_DIR}/../.." && pwd)/scripts/ops/production-health-patrol.mjs"
|
||||||
|
mkdir -p "${RELEASE_DIR}/scripts" "${RELEASE_DIR}/scripts/ops"
|
||||||
if [[ ! -f "${BACKUP_SCRIPT_SOURCE}" ]]; then
|
if [[ ! -f "${BACKUP_SCRIPT_SOURCE}" ]]; then
|
||||||
if [[ -f "${WORKSPACE_BACKUP_SCRIPT_SOURCE}" ]]; then
|
if [[ -f "${WORKSPACE_BACKUP_SCRIPT_SOURCE}" ]]; then
|
||||||
echo "[production-api-deploy] 发布产物缺少 scripts/database-backup-to-oss.mjs,回退使用部署工作区脚本;请重新触发包含该脚本的 API 构建。" >&2
|
echo "[production-api-deploy] 发布产物缺少 scripts/database-backup-to-oss.mjs,回退使用部署工作区脚本;请重新触发包含该脚本的 API 构建。" >&2
|
||||||
@@ -346,6 +348,19 @@ if [[ ! -f "${BACKUP_SCRIPT_SOURCE}" ]]; then
|
|||||||
fi
|
fi
|
||||||
cp "${BACKUP_SCRIPT_SOURCE}" "${RELEASE_DIR}/scripts/database-backup-to-oss.mjs"
|
cp "${BACKUP_SCRIPT_SOURCE}" "${RELEASE_DIR}/scripts/database-backup-to-oss.mjs"
|
||||||
chmod 0644 "${RELEASE_DIR}/scripts/database-backup-to-oss.mjs"
|
chmod 0644 "${RELEASE_DIR}/scripts/database-backup-to-oss.mjs"
|
||||||
|
if [[ ! -f "${HEALTH_PATROL_SCRIPT_SOURCE}" ]]; then
|
||||||
|
if [[ -f "${WORKSPACE_HEALTH_PATROL_SCRIPT_SOURCE}" ]]; then
|
||||||
|
echo "[production-api-deploy] 发布产物缺少 scripts/ops/production-health-patrol.mjs,回退使用部署工作区脚本;请重新触发包含该脚本的 API 构建。" >&2
|
||||||
|
HEALTH_PATROL_SCRIPT_SOURCE="${WORKSPACE_HEALTH_PATROL_SCRIPT_SOURCE}"
|
||||||
|
else
|
||||||
|
echo "[production-api-deploy] 未找到生产健康巡检脚本,跳过复制;genarrative-health-patrol.service 会因脚本缺失而跳过执行。" >&2
|
||||||
|
HEALTH_PATROL_SCRIPT_SOURCE=""
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
if [[ -n "${HEALTH_PATROL_SCRIPT_SOURCE}" ]]; then
|
||||||
|
cp "${HEALTH_PATROL_SCRIPT_SOURCE}" "${RELEASE_DIR}/scripts/ops/production-health-patrol.mjs"
|
||||||
|
chmod 0644 "${RELEASE_DIR}/scripts/ops/production-health-patrol.mjs"
|
||||||
|
fi
|
||||||
|
|
||||||
if [[ -f "${SOURCE_DIR}/release-manifest.json" ]]; then
|
if [[ -f "${SOURCE_DIR}/release-manifest.json" ]]; then
|
||||||
cp "${SOURCE_DIR}/release-manifest.json" "${RELEASE_DIR}/release-manifest.api-server.json"
|
cp "${SOURCE_DIR}/release-manifest.json" "${RELEASE_DIR}/release-manifest.api-server.json"
|
||||||
|
|||||||
@@ -732,10 +732,20 @@ render_database_backup_service() {
|
|||||||
deploy/systemd/genarrative-database-backup.service
|
deploy/systemd/genarrative-database-backup.service
|
||||||
}
|
}
|
||||||
|
|
||||||
|
render_health_patrol_service() {
|
||||||
|
local current_escaped
|
||||||
|
current_escaped="$(escape_sed_replacement "${CURRENT_LINK}")"
|
||||||
|
sed \
|
||||||
|
-e "s|/opt/genarrative/current|${current_escaped}|g" \
|
||||||
|
deploy/systemd/genarrative-health-patrol.service
|
||||||
|
}
|
||||||
|
|
||||||
require_path deploy/systemd/spacetimedb.service
|
require_path deploy/systemd/spacetimedb.service
|
||||||
require_path deploy/systemd/genarrative-api.service
|
require_path deploy/systemd/genarrative-api.service
|
||||||
require_path deploy/systemd/genarrative-database-backup.service
|
require_path deploy/systemd/genarrative-database-backup.service
|
||||||
require_path deploy/systemd/genarrative-database-backup.timer
|
require_path deploy/systemd/genarrative-database-backup.timer
|
||||||
|
require_path deploy/systemd/genarrative-health-patrol.service
|
||||||
|
require_path deploy/systemd/genarrative-health-patrol.timer
|
||||||
require_path deploy/systemd/otelcol-contrib.service
|
require_path deploy/systemd/otelcol-contrib.service
|
||||||
require_path deploy/otelcol/genarrative-debug.yaml
|
require_path deploy/otelcol/genarrative-debug.yaml
|
||||||
require_path deploy/nginx/genarrative.conf
|
require_path deploy/nginx/genarrative.conf
|
||||||
@@ -754,7 +764,7 @@ echo "[server-provision] target=${DEPLOY_TARGET}, dry_run=${DRY_RUN}, nginx_conf
|
|||||||
run_cmd id
|
run_cmd id
|
||||||
require_root_for_real_provision
|
require_root_for_real_provision
|
||||||
install_nginx_brotli_modules
|
install_nginx_brotli_modules
|
||||||
run_cmd mkdir -p "${SPACETIME_ROOT}" "${RELEASE_ROOT}" "$(dirname "${CURRENT_LINK}")" "$(dirname "${WEB_LINK}")" /etc/genarrative /var/lib/genarrative/maintenance /var/lib/genarrative/auth /var/lib/genarrative/tracking-outbox /var/lib/genarrative/database-backups
|
run_cmd mkdir -p "${SPACETIME_ROOT}" "${RELEASE_ROOT}" "$(dirname "${CURRENT_LINK}")" "$(dirname "${WEB_LINK}")" /etc/genarrative /var/lib/genarrative/maintenance /var/lib/genarrative/auth /var/lib/genarrative/tracking-outbox /var/lib/genarrative/database-backups /var/lib/genarrative/health-patrol
|
||||||
|
|
||||||
if ! id spacetimedb >/dev/null 2>&1; then
|
if ! id spacetimedb >/dev/null 2>&1; then
|
||||||
run_cmd useradd --system --home-dir "${SPACETIME_ROOT}" --shell /usr/sbin/nologin spacetimedb
|
run_cmd useradd --system --home-dir "${SPACETIME_ROOT}" --shell /usr/sbin/nologin spacetimedb
|
||||||
@@ -786,14 +796,18 @@ sync_spacetime_install "${SPACETIME_ROOT}"
|
|||||||
spacetimedb_service="$(mktemp)"
|
spacetimedb_service="$(mktemp)"
|
||||||
api_service="$(mktemp)"
|
api_service="$(mktemp)"
|
||||||
database_backup_service="$(mktemp)"
|
database_backup_service="$(mktemp)"
|
||||||
|
health_patrol_service="$(mktemp)"
|
||||||
render_spacetimedb_service >"${spacetimedb_service}"
|
render_spacetimedb_service >"${spacetimedb_service}"
|
||||||
render_api_service >"${api_service}"
|
render_api_service >"${api_service}"
|
||||||
render_database_backup_service >"${database_backup_service}"
|
render_database_backup_service >"${database_backup_service}"
|
||||||
|
render_health_patrol_service >"${health_patrol_service}"
|
||||||
install_file "${spacetimedb_service}" /etc/systemd/system/spacetimedb.service 0644
|
install_file "${spacetimedb_service}" /etc/systemd/system/spacetimedb.service 0644
|
||||||
install_file "${api_service}" /etc/systemd/system/genarrative-api.service 0644
|
install_file "${api_service}" /etc/systemd/system/genarrative-api.service 0644
|
||||||
install_file "${database_backup_service}" /etc/systemd/system/genarrative-database-backup.service 0644
|
install_file "${database_backup_service}" /etc/systemd/system/genarrative-database-backup.service 0644
|
||||||
install_file deploy/systemd/genarrative-database-backup.timer /etc/systemd/system/genarrative-database-backup.timer 0644
|
install_file deploy/systemd/genarrative-database-backup.timer /etc/systemd/system/genarrative-database-backup.timer 0644
|
||||||
rm -f "${spacetimedb_service}" "${api_service}" "${database_backup_service}"
|
install_file "${health_patrol_service}" /etc/systemd/system/genarrative-health-patrol.service 0644
|
||||||
|
install_file deploy/systemd/genarrative-health-patrol.timer /etc/systemd/system/genarrative-health-patrol.timer 0644
|
||||||
|
rm -f "${spacetimedb_service}" "${api_service}" "${database_backup_service}" "${health_patrol_service}"
|
||||||
|
|
||||||
if [[ ! -f "${API_ENV_FILE}" ]]; then
|
if [[ ! -f "${API_ENV_FILE}" ]]; then
|
||||||
echo "+ create ${API_ENV_FILE} from example"
|
echo "+ create ${API_ENV_FILE} from example"
|
||||||
@@ -828,7 +842,7 @@ if [[ "${ENABLE_SERVICES}" == "true" ]]; then
|
|||||||
if [[ "${ENABLE_OTELCOL:-true}" == "true" ]]; then
|
if [[ "${ENABLE_OTELCOL:-true}" == "true" ]]; then
|
||||||
run_cmd systemctl enable otelcol-contrib.service
|
run_cmd systemctl enable otelcol-contrib.service
|
||||||
fi
|
fi
|
||||||
run_cmd systemctl enable spacetimedb.service genarrative-api.service genarrative-database-backup.timer
|
run_cmd systemctl enable spacetimedb.service genarrative-api.service genarrative-database-backup.timer genarrative-health-patrol.timer
|
||||||
if [[ "${ENABLE_OTELCOL:-true}" == "true" ]]; then
|
if [[ "${ENABLE_OTELCOL:-true}" == "true" ]]; then
|
||||||
run_cmd systemctl restart otelcol-contrib.service
|
run_cmd systemctl restart otelcol-contrib.service
|
||||||
fi
|
fi
|
||||||
|
|||||||
477
scripts/ops/production-health-patrol.mjs
Normal file
477
scripts/ops/production-health-patrol.mjs
Normal file
@@ -0,0 +1,477 @@
|
|||||||
|
#!/usr/bin/env node
|
||||||
|
|
||||||
|
import {execFile} from 'node:child_process';
|
||||||
|
import http from 'node:http';
|
||||||
|
import https from 'node:https';
|
||||||
|
import {mkdir, writeFile} from 'node:fs/promises';
|
||||||
|
import {dirname} from 'node:path';
|
||||||
|
|
||||||
|
const STATUS_RANK = {
|
||||||
|
OK: 0,
|
||||||
|
WARNING: 1,
|
||||||
|
CRITICAL: 2,
|
||||||
|
};
|
||||||
|
|
||||||
|
const DEFAULT_PUBLIC_PATHS = [
|
||||||
|
'/api/creation-entry/config',
|
||||||
|
'/api/runtime/puzzle/gallery',
|
||||||
|
'/api/runtime/custom-world-gallery',
|
||||||
|
];
|
||||||
|
|
||||||
|
const DEFAULT_SERVICES = [
|
||||||
|
'genarrative-api.service',
|
||||||
|
'spacetimedb.service',
|
||||||
|
'nginx.service',
|
||||||
|
];
|
||||||
|
|
||||||
|
function usage() {
|
||||||
|
console.log(`Usage:
|
||||||
|
node scripts/ops/production-health-patrol.mjs [options]
|
||||||
|
|
||||||
|
Options:
|
||||||
|
--api-base-url <url> API direct base URL, default http://127.0.0.1:8082
|
||||||
|
--spacetime-base-url <url> SpacetimeDB base URL, default http://127.0.0.1:3101
|
||||||
|
--public-base-url <url> Nginx/public base URL, default http://127.0.0.1
|
||||||
|
--public-path <path> Public API path to probe; repeatable
|
||||||
|
--status-file <path> Write the last patrol result as JSON
|
||||||
|
--timeout-ms <ms> HTTP/command timeout, default 5000
|
||||||
|
--slow-ms <ms> Mark successful probes slower than this as WARNING, default 3000
|
||||||
|
--fail-on-warning Exit 1 when the total status is WARNING
|
||||||
|
--skip-journal Skip recent journal error scan
|
||||||
|
--json Print JSON instead of text
|
||||||
|
`);
|
||||||
|
}
|
||||||
|
|
||||||
|
function readBoolEnv(name, fallback = false) {
|
||||||
|
const value = process.env[name];
|
||||||
|
if (!value) {
|
||||||
|
return fallback;
|
||||||
|
}
|
||||||
|
return ['1', 'true', 'yes', 'on'].includes(value.trim().toLowerCase());
|
||||||
|
}
|
||||||
|
|
||||||
|
function parsePositiveInt(raw, fallback) {
|
||||||
|
const value = Number.parseInt(String(raw ?? ''), 10);
|
||||||
|
return Number.isFinite(value) && value > 0 ? value : fallback;
|
||||||
|
}
|
||||||
|
|
||||||
|
function parseArgs(argv) {
|
||||||
|
const config = {
|
||||||
|
apiBaseUrl:
|
||||||
|
process.env.GENARRATIVE_HEALTH_PATROL_API_BASE_URL ||
|
||||||
|
'http://127.0.0.1:8082',
|
||||||
|
spacetimeBaseUrl:
|
||||||
|
process.env.GENARRATIVE_HEALTH_PATROL_SPACETIME_BASE_URL ||
|
||||||
|
'http://127.0.0.1:3101',
|
||||||
|
publicBaseUrl:
|
||||||
|
process.env.GENARRATIVE_HEALTH_PATROL_PUBLIC_BASE_URL ||
|
||||||
|
process.env.GENARRATIVE_HEALTH_PATROL_API_BASE_URL ||
|
||||||
|
'http://127.0.0.1:8082',
|
||||||
|
publicPaths: [],
|
||||||
|
statusFile: process.env.GENARRATIVE_HEALTH_PATROL_STATUS_FILE || '',
|
||||||
|
timeoutMs: parsePositiveInt(
|
||||||
|
process.env.GENARRATIVE_HEALTH_PATROL_TIMEOUT_MS,
|
||||||
|
5000,
|
||||||
|
),
|
||||||
|
slowMs: parsePositiveInt(
|
||||||
|
process.env.GENARRATIVE_HEALTH_PATROL_SLOW_MS,
|
||||||
|
3000,
|
||||||
|
),
|
||||||
|
failOnWarning: readBoolEnv('GENARRATIVE_HEALTH_PATROL_FAIL_ON_WARNING'),
|
||||||
|
skipJournal: readBoolEnv('GENARRATIVE_HEALTH_PATROL_SKIP_JOURNAL'),
|
||||||
|
json: false,
|
||||||
|
webhookUrl: process.env.GENARRATIVE_HEALTH_PATROL_WEBHOOK_URL || '',
|
||||||
|
};
|
||||||
|
|
||||||
|
for (let index = 0; index < argv.length; index += 1) {
|
||||||
|
const arg = argv[index];
|
||||||
|
switch (arg) {
|
||||||
|
case '-h':
|
||||||
|
case '--help':
|
||||||
|
usage();
|
||||||
|
process.exit(0);
|
||||||
|
break;
|
||||||
|
case '--api-base-url':
|
||||||
|
config.apiBaseUrl = requireValue(argv, ++index, arg);
|
||||||
|
break;
|
||||||
|
case '--spacetime-base-url':
|
||||||
|
config.spacetimeBaseUrl = requireValue(argv, ++index, arg);
|
||||||
|
break;
|
||||||
|
case '--public-base-url':
|
||||||
|
config.publicBaseUrl = requireValue(argv, ++index, arg);
|
||||||
|
break;
|
||||||
|
case '--public-path':
|
||||||
|
config.publicPaths.push(requireValue(argv, ++index, arg));
|
||||||
|
break;
|
||||||
|
case '--status-file':
|
||||||
|
config.statusFile = requireValue(argv, ++index, arg);
|
||||||
|
break;
|
||||||
|
case '--timeout-ms':
|
||||||
|
config.timeoutMs = parsePositiveInt(requireValue(argv, ++index, arg), 5000);
|
||||||
|
break;
|
||||||
|
case '--slow-ms':
|
||||||
|
config.slowMs = parsePositiveInt(requireValue(argv, ++index, arg), 3000);
|
||||||
|
break;
|
||||||
|
case '--fail-on-warning':
|
||||||
|
config.failOnWarning = true;
|
||||||
|
break;
|
||||||
|
case '--skip-journal':
|
||||||
|
config.skipJournal = true;
|
||||||
|
break;
|
||||||
|
case '--json':
|
||||||
|
config.json = true;
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
throw new Error(`未知参数: ${arg}`);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if (config.publicPaths.length === 0) {
|
||||||
|
config.publicPaths = DEFAULT_PUBLIC_PATHS;
|
||||||
|
}
|
||||||
|
|
||||||
|
return config;
|
||||||
|
}
|
||||||
|
|
||||||
|
function requireValue(argv, index, flag) {
|
||||||
|
const value = argv[index];
|
||||||
|
if (!value || value.startsWith('--')) {
|
||||||
|
throw new Error(`${flag} 缺少参数值`);
|
||||||
|
}
|
||||||
|
return value;
|
||||||
|
}
|
||||||
|
|
||||||
|
function joinUrl(baseUrl, path) {
|
||||||
|
const base = baseUrl.endsWith('/') ? baseUrl.slice(0, -1) : baseUrl;
|
||||||
|
const suffix = path.startsWith('/') ? path : `/${path}`;
|
||||||
|
return `${base}${suffix}`;
|
||||||
|
}
|
||||||
|
|
||||||
|
function maxStatus(checks) {
|
||||||
|
return checks.reduce((current, check) => {
|
||||||
|
return STATUS_RANK[check.status] > STATUS_RANK[current] ? check.status : current;
|
||||||
|
}, 'OK');
|
||||||
|
}
|
||||||
|
|
||||||
|
function checkResult(name, status, summary, details = {}) {
|
||||||
|
return {
|
||||||
|
name,
|
||||||
|
status,
|
||||||
|
summary,
|
||||||
|
...details,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
function runCommand(command, args, timeoutMs) {
|
||||||
|
return new Promise((resolve) => {
|
||||||
|
execFile(
|
||||||
|
command,
|
||||||
|
args,
|
||||||
|
{
|
||||||
|
timeout: timeoutMs,
|
||||||
|
windowsHide: true,
|
||||||
|
maxBuffer: 256 * 1024,
|
||||||
|
},
|
||||||
|
(error, stdout, stderr) => {
|
||||||
|
resolve({
|
||||||
|
command: [command, ...args].join(' '),
|
||||||
|
code:
|
||||||
|
typeof error?.code === 'number'
|
||||||
|
? error.code
|
||||||
|
: error
|
||||||
|
? 1
|
||||||
|
: 0,
|
||||||
|
signal: error?.signal || '',
|
||||||
|
stdout: String(stdout || ''),
|
||||||
|
stderr: String(stderr || ''),
|
||||||
|
timedOut: Boolean(error?.killed),
|
||||||
|
error: error ? error.message : '',
|
||||||
|
});
|
||||||
|
},
|
||||||
|
);
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
async function checkService(serviceName, timeoutMs) {
|
||||||
|
const result = await runCommand(
|
||||||
|
'systemctl',
|
||||||
|
['is-active', serviceName],
|
||||||
|
timeoutMs,
|
||||||
|
);
|
||||||
|
const state = result.stdout.trim() || result.stderr.trim() || result.error;
|
||||||
|
if (result.code === 0 && state === 'active') {
|
||||||
|
return checkResult(`service:${serviceName}`, 'OK', 'active', {
|
||||||
|
command: result.command,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
return checkResult(
|
||||||
|
`service:${serviceName}`,
|
||||||
|
'CRITICAL',
|
||||||
|
`服务状态异常: ${state || `exit ${result.code}`}`,
|
||||||
|
{
|
||||||
|
command: result.command,
|
||||||
|
stderr: result.stderr.trim(),
|
||||||
|
},
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
function requestUrl(url, timeoutMs) {
|
||||||
|
return new Promise((resolve) => {
|
||||||
|
const startedAt = Date.now();
|
||||||
|
const parsed = new URL(url);
|
||||||
|
const client = parsed.protocol === 'https:' ? https : http;
|
||||||
|
const request = client.request(
|
||||||
|
parsed,
|
||||||
|
{
|
||||||
|
method: 'GET',
|
||||||
|
timeout: timeoutMs,
|
||||||
|
headers: {
|
||||||
|
'User-Agent': 'genarrative-health-patrol/1.0',
|
||||||
|
Accept: 'application/json,text/plain,*/*',
|
||||||
|
},
|
||||||
|
},
|
||||||
|
(response) => {
|
||||||
|
let body = '';
|
||||||
|
response.setEncoding('utf8');
|
||||||
|
response.on('data', (chunk) => {
|
||||||
|
if (body.length < 2048) {
|
||||||
|
body += chunk;
|
||||||
|
}
|
||||||
|
});
|
||||||
|
response.on('end', () => {
|
||||||
|
resolve({
|
||||||
|
elapsedMs: Date.now() - startedAt,
|
||||||
|
statusCode: response.statusCode || 0,
|
||||||
|
body: body.slice(0, 2048),
|
||||||
|
});
|
||||||
|
});
|
||||||
|
},
|
||||||
|
);
|
||||||
|
|
||||||
|
request.on('timeout', () => {
|
||||||
|
request.destroy(new Error(`timeout after ${timeoutMs}ms`));
|
||||||
|
});
|
||||||
|
request.on('error', (error) => {
|
||||||
|
resolve({
|
||||||
|
elapsedMs: Date.now() - startedAt,
|
||||||
|
error: error.message,
|
||||||
|
});
|
||||||
|
});
|
||||||
|
request.end();
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
async function checkHttp(name, url, config) {
|
||||||
|
const result = await requestUrl(url, config.timeoutMs);
|
||||||
|
const curlCommand = `curl -fsS --max-time ${Math.ceil(config.timeoutMs / 1000)} ${url}`;
|
||||||
|
|
||||||
|
if (result.error) {
|
||||||
|
return checkResult(name, 'CRITICAL', `请求失败: ${result.error}`, {
|
||||||
|
command: curlCommand,
|
||||||
|
elapsedMs: result.elapsedMs,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
const ok = result.statusCode >= 200 && result.statusCode < 300;
|
||||||
|
if (!ok) {
|
||||||
|
return checkResult(
|
||||||
|
name,
|
||||||
|
'CRITICAL',
|
||||||
|
`HTTP ${result.statusCode},耗时 ${result.elapsedMs}ms`,
|
||||||
|
{
|
||||||
|
command: curlCommand,
|
||||||
|
elapsedMs: result.elapsedMs,
|
||||||
|
body: result.body.trim(),
|
||||||
|
},
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (result.elapsedMs > config.slowMs) {
|
||||||
|
return checkResult(
|
||||||
|
name,
|
||||||
|
'WARNING',
|
||||||
|
`HTTP ${result.statusCode} 但耗时偏高: ${result.elapsedMs}ms`,
|
||||||
|
{
|
||||||
|
command: curlCommand,
|
||||||
|
elapsedMs: result.elapsedMs,
|
||||||
|
},
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
return checkResult(name, 'OK', `HTTP ${result.statusCode} ${result.elapsedMs}ms`, {
|
||||||
|
command: curlCommand,
|
||||||
|
elapsedMs: result.elapsedMs,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
async function checkRecentJournal(config) {
|
||||||
|
const args = [
|
||||||
|
'-u',
|
||||||
|
'genarrative-api.service',
|
||||||
|
'-u',
|
||||||
|
'spacetimedb.service',
|
||||||
|
'-u',
|
||||||
|
'nginx.service',
|
||||||
|
'--since',
|
||||||
|
'15 minutes ago',
|
||||||
|
'-p',
|
||||||
|
'err..alert',
|
||||||
|
'--no-pager',
|
||||||
|
'-o',
|
||||||
|
'short-iso',
|
||||||
|
'-n',
|
||||||
|
'20',
|
||||||
|
];
|
||||||
|
const result = await runCommand('journalctl', args, config.timeoutMs);
|
||||||
|
|
||||||
|
if (result.code !== 0) {
|
||||||
|
return checkResult('journal:recent-errors', 'WARNING', '无法读取最近错误日志', {
|
||||||
|
command: result.command,
|
||||||
|
stderr: result.stderr.trim() || result.error,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
const lines = result.stdout
|
||||||
|
.split('\n')
|
||||||
|
.map((line) => line.trim())
|
||||||
|
.filter((line) => line && line !== '-- No entries --');
|
||||||
|
|
||||||
|
if (lines.length === 0) {
|
||||||
|
return checkResult('journal:recent-errors', 'OK', '最近 15 分钟无 err..alert 日志', {
|
||||||
|
command: result.command,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
return checkResult(
|
||||||
|
'journal:recent-errors',
|
||||||
|
'WARNING',
|
||||||
|
`最近 15 分钟有 ${lines.length} 条 err..alert 日志`,
|
||||||
|
{
|
||||||
|
command: result.command,
|
||||||
|
lines,
|
||||||
|
},
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
async function writeStatusFile(statusFile, payload) {
|
||||||
|
if (!statusFile) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
await mkdir(dirname(statusFile), {recursive: true});
|
||||||
|
await writeFile(statusFile, `${JSON.stringify(payload, null, 2)}\n`, 'utf8');
|
||||||
|
}
|
||||||
|
|
||||||
|
async function notifyWebhook(config, payload) {
|
||||||
|
if (!config.webhookUrl || payload.status === 'OK') {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
const body = JSON.stringify(payload);
|
||||||
|
const parsed = new URL(config.webhookUrl);
|
||||||
|
const client = parsed.protocol === 'https:' ? https : http;
|
||||||
|
|
||||||
|
await new Promise((resolve) => {
|
||||||
|
const request = client.request(
|
||||||
|
parsed,
|
||||||
|
{
|
||||||
|
method: 'POST',
|
||||||
|
timeout: config.timeoutMs,
|
||||||
|
headers: {
|
||||||
|
'Content-Type': 'application/json',
|
||||||
|
'Content-Length': Buffer.byteLength(body),
|
||||||
|
},
|
||||||
|
},
|
||||||
|
(response) => {
|
||||||
|
response.resume();
|
||||||
|
response.on('end', resolve);
|
||||||
|
},
|
||||||
|
);
|
||||||
|
request.on('timeout', () => {
|
||||||
|
request.destroy(new Error(`timeout after ${config.timeoutMs}ms`));
|
||||||
|
});
|
||||||
|
request.on('error', (error) => {
|
||||||
|
console.error(`[health-patrol] webhook notify failed: ${error.message}`);
|
||||||
|
resolve();
|
||||||
|
});
|
||||||
|
request.end(body);
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
function printText(payload) {
|
||||||
|
console.log(`[health-patrol] ${payload.status} ${payload.checkedAt}`);
|
||||||
|
for (const check of payload.checks) {
|
||||||
|
console.log(`[${check.status}] ${check.name}: ${check.summary}`);
|
||||||
|
if (check.command && check.status !== 'OK') {
|
||||||
|
console.log(` command: ${check.command}`);
|
||||||
|
}
|
||||||
|
if (check.stderr) {
|
||||||
|
console.log(` stderr: ${check.stderr}`);
|
||||||
|
}
|
||||||
|
if (check.body) {
|
||||||
|
console.log(` body: ${check.body}`);
|
||||||
|
}
|
||||||
|
if (Array.isArray(check.lines) && check.lines.length > 0) {
|
||||||
|
for (const line of check.lines) {
|
||||||
|
console.log(` ${line}`);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async function main() {
|
||||||
|
const config = parseArgs(process.argv.slice(2));
|
||||||
|
const checks = [];
|
||||||
|
|
||||||
|
for (const serviceName of DEFAULT_SERVICES) {
|
||||||
|
checks.push(await checkService(serviceName, config.timeoutMs));
|
||||||
|
}
|
||||||
|
|
||||||
|
checks.push(await checkHttp('api:/healthz', joinUrl(config.apiBaseUrl, '/healthz'), config));
|
||||||
|
checks.push(await checkHttp('api:/readyz', joinUrl(config.apiBaseUrl, '/readyz'), config));
|
||||||
|
checks.push(
|
||||||
|
await checkHttp(
|
||||||
|
'spacetimedb:/v1/ping',
|
||||||
|
joinUrl(config.spacetimeBaseUrl, '/v1/ping'),
|
||||||
|
config,
|
||||||
|
),
|
||||||
|
);
|
||||||
|
|
||||||
|
for (const path of config.publicPaths) {
|
||||||
|
checks.push(
|
||||||
|
await checkHttp(`public:${path}`, joinUrl(config.publicBaseUrl, path), config),
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!config.skipJournal) {
|
||||||
|
checks.push(await checkRecentJournal(config));
|
||||||
|
}
|
||||||
|
|
||||||
|
const payload = {
|
||||||
|
status: maxStatus(checks),
|
||||||
|
checkedAt: new Date().toISOString(),
|
||||||
|
host: process.env.HOSTNAME || '',
|
||||||
|
checks,
|
||||||
|
};
|
||||||
|
|
||||||
|
await writeStatusFile(config.statusFile, payload);
|
||||||
|
await notifyWebhook(config, payload);
|
||||||
|
|
||||||
|
if (config.json) {
|
||||||
|
console.log(JSON.stringify(payload, null, 2));
|
||||||
|
} else {
|
||||||
|
printText(payload);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (payload.status === 'CRITICAL') {
|
||||||
|
process.exit(2);
|
||||||
|
}
|
||||||
|
if (payload.status === 'WARNING' && config.failOnWarning) {
|
||||||
|
process.exit(1);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
main().catch((error) => {
|
||||||
|
console.error(`[health-patrol] CRITICAL ${error instanceof Error ? error.message : String(error)}`);
|
||||||
|
process.exit(2);
|
||||||
|
});
|
||||||
@@ -269,6 +269,7 @@ mod tests {
|
|||||||
};
|
};
|
||||||
use reqwest::Client;
|
use reqwest::Client;
|
||||||
use serde_json::Value;
|
use serde_json::Value;
|
||||||
|
use spacetime_client::{SpacetimeClientHealthSnapshot, SpacetimeClientStage};
|
||||||
use time::OffsetDateTime;
|
use time::OffsetDateTime;
|
||||||
use tokio::net::TcpListener;
|
use tokio::net::TcpListener;
|
||||||
use tower::ServiceExt;
|
use tower::ServiceExt;
|
||||||
@@ -724,6 +725,45 @@ mod tests {
|
|||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn readyz_reports_spacetime_health_stage() {
|
||||||
|
let state = AppState::new(AppConfig::default()).expect("state should build");
|
||||||
|
state.set_test_spacetime_health(SpacetimeClientHealthSnapshot {
|
||||||
|
ok: false,
|
||||||
|
stage: SpacetimeClientStage::ProcedureResult,
|
||||||
|
checked_at_micros: 1_713_680_000_000_000,
|
||||||
|
elapsed_ms: 2_000,
|
||||||
|
timeout_ms: 2_000,
|
||||||
|
error: Some("SpacetimeDB procedure 调用超时".to_string()),
|
||||||
|
last_success_at_micros: Some(1_713_679_999_000_000),
|
||||||
|
last_error: Some("SpacetimeDB procedure 调用超时".to_string()),
|
||||||
|
});
|
||||||
|
let app = build_router(state);
|
||||||
|
|
||||||
|
let response = app
|
||||||
|
.oneshot(
|
||||||
|
Request::builder()
|
||||||
|
.uri("/readyz")
|
||||||
|
.header("x-request-id", "req-ready-spacetime")
|
||||||
|
.body(Body::empty())
|
||||||
|
.expect("readyz request should build"),
|
||||||
|
)
|
||||||
|
.await
|
||||||
|
.expect("readyz request should succeed");
|
||||||
|
|
||||||
|
assert_eq!(response.status(), StatusCode::SERVICE_UNAVAILABLE);
|
||||||
|
let body = read_json_response(response).await;
|
||||||
|
assert_eq!(body["error"]["details"]["reason"], "spacetime_unhealthy");
|
||||||
|
assert_eq!(
|
||||||
|
body["error"]["details"]["spacetime"]["stage"],
|
||||||
|
"procedure_result"
|
||||||
|
);
|
||||||
|
assert_eq!(
|
||||||
|
body["error"]["details"]["spacetime"]["timeoutMs"],
|
||||||
|
Value::from(2_000)
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn creative_agent_draft_edit_rejects_unconfirmed_template_session() {
|
async fn creative_agent_draft_edit_rejects_unconfirmed_template_session() {
|
||||||
let app = build_internal_creative_agent_app();
|
let app = build_internal_creative_agent_app();
|
||||||
|
|||||||
@@ -12,6 +12,7 @@ use platform_speech::{
|
|||||||
|
|
||||||
const DEFAULT_INTERNAL_API_SECRET: &str = "genarrative-dev-internal-bridge";
|
const DEFAULT_INTERNAL_API_SECRET: &str = "genarrative-dev-internal-bridge";
|
||||||
const SPACETIME_LOCAL_CONFIG_FILE: &str = "spacetime.local.json";
|
const SPACETIME_LOCAL_CONFIG_FILE: &str = "spacetime.local.json";
|
||||||
|
const DEFAULT_SPACETIME_HEALTH_CHECK_TIMEOUT_SECONDS: u64 = 2;
|
||||||
pub(crate) const DEFAULT_VECTOR_ENGINE_IMAGE_REQUEST_TIMEOUT_MS: u64 = 1_000_000;
|
pub(crate) const DEFAULT_VECTOR_ENGINE_IMAGE_REQUEST_TIMEOUT_MS: u64 = 1_000_000;
|
||||||
|
|
||||||
// 集中管理 api-server 的启动配置,避免入口层直接散落环境变量解析逻辑。
|
// 集中管理 api-server 的启动配置,避免入口层直接散落环境变量解析逻辑。
|
||||||
@@ -118,6 +119,7 @@ pub struct AppConfig {
|
|||||||
pub spacetime_token: Option<String>,
|
pub spacetime_token: Option<String>,
|
||||||
pub spacetime_pool_size: u32,
|
pub spacetime_pool_size: u32,
|
||||||
pub spacetime_procedure_timeout: Duration,
|
pub spacetime_procedure_timeout: Duration,
|
||||||
|
pub spacetime_health_check_timeout: Duration,
|
||||||
pub llm_provider: LlmProvider,
|
pub llm_provider: LlmProvider,
|
||||||
pub llm_base_url: String,
|
pub llm_base_url: String,
|
||||||
pub llm_api_key: Option<String>,
|
pub llm_api_key: Option<String>,
|
||||||
@@ -276,6 +278,9 @@ impl Default for AppConfig {
|
|||||||
spacetime_token: None,
|
spacetime_token: None,
|
||||||
spacetime_pool_size: 4,
|
spacetime_pool_size: 4,
|
||||||
spacetime_procedure_timeout: Duration::from_secs(30),
|
spacetime_procedure_timeout: Duration::from_secs(30),
|
||||||
|
spacetime_health_check_timeout: Duration::from_secs(
|
||||||
|
DEFAULT_SPACETIME_HEALTH_CHECK_TIMEOUT_SECONDS,
|
||||||
|
),
|
||||||
llm_provider: LlmProvider::Ark,
|
llm_provider: LlmProvider::Ark,
|
||||||
llm_base_url: String::new(),
|
llm_base_url: String::new(),
|
||||||
llm_api_key: None,
|
llm_api_key: None,
|
||||||
@@ -704,6 +709,12 @@ impl AppConfig {
|
|||||||
config.spacetime_procedure_timeout =
|
config.spacetime_procedure_timeout =
|
||||||
Duration::from_secs(spacetime_procedure_timeout_seconds);
|
Duration::from_secs(spacetime_procedure_timeout_seconds);
|
||||||
}
|
}
|
||||||
|
if let Some(spacetime_health_check_timeout_seconds) =
|
||||||
|
read_first_duration_seconds_env(&["GENARRATIVE_SPACETIME_HEALTH_CHECK_TIMEOUT_SECONDS"])
|
||||||
|
{
|
||||||
|
config.spacetime_health_check_timeout =
|
||||||
|
Duration::from_secs(spacetime_health_check_timeout_seconds);
|
||||||
|
}
|
||||||
|
|
||||||
if let Some(llm_provider) =
|
if let Some(llm_provider) =
|
||||||
read_first_llm_provider_env(&["GENARRATIVE_LLM_PROVIDER", "LLM_PROVIDER"])
|
read_first_llm_provider_env(&["GENARRATIVE_LLM_PROVIDER", "LLM_PROVIDER"])
|
||||||
@@ -1610,6 +1621,26 @@ mod tests {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn from_env_reads_spacetime_health_check_timeout() {
|
||||||
|
let _guard = ENV_LOCK
|
||||||
|
.get_or_init(|| Mutex::new(()))
|
||||||
|
.lock()
|
||||||
|
.expect("env lock should not poison");
|
||||||
|
|
||||||
|
unsafe {
|
||||||
|
std::env::remove_var("GENARRATIVE_SPACETIME_HEALTH_CHECK_TIMEOUT_SECONDS");
|
||||||
|
std::env::set_var("GENARRATIVE_SPACETIME_HEALTH_CHECK_TIMEOUT_SECONDS", "3");
|
||||||
|
}
|
||||||
|
|
||||||
|
let config = AppConfig::from_env();
|
||||||
|
assert_eq!(config.spacetime_health_check_timeout.as_secs(), 3);
|
||||||
|
|
||||||
|
unsafe {
|
||||||
|
std::env::remove_var("GENARRATIVE_SPACETIME_HEALTH_CHECK_TIMEOUT_SECONDS");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn default_keeps_structured_llm_web_search_disabled() {
|
fn default_keeps_structured_llm_web_search_disabled() {
|
||||||
let config = AppConfig::default();
|
let config = AppConfig::default();
|
||||||
|
|||||||
@@ -10,6 +10,7 @@ use crate::{
|
|||||||
api_response::json_success_body, http_error::AppError, request_context::RequestContext,
|
api_response::json_success_body, http_error::AppError, request_context::RequestContext,
|
||||||
state::AppState,
|
state::AppState,
|
||||||
};
|
};
|
||||||
|
use spacetime_client::SpacetimeClientHealthSnapshot;
|
||||||
|
|
||||||
pub async fn health_check(Extension(request_context): Extension<RequestContext>) -> Json<Value> {
|
pub async fn health_check(Extension(request_context): Extension<RequestContext>) -> Json<Value> {
|
||||||
json_success_body(
|
json_success_body(
|
||||||
@@ -25,23 +26,49 @@ pub async fn readiness_check(
|
|||||||
State(state): State<AppState>,
|
State(state): State<AppState>,
|
||||||
Extension(request_context): Extension<RequestContext>,
|
Extension(request_context): Extension<RequestContext>,
|
||||||
) -> Response {
|
) -> Response {
|
||||||
if state.is_ready() {
|
if !state.is_ready() {
|
||||||
|
return AppError::from_status(StatusCode::SERVICE_UNAVAILABLE)
|
||||||
|
.with_message("api-server 正在退出,不再接收新流量")
|
||||||
|
.with_details(json!({
|
||||||
|
"reason": "api_server_draining",
|
||||||
|
"ready": false,
|
||||||
|
}))
|
||||||
|
.into_response_with_context(Some(&request_context));
|
||||||
|
}
|
||||||
|
|
||||||
|
let spacetime_health = state.spacetime_health_check().await;
|
||||||
|
if spacetime_health.ok {
|
||||||
return json_success_body(
|
return json_success_body(
|
||||||
Some(&request_context),
|
Some(&request_context),
|
||||||
json!({
|
json!({
|
||||||
"ok": true,
|
"ok": true,
|
||||||
"ready": true,
|
"ready": true,
|
||||||
"service": "genarrative-api-server",
|
"service": "genarrative-api-server",
|
||||||
|
"spacetime": spacetime_health_to_json(&spacetime_health),
|
||||||
}),
|
}),
|
||||||
)
|
)
|
||||||
.into_response();
|
.into_response();
|
||||||
}
|
}
|
||||||
|
|
||||||
AppError::from_status(StatusCode::SERVICE_UNAVAILABLE)
|
AppError::from_status(StatusCode::SERVICE_UNAVAILABLE)
|
||||||
.with_message("api-server 正在退出,不再接收新流量")
|
.with_message("SpacetimeDB 连接健康检查失败,api-server 暂不接收新流量")
|
||||||
.with_details(json!({
|
.with_details(json!({
|
||||||
"reason": "api_server_draining",
|
"reason": "spacetime_unhealthy",
|
||||||
"ready": false,
|
"ready": false,
|
||||||
|
"spacetime": spacetime_health_to_json(&spacetime_health),
|
||||||
}))
|
}))
|
||||||
.into_response_with_context(Some(&request_context))
|
.into_response_with_context(Some(&request_context))
|
||||||
}
|
}
|
||||||
|
|
||||||
|
fn spacetime_health_to_json(snapshot: &SpacetimeClientHealthSnapshot) -> Value {
|
||||||
|
json!({
|
||||||
|
"ok": snapshot.ok,
|
||||||
|
"stage": snapshot.stage.as_str(),
|
||||||
|
"checkedAtMicros": snapshot.checked_at_micros,
|
||||||
|
"elapsedMs": snapshot.elapsed_ms,
|
||||||
|
"timeoutMs": snapshot.timeout_ms,
|
||||||
|
"error": snapshot.error,
|
||||||
|
"lastSuccessAtMicros": snapshot.last_success_at_micros,
|
||||||
|
"lastError": snapshot.last_error,
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|||||||
@@ -31,7 +31,9 @@ use platform_wechat::{WechatClient, WechatConfig, pay::WechatPayClient};
|
|||||||
use serde_json::Value;
|
use serde_json::Value;
|
||||||
use shared_contracts::creation_entry_config::CreationEntryConfigResponse;
|
use shared_contracts::creation_entry_config::CreationEntryConfigResponse;
|
||||||
use shared_contracts::creative_agent::CreativeAgentSessionSnapshot;
|
use shared_contracts::creative_agent::CreativeAgentSessionSnapshot;
|
||||||
use spacetime_client::{SpacetimeClient, SpacetimeClientConfig, SpacetimeClientError};
|
use spacetime_client::{
|
||||||
|
SpacetimeClient, SpacetimeClientConfig, SpacetimeClientError, SpacetimeClientHealthSnapshot,
|
||||||
|
};
|
||||||
use time::OffsetDateTime;
|
use time::OffsetDateTime;
|
||||||
use tokio::sync::{Semaphore, broadcast};
|
use tokio::sync::{Semaphore, broadcast};
|
||||||
use tracing::{info, warn};
|
use tracing::{info, warn};
|
||||||
@@ -242,6 +244,8 @@ pub struct AppStateInner {
|
|||||||
refresh_cookie_config: RefreshCookieConfig,
|
refresh_cookie_config: RefreshCookieConfig,
|
||||||
#[cfg(test)]
|
#[cfg(test)]
|
||||||
test_creation_entry_config: Arc<Mutex<Option<CreationEntryConfigResponse>>>,
|
test_creation_entry_config: Arc<Mutex<Option<CreationEntryConfigResponse>>>,
|
||||||
|
#[cfg(test)]
|
||||||
|
test_spacetime_health: Arc<Mutex<Option<SpacetimeClientHealthSnapshot>>>,
|
||||||
oss_client: Option<OssClient>,
|
oss_client: Option<OssClient>,
|
||||||
#[cfg_attr(test, allow(dead_code))]
|
#[cfg_attr(test, allow(dead_code))]
|
||||||
auth_store: InMemoryAuthStore,
|
auth_store: InMemoryAuthStore,
|
||||||
@@ -418,6 +422,10 @@ impl AppState {
|
|||||||
test_creation_entry_config: Arc::new(Mutex::new(Some(
|
test_creation_entry_config: Arc::new(Mutex::new(Some(
|
||||||
crate::creation_entry_config::test_creation_entry_config_response(),
|
crate::creation_entry_config::test_creation_entry_config_response(),
|
||||||
))),
|
))),
|
||||||
|
#[cfg(test)]
|
||||||
|
test_spacetime_health: Arc::new(Mutex::new(Some(
|
||||||
|
SpacetimeClientHealthSnapshot::healthy_for_test(),
|
||||||
|
))),
|
||||||
oss_client,
|
oss_client,
|
||||||
auth_store,
|
auth_store,
|
||||||
password_entry_service,
|
password_entry_service,
|
||||||
@@ -467,6 +475,30 @@ impl AppState {
|
|||||||
self.ready.store(false, Ordering::Release);
|
self.ready.store(false, Ordering::Release);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
pub async fn spacetime_health_check(&self) -> SpacetimeClientHealthSnapshot {
|
||||||
|
#[cfg(test)]
|
||||||
|
if let Some(snapshot) = self
|
||||||
|
.test_spacetime_health
|
||||||
|
.lock()
|
||||||
|
.expect("test spacetime health should lock")
|
||||||
|
.clone()
|
||||||
|
{
|
||||||
|
return snapshot;
|
||||||
|
}
|
||||||
|
|
||||||
|
self.spacetime_client
|
||||||
|
.health_check(self.config.spacetime_health_check_timeout)
|
||||||
|
.await
|
||||||
|
}
|
||||||
|
|
||||||
|
#[cfg(test)]
|
||||||
|
pub(crate) fn set_test_spacetime_health(&self, snapshot: SpacetimeClientHealthSnapshot) {
|
||||||
|
*self
|
||||||
|
.test_spacetime_health
|
||||||
|
.lock()
|
||||||
|
.expect("test spacetime health should lock") = Some(snapshot);
|
||||||
|
}
|
||||||
|
|
||||||
pub async fn upsert_creation_entry_type_config(
|
pub async fn upsert_creation_entry_type_config(
|
||||||
&self,
|
&self,
|
||||||
input: module_runtime::CreationEntryTypeAdminUpsertInput,
|
input: module_runtime::CreationEntryTypeAdminUpsertInput,
|
||||||
|
|||||||
@@ -105,8 +105,8 @@ pub mod auth;
|
|||||||
pub mod bark_battle;
|
pub mod bark_battle;
|
||||||
pub use bark_battle::{
|
pub use bark_battle::{
|
||||||
BarkBattleDraftConfigUpsertRecordInput, BarkBattleDraftCreateRecordInput,
|
BarkBattleDraftConfigUpsertRecordInput, BarkBattleDraftCreateRecordInput,
|
||||||
BarkBattleRunFinishRecordInput, BarkBattleRunStartRecordInput,
|
BarkBattleRunFinishRecordInput, BarkBattleRunStartRecordInput, BarkBattleWorkDeleteRecordInput,
|
||||||
BarkBattleWorkDeleteRecordInput, BarkBattleWorkPublishRecordInput,
|
BarkBattleWorkPublishRecordInput,
|
||||||
};
|
};
|
||||||
pub mod big_fish;
|
pub mod big_fish;
|
||||||
pub mod combat;
|
pub mod combat;
|
||||||
@@ -132,7 +132,7 @@ use std::{
|
|||||||
sync::atomic::{AtomicBool, Ordering},
|
sync::atomic::{AtomicBool, Ordering},
|
||||||
sync::{Arc, Mutex},
|
sync::{Arc, Mutex},
|
||||||
thread::JoinHandle,
|
thread::JoinHandle,
|
||||||
time::Duration,
|
time::{Duration, Instant},
|
||||||
};
|
};
|
||||||
|
|
||||||
use module_ai::{
|
use module_ai::{
|
||||||
@@ -241,6 +241,7 @@ use tokio::{
|
|||||||
sync::{OwnedSemaphorePermit, RwLock, Semaphore, oneshot},
|
sync::{OwnedSemaphorePermit, RwLock, Semaphore, oneshot},
|
||||||
time::timeout,
|
time::timeout,
|
||||||
};
|
};
|
||||||
|
use tracing::warn;
|
||||||
|
|
||||||
use crate::module_bindings::*;
|
use crate::module_bindings::*;
|
||||||
|
|
||||||
@@ -253,6 +254,60 @@ pub struct SpacetimeClientConfig {
|
|||||||
pub procedure_timeout: Duration,
|
pub procedure_timeout: Duration,
|
||||||
}
|
}
|
||||||
|
|
||||||
|
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
|
||||||
|
pub enum SpacetimeClientStage {
|
||||||
|
Ready,
|
||||||
|
PoolAcquire,
|
||||||
|
ConnectBuild,
|
||||||
|
ConnectHandshake,
|
||||||
|
ReadModelSubscribe,
|
||||||
|
ProcedureResult,
|
||||||
|
ReducerResult,
|
||||||
|
ReadCache,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl SpacetimeClientStage {
|
||||||
|
pub fn as_str(self) -> &'static str {
|
||||||
|
match self {
|
||||||
|
Self::Ready => "ready",
|
||||||
|
Self::PoolAcquire => "pool_acquire",
|
||||||
|
Self::ConnectBuild => "connect_build",
|
||||||
|
Self::ConnectHandshake => "connect_handshake",
|
||||||
|
Self::ReadModelSubscribe => "read_model_subscribe",
|
||||||
|
Self::ProcedureResult => "procedure_result",
|
||||||
|
Self::ReducerResult => "reducer_result",
|
||||||
|
Self::ReadCache => "read_cache",
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Clone, Debug, PartialEq, Eq)]
|
||||||
|
pub struct SpacetimeClientHealthSnapshot {
|
||||||
|
pub ok: bool,
|
||||||
|
pub stage: SpacetimeClientStage,
|
||||||
|
pub checked_at_micros: i64,
|
||||||
|
pub elapsed_ms: u64,
|
||||||
|
pub timeout_ms: u64,
|
||||||
|
pub error: Option<String>,
|
||||||
|
pub last_success_at_micros: Option<i64>,
|
||||||
|
pub last_error: Option<String>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl SpacetimeClientHealthSnapshot {
|
||||||
|
pub fn healthy_for_test() -> Self {
|
||||||
|
Self {
|
||||||
|
ok: true,
|
||||||
|
stage: SpacetimeClientStage::Ready,
|
||||||
|
checked_at_micros: current_unix_micros(),
|
||||||
|
elapsed_ms: 0,
|
||||||
|
timeout_ms: 0,
|
||||||
|
error: None,
|
||||||
|
last_success_at_micros: Some(current_unix_micros()),
|
||||||
|
last_error: None,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
#[derive(Clone, Debug, PartialEq, Eq)]
|
#[derive(Clone, Debug, PartialEq, Eq)]
|
||||||
pub struct AuthStoreSnapshotRecord {
|
pub struct AuthStoreSnapshotRecord {
|
||||||
pub snapshot_json: Option<String>,
|
pub snapshot_json: Option<String>,
|
||||||
@@ -270,6 +325,7 @@ pub struct AuthStoreSnapshotImportRecord {
|
|||||||
pub struct SpacetimeClient {
|
pub struct SpacetimeClient {
|
||||||
config: SpacetimeClientConfig,
|
config: SpacetimeClientConfig,
|
||||||
pool: Arc<SpacetimeConnectionPool>,
|
pool: Arc<SpacetimeConnectionPool>,
|
||||||
|
health_state: Arc<RwLock<SpacetimeClientHealthState>>,
|
||||||
creation_entry_config_cache: Arc<RwLock<Option<CreationEntryConfigRecord>>>,
|
creation_entry_config_cache: Arc<RwLock<Option<CreationEntryConfigRecord>>>,
|
||||||
custom_world_gallery_legacy_sync_attempted: Arc<AtomicBool>,
|
custom_world_gallery_legacy_sync_attempted: Arc<AtomicBool>,
|
||||||
}
|
}
|
||||||
@@ -296,6 +352,24 @@ struct SpacetimeConnectionPool {
|
|||||||
permits: Arc<Semaphore>,
|
permits: Arc<Semaphore>,
|
||||||
}
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Default)]
|
||||||
|
struct SpacetimeClientHealthState {
|
||||||
|
last_success_at_micros: Option<i64>,
|
||||||
|
last_error: Option<String>,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug)]
|
||||||
|
struct SpacetimeStageError {
|
||||||
|
stage: SpacetimeClientStage,
|
||||||
|
error: SpacetimeClientError,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl SpacetimeStageError {
|
||||||
|
fn new(stage: SpacetimeClientStage, error: SpacetimeClientError) -> Self {
|
||||||
|
Self { stage, error }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
struct PooledConnectionSlot {
|
struct PooledConnectionSlot {
|
||||||
connection: Option<PooledConnection>,
|
connection: Option<PooledConnection>,
|
||||||
in_use: bool,
|
in_use: bool,
|
||||||
@@ -341,6 +415,7 @@ impl SpacetimeClient {
|
|||||||
Self {
|
Self {
|
||||||
config,
|
config,
|
||||||
pool,
|
pool,
|
||||||
|
health_state: Arc::new(RwLock::new(SpacetimeClientHealthState::default())),
|
||||||
creation_entry_config_cache: Arc::new(RwLock::new(None)),
|
creation_entry_config_cache: Arc::new(RwLock::new(None)),
|
||||||
custom_world_gallery_legacy_sync_attempted: Arc::new(AtomicBool::new(false)),
|
custom_world_gallery_legacy_sync_attempted: Arc::new(AtomicBool::new(false)),
|
||||||
}
|
}
|
||||||
@@ -354,29 +429,58 @@ impl SpacetimeClient {
|
|||||||
where
|
where
|
||||||
T: Send + 'static,
|
T: Send + 'static,
|
||||||
{
|
{
|
||||||
|
let started_at = Instant::now();
|
||||||
let metrics_guard = telemetry::begin_procedure(procedure);
|
let metrics_guard = telemetry::begin_procedure(procedure);
|
||||||
let (sender, receiver) = oneshot::channel();
|
let (sender, receiver) = oneshot::channel();
|
||||||
let result_sender = Arc::new(Mutex::new(Some(sender)));
|
let result_sender = Arc::new(Mutex::new(Some(sender)));
|
||||||
let final_result = match self.acquire_connection().await {
|
let final_result = match self
|
||||||
|
.acquire_connection_with_timeout(self.config.procedure_timeout)
|
||||||
|
.await
|
||||||
|
{
|
||||||
Ok(lease) => {
|
Ok(lease) => {
|
||||||
let result = if let Some(connection) = lease.connection.as_ref() {
|
let (result, failed_stage) = if let Some(connection) = lease.connection.as_ref() {
|
||||||
call(&connection.connection, result_sender.clone());
|
call(&connection.connection, result_sender.clone());
|
||||||
match timeout(self.config.procedure_timeout, receiver).await {
|
let stage = SpacetimeClientStage::ProcedureResult;
|
||||||
Ok(inner) => match inner {
|
(
|
||||||
Ok(value) => value,
|
match timeout(self.config.procedure_timeout, receiver).await {
|
||||||
Err(_) => Err(SpacetimeClientError::ConnectDropped),
|
Ok(inner) => match inner {
|
||||||
|
Ok(value) => value,
|
||||||
|
Err(_) => Err(SpacetimeClientError::ConnectDropped),
|
||||||
|
},
|
||||||
|
Err(_) => Err(Self::resolve_timeout_error(Some(connection), stage)),
|
||||||
},
|
},
|
||||||
Err(_) => Err(Self::resolve_timeout_error(Some(connection))),
|
stage,
|
||||||
}
|
)
|
||||||
} else {
|
} else {
|
||||||
Err(SpacetimeClientError::Runtime(
|
(
|
||||||
"SpacetimeDB 连接租约缺少连接".to_string(),
|
Err(SpacetimeClientError::Runtime(
|
||||||
))
|
"SpacetimeDB 连接租约缺少连接".to_string(),
|
||||||
|
)),
|
||||||
|
SpacetimeClientStage::ProcedureResult,
|
||||||
|
)
|
||||||
};
|
};
|
||||||
self.release_connection(lease).await;
|
self.release_connection(lease).await;
|
||||||
|
if let Err(error) = &result {
|
||||||
|
log_spacetime_client_failure(
|
||||||
|
"procedure",
|
||||||
|
procedure,
|
||||||
|
failed_stage,
|
||||||
|
started_at,
|
||||||
|
error,
|
||||||
|
);
|
||||||
|
}
|
||||||
result
|
result
|
||||||
}
|
}
|
||||||
Err(error) => Err(error),
|
Err(error) => {
|
||||||
|
log_spacetime_client_failure(
|
||||||
|
"procedure",
|
||||||
|
procedure,
|
||||||
|
error.stage,
|
||||||
|
started_at,
|
||||||
|
&error.error,
|
||||||
|
);
|
||||||
|
Err(error.error)
|
||||||
|
}
|
||||||
};
|
};
|
||||||
|
|
||||||
metrics_guard.finish(&final_result);
|
metrics_guard.finish(&final_result);
|
||||||
@@ -388,29 +492,58 @@ impl SpacetimeClient {
|
|||||||
procedure: &'static str,
|
procedure: &'static str,
|
||||||
call: impl FnOnce(&DbConnection, ReducerResultSender) + Send + 'static,
|
call: impl FnOnce(&DbConnection, ReducerResultSender) + Send + 'static,
|
||||||
) -> Result<(), SpacetimeClientError> {
|
) -> Result<(), SpacetimeClientError> {
|
||||||
|
let started_at = Instant::now();
|
||||||
let metrics_guard = telemetry::begin_procedure(procedure);
|
let metrics_guard = telemetry::begin_procedure(procedure);
|
||||||
let (sender, receiver) = oneshot::channel();
|
let (sender, receiver) = oneshot::channel();
|
||||||
let result_sender = Arc::new(Mutex::new(Some(sender)));
|
let result_sender = Arc::new(Mutex::new(Some(sender)));
|
||||||
let final_result = match self.acquire_connection().await {
|
let final_result = match self
|
||||||
|
.acquire_connection_with_timeout(self.config.procedure_timeout)
|
||||||
|
.await
|
||||||
|
{
|
||||||
Ok(lease) => {
|
Ok(lease) => {
|
||||||
let result = if let Some(connection) = lease.connection.as_ref() {
|
let (result, failed_stage) = if let Some(connection) = lease.connection.as_ref() {
|
||||||
call(&connection.connection, result_sender.clone());
|
call(&connection.connection, result_sender.clone());
|
||||||
match timeout(self.config.procedure_timeout, receiver).await {
|
let stage = SpacetimeClientStage::ReducerResult;
|
||||||
Ok(inner) => match inner {
|
(
|
||||||
Ok(value) => value,
|
match timeout(self.config.procedure_timeout, receiver).await {
|
||||||
Err(_) => Err(SpacetimeClientError::ConnectDropped),
|
Ok(inner) => match inner {
|
||||||
|
Ok(value) => value,
|
||||||
|
Err(_) => Err(SpacetimeClientError::ConnectDropped),
|
||||||
|
},
|
||||||
|
Err(_) => Err(Self::resolve_timeout_error(Some(connection), stage)),
|
||||||
},
|
},
|
||||||
Err(_) => Err(Self::resolve_timeout_error(Some(connection))),
|
stage,
|
||||||
}
|
)
|
||||||
} else {
|
} else {
|
||||||
Err(SpacetimeClientError::Runtime(
|
(
|
||||||
"SpacetimeDB 连接租约缺少连接".to_string(),
|
Err(SpacetimeClientError::Runtime(
|
||||||
))
|
"SpacetimeDB 连接租约缺少连接".to_string(),
|
||||||
|
)),
|
||||||
|
SpacetimeClientStage::ReducerResult,
|
||||||
|
)
|
||||||
};
|
};
|
||||||
self.release_connection(lease).await;
|
self.release_connection(lease).await;
|
||||||
|
if let Err(error) = &result {
|
||||||
|
log_spacetime_client_failure(
|
||||||
|
"reducer",
|
||||||
|
procedure,
|
||||||
|
failed_stage,
|
||||||
|
started_at,
|
||||||
|
error,
|
||||||
|
);
|
||||||
|
}
|
||||||
result
|
result
|
||||||
}
|
}
|
||||||
Err(error) => Err(error),
|
Err(error) => {
|
||||||
|
log_spacetime_client_failure(
|
||||||
|
"reducer",
|
||||||
|
procedure,
|
||||||
|
error.stage,
|
||||||
|
started_at,
|
||||||
|
&error.error,
|
||||||
|
);
|
||||||
|
Err(error.error)
|
||||||
|
}
|
||||||
};
|
};
|
||||||
|
|
||||||
metrics_guard.finish(&final_result);
|
metrics_guard.finish(&final_result);
|
||||||
@@ -425,11 +558,22 @@ impl SpacetimeClient {
|
|||||||
where
|
where
|
||||||
T: Send + 'static,
|
T: Send + 'static,
|
||||||
{
|
{
|
||||||
|
let started_at = Instant::now();
|
||||||
let metrics_guard = telemetry::begin_read(read_name);
|
let metrics_guard = telemetry::begin_read(read_name);
|
||||||
let lease = match self.acquire_connection().await {
|
let lease = match self
|
||||||
|
.acquire_connection_with_timeout(self.config.procedure_timeout)
|
||||||
|
.await
|
||||||
|
{
|
||||||
Ok(lease) => lease,
|
Ok(lease) => lease,
|
||||||
Err(error) => {
|
Err(error) => {
|
||||||
let final_result = Err(error);
|
log_spacetime_client_failure(
|
||||||
|
"read",
|
||||||
|
read_name,
|
||||||
|
error.stage,
|
||||||
|
started_at,
|
||||||
|
&error.error,
|
||||||
|
);
|
||||||
|
let final_result = Err(error.error);
|
||||||
metrics_guard.finish(&final_result);
|
metrics_guard.finish(&final_result);
|
||||||
return final_result;
|
return final_result;
|
||||||
}
|
}
|
||||||
@@ -443,6 +587,15 @@ impl SpacetimeClient {
|
|||||||
};
|
};
|
||||||
self.release_connection(lease).await;
|
self.release_connection(lease).await;
|
||||||
|
|
||||||
|
if let Err(error) = &final_result {
|
||||||
|
log_spacetime_client_failure(
|
||||||
|
"read",
|
||||||
|
read_name,
|
||||||
|
SpacetimeClientStage::ReadCache,
|
||||||
|
started_at,
|
||||||
|
error,
|
||||||
|
);
|
||||||
|
}
|
||||||
metrics_guard.finish(&final_result);
|
metrics_guard.finish(&final_result);
|
||||||
final_result
|
final_result
|
||||||
}
|
}
|
||||||
@@ -455,14 +608,75 @@ impl SpacetimeClient {
|
|||||||
self.creation_entry_config_cache.read().await.clone()
|
self.creation_entry_config_cache.read().await.clone()
|
||||||
}
|
}
|
||||||
|
|
||||||
async fn acquire_connection(&self) -> Result<PooledConnectionLease, SpacetimeClientError> {
|
pub async fn health_check(&self, probe_timeout: Duration) -> SpacetimeClientHealthSnapshot {
|
||||||
let permit = timeout(
|
let timeout = if probe_timeout.is_zero() {
|
||||||
self.config.procedure_timeout,
|
DEFAULT_PROCEDURE_TIMEOUT
|
||||||
self.pool.permits.clone().acquire_owned(),
|
} else {
|
||||||
)
|
probe_timeout
|
||||||
.await
|
};
|
||||||
.map_err(|_| SpacetimeClientError::Timeout)?
|
let started_at = Instant::now();
|
||||||
.map_err(|error| SpacetimeClientError::Runtime(error.to_string()))?;
|
let checked_at_micros = current_unix_micros();
|
||||||
|
let result = self.acquire_connection_with_timeout(timeout).await;
|
||||||
|
match result {
|
||||||
|
Ok(lease) => {
|
||||||
|
self.release_connection(lease).await;
|
||||||
|
let mut health_state = self.health_state.write().await;
|
||||||
|
health_state.last_success_at_micros = Some(checked_at_micros);
|
||||||
|
health_state.last_error = None;
|
||||||
|
SpacetimeClientHealthSnapshot {
|
||||||
|
ok: true,
|
||||||
|
stage: SpacetimeClientStage::Ready,
|
||||||
|
checked_at_micros,
|
||||||
|
elapsed_ms: duration_millis_u64(started_at.elapsed()),
|
||||||
|
timeout_ms: duration_millis_u64(timeout),
|
||||||
|
error: None,
|
||||||
|
last_success_at_micros: health_state.last_success_at_micros,
|
||||||
|
last_error: health_state.last_error.clone(),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
Err(error) => {
|
||||||
|
log_spacetime_client_failure(
|
||||||
|
"health_check",
|
||||||
|
"spacetime_connection",
|
||||||
|
error.stage,
|
||||||
|
started_at,
|
||||||
|
&error.error,
|
||||||
|
);
|
||||||
|
let mut health_state = self.health_state.write().await;
|
||||||
|
let error_message = error.error.to_string();
|
||||||
|
health_state.last_error = Some(error_message.clone());
|
||||||
|
SpacetimeClientHealthSnapshot {
|
||||||
|
ok: false,
|
||||||
|
stage: error.stage,
|
||||||
|
checked_at_micros,
|
||||||
|
elapsed_ms: duration_millis_u64(started_at.elapsed()),
|
||||||
|
timeout_ms: duration_millis_u64(timeout),
|
||||||
|
error: Some(error_message),
|
||||||
|
last_success_at_micros: health_state.last_success_at_micros,
|
||||||
|
last_error: health_state.last_error.clone(),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn acquire_connection_with_timeout(
|
||||||
|
&self,
|
||||||
|
operation_timeout: Duration,
|
||||||
|
) -> Result<PooledConnectionLease, SpacetimeStageError> {
|
||||||
|
let permit = timeout(operation_timeout, self.pool.permits.clone().acquire_owned())
|
||||||
|
.await
|
||||||
|
.map_err(|_| {
|
||||||
|
SpacetimeStageError::new(
|
||||||
|
SpacetimeClientStage::PoolAcquire,
|
||||||
|
SpacetimeClientError::Timeout,
|
||||||
|
)
|
||||||
|
})?
|
||||||
|
.map_err(|error| {
|
||||||
|
SpacetimeStageError::new(
|
||||||
|
SpacetimeClientStage::PoolAcquire,
|
||||||
|
SpacetimeClientError::Runtime(error.to_string()),
|
||||||
|
)
|
||||||
|
})?;
|
||||||
|
|
||||||
loop {
|
loop {
|
||||||
for (slot_index, slot) in self.pool.slots.iter().enumerate() {
|
for (slot_index, slot) in self.pool.slots.iter().enumerate() {
|
||||||
@@ -480,7 +694,7 @@ impl SpacetimeClient {
|
|||||||
let connection = if let Some(connection) = reusable_connection {
|
let connection = if let Some(connection) = reusable_connection {
|
||||||
connection
|
connection
|
||||||
} else {
|
} else {
|
||||||
match self.build_pooled_connection().await {
|
match self.build_pooled_connection(operation_timeout).await {
|
||||||
Ok(connection) => connection,
|
Ok(connection) => connection,
|
||||||
Err(error) => {
|
Err(error) => {
|
||||||
let mut slot_guard = self.pool.slots[slot_index].lock().await;
|
let mut slot_guard = self.pool.slots[slot_index].lock().await;
|
||||||
@@ -502,7 +716,10 @@ impl SpacetimeClient {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
async fn build_pooled_connection(&self) -> Result<PooledConnection, SpacetimeClientError> {
|
async fn build_pooled_connection(
|
||||||
|
&self,
|
||||||
|
operation_timeout: Duration,
|
||||||
|
) -> Result<PooledConnection, SpacetimeStageError> {
|
||||||
let config = self.config.clone();
|
let config = self.config.clone();
|
||||||
let broken = Arc::new(AtomicBool::new(false));
|
let broken = Arc::new(AtomicBool::new(false));
|
||||||
let (sender, receiver) = oneshot::channel::<Result<(), SpacetimeClientError>>();
|
let (sender, receiver) = oneshot::channel::<Result<(), SpacetimeClientError>>();
|
||||||
@@ -510,7 +727,7 @@ impl SpacetimeClient {
|
|||||||
let broken_flag = broken.clone();
|
let broken_flag = broken.clone();
|
||||||
let disconnect_sender = connect_sender.clone();
|
let disconnect_sender = connect_sender.clone();
|
||||||
let connection = timeout(
|
let connection = timeout(
|
||||||
self.config.procedure_timeout,
|
operation_timeout,
|
||||||
tokio::task::spawn_blocking(move || {
|
tokio::task::spawn_blocking(move || {
|
||||||
DbConnection::builder()
|
DbConnection::builder()
|
||||||
.with_uri(config.server_url)
|
.with_uri(config.server_url)
|
||||||
@@ -534,17 +751,41 @@ impl SpacetimeClient {
|
|||||||
}),
|
}),
|
||||||
)
|
)
|
||||||
.await
|
.await
|
||||||
.map_err(|_| SpacetimeClientError::Timeout)?
|
.map_err(|_| {
|
||||||
.map_err(|error| SpacetimeClientError::Runtime(error.to_string()))??;
|
SpacetimeStageError::new(
|
||||||
|
SpacetimeClientStage::ConnectBuild,
|
||||||
|
SpacetimeClientError::Timeout,
|
||||||
|
)
|
||||||
|
})?
|
||||||
|
.map_err(|error| {
|
||||||
|
SpacetimeStageError::new(
|
||||||
|
SpacetimeClientStage::ConnectBuild,
|
||||||
|
SpacetimeClientError::Runtime(error.to_string()),
|
||||||
|
)
|
||||||
|
})?
|
||||||
|
.map_err(|error| SpacetimeStageError::new(SpacetimeClientStage::ConnectBuild, error))?;
|
||||||
|
|
||||||
let runner = connection.run_threaded();
|
let runner = connection.run_threaded();
|
||||||
timeout(self.config.procedure_timeout, receiver)
|
timeout(operation_timeout, receiver)
|
||||||
.await
|
.await
|
||||||
.map_err(|_| SpacetimeClientError::Timeout)?
|
.map_err(|_| {
|
||||||
.map_err(|_| SpacetimeClientError::ConnectDropped)??;
|
SpacetimeStageError::new(
|
||||||
|
SpacetimeClientStage::ConnectHandshake,
|
||||||
|
SpacetimeClientError::Timeout,
|
||||||
|
)
|
||||||
|
})?
|
||||||
|
.map_err(|_| {
|
||||||
|
SpacetimeStageError::new(
|
||||||
|
SpacetimeClientStage::ConnectHandshake,
|
||||||
|
SpacetimeClientError::ConnectDropped,
|
||||||
|
)
|
||||||
|
})?
|
||||||
|
.map_err(|error| {
|
||||||
|
SpacetimeStageError::new(SpacetimeClientStage::ConnectHandshake, error)
|
||||||
|
})?;
|
||||||
|
|
||||||
let read_model_subscriptions = self
|
let read_model_subscriptions = self
|
||||||
.subscribe_cached_read_models(&connection, broken.clone())
|
.subscribe_cached_read_models(&connection, broken.clone(), operation_timeout)
|
||||||
.await?;
|
.await?;
|
||||||
|
|
||||||
Ok(PooledConnection {
|
Ok(PooledConnection {
|
||||||
@@ -559,7 +800,8 @@ impl SpacetimeClient {
|
|||||||
&self,
|
&self,
|
||||||
connection: &DbConnection,
|
connection: &DbConnection,
|
||||||
broken: Arc<AtomicBool>,
|
broken: Arc<AtomicBool>,
|
||||||
) -> Result<Vec<SubscriptionHandle>, SpacetimeClientError> {
|
operation_timeout: Duration,
|
||||||
|
) -> Result<Vec<SubscriptionHandle>, SpacetimeStageError> {
|
||||||
let mut subscriptions = Vec::new();
|
let mut subscriptions = Vec::new();
|
||||||
for query in [
|
for query in [
|
||||||
"SELECT * FROM public_work_gallery_entry",
|
"SELECT * FROM public_work_gallery_entry",
|
||||||
@@ -576,7 +818,13 @@ impl SpacetimeClient {
|
|||||||
"SELECT * FROM big_fish_gallery_view",
|
"SELECT * FROM big_fish_gallery_view",
|
||||||
] {
|
] {
|
||||||
let subscription = self
|
let subscription = self
|
||||||
.subscribe_cached_read_model_query(connection, broken.clone(), query, true)
|
.subscribe_cached_read_model_query(
|
||||||
|
connection,
|
||||||
|
broken.clone(),
|
||||||
|
query,
|
||||||
|
true,
|
||||||
|
operation_timeout,
|
||||||
|
)
|
||||||
.await?;
|
.await?;
|
||||||
subscriptions.push(subscription);
|
subscriptions.push(subscription);
|
||||||
}
|
}
|
||||||
@@ -597,7 +845,13 @@ impl SpacetimeClient {
|
|||||||
"SELECT * FROM asset_object",
|
"SELECT * FROM asset_object",
|
||||||
] {
|
] {
|
||||||
if let Ok(subscription) = self
|
if let Ok(subscription) = self
|
||||||
.subscribe_cached_read_model_query(connection, broken.clone(), query, false)
|
.subscribe_cached_read_model_query(
|
||||||
|
connection,
|
||||||
|
broken.clone(),
|
||||||
|
query,
|
||||||
|
false,
|
||||||
|
operation_timeout,
|
||||||
|
)
|
||||||
.await
|
.await
|
||||||
{
|
{
|
||||||
subscriptions.push(subscription);
|
subscriptions.push(subscription);
|
||||||
@@ -613,7 +867,8 @@ impl SpacetimeClient {
|
|||||||
broken: Arc<AtomicBool>,
|
broken: Arc<AtomicBool>,
|
||||||
query: &'static str,
|
query: &'static str,
|
||||||
mark_broken_on_error: bool,
|
mark_broken_on_error: bool,
|
||||||
) -> Result<SubscriptionHandle, SpacetimeClientError> {
|
operation_timeout: Duration,
|
||||||
|
) -> Result<SubscriptionHandle, SpacetimeStageError> {
|
||||||
let (sender, receiver) = oneshot::channel::<Result<(), SpacetimeClientError>>();
|
let (sender, receiver) = oneshot::channel::<Result<(), SpacetimeClientError>>();
|
||||||
let applied_sender = Arc::new(Mutex::new(Some(sender)));
|
let applied_sender = Arc::new(Mutex::new(Some(sender)));
|
||||||
let on_applied_sender = applied_sender.clone();
|
let on_applied_sender = applied_sender.clone();
|
||||||
@@ -635,10 +890,23 @@ impl SpacetimeClient {
|
|||||||
})
|
})
|
||||||
.subscribe(query);
|
.subscribe(query);
|
||||||
|
|
||||||
timeout(self.config.procedure_timeout, receiver)
|
timeout(operation_timeout, receiver)
|
||||||
.await
|
.await
|
||||||
.map_err(|_| SpacetimeClientError::Timeout)?
|
.map_err(|_| {
|
||||||
.map_err(|_| SpacetimeClientError::ConnectDropped)??;
|
SpacetimeStageError::new(
|
||||||
|
SpacetimeClientStage::ReadModelSubscribe,
|
||||||
|
SpacetimeClientError::Timeout,
|
||||||
|
)
|
||||||
|
})?
|
||||||
|
.map_err(|_| {
|
||||||
|
SpacetimeStageError::new(
|
||||||
|
SpacetimeClientStage::ReadModelSubscribe,
|
||||||
|
SpacetimeClientError::ConnectDropped,
|
||||||
|
)
|
||||||
|
})?
|
||||||
|
.map_err(|error| {
|
||||||
|
SpacetimeStageError::new(SpacetimeClientStage::ReadModelSubscribe, error)
|
||||||
|
})?;
|
||||||
|
|
||||||
Ok(subscription)
|
Ok(subscription)
|
||||||
}
|
}
|
||||||
@@ -658,7 +926,10 @@ impl SpacetimeClient {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// 超时后必须统一归还租约;若连接已先一步断开则回传断线,否则标记坏连接并回传超时。
|
// 超时后必须统一归还租约;若连接已先一步断开则回传断线,否则标记坏连接并回传超时。
|
||||||
fn resolve_timeout_error(connection: Option<&PooledConnection>) -> SpacetimeClientError {
|
fn resolve_timeout_error(
|
||||||
|
connection: Option<&PooledConnection>,
|
||||||
|
_stage: SpacetimeClientStage,
|
||||||
|
) -> SpacetimeClientError {
|
||||||
if let Some(connection) = connection {
|
if let Some(connection) = connection {
|
||||||
if connection.is_broken() {
|
if connection.is_broken() {
|
||||||
return SpacetimeClientError::ConnectDropped;
|
return SpacetimeClientError::ConnectDropped;
|
||||||
@@ -681,6 +952,27 @@ fn current_public_work_day() -> i64 {
|
|||||||
current_unix_micros().div_euclid(PUBLIC_WORK_PLAY_DAY_MICROS)
|
current_unix_micros().div_euclid(PUBLIC_WORK_PLAY_DAY_MICROS)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
fn duration_millis_u64(duration: Duration) -> u64 {
|
||||||
|
duration.as_millis().min(u64::MAX as u128) as u64
|
||||||
|
}
|
||||||
|
|
||||||
|
fn log_spacetime_client_failure(
|
||||||
|
operation_kind: &'static str,
|
||||||
|
operation_name: &'static str,
|
||||||
|
stage: SpacetimeClientStage,
|
||||||
|
started_at: Instant,
|
||||||
|
error: &SpacetimeClientError,
|
||||||
|
) {
|
||||||
|
warn!(
|
||||||
|
operation_kind,
|
||||||
|
operation_name,
|
||||||
|
spacetime_stage = stage.as_str(),
|
||||||
|
elapsed_ms = duration_millis_u64(started_at.elapsed()),
|
||||||
|
error = %error,
|
||||||
|
"SpacetimeDB client operation failed"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
fn public_work_recent_play_counts(
|
fn public_work_recent_play_counts(
|
||||||
connection: &DbConnection,
|
connection: &DbConnection,
|
||||||
source_type: &str,
|
source_type: &str,
|
||||||
|
|||||||
Reference in New Issue
Block a user