From 0baad9e02267a51379fcb6fd83255e6a9546cf28 Mon Sep 17 00:00:00 2001 From: kdletters Date: Wed, 10 Jun 2026 10:42:33 +0800 Subject: [PATCH 1/9] =?UTF-8?q?=E8=AE=B0=E5=BD=95=20dev=20Gitea=20?= =?UTF-8?q?=E5=86=85=E7=BD=91=E8=AE=BF=E9=97=AE=E5=85=A5=E5=8F=A3?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 在运维文档补充 dev Gitea 内网 Git 地址和验证命令 在 Hermes 决策记录同步内网入口与访问限制 --- .hermes/shared-memory/decision-log.md | 8 ++++++++ docs/【开发运维】本地开发验证与生产运维-2026-05-15.md | 2 ++ 2 files changed, 10 insertions(+) diff --git a/.hermes/shared-memory/decision-log.md b/.hermes/shared-memory/decision-log.md index f8799da4..1f77557c 100644 --- a/.hermes/shared-memory/decision-log.md +++ b/.hermes/shared-memory/decision-log.md @@ -16,6 +16,14 @@ --- +## 2026-06-10 dev Gitea 提供内网 HTTP 入口 + +- 背景:release / dev 目标 agent 需要从 dev 自托管 Gitea 拉取仓库;继续走 `https://git.genarrative.world/...` 会绕公网链路,`10.2.0.10:3000` 又受云侧端口策略影响不能作为稳定入口。 +- 决策:dev 上 Gitea 进程保持 `HTTP_ADDR = 127.0.0.1`、`HTTP_PORT = 3000`,公网 `ROOT_URL = https://git.genarrative.world/` 不变;新增 Nginx 内网 vhost `/etc/nginx/conf.d/gitea-internal.conf`,只允许 `10.2.0.0/16` 与本机访问,并把 `http://10.2.0.10/` 反代到本机 Gitea。内网 agent 统一使用 `http://10.2.0.10/GenarrativeAI/Genarrative.git` 作为可直连 Git 源。 +- 影响范围:dev Gitea / Nginx 运维配置、Jenkins `SOURCE_GIT_REMOTE_URL`、release / dev 目标 agent checkout 口径。 +- 验证方式:从 release 执行 `git ls-remote http://10.2.0.10/GenarrativeAI/Genarrative.git HEAD` 应返回 HEAD;公网来源伪造 `Host: 10.2.0.10` 访问 dev 公网 80 应返回 `403`;`https://git.genarrative.world/` 原入口应保持 `200`。 +- 关联文档:`docs/【开发运维】本地开发验证与生产运维-2026-05-15.md`。 + ## 2026-06-08 微信能力按领域收口 - 背景:微信登录、订阅消息、普通微信支付和小程序虚拟支付能力曾分散在 `api-server` 根模块、`platform-auth` 与 `platform-wechat`,支付协议细节和业务 handler 边界不够清晰。 diff --git a/docs/【开发运维】本地开发验证与生产运维-2026-05-15.md b/docs/【开发运维】本地开发验证与生产运维-2026-05-15.md index c45c23ec..400ae33a 100644 --- a/docs/【开发运维】本地开发验证与生产运维-2026-05-15.md +++ b/docs/【开发运维】本地开发验证与生产运维-2026-05-15.md @@ -256,6 +256,8 @@ Jenkins 按 web / api / Spacetime module / build / deploy / publish 拆分 生产 Jenkins 的 `Pipeline script from SCM` 由 Jenkins controller 读取 Jenkinsfile。`Genarrative-Server-Provision` 是服务器初始化流水线,Job 配置里的 SCM URL 必须使用 controller 本机可访问的仓库路径或内网 Gitea 地址,不能使用 `https://git.genarrative.world/...`;否则日志一开始的 `Checking out git ... to read jenkins/Jenkinsfile.production-server-provision` 就会先从公网拉 Jenkinsfile。构建类流水线仍按各自 Jenkinsfile 的 checkout 口径执行;所有 `GitSCM checkout` 都必须保留单分支 refspec、`shallow=true`、`depth=1`、`noTags=true` 与 `honorRefspec=true`。API / Web / Stdb 发布类流水线不在目标机器 checkout Git,统一执行上游构建归档里的部署脚本,避免产物 commit 与部署脚本 commit 漂移。 +dev 服务器上的 Gitea 内网入口固定为 `http://10.2.0.10/GenarrativeAI/Genarrative.git`,用于 release / dev 等内网 agent 直接拉取仓库,避免绕公网 `git.genarrative.world`。该入口由 dev Nginx `/etc/nginx/conf.d/gitea-internal.conf` 暴露,只允许 `10.2.0.0/16` 和本机访问;Gitea 进程自身仍只监听 `127.0.0.1:3000`,公网域名 `https://git.genarrative.world/` 继续走原有 TLS 反代。验证时从 release 执行 `git ls-remote http://10.2.0.10/GenarrativeAI/Genarrative.git HEAD`,应能直接返回 HEAD。 + `scripts/jenkins-checkout-source.sh` 是生产 Jenkinsfile 内部二次确认源码的统一入口。构建流水线和服务器初始化流水线传入 `COMMIT_HASH` 时,脚本必须先保持 `depth=1` 浅拉,若上游 commit 已在浅历史内则直接校验并 checkout;只有浅历史无法证明 commit 属于目标分支时,才按 `GENARRATIVE_JENKINS_CHECKOUT_DEEPEN_STEPS`(默认 `50 200 1000 5000`)逐步加深,最后才尝试展开完整历史。`Genarrative-Api-Deploy`、`Genarrative-Web-Deploy` 和 `Genarrative-Stdb-Module-Publish` 仍保留上游构建传入的 `COMMIT_HASH` 作为通知和追溯字段,但不再用它在目标机器重新 checkout 部署脚本。 From 7aafb37f040d53bb0b040145a8a4a21869088190 Mon Sep 17 00:00:00 2001 From: kdletters <61648117+kdletters@users.noreply.github.com> Date: Wed, 10 Jun 2026 11:10:16 +0800 Subject: [PATCH 2/9] =?UTF-8?q?=E4=BF=AE=E5=A4=8D=20Server-Provision=20?= =?UTF-8?q?=E6=8C=89=E7=9B=AE=E6=A0=87=E7=8A=B6=E6=80=81=E5=87=86=E5=A4=87?= =?UTF-8?q?=E5=B7=A5=E5=85=B7=E5=8C=85?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 新增目标机已有 SpacetimeDB 与 otelcol-contrib 时复用本机安装的准备逻辑 补充 Prepare Provision Tools 传入 SPACETIME_ROOT,避免非默认路径检查错目录 新增 Server-Provision 工具准备回归检查脚本,防止已有工具时仍触发下载 更新开发运维文档与 Hermes 共享记忆,沉淀先检查目标机状态再准备文件的约定 --- .hermes/shared-memory/decision-log.md | 1 + .hermes/shared-memory/pitfalls.md | 8 ++ ...发运维】本地开发验证与生产运维-2026-05-15.md | 2 +- .../Jenkinsfile.production-server-provision | 1 + scripts/check-server-provision-tools.sh | 86 +++++++++++++++++++ scripts/prepare-server-provision-tools.sh | 82 +++++++++++++++++- 6 files changed, 176 insertions(+), 4 deletions(-) create mode 100755 scripts/check-server-provision-tools.sh diff --git a/.hermes/shared-memory/decision-log.md b/.hermes/shared-memory/decision-log.md index 1f77557c..11bd0e53 100644 --- a/.hermes/shared-memory/decision-log.md +++ b/.hermes/shared-memory/decision-log.md @@ -109,6 +109,7 @@ - 背景:`Genarrative-Server-Provision` 的 `DEPLOY_TARGET=development` 语义是部署到 dev 服务器,不是构建机 dry-run。旧流水线把 development 映射到 `linux && genarrative-build`,还先在 build 节点准备 `provision-tools/` 再 stash 给后续阶段,导致真实 dev 初始化可能跑到 Jenkins controller / build 节点;脚本还安装 clang / lld / pkg-config / OpenSSL headers / sccache 等构建链依赖,超出了服务器初始化职责。 - 决策:Server-Provision 只做服务器初始化,全程运行在目标部署 agent:development 使用 `linux && genarrative-dev-deploy`,release 使用 `linux && genarrative-release-deploy`。`Prepare Provision Tools` 与 `Provision Server` 在同一个目标 agent workspace 顺序执行,不再切到 `linux && genarrative-build`,不再 `stash/unstash` 工具包。`scripts/jenkins-server-provision.sh` 不再安装 clang / lld / pkg-config / libssl-dev / sccache;非 dry-run 仍要求目标 dev / release agent 具备 root 权限,因为 provision 会写 systemd、Nginx、`/etc` 和系统用户。Job 的 `Pipeline script from SCM` 与 Jenkinsfile 参数 `SOURCE_GIT_REMOTE_URL` 都必须使用本机路径或目标 agent 可访问的内网 Git 源,不允许公网 Git fallback。 +- 追加决策(2026-06-10):`Prepare Provision Tools` 必须先读取目标机现状,再准备需要的文件。目标机 `/usr/local/bin/otelcol-contrib` 版本匹配 `OTELCOL_VERSION` 时直接复用;`${SPACETIME_ROOT}/bin/current/spacetimedb-cli` 和 `spacetimedb-standalone` 存在且 CLI 版本匹配 `SPACETIME_EXPECTED_VERSION` 或 `SPACETIME_DOWNLOAD_ROOT` 中的版本时,直接复用当前安装生成 `provision-tools/`。只有目标机缺失、不可执行或版本不匹配时,才消费 `PROVISION_DOWNLOADS_DIR` 中的本地包或进入下载分支。 - 影响范围:`jenkins/Jenkinsfile.production-server-provision`、`scripts/jenkins-server-provision.sh`、生产运维文档、Server-Provision 排障口径。 - 验证方式:Jenkins 日志中 Server-Provision 的 `Prepare`、`Checkout Provision Files`、`Prepare Provision Tools` 和 `Provision Server` 都在目标 dev / release agent 上执行;日志不出现 `Running on Jenkins`、`linux && genarrative-build`、`stash 'server-provision-tools'`、`Git 主地址拉取失败...改用备用地址`、`https://git.genarrative.world/GenarrativeAI/Genarrative.git` 或构建依赖 / sccache 安装步骤;`bash -n scripts/jenkins-server-provision.sh` 和编码检查通过。 - 关联文档:`docs/【开发运维】本地开发验证与生产运维-2026-05-15.md`。 diff --git a/.hermes/shared-memory/pitfalls.md b/.hermes/shared-memory/pitfalls.md index 2972bbf6..1b904428 100644 --- a/.hermes/shared-memory/pitfalls.md +++ b/.hermes/shared-memory/pitfalls.md @@ -1314,6 +1314,14 @@ - 验证:Jenkins 日志中 `Provision Target` 下的 `Prepare`、`Checkout Provision Files`、`Prepare Provision Tools` 和 `Provision Server` 都应运行在目标 dev / release agent;日志不应出现 `stash 'server-provision-tools'`、目标阶段 `unstash`、`Git 主地址拉取失败...改用备用地址` 或 `https://git.genarrative.world/GenarrativeAI/Genarrative.git`。 - 关联:`jenkins/Jenkinsfile.production-server-provision`、`scripts/prepare-server-provision-tools.sh`、`docs/【开发运维】本地开发验证与生产运维-2026-05-15.md`。 +## Server-Provision 不要无条件下载工具包 + +- 现象:目标 dev / release 机器已经安装正确版本的 SpacetimeDB 或 `otelcol-contrib`,但 `Prepare Provision Tools` 仍每次下载 release tarball,网络慢或 GitHub 不稳时会把服务器初始化卡在准备阶段。 +- 原因:工具准备阶段如果只按“生成交付包”理解,会忽略它已经运行在目标部署 agent 上这一事实;此时目标机本地的 `/usr/local/bin/otelcol-contrib` 与 `${SPACETIME_ROOT}/bin/current` 就是可信状态源。 +- 处理:`scripts/prepare-server-provision-tools.sh` 必须先检查目标机状态:`otelcol-contrib --version` 命中 `OTELCOL_VERSION` 时复制现有二进制;`spacetimedb-cli --version` 命中 `SPACETIME_EXPECTED_VERSION` 或 `SPACETIME_DOWNLOAD_ROOT` 推导出的版本且 standalone 同时存在时,复制 `${SPACETIME_ROOT}/bin` 并生成 wrapper。只有缺失、不可执行或版本不匹配时,才查 `PROVISION_DOWNLOADS_DIR` 或下载源。 +- 验证:运行 `bash scripts/check-server-provision-tools.sh`;Jenkins 日志应先出现“检查目标机 ...”,已有版本命中时出现“复用目标机已有 ...”,且不出现“下载 ...”。 +- 关联:`scripts/prepare-server-provision-tools.sh`、`jenkins/Jenkinsfile.production-server-provision`、`docs/【开发运维】本地开发验证与生产运维-2026-05-15.md`。 + ## 个人任务 scope 不得扩成 work/site/module - 现象:个人任务配置为 `work` / `site` / `module` 后进度串桶或静默按 0 处理。 diff --git a/docs/【开发运维】本地开发验证与生产运维-2026-05-15.md b/docs/【开发运维】本地开发验证与生产运维-2026-05-15.md index 400ae33a..4797b4d4 100644 --- a/docs/【开发运维】本地开发验证与生产运维-2026-05-15.md +++ b/docs/【开发运维】本地开发验证与生产运维-2026-05-15.md @@ -277,7 +277,7 @@ dev 服务器上的 Gitea 内网入口固定为 `http://10.2.0.10/GenarrativeAI/ - `GENARRATIVE_API_MAX_CONCURRENT_REQUESTS=512` 开启应用内 HTTP 并发背压;`GENARRATIVE_API_GALLERY_MAX_CONCURRENT_REQUESTS=320`、`GENARRATIVE_API_DETAIL_MAX_CONCURRENT_REQUESTS=64`、`GENARRATIVE_API_ADMIN_MAX_CONCURRENT_REQUESTS=16` 分别限制公开列表、公开详情和后台 API 热路径。超过许可时直接返回 `429 Too Many Requests` 和 `Retry-After: 1`,`/healthz` 与 `/readyz` 不受该限制。这些值不是 RPS 限速;如果压测中 429 上升但内存和 p95 收敛,说明背压正在保护进程。直连 `api-server` 的极高 RPS 压测若出现 `connection refused`,通常已经打到 TCP 监听 / accept 层,应同时检查 backlog、Nginx upstream keepalive 和前置限流。 - `api-server` 正常运行时 `/healthz` 返回进程存活状态,`/readyz` 返回是否仍接收新流量;收到 `SIGINT` / `SIGTERM` 后会先把 readiness 标记为不可用,再让 Axum 停止接新连接并等待已有 HTTP 请求排空。systemd 仍以 `KillSignal=SIGINT` 停服务,`TimeoutStopSec=90` 作为长请求排空上限。 - `genarrative-api.service` 设置 `LimitNOFILE=65535`、`TasksMax=2048`;上线后用 `systemctl show genarrative-api.service -p LimitNOFILE -p TasksMax -p TimeoutStopUSec` 和 `cat /proc/$(pidof api-server)/limits` 核对。 -- Server provision 不再通过 Windows helper 下载,也不再通过 Linux build 节点中转工具包。`Prepare Provision Tools` 在目标 dev / release agent 工作区内准备 SpacetimeDB `2.4.1` 的 `spacetime-x86_64-unknown-linux-gnu.tar.gz` 和 `otelcol-contrib_0.151.0_linux_amd64.tar.gz` 并生成 `provision-tools/`;如果目标服务器下载需要代理,在 `PROVISION_DOWNLOAD_PROXY` 配置目标机可访问的 HTTP 代理。 +- Server provision 不再通过 Windows helper 下载,也不再通过 Linux build 节点中转工具包。`Prepare Provision Tools` 在目标 dev / release agent 工作区内先检查 `/usr/local/bin/otelcol-contrib` 与 `${SPACETIME_ROOT}/bin/current`:版本已满足时直接复用目标机现有文件生成 `provision-tools/`,只有缺失或版本不匹配时才使用 `PROVISION_DOWNLOADS_DIR` 里的本地包或从配置的下载源准备 SpacetimeDB `2.4.1` / `otelcol-contrib 0.151.0`;如果目标服务器下载需要代理,在 `PROVISION_DOWNLOAD_PROXY` 配置目标机可访问的 HTTP 代理。 - 除 `Genarrative-Server-Provision` 外,`Genarrative-Stdb-Module-Build`、`Genarrative-Web-Build`、`Genarrative-Api-Build`、`Genarrative-*Deploy`、`Genarrative-Database-Import/Export`、`Genarrative-Full-Build-And-Deploy` 和 `Genarrative-Notify-Email` 的生产流水线现都以 Linux agent 为主,仍按各自 Jenkinsfile 的 checkout 口径执行。Server provision 不使用公网备用 Git 源。 - `otelcol-contrib.service` 作为可选系统服务加入 provision,默认监听 `127.0.0.1:4317/4318` 并使用 `deploy/otelcol/genarrative-debug.yaml`。api-server 是否发送 OTLP 仍由 `GENARRATIVE_OTEL_ENABLED` 控制,服务 unit 见 `deploy/systemd/otelcol-contrib.service`。 - Nginx `/api/` 与 `/admin/api/` 通过 `genarrative_api` upstream 代理到 `127.0.0.1:8082`,upstream keepalive 为 64;`limit_conn` 负责连接 / 并发保护,`limit_req` 负责入口 RPS 快拒绝。当前模板把公开 gallery list 单独放到 `genarrative_gallery_rps`,默认 `rate=5000r/s`、`burst=4096`、`limit_conn=320`;公开详情和普通 API 放到 `genarrative_api_rps`,后台 API 放到 `genarrative_admin_rps`。通用 `/api` location 设置 `client_max_body_size 64m` 是反代兜底,防止拼图入口页 / 新增关卡本地参考图 Data URL 或旧兼容请求在到达 `api-server` 前被默认 1 MiB 上限拦截;拼图本地参考图前后端统一限制 6MB,历史图片仍提交 `referenceImageAssetObjectId(s)`。若线上出现 `413 Request Entity Too Large` 且 access log 中 `request_time=0.000`、`upstream_status=-`,说明请求在 Nginx 层被拦截,先用 `nginx -T | grep client_max_body_size` 检查 release 模板是否已渲染并 reload,同时检查前端是否超出 6MB 或错误提交了未压缩大图。`limit_conn_status 429` 和 `limit_req_status 429` 必须在 HTTP 与 HTTPS server 中同时生效;若线上压测看到 `limiting connections by zone "genarrative_api_conn"` 却返回 503,优先检查 `nginx -T` 里 HTTPS server 是否缺少这些状态码,以及 `/api/runtime/puzzle/gallery` 是否误落到通用 `location ~ ^/api` 的 `limit_conn=64`。压测时看 `/var/log/nginx/genarrative.access.log` 中的 `request_time`、`upstream_connect_time`、`upstream_header_time`、`upstream_response_time`、`upstream_status`、`request_id`。 diff --git a/jenkins/Jenkinsfile.production-server-provision b/jenkins/Jenkinsfile.production-server-provision index 9e6da12d..eb5ceb77 100644 --- a/jenkins/Jenkinsfile.production-server-provision +++ b/jenkins/Jenkinsfile.production-server-provision @@ -164,6 +164,7 @@ BASH PROVISION_DOWNLOAD_PROXY="${PROVISION_DOWNLOAD_PROXY:-}" \ SPACETIME_DOWNLOAD_ROOT="${SPACETIME_DOWNLOAD_ROOT:-https://github.com/clockworklabs/SpacetimeDB/releases/download/v2.4.1}" \ SPACETIME_TARGET_HOST="${SPACETIME_TARGET_HOST:-x86_64-unknown-linux-gnu}" \ + SPACETIME_ROOT="${SPACETIME_ROOT:-/stdb}" \ scripts/prepare-server-provision-tools.sh ' ''' diff --git a/scripts/check-server-provision-tools.sh b/scripts/check-server-provision-tools.sh new file mode 100755 index 00000000..0401529b --- /dev/null +++ b/scripts/check-server-provision-tools.sh @@ -0,0 +1,86 @@ +#!/usr/bin/env bash +set -euo pipefail + +REPO_ROOT="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")/.." && pwd)" +TMP_ROOT="$(mktemp -d)" +trap 'rm -rf "${TMP_ROOT}"' EXIT + +WORK_DIR="${TMP_ROOT}/workspace" +FAKE_BIN_DIR="${TMP_ROOT}/fake-bin" +TARGET_BIN_DIR="${TMP_ROOT}/target-bin" +SPACETIME_ROOT_DIR="${TMP_ROOT}/stdb" +OUTPUT_LOG="${TMP_ROOT}/prepare.log" + +mkdir -p \ + "${WORK_DIR}" \ + "${FAKE_BIN_DIR}" \ + "${TARGET_BIN_DIR}" \ + "${SPACETIME_ROOT_DIR}/bin/current" + +cat >"${FAKE_BIN_DIR}/curl" <<'EOF' +#!/usr/bin/env bash +echo "curl should not be called when target tools are already ready" >&2 +exit 97 +EOF +cat >"${FAKE_BIN_DIR}/wget" <<'EOF' +#!/usr/bin/env bash +echo "wget should not be called when target tools are already ready" >&2 +exit 97 +EOF +chmod +x "${FAKE_BIN_DIR}/curl" "${FAKE_BIN_DIR}/wget" + +cat >"${TARGET_BIN_DIR}/otelcol-contrib" <<'EOF' +#!/usr/bin/env bash +echo "otelcol-contrib version 0.151.0" +EOF +chmod +x "${TARGET_BIN_DIR}/otelcol-contrib" + +cat >"${SPACETIME_ROOT_DIR}/bin/current/spacetimedb-cli" <<'EOF' +#!/usr/bin/env bash +echo "spacetimedb-cli 2.4.1" +EOF +cat >"${SPACETIME_ROOT_DIR}/bin/current/spacetimedb-standalone" <<'EOF' +#!/usr/bin/env bash +echo "spacetimedb-standalone 2.4.1" +EOF +chmod +x \ + "${SPACETIME_ROOT_DIR}/bin/current/spacetimedb-cli" \ + "${SPACETIME_ROOT_DIR}/bin/current/spacetimedb-standalone" + +if ! ( + cd "${WORK_DIR}" + PATH="${FAKE_BIN_DIR}:${PATH}" \ + WORKSPACE="${WORK_DIR}" \ + PROVISION_TOOLS_DIR="provision-tools" \ + PROVISION_DOWNLOADS_DIR="downloads" \ + PROVISION_TOOLS_TMP_PARENT="${WORK_DIR}/.tmp/server-provision-tools" \ + PROVISION_REQUIRE_LOCAL_DOWNLOADS="true" \ + OTELCOL_TARGET_BIN="${TARGET_BIN_DIR}/otelcol-contrib" \ + OTELCOL_VERSION="0.151.0" \ + SPACETIME_ROOT="${SPACETIME_ROOT_DIR}" \ + SPACETIME_EXPECTED_VERSION="2.4.1" \ + "${REPO_ROOT}/scripts/prepare-server-provision-tools.sh" \ + >"${OUTPUT_LOG}" 2>&1 +); then + echo "[check-server-provision-tools] prepare-server-provision-tools.sh 执行失败。" >&2 + cat "${OUTPUT_LOG}" >&2 + exit 1 +fi + +grep -q "复用目标机已有 otelcol-contrib" "${OUTPUT_LOG}" +grep -q "复用目标机已有 SpacetimeDB 安装" "${OUTPUT_LOG}" +grep -q "otelcol-contrib 0.151.0 target existing" "${WORK_DIR}/provision-tools/MANIFEST.txt" +grep -q "spacetime target existing" "${WORK_DIR}/provision-tools/MANIFEST.txt" + +test -x "${WORK_DIR}/provision-tools/otelcol-contrib" +test -x "${WORK_DIR}/provision-tools/spacetime/spacetime" +test -x "${WORK_DIR}/provision-tools/spacetime/bin/current/spacetimedb-cli" +test -x "${WORK_DIR}/provision-tools/spacetime/bin/current/spacetimedb-standalone" + +if grep -q "下载 " "${OUTPUT_LOG}"; then + echo "[check-server-provision-tools] 已有目标机工具时不应进入下载分支。" >&2 + cat "${OUTPUT_LOG}" >&2 + exit 1 +fi + +echo "[check-server-provision-tools] OK" diff --git a/scripts/prepare-server-provision-tools.sh b/scripts/prepare-server-provision-tools.sh index 8bbd8266..edb44a12 100755 --- a/scripts/prepare-server-provision-tools.sh +++ b/scripts/prepare-server-provision-tools.sh @@ -7,9 +7,12 @@ OTELCOL_VERSION="${OTELCOL_VERSION:-0.151.0}" PREPARE_OTELCOL="${PREPARE_OTELCOL:-${ENABLE_OTELCOL:-true}}" OTELCOL_DOWNLOAD_ROOT="${OTELCOL_DOWNLOAD_ROOT:-https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download}" OTELCOL_ARCHIVE_PATH="${OTELCOL_ARCHIVE_PATH:-}" +OTELCOL_TARGET_BIN="${OTELCOL_TARGET_BIN:-/usr/local/bin/otelcol-contrib}" SPACETIME_INSTALLER_URL="${SPACETIME_INSTALLER_URL:-https://install.spacetimedb.com}" SPACETIME_DOWNLOAD_ROOT="${SPACETIME_DOWNLOAD_ROOT:-https://github.com/clockworklabs/SpacetimeDB/releases/download/v2.4.1}" SPACETIME_TARGET_HOST="${SPACETIME_TARGET_HOST:-x86_64-unknown-linux-gnu}" +SPACETIME_ROOT="${SPACETIME_ROOT:-/stdb}" +SPACETIME_EXPECTED_VERSION="${SPACETIME_EXPECTED_VERSION:-}" SPACETIME_ARCHIVE_PATH="${SPACETIME_ARCHIVE_PATH:-}" SPACETIME_INSTALLER_PATH="${SPACETIME_INSTALLER_PATH:-}" SPACETIME_UPDATE_INSTALLER_PATH="${SPACETIME_UPDATE_INSTALLER_PATH:-}" @@ -65,6 +68,60 @@ download_file() { fi } +resolve_spacetime_expected_version() { + local download_root="${SPACETIME_DOWNLOAD_ROOT%/}" + + if [[ -n "${SPACETIME_EXPECTED_VERSION}" ]]; then + printf "%s" "${SPACETIME_EXPECTED_VERSION}" + return + fi + + if [[ "${download_root}" =~ /v([0-9]+(\.[0-9]+){1,2})$ ]]; then + printf "%s" "${BASH_REMATCH[1]}" + fi +} + +target_otelcol_ready() { + local version_output + + echo "[prepare-provision-tools] 检查目标机 otelcol-contrib: ${OTELCOL_TARGET_BIN}" + if [[ ! -x "${OTELCOL_TARGET_BIN}" ]]; then + echo "[prepare-provision-tools] 目标机 otelcol-contrib 不存在或不可执行,将准备交付文件。" + return 1 + fi + + version_output="$("${OTELCOL_TARGET_BIN}" --version 2>/dev/null || true)" + if [[ -n "${OTELCOL_VERSION}" && "${version_output}" != *"${OTELCOL_VERSION}"* ]]; then + echo "[prepare-provision-tools] 目标机 otelcol-contrib 版本不匹配,期望 ${OTELCOL_VERSION},当前: ${version_output:-unknown}" + return 1 + fi + + echo "[prepare-provision-tools] 目标机 otelcol-contrib 已满足要求: ${version_output:-version unknown}" + return 0 +} + +target_spacetime_ready() { + local target_cli="${SPACETIME_ROOT}/bin/current/spacetimedb-cli" + local target_standalone="${SPACETIME_ROOT}/bin/current/spacetimedb-standalone" + local expected_version version_output + + echo "[prepare-provision-tools] 检查目标机 SpacetimeDB: ${SPACETIME_ROOT}/bin/current" + if [[ ! -x "${target_cli}" || ! -x "${target_standalone}" ]]; then + echo "[prepare-provision-tools] 目标机 SpacetimeDB current 目录不完整,将准备交付文件。" + return 1 + fi + + expected_version="$(resolve_spacetime_expected_version)" + version_output="$("${target_cli}" --version 2>/dev/null || true)" + if [[ -n "${expected_version}" && "${version_output}" != *"${expected_version}"* ]]; then + echo "[prepare-provision-tools] 目标机 SpacetimeDB 版本不匹配,期望 ${expected_version},当前: ${version_output:-unknown}" + return 1 + fi + + echo "[prepare-provision-tools] 目标机 SpacetimeDB 已满足要求: ${version_output:-version unknown}" + return 0 +} + validate_relative_dir() { local label="$1" local path="$2" @@ -101,13 +158,21 @@ prepare_otelcol() { require_cmd tar + if target_otelcol_ready; then + echo "[prepare-provision-tools] 复用目标机已有 otelcol-contrib: ${OTELCOL_TARGET_BIN}" + install -m 0755 "${OTELCOL_TARGET_BIN}" "${target}" + "${target}" --version >/dev/null + OTELCOL_SOURCE_DESCRIPTION="target existing ${OTELCOL_TARGET_BIN}" + return + fi + if [[ -n "${OTELCOL_ARCHIVE_PATH}" && -f "${OTELCOL_ARCHIVE_PATH}" ]]; then source_archive="${OTELCOL_ARCHIVE_PATH}" elif [[ -n "${PROVISION_DOWNLOADS_DIR}" && -f "${downloaded_archive}" ]]; then source_archive="${downloaded_archive}" fi if [[ "${PROVISION_REQUIRE_LOCAL_DOWNLOADS}" == "true" && -z "${source_archive}" ]]; then - echo "[prepare-provision-tools] 要求使用 Windows 已下载的 otelcol-contrib 包,但未找到: ${downloaded_archive}" >&2 + echo "[prepare-provision-tools] 要求使用本地已有的 otelcol-contrib 来源,但目标机未满足且未找到下载包: ${downloaded_archive}" >&2 exit 1 fi @@ -146,6 +211,17 @@ prepare_spacetime() { local downloaded_installer="${PROVISION_DOWNLOADS_DIR}/spacetime-install.sh" local source_installer="" + if target_spacetime_ready; then + echo "[prepare-provision-tools] 复用目标机已有 SpacetimeDB 安装: ${SPACETIME_ROOT}/bin/current" + mkdir -p "${target_dir}" + cp -a "${SPACETIME_ROOT}/bin" "${target_dir}/bin" + chmod 0755 "${target_dir}/bin/current/spacetimedb-cli" "${target_dir}/bin/current/spacetimedb-standalone" + make_spacetime_wrapper "${target_dir}/spacetime" + "${target_dir}/spacetime" --version >/dev/null + SPACETIME_SOURCE_DESCRIPTION="target existing ${SPACETIME_ROOT}/bin/current" + return + fi + mkdir -p "${install_root}" if [[ -n "${SPACETIME_ARCHIVE_PATH}" && -f "${SPACETIME_ARCHIVE_PATH}" ]]; then source_archive="${SPACETIME_ARCHIVE_PATH}" @@ -165,7 +241,7 @@ prepare_spacetime() { source_update="${downloaded_update}" fi if [[ "${PROVISION_REQUIRE_LOCAL_DOWNLOADS}" == "true" && -z "${source_archive}" ]]; then - echo "[prepare-provision-tools] 要求使用 Windows 已下载的 SpacetimeDB release tarball,但未找到: ${downloaded_archive}" >&2 + echo "[prepare-provision-tools] 要求使用本地已有的 SpacetimeDB release tarball,但目标机未满足且未找到下载包: ${downloaded_archive}" >&2 exit 1 fi @@ -185,7 +261,7 @@ prepare_spacetime() { fi if [[ "${PROVISION_REQUIRE_LOCAL_DOWNLOADS}" == "true" && -z "${source_installer}" ]]; then - echo "[prepare-provision-tools] 要求使用 Windows 已下载的 SpacetimeDB 官方安装器脚本,但未找到: ${downloaded_installer}" >&2 + echo "[prepare-provision-tools] 要求使用本地已有的 SpacetimeDB 官方安装器脚本,但未找到: ${downloaded_installer}" >&2 exit 1 elif [[ -n "${source_installer}" ]]; then echo "[prepare-provision-tools] 使用已下载的 SpacetimeDB 官方安装器脚本: ${source_installer}" From 9db467d23f524dc34e25049e22d0edeeba26c34e Mon Sep 17 00:00:00 2001 From: kdletters <61648117+kdletters@users.noreply.github.com> Date: Wed, 10 Jun 2026 11:35:39 +0800 Subject: [PATCH 3/9] =?UTF-8?q?=E8=A1=A5=E5=85=85=20release=20SpacetimeDB?= =?UTF-8?q?=20=E5=81=A5=E5=BA=B7=E6=A3=80=E6=9F=A5=E4=B8=8E=E5=B7=A1?= =?UTF-8?q?=E6=A3=80=E9=98=B2=E5=9B=9E=E9=80=80?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 增加 SpacetimeDB 阶段化健康检查与 /readyz 阶段输出 记录 procedure/reducer/read 失败的阶段和耗时 补充 release 健康巡检 systemd timer 与生产 ops 预检 同步 API 构建部署、provision 脚本和运维文档 --- .hermes/shared-memory/pitfalls.md | 16 + .../systemd/genarrative-health-patrol.service | 20 + .../systemd/genarrative-health-patrol.timer | 13 + ...发运维】本地开发验证与生产运维-2026-05-15.md | 40 +- jenkins/Jenkinsfile.production-api-build | 2 +- jenkins/Jenkinsfile.production-api-deploy | 2 +- package.json | 3 +- scripts/build-production-release.sh | 5 +- scripts/check-production-ops-guardrails.mjs | 64 +++ scripts/deploy/production-api-deploy.sh | 17 +- scripts/jenkins-server-provision.sh | 20 +- scripts/ops/production-health-patrol.mjs | 477 ++++++++++++++++++ server-rs/crates/api-server/src/app.rs | 40 ++ server-rs/crates/api-server/src/config.rs | 31 ++ server-rs/crates/api-server/src/health.rs | 33 +- server-rs/crates/api-server/src/state.rs | 34 +- server-rs/crates/spacetime-client/src/lib.rs | 400 +++++++++++++-- 17 files changed, 1147 insertions(+), 70 deletions(-) create mode 100644 deploy/systemd/genarrative-health-patrol.service create mode 100644 deploy/systemd/genarrative-health-patrol.timer create mode 100644 scripts/check-production-ops-guardrails.mjs create mode 100644 scripts/ops/production-health-patrol.mjs diff --git a/.hermes/shared-memory/pitfalls.md b/.hermes/shared-memory/pitfalls.md index 1b904428..2f0d3fd5 100644 --- a/.hermes/shared-memory/pitfalls.md +++ b/.hermes/shared-memory/pitfalls.md @@ -15,6 +15,22 @@ - 关联:相关文件、文档、提交或 Issue ``` +## 生产冷备份后 API 不能只依赖 SpacetimeDB 自恢复 + +- 现象:release 机器 `03:20` 冷备份后,`spacetimedb.service` 已恢复,但作品列表、创作入口配置或公开 gallery 继续超时 / 502 / 504,`genarrative-api.service` 保持 stopped。 +- 原因:`genarrative-api.service` 配置了 `Requires=spacetimedb.service`,冷备份停止 `spacetimedb.service` 时 API 会被 systemd 依赖关系一并停止;如果 `genarrative-database-backup.service` 只传 `--stop-service spacetimedb.service` 而漏掉 `--restart-service-after genarrative-api.service`,备份脚本只会恢复数据库,不会再拉起 API。 +- 处理:生产冷备份 unit 和发布脚本必须带 `--restart-service-after genarrative-api.service`;仓库用 `npm run check:production-ops` 检查 systemd 模板、API build/deploy 归档和健康巡检链路。现场修复后执行 `systemctl daemon-reload`,但不要为了验证而手动触发冷备份。 +- 验证:`systemctl cat genarrative-database-backup.service` 应包含该参数;`systemctl is-active spacetimedb.service genarrative-api.service nginx.service` 全为 `active`;`curl -fsS http://127.0.0.1:3101/v1/ping`、`/healthz`、`/readyz` 和代表性 `/api/runtime/puzzle/gallery` 均成功。 +- 关联:`deploy/systemd/genarrative-database-backup.service`、`scripts/database-backup-to-oss.mjs`、`scripts/ops/production-health-patrol.mjs`、`docs/【开发运维】本地开发验证与生产运维-2026-05-15.md`。 + +## SpacetimeDB 45 秒超时要看 api-server 记录的阶段 + +- 现象:release 上 Nginx 能立刻连到 `api-server`,但 `/api/runtime/*/gallery`、`/api/creation-entry/config` 等请求在约 `GENARRATIVE_SPACETIME_PROCEDURE_TIMEOUT_SECONDS` 后返回 `502` / `504`。 +- 原因:旧日志只能看到 HTTP 总耗时和最终状态,无法区分卡在连接池、SDK 建连、等待 `on_connect`、订阅 read model、等待 procedure / reducer 回调还是本地订阅 cache 读取。 +- 处理:`spacetime-client` 内置阶段化健康检查和失败日志;`/readyz` 用 `GENARRATIVE_SPACETIME_HEALTH_CHECK_TIMEOUT_SECONDS` 短窗口检查 SpacetimeDB 连接租约,业务失败日志包含 `operation_kind`、`operation_name`、`spacetime_stage`、`elapsed_ms`。 +- 验证:`/readyz` 失败时看 `details.spacetime.stage`;业务请求超时时查 `journalctl -u genarrative-api.service` 中同一时间窗口的 `SpacetimeDB client operation failed`,优先按 `pool_acquire`、`connect_build`、`connect_handshake`、`read_model_subscribe`、`procedure_result`、`reducer_result`、`read_cache` 分阶段处理。 +- 关联:`server-rs/crates/spacetime-client/src/lib.rs`、`server-rs/crates/api-server/src/health.rs`、`docs/【开发运维】本地开发验证与生产运维-2026-05-15.md`。 + ## 新建草稿扣费不能和入口卡泥点配置分离 - 现象:后台修改创作入口的 `mudPointCost` 后,入口卡和前置余额提示可能显示新数值,但用户真实钱包流水仍按代码常量扣除。 diff --git a/deploy/systemd/genarrative-health-patrol.service b/deploy/systemd/genarrative-health-patrol.service new file mode 100644 index 00000000..2ef7d26c --- /dev/null +++ b/deploy/systemd/genarrative-health-patrol.service @@ -0,0 +1,20 @@ +[Unit] +Description=Genarrative Production Health Patrol +After=network-online.target genarrative-api.service spacetimedb.service nginx.service +Wants=network-online.target +ConditionPathExists=/opt/genarrative/current/scripts/ops/production-health-patrol.mjs + +[Service] +Type=oneshot +User=root +Group=root +WorkingDirectory=/opt/genarrative/current +EnvironmentFile=-/etc/genarrative/health-patrol.env +ExecStart=/usr/bin/node /opt/genarrative/current/scripts/ops/production-health-patrol.mjs --status-file /var/lib/genarrative/health-patrol/status.json +TimeoutStartSec=30 + +# 巡检只读 systemd、HTTP 和 journal;只允许写入自己的最近一次状态文件。 +NoNewPrivileges=true +PrivateTmp=true +ProtectSystem=full +ReadWritePaths=/var/lib/genarrative/health-patrol diff --git a/deploy/systemd/genarrative-health-patrol.timer b/deploy/systemd/genarrative-health-patrol.timer new file mode 100644 index 00000000..e29bfdc2 --- /dev/null +++ b/deploy/systemd/genarrative-health-patrol.timer @@ -0,0 +1,13 @@ +[Unit] +Description=Run Genarrative Production Health Patrol + +[Timer] +OnBootSec=2min +OnCalendar=*-*-* *:0/5:00 +Persistent=true +RandomizedDelaySec=30 +AccuracySec=30s +Unit=genarrative-health-patrol.service + +[Install] +WantedBy=timers.target diff --git a/docs/【开发运维】本地开发验证与生产运维-2026-05-15.md b/docs/【开发运维】本地开发验证与生产运维-2026-05-15.md index 4797b4d4..d2b630ac 100644 --- a/docs/【开发运维】本地开发验证与生产运维-2026-05-15.md +++ b/docs/【开发运维】本地开发验证与生产运维-2026-05-15.md @@ -96,6 +96,7 @@ npm run admin-web:typecheck ```bash npm run check:encoding npm run check:spacetime-schema +npm run check:production-ops npm run check:server-rs-ddd npm run lint:eslint npm run typecheck @@ -217,7 +218,7 @@ UI 相关修改要重点验证: npm run database:backup:oss -- --data-dir /stdb --stop-service spacetimedb.service --restart-service-after genarrative-api.service ``` -脚本会将数据目录打包成 `tar.gz`,上传到 `oss://///-.tar.gz`。生产建议做冷备份:传入 `--stop-service spacetimedb.service`,脚本会在打包前停止服务、打包后恢复服务,再上传 OSS;因 `genarrative-api.service` 依赖 `spacetimedb.service`,生产定时冷备份还必须传入 `--restart-service-after genarrative-api.service`,确保备份后 API 随数据库一起恢复。由于 OSS 上传可能受服务器带宽限制,`Genarrative-Stdb-Module-Publish` 默认使用 `DATABASE_BACKUP_MODE=async`:先在 publish 前用 `--defer-upload` 生成本地冷备份和 `.manifest.json`,随后继续执行 publish;发布脚本退出前会用后台 `node ... --upload-archive ` 上传同一份发布前备份,不等待上传完成。发布脚本在校验 wasm 后、执行 `spacetime publish` 前会等待显式 `SPACETIME_SERVER_URL` 的 `/v1/ping` 就绪,默认最多等待 `60` 秒;如生产机器冷备份恢复 `spacetimedb.service` 较慢,可临时设置 `GENARRATIVE_STDB_PUBLISH_READY_TIMEOUT_SECONDS` 调整等待时间。需要强一致发布闸门时改用 `DATABASE_BACKUP_MODE=sync`(等价脚本参数 `--backup-mode sync`),备份会在 publish 前同步打包并上传,失败会阻断 publish;确认已有其他备份窗口时才使用 `DATABASE_BACKUP_MODE=skip`(兼容脚本参数 `--skip-backup`)。若业务不能接受停机窗口,应先规划 SpacetimeDB 原生快照或主备策略,不要直接在写入中的数据目录上做热拷贝并当作强一致备份。 +脚本会将数据目录打包成 `tar.gz`,上传到 `oss://///-.tar.gz`。生产建议做冷备份:传入 `--stop-service spacetimedb.service`,脚本会在打包前停止服务、打包后恢复服务,再上传 OSS;因 `genarrative-api.service` 依赖 `spacetimedb.service`,生产定时冷备份还必须传入 `--restart-service-after genarrative-api.service`,确保备份后 API 随数据库一起恢复。`2026-06-10` release 故障就是现场 unit 漏掉该参数,`03:20` 冷备份停止 SpacetimeDB 后 API 被依赖关系一并停止,备份脚本只恢复了 SpacetimeDB,API 直到人工重启前都不可用;后续现场变更、provision 模板和 Jenkins 归档都必须通过 `npm run check:production-ops` 防止回退。由于 OSS 上传可能受服务器带宽限制,`Genarrative-Stdb-Module-Publish` 默认使用 `DATABASE_BACKUP_MODE=async`:先在 publish 前用 `--defer-upload` 生成本地冷备份和 `.manifest.json`,随后继续执行 publish;发布脚本退出前会用后台 `node ... --upload-archive ` 上传同一份发布前备份,不等待上传完成。发布脚本在校验 wasm 后、执行 `spacetime publish` 前会等待显式 `SPACETIME_SERVER_URL` 的 `/v1/ping` 就绪,默认最多等待 `60` 秒;如生产机器冷备份恢复 `spacetimedb.service` 较慢,可临时设置 `GENARRATIVE_STDB_PUBLISH_READY_TIMEOUT_SECONDS` 调整等待时间。需要强一致发布闸门时改用 `DATABASE_BACKUP_MODE=sync`(等价脚本参数 `--backup-mode sync`),备份会在 publish 前同步打包并上传,失败会阻断 publish;确认已有其他备份窗口时才使用 `DATABASE_BACKUP_MODE=skip`(兼容脚本参数 `--skip-backup`)。若业务不能接受停机窗口,应先规划 SpacetimeDB 原生快照或主备策略,不要直接在写入中的数据目录上做热拷贝并当作强一致备份。 生产环境变量模板在 `deploy/env/api-server.env.example`: @@ -234,6 +235,17 @@ GENARRATIVE_DATABASE_BACKUP_OSS_ACCESS_KEY_SECRET= `GENARRATIVE_DATABASE_BACKUP_OSS_BUCKET` 为空时会回退 `ALIYUN_OSS_BUCKET`;AccessKey 默认复用 `ALIYUN_OSS_ACCESS_KEY_ID` / `ALIYUN_OSS_ACCESS_KEY_SECRET`,也可用 `GENARRATIVE_DATABASE_BACKUP_OSS_ACCESS_KEY_ID` / `GENARRATIVE_DATABASE_BACKUP_OSS_ACCESS_KEY_SECRET` 为备份 bucket 单独配置最小权限账号。`Genarrative-Server-Provision` 会创建 `/var/lib/genarrative/database-backups` 并归属 `genarrative:genarrative`,同时安装并启用 `genarrative-database-backup.timer`。手动检查定时器:`systemctl list-timers genarrative-database-backup.timer`;手动触发一次:`systemctl start genarrative-database-backup.service`。 +冷备份后必须做一次只读验收,不要只看 `genarrative-database-backup.service` 是否成功退出: + +```bash +systemctl is-active spacetimedb.service genarrative-api.service nginx.service +curl -fsS --max-time 5 http://127.0.0.1:3101/v1/ping +curl -fsS --max-time 5 http://127.0.0.1:8082/healthz +curl -fsS --max-time 5 http://127.0.0.1:8082/readyz +curl -fsS --max-time 5 http://127.0.0.1/api/creation-entry/config >/dev/null +curl -fsS --max-time 5 http://127.0.0.1/api/runtime/puzzle/gallery >/dev/null +``` + ## 生产运维 生产部署当前口径: @@ -244,11 +256,32 @@ Nginx 负责站点和反向代理 Jenkins 按 web / api / Spacetime module / build / deploy / publish 拆分 ``` +### 生产健康巡检 + +`Genarrative-Server-Provision` 会安装并启用 `genarrative-health-patrol.timer`,默认每 5 分钟运行一次 `genarrative-health-patrol.service`。巡检脚本随 API release 归档到 `/opt/genarrative/current/scripts/ops/production-health-patrol.mjs`,只读检查: + +- `genarrative-api.service`、`spacetimedb.service`、`nginx.service` 是否 active。 +- API 直连 `/healthz`、`/readyz`。 +- SpacetimeDB 直连 `/v1/ping`。 +- 默认直连 API 端口检查 `/api/creation-entry/config`、`/api/runtime/puzzle/gallery`、`/api/runtime/custom-world-gallery`;如需走 Nginx / 公网域名,在 `/etc/genarrative/health-patrol.env` 配置 `GENARRATIVE_HEALTH_PATROL_PUBLIC_BASE_URL=https://<域名>`。 +- 最近 15 分钟 `genarrative-api.service`、`spacetimedb.service`、`nginx.service` 的 `err..alert` 日志。 + +巡检输出总状态 `OK / WARNING / CRITICAL`;只有 `CRITICAL` 默认让 systemd service 失败,`WARNING` 只写日志和状态文件,避免历史日志噪声把 timer 长期打成失败。最近一次结果写入 `/var/lib/genarrative/health-patrol/status.json`。手动执行: + +```bash +systemctl start genarrative-health-patrol.service +systemctl status genarrative-health-patrol.service --no-pager +journalctl -u genarrative-health-patrol.service -n 80 --no-pager +cat /var/lib/genarrative/health-patrol/status.json +``` + +如需接外部告警,可在 `/etc/genarrative/health-patrol.env` 配置 `GENARRATIVE_HEALTH_PATROL_WEBHOOK_URL`;脚本只会在 `WARNING` 或 `CRITICAL` 时向该 webhook 发送 JSON。未配置 webhook 时,告警来源是 systemd 失败状态、journal 和状态文件。 + `Genarrative-Web-Build` 的主站构建失败若出现 Rollup 报错 `"xxx" is not exported by "src/services/publicWorkCode.ts"`,优先按前端公开作品号工具缺失处理,而不是排查 Jenkins 节点环境。修复时要让 `publicWorkCode.ts` 的 `buildPublicWorkCode` 与 `isSamePublicWorkCode` 成对导出,并补 `src/services/publicWorkCode.test.ts` 覆盖对应玩法前缀;随后用 `npm run build:production-release -- --component web --name <临时名>` 复现 Jenkins web 构建路径。 `Genarrative-Web-Build` 会把 `build//web.tar.gz`、`web.tar.gz.sha256`、`release-manifest.json` 和 `scripts/deploy/production-web-deploy.sh` 直接归档为 Jenkins 构建产物;`Genarrative-Web-Deploy` 只通过 `copyArtifacts` 从指定上游构建复制这些产物和部署脚本,不再在目标机器 checkout Git,再执行随构建归档的 `scripts/deploy/production-web-deploy.sh`。Web 发布不再读取构建机本地缓存目录,也不再通过 release agent `rsync` 回构建机拉取大包;如果 deploy 找不到 `web.tar.gz`,应先检查上游 Web Build 是否按同一 `BUILD_VERSION` 成功归档产物。 -`Genarrative-Api-Build` 的 Jenkins 归档产物必须包含 `build//api-server`、`api-server.sha256`、`release-manifest.json`、`scripts/database-backup-to-oss.mjs`、`scripts/deploy/production-api-deploy.sh`、`scripts/deploy/maintenance-on.sh` 和 `scripts/deploy/maintenance-off.sh`。`deploy/systemd/genarrative-database-backup.service` 从 `/opt/genarrative/current/scripts/database-backup-to-oss.mjs` 执行冷备份,`Genarrative-Api-Deploy` 会从上游 API 构建产物复制部署脚本和备份脚本,不再在目标机器 checkout Git;如果 API 发布后 current release 中缺少该脚本,应先检查 `Genarrative-Api-Build` 的 `archiveArtifacts` 和 `Genarrative-Api-Deploy` 的 `copyArtifacts` 过滤器是否仍包含 `build//scripts/database-backup-to-oss.mjs`,不要只在部署机工作区手工补文件。 +`Genarrative-Api-Build` 的 Jenkins 归档产物必须包含 `build//api-server`、`api-server.sha256`、`release-manifest.json`、`scripts/database-backup-to-oss.mjs`、`scripts/ops/production-health-patrol.mjs`、`scripts/deploy/production-api-deploy.sh`、`scripts/deploy/maintenance-on.sh` 和 `scripts/deploy/maintenance-off.sh`。`deploy/systemd/genarrative-database-backup.service` 从 `/opt/genarrative/current/scripts/database-backup-to-oss.mjs` 执行冷备份,`deploy/systemd/genarrative-health-patrol.service` 从 `/opt/genarrative/current/scripts/ops/production-health-patrol.mjs` 执行巡检;`Genarrative-Api-Deploy` 会从上游 API 构建产物复制部署脚本、备份脚本和巡检脚本,不再在目标机器 checkout Git。如果 API 发布后 current release 中缺少这些脚本,应先检查 `Genarrative-Api-Build` 的 `archiveArtifacts` 和 `Genarrative-Api-Deploy` 的 `copyArtifacts` 过滤器是否仍包含 `build//scripts/database-backup-to-oss.mjs` 与 `build//scripts/ops/production-health-patrol.mjs`,不要只在部署机工作区手工补文件。 `Genarrative-Stdb-Module-Build` 的 Jenkins 归档产物必须包含 `build//spacetime_module.wasm`、`spacetime_module.wasm.sha256`、`release-manifest.json`、`scripts/deploy/production-stdb-publish.sh`、`scripts/deploy/maintenance-on.sh`、`scripts/deploy/maintenance-off.sh` 和 `scripts/database-backup-to-oss.mjs`。`Genarrative-Stdb-Module-Publish` 只通过 `copyArtifacts` 复制这些产物和发布脚本,不再在目标机器 checkout Git;如果 publish 前备份脚本缺失,应先检查 Stdb Build 的归档列表和 Stdb Publish 的复制过滤器。 @@ -275,7 +308,8 @@ dev 服务器上的 Gitea 内网入口固定为 `http://10.2.0.10/GenarrativeAI/ - `api-server` 生产模板默认 `GENARRATIVE_API_LISTEN_BACKLOG=1024`、`GENARRATIVE_API_WORKER_THREADS=4`;本地未设置 worker threads 时继续使用 Tokio 默认值。 - `GENARRATIVE_API_MAX_CONCURRENT_REQUESTS=512` 开启应用内 HTTP 并发背压;`GENARRATIVE_API_GALLERY_MAX_CONCURRENT_REQUESTS=320`、`GENARRATIVE_API_DETAIL_MAX_CONCURRENT_REQUESTS=64`、`GENARRATIVE_API_ADMIN_MAX_CONCURRENT_REQUESTS=16` 分别限制公开列表、公开详情和后台 API 热路径。超过许可时直接返回 `429 Too Many Requests` 和 `Retry-After: 1`,`/healthz` 与 `/readyz` 不受该限制。这些值不是 RPS 限速;如果压测中 429 上升但内存和 p95 收敛,说明背压正在保护进程。直连 `api-server` 的极高 RPS 压测若出现 `connection refused`,通常已经打到 TCP 监听 / accept 层,应同时检查 backlog、Nginx upstream keepalive 和前置限流。 -- `api-server` 正常运行时 `/healthz` 返回进程存活状态,`/readyz` 返回是否仍接收新流量;收到 `SIGINT` / `SIGTERM` 后会先把 readiness 标记为不可用,再让 Axum 停止接新连接并等待已有 HTTP 请求排空。systemd 仍以 `KillSignal=SIGINT` 停服务,`TimeoutStopSec=90` 作为长请求排空上限。 +- `api-server` 正常运行时 `/healthz` 只返回进程存活状态,`/readyz` 会同时检查进程是否仍接收新流量和 SpacetimeDB 连接租约是否健康;收到 `SIGINT` / `SIGTERM` 后会先把 readiness 标记为不可用,再让 Axum 停止接新连接并等待已有 HTTP 请求排空。systemd 仍以 `KillSignal=SIGINT` 停服务,`TimeoutStopSec=90` 作为长请求排空上限。 +- SpacetimeDB 健康检查默认使用 `GENARRATIVE_SPACETIME_HEALTH_CHECK_TIMEOUT_SECONDS=2` 的短等待窗口,和业务 procedure 的 `GENARRATIVE_SPACETIME_PROCEDURE_TIMEOUT_SECONDS` 分开。`/readyz` 失败时 `details.spacetime.stage` 会标出当前卡住阶段:`pool_acquire`、`connect_build`、`connect_handshake`、`read_model_subscribe`、`procedure_result`、`reducer_result` 或 `read_cache`;`elapsedMs` / `timeoutMs` 用于确认是否命中健康检查窗口。业务请求日志也会写入 `operation_kind`、`operation_name`、`spacetime_stage` 和 `elapsed_ms`,后续 45 秒超时不再只靠 Nginx `request_time=45s` 推断。 - `genarrative-api.service` 设置 `LimitNOFILE=65535`、`TasksMax=2048`;上线后用 `systemctl show genarrative-api.service -p LimitNOFILE -p TasksMax -p TimeoutStopUSec` 和 `cat /proc/$(pidof api-server)/limits` 核对。 - Server provision 不再通过 Windows helper 下载,也不再通过 Linux build 节点中转工具包。`Prepare Provision Tools` 在目标 dev / release agent 工作区内先检查 `/usr/local/bin/otelcol-contrib` 与 `${SPACETIME_ROOT}/bin/current`:版本已满足时直接复用目标机现有文件生成 `provision-tools/`,只有缺失或版本不匹配时才使用 `PROVISION_DOWNLOADS_DIR` 里的本地包或从配置的下载源准备 SpacetimeDB `2.4.1` / `otelcol-contrib 0.151.0`;如果目标服务器下载需要代理,在 `PROVISION_DOWNLOAD_PROXY` 配置目标机可访问的 HTTP 代理。 - 除 `Genarrative-Server-Provision` 外,`Genarrative-Stdb-Module-Build`、`Genarrative-Web-Build`、`Genarrative-Api-Build`、`Genarrative-*Deploy`、`Genarrative-Database-Import/Export`、`Genarrative-Full-Build-And-Deploy` 和 `Genarrative-Notify-Email` 的生产流水线现都以 Linux agent 为主,仍按各自 Jenkinsfile 的 checkout 口径执行。Server provision 不使用公网备用 Git 源。 diff --git a/jenkins/Jenkinsfile.production-api-build b/jenkins/Jenkinsfile.production-api-build index d399ed2f..91aa966b 100644 --- a/jenkins/Jenkinsfile.production-api-build +++ b/jenkins/Jenkinsfile.production-api-build @@ -104,7 +104,7 @@ pipeline { stage('Archive') { steps { - archiveArtifacts artifacts: "build/${env.EFFECTIVE_BUILD_VERSION}/api-server,build/${env.EFFECTIVE_BUILD_VERSION}/api-server.sha256,build/${env.EFFECTIVE_BUILD_VERSION}/release-manifest.json,build/${env.EFFECTIVE_BUILD_VERSION}/scripts/database-backup-to-oss.mjs,scripts/deploy/production-api-deploy.sh,scripts/deploy/maintenance-on.sh,scripts/deploy/maintenance-off.sh", fingerprint: true + archiveArtifacts artifacts: "build/${env.EFFECTIVE_BUILD_VERSION}/api-server,build/${env.EFFECTIVE_BUILD_VERSION}/api-server.sha256,build/${env.EFFECTIVE_BUILD_VERSION}/release-manifest.json,build/${env.EFFECTIVE_BUILD_VERSION}/scripts/database-backup-to-oss.mjs,build/${env.EFFECTIVE_BUILD_VERSION}/scripts/ops/production-health-patrol.mjs,scripts/deploy/production-api-deploy.sh,scripts/deploy/maintenance-on.sh,scripts/deploy/maintenance-off.sh", fingerprint: true } } diff --git a/jenkins/Jenkinsfile.production-api-deploy b/jenkins/Jenkinsfile.production-api-deploy index db7809ce..152c181f 100644 --- a/jenkins/Jenkinsfile.production-api-deploy +++ b/jenkins/Jenkinsfile.production-api-deploy @@ -65,7 +65,7 @@ pipeline { copyArtifacts( projectName: params.BUILD_JOB_NAME, selector: specific(params.BUILD_NUMBER_TO_DEPLOY), - filter: "build/${params.BUILD_VERSION}/api-server,build/${params.BUILD_VERSION}/api-server.sha256,build/${params.BUILD_VERSION}/release-manifest.json,build/${params.BUILD_VERSION}/scripts/database-backup-to-oss.mjs,scripts/deploy/production-api-deploy.sh,scripts/deploy/maintenance-on.sh,scripts/deploy/maintenance-off.sh", + filter: "build/${params.BUILD_VERSION}/api-server,build/${params.BUILD_VERSION}/api-server.sha256,build/${params.BUILD_VERSION}/release-manifest.json,build/${params.BUILD_VERSION}/scripts/database-backup-to-oss.mjs,build/${params.BUILD_VERSION}/scripts/ops/production-health-patrol.mjs,scripts/deploy/production-api-deploy.sh,scripts/deploy/maintenance-on.sh,scripts/deploy/maintenance-off.sh", target: '.', fingerprintArtifacts: true ) diff --git a/package.json b/package.json index 82883e4f..b9055ebf 100644 --- a/package.json +++ b/package.json @@ -27,6 +27,7 @@ "clean": "node -e \"require('fs').rmSync('dist', { recursive: true, force: true })\"", "check:encoding": "node scripts/check-encoding.mjs", "check:spacetime-schema": "node scripts/check-spacetime-schema-guard.mjs", + "check:production-ops": "node scripts/check-production-ops-guardrails.mjs", "assets:child-motion-demo": "node scripts/generate-child-motion-demo-assets.mjs", "assets:match3d-style-references": "node scripts/generate-match3d-style-references.mjs", "check:visual-novel-vn11": "node scripts/check-visual-novel-vn11-negative-scan.mjs", @@ -37,7 +38,7 @@ "lint:guardrails": "npm run lint:eslint", "typecheck": "tsc -p tsconfig.typecheck-guardrails.json --noEmit", "typecheck:guardrails": "npm run typecheck", - "lint": "npm run check:encoding && npm run check:spacetime-schema && npm run lint:eslint && npm run typecheck", + "lint": "npm run check:encoding && npm run check:spacetime-schema && npm run check:production-ops && npm run lint:eslint && npm run typecheck", "lint:fix": "eslint . --ext .ts,.tsx,.js,.mjs,.cjs --fix && prettier --write .", "format": "prettier --write .", "format:check": "prettier --check .", diff --git a/scripts/build-production-release.sh b/scripts/build-production-release.sh index 80068924..19924c54 100644 --- a/scripts/build-production-release.sh +++ b/scripts/build-production-release.sh @@ -445,7 +445,7 @@ if [[ "${BUILD_SPACETIME}" -eq 1 ]]; then write_migration_bootstrap_secret_file fi -mkdir -p "${TARGET_DIR}/scripts" "${TARGET_DIR}/deploy" +mkdir -p "${TARGET_DIR}/scripts" "${TARGET_DIR}/scripts/ops" "${TARGET_DIR}/deploy" cp "${SCRIPT_DIR}/deploy/maintenance-on.sh" "${TARGET_DIR}/scripts/maintenance-on.sh" cp "${SCRIPT_DIR}/deploy/maintenance-off.sh" "${TARGET_DIR}/scripts/maintenance-off.sh" cp "${SCRIPT_DIR}/deploy/maintenance-status.sh" "${TARGET_DIR}/scripts/maintenance-status.sh" @@ -466,6 +466,7 @@ copy_required_file "${SCRIPT_DIR}/spacetime-migration-common.mjs" "${TARGET_DIR} copy_required_file "${SCRIPT_DIR}/spacetime-authorize-migration-operator.mjs" "${TARGET_DIR}/scripts/spacetime-authorize-migration-operator.mjs" "数据库迁移授权脚本" copy_required_file "${SCRIPT_DIR}/spacetime-revoke-migration-operator.mjs" "${TARGET_DIR}/scripts/spacetime-revoke-migration-operator.mjs" "数据库迁移撤权脚本" copy_required_file "${SCRIPT_DIR}/database-backup-to-oss.mjs" "${TARGET_DIR}/scripts/database-backup-to-oss.mjs" "数据库 OSS 备份脚本" +copy_required_file "${SCRIPT_DIR}/ops/production-health-patrol.mjs" "${TARGET_DIR}/scripts/ops/production-health-patrol.mjs" "生产健康巡检脚本" copy_required_dir "${REPO_ROOT}/deploy/systemd" "${TARGET_DIR}/deploy/systemd" "systemd 配置" copy_required_dir "${REPO_ROOT}/deploy/nginx" "${TARGET_DIR}/deploy/nginx" "Nginx 配置" @@ -485,7 +486,7 @@ cat >"${TARGET_DIR}/README.md" <&2 @@ -346,6 +348,19 @@ if [[ ! -f "${BACKUP_SCRIPT_SOURCE}" ]]; then fi cp "${BACKUP_SCRIPT_SOURCE}" "${RELEASE_DIR}/scripts/database-backup-to-oss.mjs" chmod 0644 "${RELEASE_DIR}/scripts/database-backup-to-oss.mjs" +if [[ ! -f "${HEALTH_PATROL_SCRIPT_SOURCE}" ]]; then + if [[ -f "${WORKSPACE_HEALTH_PATROL_SCRIPT_SOURCE}" ]]; then + echo "[production-api-deploy] 发布产物缺少 scripts/ops/production-health-patrol.mjs,回退使用部署工作区脚本;请重新触发包含该脚本的 API 构建。" >&2 + HEALTH_PATROL_SCRIPT_SOURCE="${WORKSPACE_HEALTH_PATROL_SCRIPT_SOURCE}" + else + echo "[production-api-deploy] 未找到生产健康巡检脚本,跳过复制;genarrative-health-patrol.service 会因脚本缺失而跳过执行。" >&2 + HEALTH_PATROL_SCRIPT_SOURCE="" + fi +fi +if [[ -n "${HEALTH_PATROL_SCRIPT_SOURCE}" ]]; then + cp "${HEALTH_PATROL_SCRIPT_SOURCE}" "${RELEASE_DIR}/scripts/ops/production-health-patrol.mjs" + chmod 0644 "${RELEASE_DIR}/scripts/ops/production-health-patrol.mjs" +fi if [[ -f "${SOURCE_DIR}/release-manifest.json" ]]; then cp "${SOURCE_DIR}/release-manifest.json" "${RELEASE_DIR}/release-manifest.api-server.json" diff --git a/scripts/jenkins-server-provision.sh b/scripts/jenkins-server-provision.sh index 5d3535ed..9f399e84 100755 --- a/scripts/jenkins-server-provision.sh +++ b/scripts/jenkins-server-provision.sh @@ -732,10 +732,20 @@ render_database_backup_service() { deploy/systemd/genarrative-database-backup.service } +render_health_patrol_service() { + local current_escaped + current_escaped="$(escape_sed_replacement "${CURRENT_LINK}")" + sed \ + -e "s|/opt/genarrative/current|${current_escaped}|g" \ + deploy/systemd/genarrative-health-patrol.service +} + require_path deploy/systemd/spacetimedb.service require_path deploy/systemd/genarrative-api.service require_path deploy/systemd/genarrative-database-backup.service require_path deploy/systemd/genarrative-database-backup.timer +require_path deploy/systemd/genarrative-health-patrol.service +require_path deploy/systemd/genarrative-health-patrol.timer require_path deploy/systemd/otelcol-contrib.service require_path deploy/otelcol/genarrative-debug.yaml require_path deploy/nginx/genarrative.conf @@ -754,7 +764,7 @@ echo "[server-provision] target=${DEPLOY_TARGET}, dry_run=${DRY_RUN}, nginx_conf run_cmd id require_root_for_real_provision install_nginx_brotli_modules -run_cmd mkdir -p "${SPACETIME_ROOT}" "${RELEASE_ROOT}" "$(dirname "${CURRENT_LINK}")" "$(dirname "${WEB_LINK}")" /etc/genarrative /var/lib/genarrative/maintenance /var/lib/genarrative/auth /var/lib/genarrative/tracking-outbox /var/lib/genarrative/database-backups +run_cmd mkdir -p "${SPACETIME_ROOT}" "${RELEASE_ROOT}" "$(dirname "${CURRENT_LINK}")" "$(dirname "${WEB_LINK}")" /etc/genarrative /var/lib/genarrative/maintenance /var/lib/genarrative/auth /var/lib/genarrative/tracking-outbox /var/lib/genarrative/database-backups /var/lib/genarrative/health-patrol if ! id spacetimedb >/dev/null 2>&1; then run_cmd useradd --system --home-dir "${SPACETIME_ROOT}" --shell /usr/sbin/nologin spacetimedb @@ -786,14 +796,18 @@ sync_spacetime_install "${SPACETIME_ROOT}" spacetimedb_service="$(mktemp)" api_service="$(mktemp)" database_backup_service="$(mktemp)" +health_patrol_service="$(mktemp)" render_spacetimedb_service >"${spacetimedb_service}" render_api_service >"${api_service}" render_database_backup_service >"${database_backup_service}" +render_health_patrol_service >"${health_patrol_service}" install_file "${spacetimedb_service}" /etc/systemd/system/spacetimedb.service 0644 install_file "${api_service}" /etc/systemd/system/genarrative-api.service 0644 install_file "${database_backup_service}" /etc/systemd/system/genarrative-database-backup.service 0644 install_file deploy/systemd/genarrative-database-backup.timer /etc/systemd/system/genarrative-database-backup.timer 0644 -rm -f "${spacetimedb_service}" "${api_service}" "${database_backup_service}" +install_file "${health_patrol_service}" /etc/systemd/system/genarrative-health-patrol.service 0644 +install_file deploy/systemd/genarrative-health-patrol.timer /etc/systemd/system/genarrative-health-patrol.timer 0644 +rm -f "${spacetimedb_service}" "${api_service}" "${database_backup_service}" "${health_patrol_service}" if [[ ! -f "${API_ENV_FILE}" ]]; then echo "+ create ${API_ENV_FILE} from example" @@ -828,7 +842,7 @@ if [[ "${ENABLE_SERVICES}" == "true" ]]; then if [[ "${ENABLE_OTELCOL:-true}" == "true" ]]; then run_cmd systemctl enable otelcol-contrib.service fi - run_cmd systemctl enable spacetimedb.service genarrative-api.service genarrative-database-backup.timer + run_cmd systemctl enable spacetimedb.service genarrative-api.service genarrative-database-backup.timer genarrative-health-patrol.timer if [[ "${ENABLE_OTELCOL:-true}" == "true" ]]; then run_cmd systemctl restart otelcol-contrib.service fi diff --git a/scripts/ops/production-health-patrol.mjs b/scripts/ops/production-health-patrol.mjs new file mode 100644 index 00000000..219d8e29 --- /dev/null +++ b/scripts/ops/production-health-patrol.mjs @@ -0,0 +1,477 @@ +#!/usr/bin/env node + +import {execFile} from 'node:child_process'; +import http from 'node:http'; +import https from 'node:https'; +import {mkdir, writeFile} from 'node:fs/promises'; +import {dirname} from 'node:path'; + +const STATUS_RANK = { + OK: 0, + WARNING: 1, + CRITICAL: 2, +}; + +const DEFAULT_PUBLIC_PATHS = [ + '/api/creation-entry/config', + '/api/runtime/puzzle/gallery', + '/api/runtime/custom-world-gallery', +]; + +const DEFAULT_SERVICES = [ + 'genarrative-api.service', + 'spacetimedb.service', + 'nginx.service', +]; + +function usage() { + console.log(`Usage: + node scripts/ops/production-health-patrol.mjs [options] + +Options: + --api-base-url API direct base URL, default http://127.0.0.1:8082 + --spacetime-base-url SpacetimeDB base URL, default http://127.0.0.1:3101 + --public-base-url Nginx/public base URL, default http://127.0.0.1 + --public-path Public API path to probe; repeatable + --status-file Write the last patrol result as JSON + --timeout-ms HTTP/command timeout, default 5000 + --slow-ms Mark successful probes slower than this as WARNING, default 3000 + --fail-on-warning Exit 1 when the total status is WARNING + --skip-journal Skip recent journal error scan + --json Print JSON instead of text +`); +} + +function readBoolEnv(name, fallback = false) { + const value = process.env[name]; + if (!value) { + return fallback; + } + return ['1', 'true', 'yes', 'on'].includes(value.trim().toLowerCase()); +} + +function parsePositiveInt(raw, fallback) { + const value = Number.parseInt(String(raw ?? ''), 10); + return Number.isFinite(value) && value > 0 ? value : fallback; +} + +function parseArgs(argv) { + const config = { + apiBaseUrl: + process.env.GENARRATIVE_HEALTH_PATROL_API_BASE_URL || + 'http://127.0.0.1:8082', + spacetimeBaseUrl: + process.env.GENARRATIVE_HEALTH_PATROL_SPACETIME_BASE_URL || + 'http://127.0.0.1:3101', + publicBaseUrl: + process.env.GENARRATIVE_HEALTH_PATROL_PUBLIC_BASE_URL || + process.env.GENARRATIVE_HEALTH_PATROL_API_BASE_URL || + 'http://127.0.0.1:8082', + publicPaths: [], + statusFile: process.env.GENARRATIVE_HEALTH_PATROL_STATUS_FILE || '', + timeoutMs: parsePositiveInt( + process.env.GENARRATIVE_HEALTH_PATROL_TIMEOUT_MS, + 5000, + ), + slowMs: parsePositiveInt( + process.env.GENARRATIVE_HEALTH_PATROL_SLOW_MS, + 3000, + ), + failOnWarning: readBoolEnv('GENARRATIVE_HEALTH_PATROL_FAIL_ON_WARNING'), + skipJournal: readBoolEnv('GENARRATIVE_HEALTH_PATROL_SKIP_JOURNAL'), + json: false, + webhookUrl: process.env.GENARRATIVE_HEALTH_PATROL_WEBHOOK_URL || '', + }; + + for (let index = 0; index < argv.length; index += 1) { + const arg = argv[index]; + switch (arg) { + case '-h': + case '--help': + usage(); + process.exit(0); + break; + case '--api-base-url': + config.apiBaseUrl = requireValue(argv, ++index, arg); + break; + case '--spacetime-base-url': + config.spacetimeBaseUrl = requireValue(argv, ++index, arg); + break; + case '--public-base-url': + config.publicBaseUrl = requireValue(argv, ++index, arg); + break; + case '--public-path': + config.publicPaths.push(requireValue(argv, ++index, arg)); + break; + case '--status-file': + config.statusFile = requireValue(argv, ++index, arg); + break; + case '--timeout-ms': + config.timeoutMs = parsePositiveInt(requireValue(argv, ++index, arg), 5000); + break; + case '--slow-ms': + config.slowMs = parsePositiveInt(requireValue(argv, ++index, arg), 3000); + break; + case '--fail-on-warning': + config.failOnWarning = true; + break; + case '--skip-journal': + config.skipJournal = true; + break; + case '--json': + config.json = true; + break; + default: + throw new Error(`未知参数: ${arg}`); + } + } + + if (config.publicPaths.length === 0) { + config.publicPaths = DEFAULT_PUBLIC_PATHS; + } + + return config; +} + +function requireValue(argv, index, flag) { + const value = argv[index]; + if (!value || value.startsWith('--')) { + throw new Error(`${flag} 缺少参数值`); + } + return value; +} + +function joinUrl(baseUrl, path) { + const base = baseUrl.endsWith('/') ? baseUrl.slice(0, -1) : baseUrl; + const suffix = path.startsWith('/') ? path : `/${path}`; + return `${base}${suffix}`; +} + +function maxStatus(checks) { + return checks.reduce((current, check) => { + return STATUS_RANK[check.status] > STATUS_RANK[current] ? check.status : current; + }, 'OK'); +} + +function checkResult(name, status, summary, details = {}) { + return { + name, + status, + summary, + ...details, + }; +} + +function runCommand(command, args, timeoutMs) { + return new Promise((resolve) => { + execFile( + command, + args, + { + timeout: timeoutMs, + windowsHide: true, + maxBuffer: 256 * 1024, + }, + (error, stdout, stderr) => { + resolve({ + command: [command, ...args].join(' '), + code: + typeof error?.code === 'number' + ? error.code + : error + ? 1 + : 0, + signal: error?.signal || '', + stdout: String(stdout || ''), + stderr: String(stderr || ''), + timedOut: Boolean(error?.killed), + error: error ? error.message : '', + }); + }, + ); + }); +} + +async function checkService(serviceName, timeoutMs) { + const result = await runCommand( + 'systemctl', + ['is-active', serviceName], + timeoutMs, + ); + const state = result.stdout.trim() || result.stderr.trim() || result.error; + if (result.code === 0 && state === 'active') { + return checkResult(`service:${serviceName}`, 'OK', 'active', { + command: result.command, + }); + } + + return checkResult( + `service:${serviceName}`, + 'CRITICAL', + `服务状态异常: ${state || `exit ${result.code}`}`, + { + command: result.command, + stderr: result.stderr.trim(), + }, + ); +} + +function requestUrl(url, timeoutMs) { + return new Promise((resolve) => { + const startedAt = Date.now(); + const parsed = new URL(url); + const client = parsed.protocol === 'https:' ? https : http; + const request = client.request( + parsed, + { + method: 'GET', + timeout: timeoutMs, + headers: { + 'User-Agent': 'genarrative-health-patrol/1.0', + Accept: 'application/json,text/plain,*/*', + }, + }, + (response) => { + let body = ''; + response.setEncoding('utf8'); + response.on('data', (chunk) => { + if (body.length < 2048) { + body += chunk; + } + }); + response.on('end', () => { + resolve({ + elapsedMs: Date.now() - startedAt, + statusCode: response.statusCode || 0, + body: body.slice(0, 2048), + }); + }); + }, + ); + + request.on('timeout', () => { + request.destroy(new Error(`timeout after ${timeoutMs}ms`)); + }); + request.on('error', (error) => { + resolve({ + elapsedMs: Date.now() - startedAt, + error: error.message, + }); + }); + request.end(); + }); +} + +async function checkHttp(name, url, config) { + const result = await requestUrl(url, config.timeoutMs); + const curlCommand = `curl -fsS --max-time ${Math.ceil(config.timeoutMs / 1000)} ${url}`; + + if (result.error) { + return checkResult(name, 'CRITICAL', `请求失败: ${result.error}`, { + command: curlCommand, + elapsedMs: result.elapsedMs, + }); + } + + const ok = result.statusCode >= 200 && result.statusCode < 300; + if (!ok) { + return checkResult( + name, + 'CRITICAL', + `HTTP ${result.statusCode},耗时 ${result.elapsedMs}ms`, + { + command: curlCommand, + elapsedMs: result.elapsedMs, + body: result.body.trim(), + }, + ); + } + + if (result.elapsedMs > config.slowMs) { + return checkResult( + name, + 'WARNING', + `HTTP ${result.statusCode} 但耗时偏高: ${result.elapsedMs}ms`, + { + command: curlCommand, + elapsedMs: result.elapsedMs, + }, + ); + } + + return checkResult(name, 'OK', `HTTP ${result.statusCode} ${result.elapsedMs}ms`, { + command: curlCommand, + elapsedMs: result.elapsedMs, + }); +} + +async function checkRecentJournal(config) { + const args = [ + '-u', + 'genarrative-api.service', + '-u', + 'spacetimedb.service', + '-u', + 'nginx.service', + '--since', + '15 minutes ago', + '-p', + 'err..alert', + '--no-pager', + '-o', + 'short-iso', + '-n', + '20', + ]; + const result = await runCommand('journalctl', args, config.timeoutMs); + + if (result.code !== 0) { + return checkResult('journal:recent-errors', 'WARNING', '无法读取最近错误日志', { + command: result.command, + stderr: result.stderr.trim() || result.error, + }); + } + + const lines = result.stdout + .split('\n') + .map((line) => line.trim()) + .filter((line) => line && line !== '-- No entries --'); + + if (lines.length === 0) { + return checkResult('journal:recent-errors', 'OK', '最近 15 分钟无 err..alert 日志', { + command: result.command, + }); + } + + return checkResult( + 'journal:recent-errors', + 'WARNING', + `最近 15 分钟有 ${lines.length} 条 err..alert 日志`, + { + command: result.command, + lines, + }, + ); +} + +async function writeStatusFile(statusFile, payload) { + if (!statusFile) { + return; + } + await mkdir(dirname(statusFile), {recursive: true}); + await writeFile(statusFile, `${JSON.stringify(payload, null, 2)}\n`, 'utf8'); +} + +async function notifyWebhook(config, payload) { + if (!config.webhookUrl || payload.status === 'OK') { + return; + } + + const body = JSON.stringify(payload); + const parsed = new URL(config.webhookUrl); + const client = parsed.protocol === 'https:' ? https : http; + + await new Promise((resolve) => { + const request = client.request( + parsed, + { + method: 'POST', + timeout: config.timeoutMs, + headers: { + 'Content-Type': 'application/json', + 'Content-Length': Buffer.byteLength(body), + }, + }, + (response) => { + response.resume(); + response.on('end', resolve); + }, + ); + request.on('timeout', () => { + request.destroy(new Error(`timeout after ${config.timeoutMs}ms`)); + }); + request.on('error', (error) => { + console.error(`[health-patrol] webhook notify failed: ${error.message}`); + resolve(); + }); + request.end(body); + }); +} + +function printText(payload) { + console.log(`[health-patrol] ${payload.status} ${payload.checkedAt}`); + for (const check of payload.checks) { + console.log(`[${check.status}] ${check.name}: ${check.summary}`); + if (check.command && check.status !== 'OK') { + console.log(` command: ${check.command}`); + } + if (check.stderr) { + console.log(` stderr: ${check.stderr}`); + } + if (check.body) { + console.log(` body: ${check.body}`); + } + if (Array.isArray(check.lines) && check.lines.length > 0) { + for (const line of check.lines) { + console.log(` ${line}`); + } + } + } +} + +async function main() { + const config = parseArgs(process.argv.slice(2)); + const checks = []; + + for (const serviceName of DEFAULT_SERVICES) { + checks.push(await checkService(serviceName, config.timeoutMs)); + } + + checks.push(await checkHttp('api:/healthz', joinUrl(config.apiBaseUrl, '/healthz'), config)); + checks.push(await checkHttp('api:/readyz', joinUrl(config.apiBaseUrl, '/readyz'), config)); + checks.push( + await checkHttp( + 'spacetimedb:/v1/ping', + joinUrl(config.spacetimeBaseUrl, '/v1/ping'), + config, + ), + ); + + for (const path of config.publicPaths) { + checks.push( + await checkHttp(`public:${path}`, joinUrl(config.publicBaseUrl, path), config), + ); + } + + if (!config.skipJournal) { + checks.push(await checkRecentJournal(config)); + } + + const payload = { + status: maxStatus(checks), + checkedAt: new Date().toISOString(), + host: process.env.HOSTNAME || '', + checks, + }; + + await writeStatusFile(config.statusFile, payload); + await notifyWebhook(config, payload); + + if (config.json) { + console.log(JSON.stringify(payload, null, 2)); + } else { + printText(payload); + } + + if (payload.status === 'CRITICAL') { + process.exit(2); + } + if (payload.status === 'WARNING' && config.failOnWarning) { + process.exit(1); + } +} + +main().catch((error) => { + console.error(`[health-patrol] CRITICAL ${error instanceof Error ? error.message : String(error)}`); + process.exit(2); +}); diff --git a/server-rs/crates/api-server/src/app.rs b/server-rs/crates/api-server/src/app.rs index a68f6db5..d9b4b0e3 100644 --- a/server-rs/crates/api-server/src/app.rs +++ b/server-rs/crates/api-server/src/app.rs @@ -269,6 +269,7 @@ mod tests { }; use reqwest::Client; use serde_json::Value; + use spacetime_client::{SpacetimeClientHealthSnapshot, SpacetimeClientStage}; use time::OffsetDateTime; use tokio::net::TcpListener; use tower::ServiceExt; @@ -724,6 +725,45 @@ mod tests { ); } + #[tokio::test] + async fn readyz_reports_spacetime_health_stage() { + let state = AppState::new(AppConfig::default()).expect("state should build"); + state.set_test_spacetime_health(SpacetimeClientHealthSnapshot { + ok: false, + stage: SpacetimeClientStage::ProcedureResult, + checked_at_micros: 1_713_680_000_000_000, + elapsed_ms: 2_000, + timeout_ms: 2_000, + error: Some("SpacetimeDB procedure 调用超时".to_string()), + last_success_at_micros: Some(1_713_679_999_000_000), + last_error: Some("SpacetimeDB procedure 调用超时".to_string()), + }); + let app = build_router(state); + + let response = app + .oneshot( + Request::builder() + .uri("/readyz") + .header("x-request-id", "req-ready-spacetime") + .body(Body::empty()) + .expect("readyz request should build"), + ) + .await + .expect("readyz request should succeed"); + + assert_eq!(response.status(), StatusCode::SERVICE_UNAVAILABLE); + let body = read_json_response(response).await; + assert_eq!(body["error"]["details"]["reason"], "spacetime_unhealthy"); + assert_eq!( + body["error"]["details"]["spacetime"]["stage"], + "procedure_result" + ); + assert_eq!( + body["error"]["details"]["spacetime"]["timeoutMs"], + Value::from(2_000) + ); + } + #[tokio::test] async fn creative_agent_draft_edit_rejects_unconfirmed_template_session() { let app = build_internal_creative_agent_app(); diff --git a/server-rs/crates/api-server/src/config.rs b/server-rs/crates/api-server/src/config.rs index 46a5f9f0..3ca4a2a6 100644 --- a/server-rs/crates/api-server/src/config.rs +++ b/server-rs/crates/api-server/src/config.rs @@ -12,6 +12,7 @@ use platform_speech::{ const DEFAULT_INTERNAL_API_SECRET: &str = "genarrative-dev-internal-bridge"; const SPACETIME_LOCAL_CONFIG_FILE: &str = "spacetime.local.json"; +const DEFAULT_SPACETIME_HEALTH_CHECK_TIMEOUT_SECONDS: u64 = 2; pub(crate) const DEFAULT_VECTOR_ENGINE_IMAGE_REQUEST_TIMEOUT_MS: u64 = 1_000_000; // 集中管理 api-server 的启动配置,避免入口层直接散落环境变量解析逻辑。 @@ -118,6 +119,7 @@ pub struct AppConfig { pub spacetime_token: Option, pub spacetime_pool_size: u32, pub spacetime_procedure_timeout: Duration, + pub spacetime_health_check_timeout: Duration, pub llm_provider: LlmProvider, pub llm_base_url: String, pub llm_api_key: Option, @@ -276,6 +278,9 @@ impl Default for AppConfig { spacetime_token: None, spacetime_pool_size: 4, spacetime_procedure_timeout: Duration::from_secs(30), + spacetime_health_check_timeout: Duration::from_secs( + DEFAULT_SPACETIME_HEALTH_CHECK_TIMEOUT_SECONDS, + ), llm_provider: LlmProvider::Ark, llm_base_url: String::new(), llm_api_key: None, @@ -704,6 +709,12 @@ impl AppConfig { config.spacetime_procedure_timeout = Duration::from_secs(spacetime_procedure_timeout_seconds); } + if let Some(spacetime_health_check_timeout_seconds) = + read_first_duration_seconds_env(&["GENARRATIVE_SPACETIME_HEALTH_CHECK_TIMEOUT_SECONDS"]) + { + config.spacetime_health_check_timeout = + Duration::from_secs(spacetime_health_check_timeout_seconds); + } if let Some(llm_provider) = read_first_llm_provider_env(&["GENARRATIVE_LLM_PROVIDER", "LLM_PROVIDER"]) @@ -1610,6 +1621,26 @@ mod tests { } } + #[test] + fn from_env_reads_spacetime_health_check_timeout() { + let _guard = ENV_LOCK + .get_or_init(|| Mutex::new(())) + .lock() + .expect("env lock should not poison"); + + unsafe { + std::env::remove_var("GENARRATIVE_SPACETIME_HEALTH_CHECK_TIMEOUT_SECONDS"); + std::env::set_var("GENARRATIVE_SPACETIME_HEALTH_CHECK_TIMEOUT_SECONDS", "3"); + } + + let config = AppConfig::from_env(); + assert_eq!(config.spacetime_health_check_timeout.as_secs(), 3); + + unsafe { + std::env::remove_var("GENARRATIVE_SPACETIME_HEALTH_CHECK_TIMEOUT_SECONDS"); + } + } + #[test] fn default_keeps_structured_llm_web_search_disabled() { let config = AppConfig::default(); diff --git a/server-rs/crates/api-server/src/health.rs b/server-rs/crates/api-server/src/health.rs index ee83a012..c6df5026 100644 --- a/server-rs/crates/api-server/src/health.rs +++ b/server-rs/crates/api-server/src/health.rs @@ -10,6 +10,7 @@ use crate::{ api_response::json_success_body, http_error::AppError, request_context::RequestContext, state::AppState, }; +use spacetime_client::SpacetimeClientHealthSnapshot; pub async fn health_check(Extension(request_context): Extension) -> Json { json_success_body( @@ -25,23 +26,49 @@ pub async fn readiness_check( State(state): State, Extension(request_context): Extension, ) -> Response { - if state.is_ready() { + if !state.is_ready() { + return AppError::from_status(StatusCode::SERVICE_UNAVAILABLE) + .with_message("api-server 正在退出,不再接收新流量") + .with_details(json!({ + "reason": "api_server_draining", + "ready": false, + })) + .into_response_with_context(Some(&request_context)); + } + + let spacetime_health = state.spacetime_health_check().await; + if spacetime_health.ok { return json_success_body( Some(&request_context), json!({ "ok": true, "ready": true, "service": "genarrative-api-server", + "spacetime": spacetime_health_to_json(&spacetime_health), }), ) .into_response(); } AppError::from_status(StatusCode::SERVICE_UNAVAILABLE) - .with_message("api-server 正在退出,不再接收新流量") + .with_message("SpacetimeDB 连接健康检查失败,api-server 暂不接收新流量") .with_details(json!({ - "reason": "api_server_draining", + "reason": "spacetime_unhealthy", "ready": false, + "spacetime": spacetime_health_to_json(&spacetime_health), })) .into_response_with_context(Some(&request_context)) } + +fn spacetime_health_to_json(snapshot: &SpacetimeClientHealthSnapshot) -> Value { + json!({ + "ok": snapshot.ok, + "stage": snapshot.stage.as_str(), + "checkedAtMicros": snapshot.checked_at_micros, + "elapsedMs": snapshot.elapsed_ms, + "timeoutMs": snapshot.timeout_ms, + "error": snapshot.error, + "lastSuccessAtMicros": snapshot.last_success_at_micros, + "lastError": snapshot.last_error, + }) +} diff --git a/server-rs/crates/api-server/src/state.rs b/server-rs/crates/api-server/src/state.rs index 49d0e381..3858914a 100644 --- a/server-rs/crates/api-server/src/state.rs +++ b/server-rs/crates/api-server/src/state.rs @@ -31,7 +31,9 @@ use platform_wechat::{WechatClient, WechatConfig, pay::WechatPayClient}; use serde_json::Value; use shared_contracts::creation_entry_config::CreationEntryConfigResponse; use shared_contracts::creative_agent::CreativeAgentSessionSnapshot; -use spacetime_client::{SpacetimeClient, SpacetimeClientConfig, SpacetimeClientError}; +use spacetime_client::{ + SpacetimeClient, SpacetimeClientConfig, SpacetimeClientError, SpacetimeClientHealthSnapshot, +}; use time::OffsetDateTime; use tokio::sync::{Semaphore, broadcast}; use tracing::{info, warn}; @@ -242,6 +244,8 @@ pub struct AppStateInner { refresh_cookie_config: RefreshCookieConfig, #[cfg(test)] test_creation_entry_config: Arc>>, + #[cfg(test)] + test_spacetime_health: Arc>>, oss_client: Option, #[cfg_attr(test, allow(dead_code))] auth_store: InMemoryAuthStore, @@ -418,6 +422,10 @@ impl AppState { test_creation_entry_config: Arc::new(Mutex::new(Some( crate::creation_entry_config::test_creation_entry_config_response(), ))), + #[cfg(test)] + test_spacetime_health: Arc::new(Mutex::new(Some( + SpacetimeClientHealthSnapshot::healthy_for_test(), + ))), oss_client, auth_store, password_entry_service, @@ -467,6 +475,30 @@ impl AppState { self.ready.store(false, Ordering::Release); } + pub async fn spacetime_health_check(&self) -> SpacetimeClientHealthSnapshot { + #[cfg(test)] + if let Some(snapshot) = self + .test_spacetime_health + .lock() + .expect("test spacetime health should lock") + .clone() + { + return snapshot; + } + + self.spacetime_client + .health_check(self.config.spacetime_health_check_timeout) + .await + } + + #[cfg(test)] + pub(crate) fn set_test_spacetime_health(&self, snapshot: SpacetimeClientHealthSnapshot) { + *self + .test_spacetime_health + .lock() + .expect("test spacetime health should lock") = Some(snapshot); + } + pub async fn upsert_creation_entry_type_config( &self, input: module_runtime::CreationEntryTypeAdminUpsertInput, diff --git a/server-rs/crates/spacetime-client/src/lib.rs b/server-rs/crates/spacetime-client/src/lib.rs index 20361ba8..6b43db21 100644 --- a/server-rs/crates/spacetime-client/src/lib.rs +++ b/server-rs/crates/spacetime-client/src/lib.rs @@ -105,8 +105,8 @@ pub mod auth; pub mod bark_battle; pub use bark_battle::{ BarkBattleDraftConfigUpsertRecordInput, BarkBattleDraftCreateRecordInput, - BarkBattleRunFinishRecordInput, BarkBattleRunStartRecordInput, - BarkBattleWorkDeleteRecordInput, BarkBattleWorkPublishRecordInput, + BarkBattleRunFinishRecordInput, BarkBattleRunStartRecordInput, BarkBattleWorkDeleteRecordInput, + BarkBattleWorkPublishRecordInput, }; pub mod big_fish; pub mod combat; @@ -132,7 +132,7 @@ use std::{ sync::atomic::{AtomicBool, Ordering}, sync::{Arc, Mutex}, thread::JoinHandle, - time::Duration, + time::{Duration, Instant}, }; use module_ai::{ @@ -241,6 +241,7 @@ use tokio::{ sync::{OwnedSemaphorePermit, RwLock, Semaphore, oneshot}, time::timeout, }; +use tracing::warn; use crate::module_bindings::*; @@ -253,6 +254,60 @@ pub struct SpacetimeClientConfig { pub procedure_timeout: Duration, } +#[derive(Clone, Copy, Debug, PartialEq, Eq)] +pub enum SpacetimeClientStage { + Ready, + PoolAcquire, + ConnectBuild, + ConnectHandshake, + ReadModelSubscribe, + ProcedureResult, + ReducerResult, + ReadCache, +} + +impl SpacetimeClientStage { + pub fn as_str(self) -> &'static str { + match self { + Self::Ready => "ready", + Self::PoolAcquire => "pool_acquire", + Self::ConnectBuild => "connect_build", + Self::ConnectHandshake => "connect_handshake", + Self::ReadModelSubscribe => "read_model_subscribe", + Self::ProcedureResult => "procedure_result", + Self::ReducerResult => "reducer_result", + Self::ReadCache => "read_cache", + } + } +} + +#[derive(Clone, Debug, PartialEq, Eq)] +pub struct SpacetimeClientHealthSnapshot { + pub ok: bool, + pub stage: SpacetimeClientStage, + pub checked_at_micros: i64, + pub elapsed_ms: u64, + pub timeout_ms: u64, + pub error: Option, + pub last_success_at_micros: Option, + pub last_error: Option, +} + +impl SpacetimeClientHealthSnapshot { + pub fn healthy_for_test() -> Self { + Self { + ok: true, + stage: SpacetimeClientStage::Ready, + checked_at_micros: current_unix_micros(), + elapsed_ms: 0, + timeout_ms: 0, + error: None, + last_success_at_micros: Some(current_unix_micros()), + last_error: None, + } + } +} + #[derive(Clone, Debug, PartialEq, Eq)] pub struct AuthStoreSnapshotRecord { pub snapshot_json: Option, @@ -270,6 +325,7 @@ pub struct AuthStoreSnapshotImportRecord { pub struct SpacetimeClient { config: SpacetimeClientConfig, pool: Arc, + health_state: Arc>, creation_entry_config_cache: Arc>>, custom_world_gallery_legacy_sync_attempted: Arc, } @@ -296,6 +352,24 @@ struct SpacetimeConnectionPool { permits: Arc, } +#[derive(Debug, Default)] +struct SpacetimeClientHealthState { + last_success_at_micros: Option, + last_error: Option, +} + +#[derive(Debug)] +struct SpacetimeStageError { + stage: SpacetimeClientStage, + error: SpacetimeClientError, +} + +impl SpacetimeStageError { + fn new(stage: SpacetimeClientStage, error: SpacetimeClientError) -> Self { + Self { stage, error } + } +} + struct PooledConnectionSlot { connection: Option, in_use: bool, @@ -341,6 +415,7 @@ impl SpacetimeClient { Self { config, pool, + health_state: Arc::new(RwLock::new(SpacetimeClientHealthState::default())), creation_entry_config_cache: Arc::new(RwLock::new(None)), custom_world_gallery_legacy_sync_attempted: Arc::new(AtomicBool::new(false)), } @@ -354,29 +429,58 @@ impl SpacetimeClient { where T: Send + 'static, { + let started_at = Instant::now(); let metrics_guard = telemetry::begin_procedure(procedure); let (sender, receiver) = oneshot::channel(); let result_sender = Arc::new(Mutex::new(Some(sender))); - let final_result = match self.acquire_connection().await { + let final_result = match self + .acquire_connection_with_timeout(self.config.procedure_timeout) + .await + { Ok(lease) => { - let result = if let Some(connection) = lease.connection.as_ref() { + let (result, failed_stage) = if let Some(connection) = lease.connection.as_ref() { call(&connection.connection, result_sender.clone()); - match timeout(self.config.procedure_timeout, receiver).await { - Ok(inner) => match inner { - Ok(value) => value, - Err(_) => Err(SpacetimeClientError::ConnectDropped), + let stage = SpacetimeClientStage::ProcedureResult; + ( + match timeout(self.config.procedure_timeout, receiver).await { + Ok(inner) => match inner { + Ok(value) => value, + Err(_) => Err(SpacetimeClientError::ConnectDropped), + }, + Err(_) => Err(Self::resolve_timeout_error(Some(connection), stage)), }, - Err(_) => Err(Self::resolve_timeout_error(Some(connection))), - } + stage, + ) } else { - Err(SpacetimeClientError::Runtime( - "SpacetimeDB 连接租约缺少连接".to_string(), - )) + ( + Err(SpacetimeClientError::Runtime( + "SpacetimeDB 连接租约缺少连接".to_string(), + )), + SpacetimeClientStage::ProcedureResult, + ) }; self.release_connection(lease).await; + if let Err(error) = &result { + log_spacetime_client_failure( + "procedure", + procedure, + failed_stage, + started_at, + error, + ); + } result } - Err(error) => Err(error), + Err(error) => { + log_spacetime_client_failure( + "procedure", + procedure, + error.stage, + started_at, + &error.error, + ); + Err(error.error) + } }; metrics_guard.finish(&final_result); @@ -388,29 +492,58 @@ impl SpacetimeClient { procedure: &'static str, call: impl FnOnce(&DbConnection, ReducerResultSender) + Send + 'static, ) -> Result<(), SpacetimeClientError> { + let started_at = Instant::now(); let metrics_guard = telemetry::begin_procedure(procedure); let (sender, receiver) = oneshot::channel(); let result_sender = Arc::new(Mutex::new(Some(sender))); - let final_result = match self.acquire_connection().await { + let final_result = match self + .acquire_connection_with_timeout(self.config.procedure_timeout) + .await + { Ok(lease) => { - let result = if let Some(connection) = lease.connection.as_ref() { + let (result, failed_stage) = if let Some(connection) = lease.connection.as_ref() { call(&connection.connection, result_sender.clone()); - match timeout(self.config.procedure_timeout, receiver).await { - Ok(inner) => match inner { - Ok(value) => value, - Err(_) => Err(SpacetimeClientError::ConnectDropped), + let stage = SpacetimeClientStage::ReducerResult; + ( + match timeout(self.config.procedure_timeout, receiver).await { + Ok(inner) => match inner { + Ok(value) => value, + Err(_) => Err(SpacetimeClientError::ConnectDropped), + }, + Err(_) => Err(Self::resolve_timeout_error(Some(connection), stage)), }, - Err(_) => Err(Self::resolve_timeout_error(Some(connection))), - } + stage, + ) } else { - Err(SpacetimeClientError::Runtime( - "SpacetimeDB 连接租约缺少连接".to_string(), - )) + ( + Err(SpacetimeClientError::Runtime( + "SpacetimeDB 连接租约缺少连接".to_string(), + )), + SpacetimeClientStage::ReducerResult, + ) }; self.release_connection(lease).await; + if let Err(error) = &result { + log_spacetime_client_failure( + "reducer", + procedure, + failed_stage, + started_at, + error, + ); + } result } - Err(error) => Err(error), + Err(error) => { + log_spacetime_client_failure( + "reducer", + procedure, + error.stage, + started_at, + &error.error, + ); + Err(error.error) + } }; metrics_guard.finish(&final_result); @@ -425,11 +558,22 @@ impl SpacetimeClient { where T: Send + 'static, { + let started_at = Instant::now(); let metrics_guard = telemetry::begin_read(read_name); - let lease = match self.acquire_connection().await { + let lease = match self + .acquire_connection_with_timeout(self.config.procedure_timeout) + .await + { Ok(lease) => lease, Err(error) => { - let final_result = Err(error); + log_spacetime_client_failure( + "read", + read_name, + error.stage, + started_at, + &error.error, + ); + let final_result = Err(error.error); metrics_guard.finish(&final_result); return final_result; } @@ -443,6 +587,15 @@ impl SpacetimeClient { }; self.release_connection(lease).await; + if let Err(error) = &final_result { + log_spacetime_client_failure( + "read", + read_name, + SpacetimeClientStage::ReadCache, + started_at, + error, + ); + } metrics_guard.finish(&final_result); final_result } @@ -455,14 +608,75 @@ impl SpacetimeClient { self.creation_entry_config_cache.read().await.clone() } - async fn acquire_connection(&self) -> Result { - let permit = timeout( - self.config.procedure_timeout, - self.pool.permits.clone().acquire_owned(), - ) - .await - .map_err(|_| SpacetimeClientError::Timeout)? - .map_err(|error| SpacetimeClientError::Runtime(error.to_string()))?; + pub async fn health_check(&self, probe_timeout: Duration) -> SpacetimeClientHealthSnapshot { + let timeout = if probe_timeout.is_zero() { + DEFAULT_PROCEDURE_TIMEOUT + } else { + probe_timeout + }; + let started_at = Instant::now(); + let checked_at_micros = current_unix_micros(); + let result = self.acquire_connection_with_timeout(timeout).await; + match result { + Ok(lease) => { + self.release_connection(lease).await; + let mut health_state = self.health_state.write().await; + health_state.last_success_at_micros = Some(checked_at_micros); + health_state.last_error = None; + SpacetimeClientHealthSnapshot { + ok: true, + stage: SpacetimeClientStage::Ready, + checked_at_micros, + elapsed_ms: duration_millis_u64(started_at.elapsed()), + timeout_ms: duration_millis_u64(timeout), + error: None, + last_success_at_micros: health_state.last_success_at_micros, + last_error: health_state.last_error.clone(), + } + } + Err(error) => { + log_spacetime_client_failure( + "health_check", + "spacetime_connection", + error.stage, + started_at, + &error.error, + ); + let mut health_state = self.health_state.write().await; + let error_message = error.error.to_string(); + health_state.last_error = Some(error_message.clone()); + SpacetimeClientHealthSnapshot { + ok: false, + stage: error.stage, + checked_at_micros, + elapsed_ms: duration_millis_u64(started_at.elapsed()), + timeout_ms: duration_millis_u64(timeout), + error: Some(error_message), + last_success_at_micros: health_state.last_success_at_micros, + last_error: health_state.last_error.clone(), + } + } + } + } + + async fn acquire_connection_with_timeout( + &self, + operation_timeout: Duration, + ) -> Result { + let permit = timeout(operation_timeout, self.pool.permits.clone().acquire_owned()) + .await + .map_err(|_| { + SpacetimeStageError::new( + SpacetimeClientStage::PoolAcquire, + SpacetimeClientError::Timeout, + ) + })? + .map_err(|error| { + SpacetimeStageError::new( + SpacetimeClientStage::PoolAcquire, + SpacetimeClientError::Runtime(error.to_string()), + ) + })?; loop { for (slot_index, slot) in self.pool.slots.iter().enumerate() { @@ -480,7 +694,7 @@ impl SpacetimeClient { let connection = if let Some(connection) = reusable_connection { connection } else { - match self.build_pooled_connection().await { + match self.build_pooled_connection(operation_timeout).await { Ok(connection) => connection, Err(error) => { let mut slot_guard = self.pool.slots[slot_index].lock().await; @@ -502,7 +716,10 @@ impl SpacetimeClient { } } - async fn build_pooled_connection(&self) -> Result { + async fn build_pooled_connection( + &self, + operation_timeout: Duration, + ) -> Result { let config = self.config.clone(); let broken = Arc::new(AtomicBool::new(false)); let (sender, receiver) = oneshot::channel::>(); @@ -510,7 +727,7 @@ impl SpacetimeClient { let broken_flag = broken.clone(); let disconnect_sender = connect_sender.clone(); let connection = timeout( - self.config.procedure_timeout, + operation_timeout, tokio::task::spawn_blocking(move || { DbConnection::builder() .with_uri(config.server_url) @@ -534,17 +751,41 @@ impl SpacetimeClient { }), ) .await - .map_err(|_| SpacetimeClientError::Timeout)? - .map_err(|error| SpacetimeClientError::Runtime(error.to_string()))??; + .map_err(|_| { + SpacetimeStageError::new( + SpacetimeClientStage::ConnectBuild, + SpacetimeClientError::Timeout, + ) + })? + .map_err(|error| { + SpacetimeStageError::new( + SpacetimeClientStage::ConnectBuild, + SpacetimeClientError::Runtime(error.to_string()), + ) + })? + .map_err(|error| SpacetimeStageError::new(SpacetimeClientStage::ConnectBuild, error))?; let runner = connection.run_threaded(); - timeout(self.config.procedure_timeout, receiver) + timeout(operation_timeout, receiver) .await - .map_err(|_| SpacetimeClientError::Timeout)? - .map_err(|_| SpacetimeClientError::ConnectDropped)??; + .map_err(|_| { + SpacetimeStageError::new( + SpacetimeClientStage::ConnectHandshake, + SpacetimeClientError::Timeout, + ) + })? + .map_err(|_| { + SpacetimeStageError::new( + SpacetimeClientStage::ConnectHandshake, + SpacetimeClientError::ConnectDropped, + ) + })? + .map_err(|error| { + SpacetimeStageError::new(SpacetimeClientStage::ConnectHandshake, error) + })?; let read_model_subscriptions = self - .subscribe_cached_read_models(&connection, broken.clone()) + .subscribe_cached_read_models(&connection, broken.clone(), operation_timeout) .await?; Ok(PooledConnection { @@ -559,7 +800,8 @@ impl SpacetimeClient { &self, connection: &DbConnection, broken: Arc, - ) -> Result, SpacetimeClientError> { + operation_timeout: Duration, + ) -> Result, SpacetimeStageError> { let mut subscriptions = Vec::new(); for query in [ "SELECT * FROM public_work_gallery_entry", @@ -576,7 +818,13 @@ impl SpacetimeClient { "SELECT * FROM big_fish_gallery_view", ] { let subscription = self - .subscribe_cached_read_model_query(connection, broken.clone(), query, true) + .subscribe_cached_read_model_query( + connection, + broken.clone(), + query, + true, + operation_timeout, + ) .await?; subscriptions.push(subscription); } @@ -597,7 +845,13 @@ impl SpacetimeClient { "SELECT * FROM asset_object", ] { if let Ok(subscription) = self - .subscribe_cached_read_model_query(connection, broken.clone(), query, false) + .subscribe_cached_read_model_query( + connection, + broken.clone(), + query, + false, + operation_timeout, + ) .await { subscriptions.push(subscription); @@ -613,7 +867,8 @@ impl SpacetimeClient { broken: Arc, query: &'static str, mark_broken_on_error: bool, - ) -> Result { + operation_timeout: Duration, + ) -> Result { let (sender, receiver) = oneshot::channel::>(); let applied_sender = Arc::new(Mutex::new(Some(sender))); let on_applied_sender = applied_sender.clone(); @@ -635,10 +890,23 @@ impl SpacetimeClient { }) .subscribe(query); - timeout(self.config.procedure_timeout, receiver) + timeout(operation_timeout, receiver) .await - .map_err(|_| SpacetimeClientError::Timeout)? - .map_err(|_| SpacetimeClientError::ConnectDropped)??; + .map_err(|_| { + SpacetimeStageError::new( + SpacetimeClientStage::ReadModelSubscribe, + SpacetimeClientError::Timeout, + ) + })? + .map_err(|_| { + SpacetimeStageError::new( + SpacetimeClientStage::ReadModelSubscribe, + SpacetimeClientError::ConnectDropped, + ) + })? + .map_err(|error| { + SpacetimeStageError::new(SpacetimeClientStage::ReadModelSubscribe, error) + })?; Ok(subscription) } @@ -658,7 +926,10 @@ impl SpacetimeClient { } // 超时后必须统一归还租约;若连接已先一步断开则回传断线,否则标记坏连接并回传超时。 - fn resolve_timeout_error(connection: Option<&PooledConnection>) -> SpacetimeClientError { + fn resolve_timeout_error( + connection: Option<&PooledConnection>, + _stage: SpacetimeClientStage, + ) -> SpacetimeClientError { if let Some(connection) = connection { if connection.is_broken() { return SpacetimeClientError::ConnectDropped; @@ -681,6 +952,27 @@ fn current_public_work_day() -> i64 { current_unix_micros().div_euclid(PUBLIC_WORK_PLAY_DAY_MICROS) } +fn duration_millis_u64(duration: Duration) -> u64 { + duration.as_millis().min(u64::MAX as u128) as u64 +} + +fn log_spacetime_client_failure( + operation_kind: &'static str, + operation_name: &'static str, + stage: SpacetimeClientStage, + started_at: Instant, + error: &SpacetimeClientError, +) { + warn!( + operation_kind, + operation_name, + spacetime_stage = stage.as_str(), + elapsed_ms = duration_millis_u64(started_at.elapsed()), + error = %error, + "SpacetimeDB client operation failed" + ); +} + fn public_work_recent_play_counts( connection: &DbConnection, source_type: &str, From e29992cf0158686777d089da7c8da68607b4a6f4 Mon Sep 17 00:00:00 2001 From: kdletters Date: Wed, 10 Jun 2026 14:36:56 +0800 Subject: [PATCH 4/9] =?UTF-8?q?=E7=82=B9=E8=B5=9E=E5=92=8C=E6=94=B9?= =?UTF-8?q?=E9=80=A0=E5=BC=80=E5=85=B3=E5=8A=A0=E5=85=A5=E5=90=8E=E5=8F=B0?= =?UTF-8?q?=E9=85=8D=E7=BD=AE?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .hermes/shared-memory/decision-log.md | 8 + .hermes/shared-memory/pitfalls.md | 2 + apps/admin-web/src/api/adminApiClient.ts | 47 +- apps/admin-web/src/api/adminApiTypes.ts | 23 +- .../AdminCreationEntrySwitchPage.test.tsx | 117 +++- .../pages/AdminCreationEntrySwitchPage.tsx | 602 ++++++++++++------ ...】server-rs与SpacetimeDB数据契约-2026-05-15.md | 14 +- ...发运维】本地开发验证与生产运维-2026-05-15.md | 2 + ...玩法创作】平台入口与玩法链路-2026-05-15.md | 6 +- packages/shared/src/contracts/auth.ts | 3 +- server-rs/crates/api-server/src/admin.rs | 82 ++- server-rs/crates/api-server/src/app.rs | 95 ++- .../api-server/src/creation_entry_config.rs | 102 +++ .../crates/api-server/src/modules/admin.rs | 8 +- server-rs/crates/api-server/src/state.rs | 85 +++ .../crates/module-runtime/src/application.rs | 223 ++++++- server-rs/crates/module-runtime/src/domain.rs | 23 + server-rs/crates/module-runtime/src/lib.rs | 32 + .../crates/shared-contracts/src/admin.rs | 14 +- .../src/creation_entry_config.rs | 23 + .../spacetime-client/src/mapper/runtime.rs | 15 + .../spacetime-client/src/module_bindings.rs | 4 + .../creation_entry_config_snapshot_type.rs | 1 + .../creation_entry_config_type.rs | 7 + .../crates/spacetime-client/src/runtime.rs | 28 + .../crates/spacetime-module/src/migration.rs | 3 + .../src/runtime/creation_entry_config.rs | 54 ++ .../PlatformEntryFlowShellImpl.tsx | 178 +++--- .../platformPublicWorkDetailFlow.test.ts | 50 +- .../platformPublicWorkDetailFlow.ts | 69 +- src/services/apiClient.test.ts | 67 +- src/services/apiClient.ts | 26 +- src/services/creationEntryConfigService.ts | 11 + 33 files changed, 1644 insertions(+), 380 deletions(-) diff --git a/.hermes/shared-memory/decision-log.md b/.hermes/shared-memory/decision-log.md index 11bd0e53..3c36b390 100644 --- a/.hermes/shared-memory/decision-log.md +++ b/.hermes/shared-memory/decision-log.md @@ -16,6 +16,14 @@ --- +## 2026-06-10 公开作品互动能力进入后台全局配置 + +- 背景:作品详情页的点赞和改造能力原本由前端和各玩法 handler 的硬编码能力矩阵决定,后台无法临时关闭某类公开作品的互动入口,直接关闭创作入口又会误伤已有作品读取和游玩。 +- 决策:公开作品点赞 / 改造能力作为 `creation_entry_config.public_work_interactions_json` 的全局矩阵保存,不进入单个 `creation_entry_type_config`。`GET /api/creation-entry/config` 下发 `publicWorkInteractions`;后台通过 `/admin/api/creation-entry/config/interactions` 按 `sourceType` 保存点赞、改造开关和关闭提示;api-server 只对已经接入后端动作的 RPG / custom-world、大鱼吃小鱼和拼图 like / remix 路由做同源熔断,公开列表、详情读取、已发布作品启动和运行态请求不受影响。 +- 影响范围:`CreationEntryConfigResponse`、`AdminCreationEntryConfigResponse`、`module-runtime` 默认矩阵、`spacetime-module` 表字段和 procedure、`spacetime-client` 绑定、后台入口开关页、平台作品详情点赞 / 改造意图解析。 +- 验证方式:`npm run spacetime:generate`、`npm run check:spacetime-schema`、`cargo test -p module-runtime public_work_interaction_config_defaults_and_overrides --manifest-path server-rs/Cargo.toml`、`cargo test -p api-server public_work_interactions --manifest-path server-rs/Cargo.toml`、后台和前台作品详情互动相关前端测试。 +- 关联文档:`docs/【后端架构】server-rs与SpacetimeDB数据契约-2026-05-15.md`、`docs/【玩法创作】平台入口与玩法链路-2026-05-15.md`。 + ## 2026-06-10 dev Gitea 提供内网 HTTP 入口 - 背景:release / dev 目标 agent 需要从 dev 自托管 Gitea 拉取仓库;继续走 `https://git.genarrative.world/...` 会绕公网链路,`10.2.0.10:3000` 又受云侧端口策略影响不能作为稳定入口。 diff --git a/.hermes/shared-memory/pitfalls.md b/.hermes/shared-memory/pitfalls.md index 2f0d3fd5..cef625e0 100644 --- a/.hermes/shared-memory/pitfalls.md +++ b/.hermes/shared-memory/pitfalls.md @@ -1135,6 +1135,8 @@ - 现象:刷新网页后,用户明明有本地 access token,却回到未登录状态。 - 原因:`AuthGate` hydrate 曾先强制调用 `refreshStoredAccessToken()`;当 refresh cookie 临时失效、代理错配或后端返回 `401` 时,该方法会先清空本地 access token,随后 `/api/auth/me` 只能恢复成未登录。 - 处理:`refreshStoredAccessToken()` 增加 `clearOnFailure` 选项;`AuthGate` 在已有本地 access token 时先用 `/api/auth/me` 确认用户,确认成功后再后台 refresh 续期与写每日登录埋点,后台 refresh 失败不清 token。 +- 追加处理:`/api/auth/refresh` 只有明确返回 `401` / `403` 时才代表登录态权威失效,可以清本地 access token 并触发全局 auth 变化;服务器重启、Nginx 502/503/504、浏览器 `Failed to fetch` 或 refresh 响应契约异常都属于暂时不可用,不能把已有本地 token 清掉,否则重启窗口会把所有打开页面踢成未登录。 +- 契约:`/api/auth/refresh` 成功响应按共享契约 `RefreshSessionResponse { token }` 解析;测试 mock 不要额外塞 `{ ok: true, token }` 遮住真实恢复路径。 - 验证:`npm run test -- src/services/apiClient.test.ts src/components/auth/AuthGate.test.tsx -t "explicit refresh opts out|auth gate keeps a valid local token login"`。 - 关联:`src/services/apiClient.ts`、`src/components/auth/AuthGate.tsx`、`docs/technical/AUTH_RESTORE_AND_RECOMMEND_LOADING_FIX_2026-05-09.md`。 diff --git a/apps/admin-web/src/api/adminApiClient.ts b/apps/admin-web/src/api/adminApiClient.ts index ef176285..0a3a1792 100644 --- a/apps/admin-web/src/api/adminApiClient.ts +++ b/apps/admin-web/src/api/adminApiClient.ts @@ -20,6 +20,7 @@ import type { AdminUpsertProfileRechargeProductRequest, AdminUpsertProfileRedeemCodeRequest, AdminUpsertProfileTaskConfigRequest, + AdminUpsertPublicWorkInteractionConfigRequest, AdminWorkVisibilityListResponse, ApiErrorEnvelope, ApiMeta, @@ -129,16 +130,16 @@ export async function request( export function loginAdmin(username: string, password: string) { return request('/admin/api/login', { method: 'POST', - body: {username, password}, + body: { username, password }, }); } export function getAdminMe(token: string) { - return request('/admin/api/me', {token}); + return request('/admin/api/me', { token }); } export function getAdminOverview(token: string) { - return request('/admin/api/overview', {token}); + return request('/admin/api/overview', { token }); } export function getAdminDatabaseTables(token: string) { @@ -154,7 +155,7 @@ export function getAdminDatabaseTableRows( ) { return request( `/admin/api/database/tables/${encodeURIComponent(tableName)}/rows${buildDatabaseTableRowsQuery(query)}`, - {token}, + { token }, ); } @@ -172,15 +173,14 @@ export function listAdminTrackingEvents( ) { return request( `/admin/api/tracking/events${buildQueryString(query)}`, - {token}, + { token }, ); } - export function getAdminCreationEntryConfig(token: string) { return request( '/admin/api/creation-entry/config', - {token}, + { token }, ); } @@ -213,10 +213,25 @@ export function upsertAdminCreationEntryBanners( ); } +/** 保存公开作品详情页点赞 / 改造能力配置。 */ +export function upsertAdminPublicWorkInteractions( + token: string, + payload: AdminUpsertPublicWorkInteractionConfigRequest, +) { + return request( + '/admin/api/creation-entry/config/interactions', + { + method: 'POST', + token, + body: payload, + }, + ); +} + export function listAdminWorkVisibility(token: string) { return request( '/admin/api/works/visibility', - {token}, + { token }, ); } @@ -237,7 +252,7 @@ export function updateAdminWorkVisibility( export function listProfileRedeemCodes(token: string) { return request( '/admin/api/profile/redeem-codes', - {token}, + { token }, ); } @@ -258,7 +273,7 @@ export function upsertProfileRedeemCode( export function listProfileInviteCodes(token: string) { return request( '/admin/api/profile/invite-codes', - {token}, + { token }, ); } @@ -293,7 +308,7 @@ export function disableProfileRedeemCode( export function listProfileTaskConfigs(token: string) { return request( '/admin/api/profile/tasks', - {token}, + { token }, ); } @@ -325,7 +340,7 @@ export function disableProfileTaskConfig( export function listProfileRechargeProducts(token: string) { return request( '/admin/api/profile/recharge-products', - {token}, + { token }, ); } @@ -414,13 +429,13 @@ function buildAdminApiError( ) { const envelope = isRecord(payload) ? (payload as ApiErrorEnvelope) : null; const errorPayload = envelope?.error; - const details = isRecord(errorPayload?.details) - ? errorPayload.details - : null; + const details = isRecord(errorPayload?.details) ? errorPayload.details : null; const detailsMessage = typeof details?.message === 'string' ? details.message.trim() : ''; const payloadMessage = - typeof errorPayload?.message === 'string' ? errorPayload.message.trim() : ''; + typeof errorPayload?.message === 'string' + ? errorPayload.message.trim() + : ''; const topLevelMessage = typeof envelope?.message === 'string' ? envelope.message.trim() : ''; const message = diff --git a/apps/admin-web/src/api/adminApiTypes.ts b/apps/admin-web/src/api/adminApiTypes.ts index 2eb58477..d3cb8ebe 100644 --- a/apps/admin-web/src/api/adminApiTypes.ts +++ b/apps/admin-web/src/api/adminApiTypes.ts @@ -107,12 +107,7 @@ export interface AdminDebugHeaderInput { value: string; } -export type AdminDebugHttpMethod = - | 'GET' - | 'POST' - | 'PUT' - | 'PATCH' - | 'DELETE'; +export type AdminDebugHttpMethod = 'GET' | 'POST' | 'PUT' | 'PATCH' | 'DELETE'; export interface AdminDebugHttpRequest { method: AdminDebugHttpMethod; @@ -143,11 +138,11 @@ export interface AdminTrackingEventListQuery { limit?: number; } - /** 后台创作入口配置响应,同时包含模板入口和独立公告配置。 */ export interface AdminCreationEntryConfigResponse { entries: AdminCreationEntryTypeConfigPayload[]; eventBanners: AdminCreationEntryEventBannerPayload[]; + publicWorkInteractions: PublicWorkInteractionConfigPayload[]; } /** 后台创作入口公告位配置项;旧结构化 banner 字段仅保留兼容。 */ @@ -201,6 +196,20 @@ export interface AdminUpsertCreationEntryEventBannersRequest { eventBannersJson: string; } +/** 后台公开作品详情页互动能力配置项。 */ +export interface PublicWorkInteractionConfigPayload { + sourceType: string; + likeEnabled: boolean; + remixEnabled: boolean; + likeDisabledMessage: string; + remixDisabledMessage: string; +} + +/** 后台保存公开作品点赞 / 改造能力配置请求体。 */ +export interface AdminUpsertPublicWorkInteractionConfigRequest { + publicWorkInteractions: PublicWorkInteractionConfigPayload[]; +} + /** 后台统一创作工作台契约表单的传输结构。 */ export interface UnifiedCreationSpecPayload { playId: string; diff --git a/apps/admin-web/src/pages/AdminCreationEntrySwitchPage.test.tsx b/apps/admin-web/src/pages/AdminCreationEntrySwitchPage.test.tsx index 4400b5da..0022c263 100644 --- a/apps/admin-web/src/pages/AdminCreationEntrySwitchPage.test.tsx +++ b/apps/admin-web/src/pages/AdminCreationEntrySwitchPage.test.tsx @@ -1,19 +1,26 @@ /* @vitest-environment jsdom */ -import {fireEvent, render, screen, waitFor, within} from '@testing-library/react'; +import { + fireEvent, + render, + screen, + waitFor, + within, +} from '@testing-library/react'; import userEvent from '@testing-library/user-event'; -import {beforeEach, expect, test, vi} from 'vitest'; +import { beforeEach, expect, test, vi } from 'vitest'; import { getAdminCreationEntryConfig, upsertAdminCreationEntryBanners, upsertAdminCreationEntryConfig, + upsertAdminPublicWorkInteractions, } from '../api/adminApiClient'; import type { AdminCreationEntryConfigResponse, UnifiedCreationSpecPayload, } from '../api/adminApiTypes'; -import {AdminCreationEntrySwitchPage} from './AdminCreationEntrySwitchPage'; +import { AdminCreationEntrySwitchPage } from './AdminCreationEntrySwitchPage'; vi.mock('../api/adminApiClient', () => ({ formatAdminApiError: vi.fn((error: unknown) => @@ -23,6 +30,7 @@ vi.mock('../api/adminApiClient', () => ({ isAdminApiError: vi.fn(() => false), upsertAdminCreationEntryBanners: vi.fn(), upsertAdminCreationEntryConfig: vi.fn(), + upsertAdminPublicWorkInteractions: vi.fn(), })); const puzzleSpec: UnifiedCreationSpecPayload = { @@ -55,6 +63,15 @@ const configResponse: AdminCreationEntryConfigResponse = { htmlCode: '
后台公告
', }, ], + publicWorkInteractions: [ + { + sourceType: 'puzzle', + likeEnabled: true, + remixEnabled: true, + likeDisabledMessage: '拼图点赞暂不可用。', + remixDisabledMessage: '拼图作品改造暂不可用。', + }, + ], entries: [ { id: 'puzzle', @@ -79,16 +96,24 @@ beforeEach(() => { vi.mocked(getAdminCreationEntryConfig).mockResolvedValue(configResponse); vi.mocked(upsertAdminCreationEntryBanners).mockResolvedValue(configResponse); vi.mocked(upsertAdminCreationEntryConfig).mockResolvedValue(configResponse); + vi.mocked(upsertAdminPublicWorkInteractions).mockResolvedValue( + configResponse, + ); }); test('创作入口后台展示并保存统一创作契约', async () => { const user = userEvent.setup(); - const {container} = render( - , + const { container } = render( + , ); await screen.findByText('pictureDescription'); - expect(container.querySelector('.admin-subsection .admin-info-list')).not.toBeNull(); + expect( + container.querySelector('.admin-subsection .admin-info-list'), + ).not.toBeNull(); expect( container.querySelector('.admin-subsection .admin-info-list')?.textContent, ).toContain('拼图'); @@ -97,22 +122,22 @@ test('创作入口后台展示并保存统一创作契约', async () => { expect(screen.queryByLabelText('契约 JSON')).toBeNull(); expect(screen.queryByText('puzzle-generating')).toBeNull(); - await user.click(screen.getByRole('button', {name: '修改契约'})); - const dialog = screen.getByRole('dialog', {name: '统一创作契约'}); + await user.click(screen.getByRole('button', { name: '修改契约' })); + const dialog = screen.getByRole('dialog', { name: '统一创作契约' }); expect(within(dialog).queryByLabelText('玩法 ID')).toBeNull(); expect(within(dialog).queryByLabelText('工作台阶段')).toBeNull(); expect(within(dialog).queryByLabelText('生成阶段')).toBeNull(); expect(within(dialog).queryByLabelText('结果阶段')).toBeNull(); fireEvent.change(within(dialog).getByLabelText('泥点消耗'), { - target: {value: '12'}, + target: { value: '12' }, }); - await user.click(within(dialog).getByRole('button', {name: '应用修改'})); + await user.click(within(dialog).getByRole('button', { name: '应用修改' })); - expect(screen.queryByRole('dialog', {name: '统一创作契约'})).toBeNull(); + expect(screen.queryByRole('dialog', { name: '统一创作契约' })).toBeNull(); expect(screen.getByText('12泥点数')).toBeTruthy(); - await user.click(screen.getByRole('button', {name: '保存入库'})); - await user.click(screen.getByRole('button', {name: '确认'})); + await user.click(screen.getByRole('button', { name: '保存入库' })); + await user.click(screen.getByRole('button', { name: '确认' })); await waitFor(() => { expect(upsertAdminCreationEntryConfig).toHaveBeenCalledWith( @@ -150,9 +175,11 @@ test('创作入口后台拒绝 playId 不一致的统一创作契约', async () ); await screen.findByText('pictureDescription'); - await user.click(screen.getByRole('button', {name: '保存入库'})); + await user.click(screen.getByRole('button', { name: '保存入库' })); - expect(await screen.findByText('统一创作契约 playId 必须与入口 ID 一致')).toBeTruthy(); + expect( + await screen.findByText('统一创作契约 playId 必须与入口 ID 一致'), + ).toBeTruthy(); expect(upsertAdminCreationEntryConfig).not.toHaveBeenCalled(); }); @@ -166,23 +193,25 @@ test('创作入口后台用表单保存公告配置', async () => { />, ); - expect(await screen.findAllByRole('heading', {name: '创作入口公告'})).toHaveLength(2); + expect( + await screen.findAllByRole('heading', { name: '创作入口公告' }), + ).toHaveLength(2); expect(screen.queryByLabelText('公告代码 JSON')).toBeNull(); fireEvent.change(await screen.findByLabelText('公告 1 标题'), { - target: {value: '周末创作赛'}, + target: { value: '周末创作赛' }, }); fireEvent.change(screen.getByLabelText('公告 1 HTML'), { - target: {value: '
新的入口公告
'}, + target: { value: '
新的入口公告
' }, }); - await user.click(screen.getByRole('button', {name: '新增公告'})); + await user.click(screen.getByRole('button', { name: '新增公告' })); fireEvent.change(screen.getByLabelText('公告 2 标题'), { - target: {value: '第二条公告'}, + target: { value: '第二条公告' }, }); fireEvent.change(screen.getByLabelText('公告 2 HTML'), { - target: {value: '
轮播第二条
'}, + target: { value: '
轮播第二条
' }, }); - await user.click(screen.getByRole('button', {name: '保存公告'})); - await user.click(screen.getByRole('button', {name: '确认'})); + await user.click(screen.getByRole('button', { name: '保存公告' })); + await user.click(screen.getByRole('button', { name: '确认' })); await waitFor(() => { expect(upsertAdminCreationEntryBanners).toHaveBeenCalled(); @@ -206,6 +235,42 @@ test('创作入口后台用表单保存公告配置', async () => { ); }); +test('创作入口后台用表单保存作品互动配置', async () => { + const user = userEvent.setup(); + render( + , + ); + + await screen.findByText('作品互动'); + const likeToggle = screen.getAllByRole('checkbox')[0]!; + await user.click(likeToggle); + fireEvent.change(screen.getByLabelText('拼图 / puzzle 点赞关闭提示'), { + target: { value: '拼图点赞维护中。' }, + }); + await user.click(screen.getByRole('button', { name: '保存作品互动' })); + await user.click(screen.getByRole('button', { name: '确认' })); + + await waitFor(() => { + expect(upsertAdminPublicWorkInteractions).toHaveBeenCalledWith( + 'admin-token', + { + publicWorkInteractions: [ + { + sourceType: 'puzzle', + likeEnabled: false, + remixEnabled: true, + likeDisabledMessage: '拼图点赞维护中。', + remixDisabledMessage: '拼图作品改造暂不可用。', + }, + ], + }, + ); + }); +}); + test('创作入口后台把旧结构化公告回显成 HTML 表单', async () => { vi.mocked(getAdminCreationEntryConfig).mockResolvedValueOnce({ ...configResponse, @@ -251,12 +316,12 @@ test('创作入口后台拒绝空公告表单', async () => { ); fireEvent.change(await screen.findByLabelText('公告 1 标题'), { - target: {value: ''}, + target: { value: '' }, }); fireEvent.change(screen.getByLabelText('公告 1 HTML'), { - target: {value: ''}, + target: { value: '' }, }); - await user.click(screen.getByRole('button', {name: '保存公告'})); + await user.click(screen.getByRole('button', { name: '保存公告' })); expect(await screen.findByText('公告 1 标题和 HTML 都不能为空')).toBeTruthy(); expect(upsertAdminCreationEntryBanners).not.toHaveBeenCalled(); diff --git a/apps/admin-web/src/pages/AdminCreationEntrySwitchPage.tsx b/apps/admin-web/src/pages/AdminCreationEntrySwitchPage.tsx index e00c848b..8a674650 100644 --- a/apps/admin-web/src/pages/AdminCreationEntrySwitchPage.tsx +++ b/apps/admin-web/src/pages/AdminCreationEntrySwitchPage.tsx @@ -5,10 +5,12 @@ import { getAdminCreationEntryConfig, upsertAdminCreationEntryBanners, upsertAdminCreationEntryConfig, + upsertAdminPublicWorkInteractions, } from '../api/adminApiClient'; import type { AdminCreationEntryEventBannerPayload, AdminCreationEntryTypeConfigPayload, + PublicWorkInteractionConfigPayload, UnifiedCreationFieldPayload, UnifiedCreationSpecPayload, } from '../api/adminApiTypes'; @@ -129,14 +131,28 @@ const DEFAULT_UNIFIED_CREATION_STAGE_MAP: Record< let announcementFormItemSequence = 0; let unifiedCreationSpecFieldSequence = 0; +const PUBLIC_WORK_SOURCE_LABELS: Record = { + 'custom-world': 'RPG', + 'big-fish': '摸鱼', + puzzle: '拼图', + 'puzzle-clear': '拼消消', + 'jump-hop': '跳一跳', + 'wooden-fish': '敲木鱼', + match3d: '抓大鹅', + 'square-hole': '方洞挑战', + 'visual-novel': '视觉小说', + 'bark-battle': '汪汪声浪', + edutainment: '宝贝识物', +}; + export function AdminCreationEntrySwitchPage({ token, onUnauthorized, mode = 'switches', }: AdminCreationEntrySwitchPageProps) { - const [entries, setEntries] = useState< - AdminCreationEntryTypeConfigPayload[] - >([]); + const [entries, setEntries] = useState( + [], + ); const [selectedId, setSelectedId] = useState('puzzle'); const [title, setTitle] = useState(''); const [subtitle, setSubtitle] = useState(''); @@ -157,12 +173,17 @@ export function AdminCreationEntrySwitchPage({ const [announcementItems, setAnnouncementItems] = useState< AnnouncementFormItem[] >([]); + const [publicWorkInteractions, setPublicWorkInteractions] = useState< + PublicWorkInteractionConfigPayload[] + >([]); const [isLoading, setIsLoading] = useState(false); const [isSaving, setIsSaving] = useState(false); const [isSavingBanners, setIsSavingBanners] = useState(false); + const [isSavingInteractions, setIsSavingInteractions] = useState(false); const [listErrorMessage, setListErrorMessage] = useState(''); const [errorMessage, setErrorMessage] = useState(''); const [bannerErrorMessage, setBannerErrorMessage] = useState(''); + const [interactionErrorMessage, setInteractionErrorMessage] = useState(''); const { confirmWrite, confirmDialog } = useAdminWriteConfirm(); const isAnnouncementMode = mode === 'announcements'; @@ -179,6 +200,7 @@ export function AdminCreationEntrySwitchPage({ const nextEntries = sortEntries(response.entries); setEntries(nextEntries); setAnnouncementItems(formatEventBannersFormItems(response.eventBanners)); + setPublicWorkInteractions(response.publicWorkInteractions ?? []); fillForm( nextEntries.find((entry) => entry.id === selectedId) ?? nextEntries[0] ?? @@ -234,6 +256,7 @@ export function AdminCreationEntrySwitchPage({ const nextEntries = sortEntries(response.entries); setEntries(nextEntries); setAnnouncementItems(formatEventBannersFormItems(response.eventBanners)); + setPublicWorkInteractions(response.publicWorkInteractions ?? []); fillForm(nextEntries.find((entry) => entry.id === targetId) ?? null); } catch (error: unknown) { handlePageError(error, onUnauthorized, setErrorMessage); @@ -269,6 +292,7 @@ export function AdminCreationEntrySwitchPage({ }); setEntries(sortEntries(response.entries)); setAnnouncementItems(formatEventBannersFormItems(response.eventBanners)); + setPublicWorkInteractions(response.publicWorkInteractions ?? []); } catch (error: unknown) { handlePageError(error, onUnauthorized, setBannerErrorMessage); } finally { @@ -276,6 +300,41 @@ export function AdminCreationEntrySwitchPage({ } } + /** 保存公开作品详情页点赞 / 改造能力开关。 */ + async function handleSavePublicWorkInteractions() { + if (isSavingInteractions) { + return; + } + + setInteractionErrorMessage(''); + const confirmed = await confirmWrite({ + action: '保存作品互动配置', + target: 'public-work-interactions', + }); + if (!confirmed) { + return; + } + + setIsSavingInteractions(true); + try { + const response = await upsertAdminPublicWorkInteractions(token, { + publicWorkInteractions: publicWorkInteractions.map((item) => ({ + ...item, + sourceType: item.sourceType.trim(), + likeDisabledMessage: item.likeDisabledMessage.trim(), + remixDisabledMessage: item.remixDisabledMessage.trim(), + })), + }); + setEntries(sortEntries(response.entries)); + setAnnouncementItems(formatEventBannersFormItems(response.eventBanners)); + setPublicWorkInteractions(response.publicWorkInteractions ?? []); + } catch (error: unknown) { + handlePageError(error, onUnauthorized, setInteractionErrorMessage); + } finally { + setIsSavingInteractions(false); + } + } + function fillForm(entry: AdminCreationEntryTypeConfigPayload | null) { if (!entry) { return; @@ -361,7 +420,10 @@ export function AdminCreationEntrySwitchPage({ currentForm ? { ...currentForm, - fields: [...currentForm.fields, createUnifiedCreationSpecFieldFormItem()], + fields: [ + ...currentForm.fields, + createUnifiedCreationSpecFieldFormItem(), + ], } : currentForm, ); @@ -377,7 +439,10 @@ export function AdminCreationEntrySwitchPage({ ); return { ...currentForm, - fields: fields.length > 0 ? fields : [createUnifiedCreationSpecFieldFormItem()], + fields: + fields.length > 0 + ? fields + : [createUnifiedCreationSpecFieldFormItem()], }; }); } @@ -414,6 +479,26 @@ export function AdminCreationEntrySwitchPage({ }); } + /** 更新单条公开作品互动配置。 */ + function updatePublicWorkInteraction( + index: number, + patch: Partial< + Pick< + PublicWorkInteractionConfigPayload, + | 'likeEnabled' + | 'remixEnabled' + | 'likeDisabledMessage' + | 'remixDisabledMessage' + > + >, + ) { + setPublicWorkInteractions((currentItems) => + currentItems.map((item, itemIndex) => + itemIndex === index ? { ...item, ...patch } : item, + ), + ); + } + return (
@@ -515,191 +600,288 @@ export function AdminCreationEntrySwitchPage({ ) : null} {!isAnnouncementMode ? ( -
-
-
- - - + <> +
+
+

作品互动

+ {`${publicWorkInteractions.length} 类`}
- -
- - -
- - - - - - - -
- - -
- - - -
-
- 统一创作契约 - - {unifiedCreationSpec ? '已配置' : '未配置'} - -
-
- - {unifiedCreationSpec ? ( - - ) : null} -
- {unifiedCreationSpec ? ( - - ) : ( -
未配置统一创作页契约
- )} -
- - {errorMessage ? ( -
- {errorMessage} -
- ) : null} - -
- -
- - -
- - - - - - + + + + + - {entries.map((entry) => ( - + {publicWorkInteractions.map((item, index) => ( + + + + + - - - - - ))}
入口展示开放统一契约分类排序作品类型点赞点赞关闭提示改造改造关闭提示
{formatPublicWorkSourceLabel(item.sourceType)} - + + + + updatePublicWorkInteraction(index, { + likeDisabledMessage: event.target.value, + }) + } + /> + + + + + updatePublicWorkInteraction(index, { + remixDisabledMessage: event.target.value, + }) + } + /> {entry.visible ? '是' : '否'}{entry.open ? '是' : '否'}{entry.unifiedCreationSpec ? '是' : '否'}{entry.categoryLabel || entry.categoryId}{entry.sortOrder}
+ {interactionErrorMessage ? ( +
+ {interactionErrorMessage} +
+ ) : null} +
+ +
-
+ +
+
+
+ + + +
+ +
+ + +
+ + + + + + + +
+ + +
+ + + +
+
+ 统一创作契约 + {unifiedCreationSpec ? '已配置' : '未配置'} +
+
+ + {unifiedCreationSpec ? ( + + ) : null} +
+ {unifiedCreationSpec ? ( + + ) : ( +
未配置统一创作页契约
+ )} +
+ + {errorMessage ? ( +
+ {errorMessage} +
+ ) : null} + +
+ +
+
+ +
+
+ + + + + + + + + + + + + {entries.map((entry) => ( + + + + + + + + + ))} + +
入口展示开放统一契约分类排序
+ + {entry.visible ? '是' : '否'}{entry.open ? '是' : '否'}{entry.unifiedCreationSpec ? '是' : '否'}{entry.categoryLabel || entry.categoryId}{entry.sortOrder}
+
+
+
+ ) : null} {confirmDialog} @@ -776,7 +958,10 @@ export function AdminCreationEntrySwitchPage({
{unifiedCreationSpecForm.fields.map((field, index) => ( -
+
{`字段 ${index + 1}`}