Prune stale docs and update .hermes content
Delete a large set of outdated documentation (many files under docs/ and .hermes/plans/, including audits, design, prd, technical, planning, assets, and todos). Update and consolidate .hermes content: refresh shared-memory pages (decision-log, development-workflow, document-map, pitfalls, project-overview, team-conventions) and several skills/references under .hermes/skills. Also modify AGENTS.md, README.md, UI_CODING_STANDARD.md, docs/README.md and .encoding-check-ignore. Purpose: clean up stale planning/audit material and keep current hermes documentation and related top-level docs in sync.
This commit is contained in:
@@ -1,77 +0,0 @@
|
||||
# start.sh SpacetimeDB 就绪前退出诊断补强
|
||||
|
||||
日期:`2026-04-27`
|
||||
|
||||
状态:历史事故记录。本文针对旧发布包 `start.sh` 的诊断补强,相关 `start.sh` 运行细节已过时,不再作为当前发布或人工排障依据。当前 SpacetimeDB 人工命令不得使用 `--root-dir`,CI/CD 脚本内部受控用法除外。
|
||||
|
||||
## 1. 问题
|
||||
|
||||
执行发布包内 `start.sh` 时,可能只看到:
|
||||
|
||||
```text
|
||||
[start] 启动 spacetimedb
|
||||
[start] SpacetimeDB 进程在就绪前退出。
|
||||
```
|
||||
|
||||
这条信息只能说明 `spacetime start` 子进程在 `server ping` 判定就绪前退出,不能直接说明根因。真实错误通常已经写入发布目录下的 `logs/spacetimedb.log`。
|
||||
|
||||
## 2. 常见根因
|
||||
|
||||
1. `GENARRATIVE_SPACETIME_PORT` 对应端口已被其他进程占用。
|
||||
2. `.spacetimedb/` 运行目录权限不正确,当前用户无法写入数据或日志目录。
|
||||
3. 目标机 `spacetime` 安装不完整,发布包同步不到可执行的 `bin/current/spacetimedb-cli`。
|
||||
4. 目标机上的 `spacetime` 版本与脚本启动参数不兼容。
|
||||
5. 旧 SpacetimeDB 进程仍持有同一运行目录或数据锁,但当前 `GENARRATIVE_SPACETIME_SERVER_URL` 指向的端口未就绪。
|
||||
6. `.spacetimedb/bin/current/` 下只有 `spacetimedb-cli`,缺少 `spacetimedb-standalone`,日志会显示 `exec failed for .../spacetimedb-standalone`。
|
||||
|
||||
## 3. 落地修复
|
||||
|
||||
发布包生成的 `start.sh` 调整为:
|
||||
|
||||
1. 等待循环先执行 `server ping`,再检查启动进程是否退出,避免目标服务刚好已就绪但启动包装进程已退出时误报。
|
||||
2. `sync_ubuntu_spacetime_install` 不再只判断或复制 `spacetimedb-cli`,而是要求 `bin/current/spacetimedb-cli` 与 `bin/current/spacetimedb-standalone` 同时存在;从 Ubuntu 用户级安装目录同步时也复制完整版本目录。
|
||||
3. 当 SpacetimeDB 进程提前退出或等待超时时,自动打印:
|
||||
- `GENARRATIVE_SPACETIME_SERVER_URL` 对应的目标地址。
|
||||
- `GENARRATIVE_SPACETIME_HOST:GENARRATIVE_SPACETIME_PORT` 对应的监听地址。
|
||||
- 当前 `.spacetimedb/` 运行目录。
|
||||
- `logs/spacetimedb.log` 最近 120 行。
|
||||
- `spacetime server ping` 的原始输出。
|
||||
- `ss` 或 `netstat` 中当前端口的监听情况。
|
||||
- 同一运行目录下仍在运行的 SpacetimeDB 进程。
|
||||
|
||||
## 4. 现场排查
|
||||
|
||||
在发布目录中执行:
|
||||
|
||||
```bash
|
||||
tail -n 120 logs/spacetimedb.log
|
||||
spacetime server ping "${GENARRATIVE_SPACETIME_SERVER_URL:-http://127.0.0.1:3101}"
|
||||
ss -ltnp | grep ':3101' || true
|
||||
```
|
||||
|
||||
如果日志包含:
|
||||
|
||||
```text
|
||||
exec failed for /var/lib/jenkins/deploy/Genarrative/.spacetimedb/bin/current/spacetimedb-standalone
|
||||
No such file or directory (os error 2)
|
||||
```
|
||||
|
||||
说明发布目录的 SpacetimeDB 运行目录中同步了 CLI,但没有同步 standalone。该段仅适用于仍保留 CI/CD 内部受控 `--root-dir` 的旧发布包;新人工排障不要引入 `--root-dir`。现场可先执行:
|
||||
|
||||
```bash
|
||||
cd /var/lib/jenkins/deploy/Genarrative
|
||||
SPACETIME_VERSION_DIR="$(find /usr/.local/share/spacetime/bin "$HOME/.local/share/spacetime/bin" -mindepth 1 -maxdepth 1 -type d 2>/dev/null | sort -V | tail -n 1)"
|
||||
test -x "${SPACETIME_VERSION_DIR}/spacetimedb-cli"
|
||||
test -x "${SPACETIME_VERSION_DIR}/spacetimedb-standalone"
|
||||
rm -rf .spacetimedb/bin/current
|
||||
mkdir -p .spacetimedb/bin/current
|
||||
cp -a "${SPACETIME_VERSION_DIR}/." .spacetimedb/bin/current/
|
||||
chmod +x .spacetimedb/bin/current/spacetimedb-cli .spacetimedb/bin/current/spacetimedb-standalone
|
||||
./start.sh
|
||||
```
|
||||
|
||||
如果日志显示端口占用,先确认占用者是否就是旧的 SpacetimeDB。需要复用时,把 `GENARRATIVE_SPACETIME_PORT` 或 `GENARRATIVE_SPACETIME_SERVER_URL` 改成实际端口;需要重启时,优先执行同目录 `./stop.sh`。
|
||||
|
||||
如果日志显示权限问题,先修复发布目录、`.spacetimedb/` 和 `logs/` 的所属用户,不要直接删除 `.spacetimedb/` 绕过问题。删除 `.spacetimedb/` 会同时影响本地 SpacetimeDB 数据与 CLI 身份,只能在确认本地库可丢弃时使用。
|
||||
|
||||
如果日志显示身份或 `403 Forbidden`,继续按 `SPACETIMEDB_START_SH_PUBLISH_403_IDENTITY_FIX_2026-04-26.md` 处理;这类错误发生在发布阶段,不属于启动进程提前退出。
|
||||
Reference in New Issue
Block a user