Prune stale docs and update .hermes content
Delete a large set of outdated documentation (many files under docs/ and .hermes/plans/, including audits, design, prd, technical, planning, assets, and todos). Update and consolidate .hermes content: refresh shared-memory pages (decision-log, development-workflow, document-map, pitfalls, project-overview, team-conventions) and several skills/references under .hermes/skills. Also modify AGENTS.md, README.md, UI_CODING_STANDARD.md, docs/README.md and .encoding-check-ignore. Purpose: clean up stale planning/audit material and keep current hermes documentation and related top-level docs in sync.
This commit is contained in:
@@ -1,113 +0,0 @@
|
||||
# 手机验证码发送链路可观测性补强
|
||||
|
||||
日期:`2026-04-23`
|
||||
|
||||
## 1. 文档目的
|
||||
|
||||
这份文档用于冻结 Rust `api-server + module-auth + platform-auth` 在手机号验证码发送链路上的新增日志口径,解决当前排查成本过高的问题:
|
||||
|
||||
1. 前端点击“获取验证码”后,只能看到 `/api/auth/phone/send-code` 返回了 `500`。
|
||||
2. 服务端虽然已有部分失败日志,但关键上下文不完整,仍需要人工反推是哪一层失败。
|
||||
3. 阿里云真实返回 `UNKNOWN`、`check frequency failed` 这类错误时,缺少足够字段判断是本地冷却、上游频控、配置异常还是响应解析问题。
|
||||
|
||||
## 2. 本次补强范围
|
||||
|
||||
本次只补强短信发送链路日志,不改前端 UI,不扩散到无关模块:
|
||||
|
||||
1. `server-rs/crates/api-server`
|
||||
2. `server-rs/crates/module-auth`
|
||||
3. `server-rs/crates/platform-auth`
|
||||
|
||||
## 3. 日志补强目标
|
||||
|
||||
新增日志后,排查一次 `send-code` 失败时,单看日志应能回答:
|
||||
|
||||
1. 请求是否真的进入 `api-server`
|
||||
2. 当前使用的是 `mock` 还是 `aliyun`
|
||||
3. 手机号和场景是否已经通过服务端规范化
|
||||
4. 失败发生在本地冷却检查前、阿里云请求发出前、阿里云返回后,还是快照写入阶段
|
||||
5. 阿里云返回的 HTTP 状态、`Code`、`Message`、`RequestId` 是否可见
|
||||
|
||||
## 4. 新增日志位置
|
||||
|
||||
### 4.1 `api-server`
|
||||
|
||||
在 `POST /api/auth/phone/send-code` handler 内新增发送前日志,至少输出:
|
||||
|
||||
1. `request_id`
|
||||
2. `operation`
|
||||
3. `scene`
|
||||
4. `provider`
|
||||
5. `phone_input_masked`
|
||||
|
||||
失败日志继续保留,但补充:
|
||||
|
||||
1. `provider`
|
||||
2. `phone_input_masked`
|
||||
|
||||
### 4.2 `module-auth`
|
||||
|
||||
在 `PhoneAuthService::send_code` 内新增:
|
||||
|
||||
1. 规范化手机号后、调用 provider 前的调试日志
|
||||
2. 本地冷却命中时的日志
|
||||
3. provider 成功后、写入本地验证码快照前的日志
|
||||
|
||||
字段至少包含:
|
||||
|
||||
1. `scene`
|
||||
2. `phone_e164_masked`
|
||||
3. `phone_national_masked`
|
||||
4. `provider`
|
||||
5. `cooldown_seconds`
|
||||
6. `expires_in_seconds`
|
||||
7. `provider_request_id`
|
||||
8. `provider_out_id`
|
||||
|
||||
### 4.3 `platform-auth`
|
||||
|
||||
在阿里云 provider 内新增:
|
||||
|
||||
1. 发起 `SendSmsVerifyCode` 前的日志
|
||||
2. 收到阿里云 HTTP 响应后的日志
|
||||
3. 错误分类前的日志
|
||||
|
||||
字段至少包含:
|
||||
|
||||
1. `provider=aliyun`
|
||||
2. `scene`
|
||||
3. `phone_masked`
|
||||
4. `endpoint`
|
||||
5. `http_status`
|
||||
6. `provider_code`
|
||||
7. `provider_message`
|
||||
8. `provider_request_id`
|
||||
9. `provider_out_id`
|
||||
|
||||
## 5. 约束
|
||||
|
||||
为了避免日志变成新的风险源,本次必须满足:
|
||||
|
||||
1. 不打印完整手机号
|
||||
2. 不打印验证码明文
|
||||
3. 不打印 `AccessKeySecret`
|
||||
4. 不打印完整签名串
|
||||
5. 只打印排查发送链路需要的最小字段
|
||||
|
||||
## 6. 验收标准
|
||||
|
||||
补强后再次请求 `/api/auth/phone/send-code`,无论成功还是失败,都应满足:
|
||||
|
||||
1. `api-server` 能看到请求进入日志
|
||||
2. `module-auth` 能看到 provider 调用前的规范化结果
|
||||
3. `platform-auth` 能看到阿里云返回的状态和关键字段
|
||||
4. 如果失败,日志里可以直接区分:
|
||||
- 本地冷却
|
||||
- 阿里云配置缺失
|
||||
- 阿里云上游失败
|
||||
- 阿里云频控或业务拒绝
|
||||
|
||||
## 7. 关联文档
|
||||
|
||||
1. [PHONE_AUTH_AXUM_REAL_SMS_PROVIDER_DESIGN_2026-04-22.md](./PHONE_AUTH_AXUM_REAL_SMS_PROVIDER_DESIGN_2026-04-22.md)
|
||||
2. [PHONE_SMS_REAL_PROVIDER_MANUAL_VERIFICATION_RUNBOOK_2026-04-23.md](./PHONE_SMS_REAL_PROVIDER_MANUAL_VERIFICATION_RUNBOOK_2026-04-23.md)
|
||||
Reference in New Issue
Block a user