chore: add loadtest observability setup
This commit is contained in:
@@ -113,6 +113,17 @@ $env:WORKS_DATA="data/works-list.local.json"
|
||||
npm run loadtest:k6:works -- --summary-trend-stats="avg,min,med,p(90),p(95),p(99),max"
|
||||
```
|
||||
|
||||
## 50 HTTP req/s 口径
|
||||
|
||||
`k6-works-list.js` 默认一次 iteration 会依次请求两个公开列表接口:`/api/runtime/puzzle/gallery` 和 `/api/runtime/custom-world-gallery`。因此目标约 50 HTTP req/s 时,`ramping-arrival-rate` 的 `PEAK_RPS` 应设置为 `25`。如果传入 `AUTH_TOKEN` 或把 `DETAIL_RATIO` 设为大于 0,每次 iteration 的请求数会增加,需要重新折算。
|
||||
|
||||
验收目标:
|
||||
|
||||
- `http_req_failed < 1%`
|
||||
- `http_req_duration p95 < 2000ms`
|
||||
- `dropped_iterations = 0`
|
||||
- 压测窗口内 Nginx 无新增 502
|
||||
|
||||
## Smoke
|
||||
|
||||
```bash
|
||||
@@ -151,17 +162,38 @@ BASE_URL=http://127.0.0.1:8787 \
|
||||
WORKS_DATA=data/works-list.local.json \
|
||||
SCENARIO=spike \
|
||||
START_RPS=5 \
|
||||
PEAK_RPS=100 \
|
||||
HOLD=2m \
|
||||
PEAK_RPS=25 \
|
||||
HOLD=60s \
|
||||
DETAIL_RATIO=0 \
|
||||
npm run loadtest:k6:works
|
||||
```
|
||||
|
||||
默认阈值:
|
||||
|
||||
- `http_req_failed < 5%`
|
||||
- `http_req_failed < 1%`
|
||||
- `http_req_duration p95 < 2000ms`
|
||||
- `works_list_shape_error_rate < 5%`
|
||||
- `dropped_iterations = 0`
|
||||
- `works_list_shape_error_rate < 1%`
|
||||
|
||||
PowerShell:
|
||||
|
||||
```powershell
|
||||
$env:BASE_URL="https://genarrative.world"
|
||||
$env:WORKS_DATA="data/works-list.local.json"
|
||||
$env:SCENARIO="spike"
|
||||
$env:START_RPS="5"
|
||||
$env:PEAK_RPS="25"
|
||||
$env:HOLD="60s"
|
||||
$env:END_RPS="5"
|
||||
$env:DETAIL_RATIO="0"
|
||||
npm run loadtest:k6:works -- --summary-trend-stats="avg,min,med,p(90),p(95),p(99),max"
|
||||
```
|
||||
|
||||
线上 release 回归可使用同一组环境变量:
|
||||
|
||||
```bash
|
||||
SCENARIO=spike START_RPS=5 PEAK_RPS=25 HOLD=60s END_RPS=5 DETAIL_RATIO=0 npm run loadtest:k6:works
|
||||
```
|
||||
|
||||
## 带登录态压测个人作品列表
|
||||
|
||||
@@ -197,6 +229,96 @@ npm run loadtest:k6:works
|
||||
- 如果个人作品列表返回 401,确认 `AUTH_TOKEN` 是当前 api-server 可识别的 access token。
|
||||
- 如果详情全部 404,确认是否已向目标环境导入与 `WORKS_DATA` 一致的数据。
|
||||
|
||||
## 压测窗口采集
|
||||
|
||||
Nginx upstream timing:
|
||||
|
||||
```bash
|
||||
sudo tail -f /var/log/nginx/genarrative.access.log
|
||||
sudo tail -f /var/log/nginx/genarrative.error.log
|
||||
```
|
||||
|
||||
api-server 与 SpacetimeDB 日志:
|
||||
|
||||
```bash
|
||||
sudo journalctl -u genarrative-api.service -f
|
||||
sudo journalctl -u spacetimedb.service -f
|
||||
```
|
||||
|
||||
api-server 的 OpenTelemetry 默认关闭。需要验证 OTLP traces / metrics / logs 时,先在服务器本机启动只监听 `127.0.0.1` 的 `otelcol-contrib` debug exporter:
|
||||
|
||||
```bash
|
||||
npm run otel:debug
|
||||
```
|
||||
|
||||
如果要把本机数据转发给 Rider OpenTelemetry 面板,先在 Rider 的 OpenTelemetry 设置中启用固定 OTLP server port,例如 `17011`,再运行:
|
||||
|
||||
```bash
|
||||
RIDER_OTLP_GRPC_ENDPOINT=127.0.0.1:17011 npm run otel:rider
|
||||
```
|
||||
|
||||
脚本会在 `.codex-temp/otelcol/` 生成临时 collector 配置,默认接收 api-server 发到 `http://127.0.0.1:4318` 的 OTLP HTTP 数据。需要改端口时可设置:
|
||||
|
||||
- `OTELCOL_OTLP_HTTP_ENDPOINT`,默认 `127.0.0.1:4318`
|
||||
- `OTELCOL_OTLP_GRPC_ENDPOINT`,默认 `127.0.0.1:4317`
|
||||
- `RIDER_OTLP_GRPC_ENDPOINT`,默认 `127.0.0.1:17011`
|
||||
- `OTELCOL_BIN`,默认 `otelcol-contrib`
|
||||
|
||||
等价的 debug collector 配置如下:
|
||||
|
||||
```yaml
|
||||
receivers:
|
||||
otlp:
|
||||
protocols:
|
||||
grpc:
|
||||
endpoint: 127.0.0.1:4317
|
||||
http:
|
||||
endpoint: 127.0.0.1:4318
|
||||
|
||||
exporters:
|
||||
debug:
|
||||
verbosity: detailed
|
||||
|
||||
service:
|
||||
pipelines:
|
||||
traces:
|
||||
receivers: [otlp]
|
||||
exporters: [debug]
|
||||
metrics:
|
||||
receivers: [otlp]
|
||||
exporters: [debug]
|
||||
logs:
|
||||
receivers: [otlp]
|
||||
exporters: [debug]
|
||||
```
|
||||
|
||||
```bash
|
||||
otelcol-contrib --config /etc/otelcol-contrib/genarrative-debug.yaml
|
||||
```
|
||||
|
||||
然后在 `/etc/genarrative/api-server.env` 中打开:
|
||||
|
||||
```env
|
||||
GENARRATIVE_OTEL_ENABLED=true
|
||||
OTEL_SERVICE_NAME=genarrative-api
|
||||
OTEL_EXPORTER_OTLP_ENDPOINT=http://127.0.0.1:4318
|
||||
```
|
||||
|
||||
注意 `api-server` 当前使用 OTLP HTTP exporter,`OTEL_EXPORTER_OTLP_ENDPOINT` 必须指向 Collector 的 HTTP base endpoint `http://127.0.0.1:4318`。不要把它改成 Collector gRPC 端口 `4317`,也不要直接指向 Rider 的 gRPC 端口;Rider 只由 `npm run otel:rider` 启动的 Collector 通过 `RIDER_OTLP_GRPC_ENDPOINT` 转发。
|
||||
|
||||
OTLP logs 是远端观测增量,不替代本地日志;api-server 日志仍看 `journalctl` / `logs/api-server/`,Nginx 日志仍看文件。日志等级继续用 `GENARRATIVE_API_LOG` / `RUST_LOG` 控制,例如 `info,tower_http=info,spacetime_client=info`。
|
||||
|
||||
Rider 的 Logs 面板展示的是 OTLP log event 自身字段,不会自动把父 span 的全部 attributes 摊平到每一条日志。请求完成日志会直接携带 `request_id`、`http.request.method`、`http.route`、`url.scheme`、`url.path`、`http.response.status_code`、`status_class`、`latency_ms` 和 `slow_request`;更完整的请求链路仍在 Traces 面板中按同一个 trace/span 关联查看。
|
||||
|
||||
线上回归辅助命令:
|
||||
|
||||
```bash
|
||||
systemctl show genarrative-api.service -p LimitNOFILE -p TasksMax
|
||||
cat /proc/$(pidof api-server)/limits
|
||||
ss -ltnp | grep 8082
|
||||
curl -sS http://127.0.0.1:8082/healthz
|
||||
```
|
||||
|
||||
## 验证命令
|
||||
|
||||
```bash
|
||||
|
||||
@@ -56,20 +56,22 @@ const scenarioOptions = {
|
||||
scenarios: {
|
||||
spike: {
|
||||
executor: 'ramping-arrival-rate',
|
||||
startRate: Number(__ENV.START_RPS || 5),
|
||||
preAllocatedVUs: Number(__ENV.PREALLOCATED_VUS || 50),
|
||||
maxVUs: Number(__ENV.MAX_VUS || 200),
|
||||
timeUnit: '1s',
|
||||
stages: [
|
||||
{ target: Number(__ENV.START_RPS || 5), duration: __ENV.RAMP_UP || '30s' },
|
||||
{ target: Number(__ENV.PEAK_RPS || 100), duration: __ENV.HOLD || '2m' },
|
||||
{ target: Number(__ENV.PEAK_RPS || 25), duration: __ENV.RAMP_UP || '30s' },
|
||||
{ target: Number(__ENV.PEAK_RPS || 25), duration: __ENV.HOLD || '2m' },
|
||||
{ target: Number(__ENV.END_RPS || 5), duration: __ENV.RAMP_DOWN || '30s' },
|
||||
],
|
||||
},
|
||||
},
|
||||
thresholds: {
|
||||
http_req_failed: ['rate<0.05'],
|
||||
http_req_failed: ['rate<0.01'],
|
||||
http_req_duration: ['p(95)<2000'],
|
||||
works_list_shape_error_rate: ['rate<0.05'],
|
||||
dropped_iterations: ['count==0'],
|
||||
works_list_shape_error_rate: ['rate<0.01'],
|
||||
},
|
||||
},
|
||||
};
|
||||
|
||||
Reference in New Issue
Block a user