chore: add loadtest observability setup

This commit is contained in:
kdletters
2026-05-16 22:44:30 +08:00
parent 7f16e88e57
commit 0305b79440
55 changed files with 2867 additions and 1622 deletions

View File

@@ -113,6 +113,17 @@ $env:WORKS_DATA="data/works-list.local.json"
npm run loadtest:k6:works -- --summary-trend-stats="avg,min,med,p(90),p(95),p(99),max"
```
## 50 HTTP req/s 口径
`k6-works-list.js` 默认一次 iteration 会依次请求两个公开列表接口:`/api/runtime/puzzle/gallery``/api/runtime/custom-world-gallery`。因此目标约 50 HTTP req/s 时,`ramping-arrival-rate``PEAK_RPS` 应设置为 `25`。如果传入 `AUTH_TOKEN` 或把 `DETAIL_RATIO` 设为大于 0每次 iteration 的请求数会增加,需要重新折算。
验收目标:
- `http_req_failed < 1%`
- `http_req_duration p95 < 2000ms`
- `dropped_iterations = 0`
- 压测窗口内 Nginx 无新增 502
## Smoke
```bash
@@ -151,17 +162,38 @@ BASE_URL=http://127.0.0.1:8787 \
WORKS_DATA=data/works-list.local.json \
SCENARIO=spike \
START_RPS=5 \
PEAK_RPS=100 \
HOLD=2m \
PEAK_RPS=25 \
HOLD=60s \
DETAIL_RATIO=0 \
npm run loadtest:k6:works
```
默认阈值:
- `http_req_failed < 5%`
- `http_req_failed < 1%`
- `http_req_duration p95 < 2000ms`
- `works_list_shape_error_rate < 5%`
- `dropped_iterations = 0`
- `works_list_shape_error_rate < 1%`
PowerShell
```powershell
$env:BASE_URL="https://genarrative.world"
$env:WORKS_DATA="data/works-list.local.json"
$env:SCENARIO="spike"
$env:START_RPS="5"
$env:PEAK_RPS="25"
$env:HOLD="60s"
$env:END_RPS="5"
$env:DETAIL_RATIO="0"
npm run loadtest:k6:works -- --summary-trend-stats="avg,min,med,p(90),p(95),p(99),max"
```
线上 release 回归可使用同一组环境变量:
```bash
SCENARIO=spike START_RPS=5 PEAK_RPS=25 HOLD=60s END_RPS=5 DETAIL_RATIO=0 npm run loadtest:k6:works
```
## 带登录态压测个人作品列表
@@ -197,6 +229,96 @@ npm run loadtest:k6:works
- 如果个人作品列表返回 401确认 `AUTH_TOKEN` 是当前 api-server 可识别的 access token。
- 如果详情全部 404确认是否已向目标环境导入与 `WORKS_DATA` 一致的数据。
## 压测窗口采集
Nginx upstream timing
```bash
sudo tail -f /var/log/nginx/genarrative.access.log
sudo tail -f /var/log/nginx/genarrative.error.log
```
api-server 与 SpacetimeDB 日志:
```bash
sudo journalctl -u genarrative-api.service -f
sudo journalctl -u spacetimedb.service -f
```
api-server 的 OpenTelemetry 默认关闭。需要验证 OTLP traces / metrics / logs 时,先在服务器本机启动只监听 `127.0.0.1``otelcol-contrib` debug exporter
```bash
npm run otel:debug
```
如果要把本机数据转发给 Rider OpenTelemetry 面板,先在 Rider 的 OpenTelemetry 设置中启用固定 OTLP server port例如 `17011`,再运行:
```bash
RIDER_OTLP_GRPC_ENDPOINT=127.0.0.1:17011 npm run otel:rider
```
脚本会在 `.codex-temp/otelcol/` 生成临时 collector 配置,默认接收 api-server 发到 `http://127.0.0.1:4318` 的 OTLP HTTP 数据。需要改端口时可设置:
- `OTELCOL_OTLP_HTTP_ENDPOINT`,默认 `127.0.0.1:4318`
- `OTELCOL_OTLP_GRPC_ENDPOINT`,默认 `127.0.0.1:4317`
- `RIDER_OTLP_GRPC_ENDPOINT`,默认 `127.0.0.1:17011`
- `OTELCOL_BIN`,默认 `otelcol-contrib`
等价的 debug collector 配置如下:
```yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 127.0.0.1:4317
http:
endpoint: 127.0.0.1:4318
exporters:
debug:
verbosity: detailed
service:
pipelines:
traces:
receivers: [otlp]
exporters: [debug]
metrics:
receivers: [otlp]
exporters: [debug]
logs:
receivers: [otlp]
exporters: [debug]
```
```bash
otelcol-contrib --config /etc/otelcol-contrib/genarrative-debug.yaml
```
然后在 `/etc/genarrative/api-server.env` 中打开:
```env
GENARRATIVE_OTEL_ENABLED=true
OTEL_SERVICE_NAME=genarrative-api
OTEL_EXPORTER_OTLP_ENDPOINT=http://127.0.0.1:4318
```
注意 `api-server` 当前使用 OTLP HTTP exporter`OTEL_EXPORTER_OTLP_ENDPOINT` 必须指向 Collector 的 HTTP base endpoint `http://127.0.0.1:4318`。不要把它改成 Collector gRPC 端口 `4317`,也不要直接指向 Rider 的 gRPC 端口Rider 只由 `npm run otel:rider` 启动的 Collector 通过 `RIDER_OTLP_GRPC_ENDPOINT` 转发。
OTLP logs 是远端观测增量不替代本地日志api-server 日志仍看 `journalctl` / `logs/api-server/`Nginx 日志仍看文件。日志等级继续用 `GENARRATIVE_API_LOG` / `RUST_LOG` 控制,例如 `info,tower_http=info,spacetime_client=info`
Rider 的 Logs 面板展示的是 OTLP log event 自身字段,不会自动把父 span 的全部 attributes 摊平到每一条日志。请求完成日志会直接携带 `request_id``http.request.method``http.route``url.scheme``url.path``http.response.status_code``status_class``latency_ms``slow_request`;更完整的请求链路仍在 Traces 面板中按同一个 trace/span 关联查看。
线上回归辅助命令:
```bash
systemctl show genarrative-api.service -p LimitNOFILE -p TasksMax
cat /proc/$(pidof api-server)/limits
ss -ltnp | grep 8082
curl -sS http://127.0.0.1:8082/healthz
```
## 验证命令
```bash

View File

@@ -56,20 +56,22 @@ const scenarioOptions = {
scenarios: {
spike: {
executor: 'ramping-arrival-rate',
startRate: Number(__ENV.START_RPS || 5),
preAllocatedVUs: Number(__ENV.PREALLOCATED_VUS || 50),
maxVUs: Number(__ENV.MAX_VUS || 200),
timeUnit: '1s',
stages: [
{ target: Number(__ENV.START_RPS || 5), duration: __ENV.RAMP_UP || '30s' },
{ target: Number(__ENV.PEAK_RPS || 100), duration: __ENV.HOLD || '2m' },
{ target: Number(__ENV.PEAK_RPS || 25), duration: __ENV.RAMP_UP || '30s' },
{ target: Number(__ENV.PEAK_RPS || 25), duration: __ENV.HOLD || '2m' },
{ target: Number(__ENV.END_RPS || 5), duration: __ENV.RAMP_DOWN || '30s' },
],
},
},
thresholds: {
http_req_failed: ['rate<0.05'],
http_req_failed: ['rate<0.01'],
http_req_duration: ['p(95)<2000'],
works_list_shape_error_rate: ['rate<0.05'],
dropped_iterations: ['count==0'],
works_list_shape_error_rate: ['rate<0.01'],
},
},
};

119
scripts/run-otelcol.mjs Normal file
View File

@@ -0,0 +1,119 @@
import {spawn} from 'node:child_process';
import {mkdirSync, writeFileSync} from 'node:fs';
import path from 'node:path';
const [, , rawMode = 'debug', ...args] = process.argv;
const mode = rawMode.trim();
const printConfigOnly = args.includes('--print-config');
const supportedModes = new Set(['debug', 'rider']);
if (!supportedModes.has(mode)) {
console.error('[otelcol] mode must be one of: debug, rider');
process.exit(1);
}
const otlpHttpEndpoint = readEnv('OTELCOL_OTLP_HTTP_ENDPOINT', '127.0.0.1:4318');
const otlpGrpcEndpoint = readEnv('OTELCOL_OTLP_GRPC_ENDPOINT', '127.0.0.1:4317');
const riderEndpoint = readEnv('RIDER_OTLP_GRPC_ENDPOINT', '127.0.0.1:17011');
const debugVerbosity = readEnv('OTELCOL_DEBUG_VERBOSITY', 'detailed');
const otelcolBin = readEnv('OTELCOL_BIN', 'otelcol-contrib');
const configText = buildConfig(mode);
const configDir = path.resolve('.codex-temp', 'otelcol');
const configPath = path.join(configDir, `genarrative-${mode}.yaml`);
mkdirSync(configDir, {recursive: true});
writeFileSync(configPath, configText, 'utf8');
console.log(`[otelcol] wrote ${configPath}`);
console.log(`[otelcol] receiving OTLP HTTP at http://${otlpHttpEndpoint}`);
console.log(`[otelcol] receiving OTLP gRPC at ${otlpGrpcEndpoint}`);
if (mode === 'rider') {
console.log(`[otelcol] forwarding traces/metrics/logs to Rider OTLP gRPC at ${riderEndpoint}`);
}
console.log(
'[otelcol] api-server env: GENARRATIVE_OTEL_ENABLED=true OTEL_EXPORTER_OTLP_ENDPOINT=http://127.0.0.1:4318'
);
if (printConfigOnly) {
console.log(configText);
process.exit(0);
}
const child = spawn(otelcolBin, ['--config', configPath], {
cwd: process.cwd(),
env: process.env,
stdio: 'inherit',
});
const stopChild = () => {
if (!child.killed) {
child.kill();
}
};
for (const signal of ['SIGINT', 'SIGTERM', 'SIGHUP']) {
process.on(signal, () => {
stopChild();
process.exit(130);
});
}
process.on('exit', stopChild);
child.on('error', (error) => {
console.error(`[otelcol] failed to start ${otelcolBin}: ${error.message}`);
console.error('[otelcol] install otelcol-contrib and make sure it is on PATH, or set OTELCOL_BIN.');
process.exit(1);
});
child.on('exit', (code, signal) => {
if (signal) {
console.error(`[otelcol] exited by signal: ${signal}`);
process.exit(1);
}
process.exit(code ?? 0);
});
function readEnv(key, fallback) {
const value = process.env[key]?.trim();
return value ? value : fallback;
}
function buildConfig(selectedMode) {
const exporters =
selectedMode === 'rider'
? ` otlp/rider:
endpoint: ${riderEndpoint}
tls:
insecure: true
debug:
verbosity: ${debugVerbosity}`
: ` debug:
verbosity: ${debugVerbosity}`;
const pipelineExporters = selectedMode === 'rider' ? '[otlp/rider, debug]' : '[debug]';
return `receivers:
otlp:
protocols:
grpc:
endpoint: ${otlpGrpcEndpoint}
http:
endpoint: ${otlpHttpEndpoint}
exporters:
${exporters}
service:
pipelines:
traces:
receivers: [otlp]
exporters: ${pipelineExporters}
metrics:
receivers: [otlp]
exporters: ${pipelineExporters}
logs:
receivers: [otlp]
exporters: ${pipelineExporters}
`;
}