Measurement reference
Full reference for shim’s measurement surface. The README has the overview; this is the per-field detail.
/v1/metrics (JSON)
GET /v1/metrics returns a snapshot of what shim has done since startup.
curl -s http://127.0.0.1:8082/v1/metrics | python3 -m json.tool
{
"latency": {
"/v1/messages": {"p50": 0.316637, "p95": 0.980266, "p99": 1.585670, "n": 14},
"/v1/messages/count_tokens": {"p50": 0.046325, "p95": 0.054247, "p99": 0.054951, "n": 3}
},
"token_delta": {
"/v1/messages": {"shim_total": 86, "upstream_prompt_total": 336, "upstream_completion_total": 168, "n": 14}
},
"rewrites": {"model": 14, "stop_sequences": 2},
"requests_seen": {"/v1/messages": 14, "/v1/messages/count_tokens": 3},
"upstream_errors": {
"/v1/messages": {"total": 1, "class_4xx": 1, "class_5xx": 0, "by_status": {"400": 1}}
},
"panics_total": 0
}
latency.<path>.{p50,p95,p99}— milliseconds, from the per-endpoint reservoir.nis total observations since startup (the reservoir caps at 1024 samples for percentile compute;nkeeps counting past that).token_delta.<path>.shim_totalis shim’s cl100k_base BPE count of every prompt’s input.upstream_prompt_totalis what the upstream reported inusage.prompt_tokens. The gap is the drift: the shim-side number is reproducible under cl100k, the upstream may use a different (unpublished) tokenizer, so a wide gap means the two disagree on this traffic shape, not that one is wrong. If the upstream omitsusage, shim skips the observation rather than recording zeros. For anthropic-passthrough the upstream is Anthropic, so the delta compares cl100k against Anthropic’s (also unpublished) tokenizer — still a drift signal, not a verification of shim.rewrites.modelcounts how often shim replaced the requested model name (the translating presets rewrite every claude-* request, so this tracks/v1/messagesn).rewrites.stop_sequencescounts over-cap truncations.requests_seen.<path>counts every handler entry — the denominator for any per-request ratio. Increments before parsing or validation; counts all attempts. Only the client API endpoints (/v1/messages,/v1/messages/count_tokens) record; the probe/observability endpoints (/health,/healthz,/readyz,/metrics,/v1/metrics) and the/v1/messages/explaindry-run are excluded so probes and scrapes don’t pollute the signal.upstream_errors.<path>counts non-2xx responses from the upstream.totalis all of them;class_4xx+class_5xxbucket by HTTP class (3xx and oddities contribute tototalandby_statusonly).by_statusis the per-code breakdown. The companion diagnostic — the upstream body — is on theupstream errorlog line (see Errors and debugging).panics_totalcounts handler panics the server recovered: a panic becomes an Anthropic-shaped 500, a stack-bearinghandler panic recoveredlog line, and this counter, instead of a silently dropped connection. A non-zero value is always a bug to investigate.
State is in-memory and resets on restart. The endpoint is loopback-only by default (no auth, matches /health). The JSON shape is part of shim’s stable contract; a breaking change ships in a major version bump with a CHANGELOG.md note.
/metrics (Prometheus)
GET /metrics exposes the same aggregates in Prometheus text-exposition format, so shim can sit in front of Grafana/Prometheus rather than replace them. Hand-rolled — no client_golang dependency, so the binary stays a single static file.
# HELP shim_rewrites_total Inbound-traffic mutations shim applied, by kind ...
# TYPE shim_rewrites_total counter
shim_rewrites_total{kind="model"} 14
shim_upstream_errors_total{endpoint="/v1/messages",status="400"} 1
shim_latency_seconds{endpoint="/v1/messages",quantile="0.95"} 0.980266
| Metric | Type | Labels | Meaning |
|---|---|---|---|
shim_requests_seen_total |
counter | endpoint |
client requests seen |
shim_rewrites_total |
counter | kind |
inbound mutations (model, stop_sequences) |
shim_upstream_errors_total |
counter | endpoint, status |
upstream non-2xx |
shim_tokens_shim_total |
counter | endpoint |
shim’s cl100k prompt-token count |
shim_tokens_upstream_prompt_total |
counter | endpoint |
upstream-reported prompt tokens |
shim_tokens_upstream_completion_total |
counter | endpoint |
upstream-reported completion tokens |
shim_token_observations_total |
counter | endpoint |
responses with usage recorded |
shim_latency_seconds |
gauge | endpoint, quantile |
latency percentile (reservoir estimate, seconds) |
shim_latency_observations_total |
counter | endpoint |
latency observations |
shim_panics_total |
counter | (none) | handler panics recovered (emitted only when non-zero) |
Latency is a gauge with a quantile label, not a summary: the reservoir yields point-in-time percentiles, not histogram buckets, so a gauge is the honest representation of what shim has. Units are seconds (Prometheus base-unit convention); the JSON /v1/metrics reports milliseconds.
Token counting
The count_tokens endpoint and the token_delta.shim_total field use cl100k_base — OpenAI’s GPT-3.5/GPT-4 BPE tokenizer, loaded via pkoukk/tiktoken-go with offline-embedded tables. shim calls EncodeOrdinary (special tokens like <|endoftext|> are not processed specially), so the count is reproducible byte-for-byte for any given input.
DeepSeek (and most non-OpenAI upstreams) don’t publish their tokenizer, so cl100k is an approximation across tokenizers — close enough for in-session sanity checks and the /v1/metrics drift signal, not a substitute for the upstream’s own count when reconciling a bill.
Response usage (Anthropic Messages contract) comes straight from the upstream’s usage.prompt_tokens / usage.completion_tokens, not from shim’s cl100k count:
{"usage": {"input_tokens": 123, "output_tokens": 45}}
Errors and debugging
On an upstream non-2xx, shim emits a single upstream error log line at error level before writing the Anthropic-shaped error to the client:
{
"level": "ERROR",
"msg": "upstream error",
"endpoint": "/v1/messages",
"adapter": "deepseek",
"upstream_status": 400,
"resolved_model": "deepseek-v4-pro",
"body_preview": "{\"error\":{\"type\":\"context_length_exceeded\",\"message\":\"...\"}}"
}
The same event increments upstream_errors[/v1/messages].by_status[400] in /v1/metrics. The counter is the histogram; this line is the per-request diagnostic.
upstream_status— the HTTP code the upstream returned (separate from shim’s response status, the Anthropic-shaped translation).resolved_model— the model afterAdapter.MapModel, i.e. what shim sent upstream. Joinable to the priormodel rewrittenline.body_preview— the first 1024 bytes of the upstream response body, verbatim (cap atupstreamBodyLogBytesininternal/server/handlers.go).
body_preview is NOT routed through shim’s key-based redactor — it is operator-facing diagnostic, the only reason the field exists. Some upstreams echo a fragment of the offending request in their error body (e.g. a prompt snippet that exceeded the context window), so body_preview will carry it. This is the deliberate trade-off for honesty at the boundary: an opaque “upstream status 400” tells you nothing about what to fix. The upstream body is never echoed to the client, only logged. If your logs ship somewhere upstream-echoed prompt content matters, run a downstream redactor against body_preview at the sink.
Thinking-block signatures
Anthropic’s extended-thinking blocks carry a signature field for multi-turn continuity — clients pass it back unchanged, and Anthropic’s API verifies it server-side (HMAC-shaped, keyed by an internal secret clients can’t reproduce).
shim attaches a constant signature (shim-passthrough-v1) to every emitted thinking block and does not verify what clients send back:
- The loopback threat model (default bind
127.0.0.1:8082) makes tamper-evidence unnecessary — the only caller is the same user’s Claude Code. - DeepSeek’s
reasoning_contentfield has no signature concept; the field is discarded on outbound translation regardless of value. - Anthropic clients treat the signature as opaque (they can’t verify locally — only the API server has the key), so any string round-trips through them.
This is deliberate, not an oversight. A reader seeing the constant string + missing verification should NOT add HMAC back as “fix the gap” — it would be verification theater for a property no caller in the deployment model requires. If shim ever runs exposed beyond loopback, revisit with a real threat model.