Prometheus metrics

Endpoint

GET/metrics

Returns Prometheus text exposition format scoped to the authenticated organization. Authentication is the same Bearer token used elsewhere — pass it in your Prometheus scrape config.

What's exposed

Metric	Type	Labels
`llm_tokens_total`	counter	`provider`, `model`, `org_id`
`llm_cost_usd_total`	counter	`provider`, `model`, `org_id`
`llm_requests_total`	counter	`provider`, `model`, `org_id`
`llm_cost_usd_rolling_24h`	gauge	`org_id`

llm_tokens_total is the sum of input_tokens + output_tokens across all api_call events for the org. llm_cost_usd_rolling_24h is a 24-hour rolling window so it's well-suited as an alerting input.

Example scrape config

scrape_configs:
  - job_name: tokenid
    metrics_path: /metrics
    scheme: https
    static_configs:
      - targets: ['token-api.audit.id']
    authorization:
      type: Bearer
      credentials: 'eyJhbGciOi…'   # your TokenID API token
    scrape_interval: 60s

Endpoint details

GET/metrics

Property	Value
Auth	`Authorization: Bearer td_live_xxxx` — same token as the rest of the API
Scope	Per-organization — the response only contains metrics for the calling org
Content type	`text/plain; version=0.0.4; charset=utf-8` (Prometheus text exposition)
Suggested scrape interval	60s (computed on demand from event-log aggregates)

Metric reference

Metric	Type	Labels	Meaning
`llm_requests_total`	counter	`provider`, `model`, `org_id`	Cumulative number of `api_call` events
`llm_tokens_total`	counter	`provider`, `model`, `org_id`	Cumulative `input_tokens + output_tokens`
`llm_cost_usd_total`	counter	`provider`, `model`, `org_id`	Cumulative cost in USD
`llm_cost_usd_rolling_24h`	gauge	`org_id`	Sum of cost over the trailing 24 hours

All labels are emitted lowercase. Unknown values (an event without a model) are dropped from the per-(provider, model) counters — they never show up as model="unknown" blowing up your cardinality.

Sample `prometheus.yml`

scrape_configs:
  - job_name: tokenid
    metrics_path: /metrics
    scheme: https
    static_configs:
      - targets: ['token.audit.id']
    bearer_token: td_live_xxxx
    scrape_interval: 60s

For Kubernetes-hosted Prometheus, store the token in a Secret and reference it via bearer_token_file:

    bearer_token_file: /etc/prometheus/secrets/tokenid/api-key

Alerting on burn rate

groups:
  - name: tokenid
    rules:
      - alert: TokenIDDailyBurnHigh
        expr: llm_cost_usd_rolling_24h > 500
        for: 5m
        annotations:
          summary: "TokenID 24h spend exceeded $500"

This complements the in-product Spend Alerts feature — Prometheus alerting is the right channel for ops teams who already run a Grafana / Alertmanager stack and want LLM cost in the same surface.