> ## Documentation Index
> Fetch the complete documentation index at: https://benchgen.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Monitor Model Usage

> Track requests, tokens, latency, and spend for a deployed model, and inspect individual request logs.

Once a model is serving traffic, the **Usage** tab on its model card is where you monitor performance and API usage. It summarizes requests, token counts, latency, and spend over a time window, and lets you drill into any individual request to see its full prompt, response, and metrics.

<Info>
  The Usage tab is available on any deployed model. If you don't have one yet, see [Deploy an inference model](/eval/run-an-inference-model).
</Info>

***

## Steps

### 1. Open the Usage tab

Open a deployed model's card and select the **Usage** tab. Use the **Last 7 Days** dropdown to change the time window, **Refresh** to pull the latest data, or **Live** to stream new requests as they arrive.

<img src="https://mintcdn.com/benchgen-8fc81371/FddC5uLEIMRz8cT0/images/eval/usage/01-usage-overview.jpg?fit=max&auto=format&n=FddC5uLEIMRz8cT0&q=85&s=291b724f8fe9d80ae7567d9a7f5f4671" alt="The Usage tab with summary metric cards and the request log" width="1478" height="941" data-path="images/eval/usage/01-usage-overview.jpg" />

### 2. Read the summary metrics

Four cards summarize activity for the selected window:

| Card            | What it shows                                                    |
| --------------- | ---------------------------------------------------------------- |
| **Requests**    | Total requests, with the success rate below.                     |
| **Tokens**      | Total tokens, split into input (prompt) and output (completion). |
| **Spend**       | Total cost across all requests.                                  |
| **Avg Latency** | Mean response time, with the p95 latency below it.               |

### 3. Browse the request log

The **Requests** table lists every call to the endpoint, newest first. It's paginated, so use the arrows to page through history.

| Column         | Meaning                                    |
| -------------- | ------------------------------------------ |
| **Time**       | When the request was made.                 |
| **Status**     | `success` or `error`.                      |
| **Prompt**     | Input (prompt) tokens.                     |
| **Completion** | Output (completion) tokens.                |
| **Total**      | Total tokens for the request.              |
| **Latency**    | Response time in milliseconds.             |
| **Spend**      | Cost of the request.                       |
| **Request**    | The request ID (for example `chatcmpl-…`). |

### 4. Inspect a single request

Click any row to open its detail drawer. This is where you debug a specific call.

<img src="https://mintcdn.com/benchgen-8fc81371/FddC5uLEIMRz8cT0/images/eval/usage/02-request-details.jpg?fit=max&auto=format&n=FddC5uLEIMRz8cT0&q=85&s=357476cde0b2396583efdb63af98b1b5" alt="A single request's details, tags, and metrics" width="1478" height="941" data-path="images/eval/usage/02-request-details.jpg" />

| Section                | What's inside                                                                                                     |
| ---------------------- | ----------------------------------------------------------------------------------------------------------------- |
| **Request Details**    | Model, provider, call type, model ID, API base, and (redacted) IP address.                                        |
| **Metrics**            | Token split, cost, duration, time to first token, cache hit, LiteLLM overhead, retries, and start/end times.      |
| **Cost Breakdown**     | The cost components that add up to the request total.                                                             |
| **Request & Response** | The full exchange, with a **Pretty** / **JSON** toggle. Shows the system and user input and the assistant output. |
| **Metadata**           | Additional metadata captured for the request.                                                                     |

<img src="https://mintcdn.com/benchgen-8fc81371/FddC5uLEIMRz8cT0/images/eval/usage/03-request-response.jpg?fit=max&auto=format&n=FddC5uLEIMRz8cT0&q=85&s=5534b88ab71f6462307b38cab4f2ba16" alt="The request and response payloads for a single call" width="1478" height="941" data-path="images/eval/usage/03-request-response.jpg" />

### 5. Watch live traffic

Toggle **Live** to auto-refresh the tab every 15 seconds. A banner confirms live mode is on; click **Stop** to pause it. This is useful while load-testing or watching a benchmark run drive requests.

<img src="https://mintcdn.com/benchgen-8fc81371/FddC5uLEIMRz8cT0/images/eval/usage/04-live-mode.jpg?fit=max&auto=format&n=FddC5uLEIMRz8cT0&q=85&s=f5be81bb570f70d59b18120e001b4257" alt="Live mode auto-refreshing the Usage tab" width="1478" height="941" data-path="images/eval/usage/04-live-mode.jpg" />

***

## Next Steps

<Note>
  **Usage data comes from real traffic.** Deploy a model to start serving requests, or run an evaluation against it to generate activity you can watch here in real time.
</Note>

<CardGroup cols={2}>
  <Card title="Deploy an inference model" icon="rocket" href="/eval/run-an-inference-model">
    Spin up a live, OpenAI-compatible endpoint that records usage.
  </Card>

  <Card title="Evaluate an inference model" icon="gauge-high" href="/eval/evaluate-a-running-model">
    Benchmark a running model and watch the requests stream in.
  </Card>
</CardGroup>
