Small Model Forensics
viewer
post
— rows
⟳ reload
selection
sweep
all
context
all
provider
all
prompt
all
is_test
all
test only
production only
axes
x · independent
input_tokens
prompt_bytes
request_body_bytes
y · dependent
first_content_delta_ms (TTFT)
first_response_byte_ms
first_stream_event_ms
total_ms
decode_ms_per_token (post-TTFT)
ttft_ms_per_input_token
output_tokens
reasoning_tokens
presentation
scale
log-log
log x only
linear
aggregate
scatter all points
median ± P10–P90
median ± min–max
median ± Q1–Q3 (IQR)
x
—
y
—
kind
—
fig. 1
latency vs. context, all observations
fig. 2
medians per model, log-x bucketed
~10 bins per decade