Lapdog: The Missing Middle in LLM Observability

There's a strange gap when you start running coding agents. At one end, you're firing off Claude Code or Codex commands and hoping for the best, with no real visibility into what they're up to between your prompt and the diff that lands in your editor. At the other end, you're standing up a full LLM Observability pipeline (ddtrace, env vars, dashboards, the lot) to monitor a production AI feature.

Most of us live somewhere in the middle. We want to see what our agent is actually doing without committing to a full instrumentation project.

Lapdog is Datadog's answer to that middle ground. It's free, it's open-source, and you don't need a Datadog account to use it.

What it does

Lapdog wraps the Datadog APM test agent with a small CLI. You start it once, then run your coding agent through it. Every prompt, every tool call, every model response, and every cost figure shows up in a browser dashboard at https://lapdog.datadoghq.com/.

That's it. No account creation, no API keys, no SaaS endpoint to point at. The data stays on your machine.

It supports Claude Code, Codex, and Pi out of the box, and you can wire up your own agents if you've built something custom.

Getting it running

I'm on macOS, so I went with Homebrew:

brew install datadog/lapdog/lapdog

If you'd rather use pip, pipx install ddapm-test-agent keeps it isolated from the rest of your Python environment. There's also a Docker image if you'd prefer that route.

To run Claude Code through Lapdog:

lapdog claude

That's the whole setup. The first time you run it, Lapdog auto-installs a Claude Code plugin under ~/.claude/plugins/ so it can intercept tool calls, prompts, sessions, and permission requests. It doesn't touch your core Claude settings, and you can opt out with --no-plugin-install if you'd rather.

Open https://lapdog.datadoghq.com/ in your browser and you'll see your session appear in real time.

What you actually see

The dashboard shows you a live timeline of spans for your current agent session. Each model call is a span. Each tool call is a span. Each permission prompt is recorded. Click into any of them and you get the full prompt, the full response, and the token cost.

For someone who runs a lot of Claude Code sessions during demo prep, this is properly useful. I've caught a few things I'd never have spotted otherwise:

A subagent burning through tokens on a task I'd assumed was lightweight
A tool call that was retrying silently because of a permission issue I hadn't approved properly
A prompt that had ballooned to 40k tokens because of context I'd forgotten was loaded

None of this is new observability data. It's the same telemetry you'd get from full LLM Observability. The point is that Lapdog gives it to you in thirty seconds with no overhead and no commitment.

The bridge to full LLM Observability

Here's where it stops being a toy.

If you do have a Datadog account and you want the events to flow into proper LLM Observability, add the --forward flag:

lapdog start --forward

Set DD_API_KEY and DD_SITE in your environment, and Lapdog will mirror everything to your Datadog org while still showing it locally. The local dashboard keeps working, you just get the data in two places.

This is the bit I keep coming back to. The same tool that gets you started with no account is the tool that hands you off to a production-grade observability platform when you're ready. No rewrite, no migration, no second instrumentation effort.

For SEs, that's also a handy demo lever. You can show a prospect what local visibility looks like, then flip the flag and show them the same data flowing into LLM Observability dashboards. The story tells itself.

What it isn't

Lapdog isn't a replacement for full LLM Observability if you're running production AI workloads. It's deliberately scoped to local development. There's no alerting, no long-term storage, no team sharing. Close your browser and the dashboard goes with it.

It's also not a code-quality tool. It tells you what your agent did, not whether what it did was sensible. You still have to read the diff.

Where it fits in my workflow

I now run lapdog start as part of my morning setup, alongside the usual terminal apps. Every Claude Code session I run during the day shows up in the dashboard, and if something behaves oddly, I've got a full trace to look at.

I didn't realise I wanted this until I started using it. The barrier to entry is so low that there's almost no reason not to run it locally, and the moment you want production-grade observability, you're one flag away.

If you spend any meaningful time running coding agents, give it a go.

The repository and full docs are over at the Datadog dd-apm-test-agent GitHub repo.