Observability for a 24/7 trading desk

Context

We founded a market-making firm, which meant everything had to be built from the ground up: the engine, the trading algorithms and the infrastructure around them. As co-founder and CTO I designed and wrote the whole system, and it now runs across more than thirty clients in production.

The platform is large. The diagram below shows the whole system and the people who operate it; this case study zooms in on the observability stack, the metrics database and Grafana highlighted below. The core engine and the AI copilot have their own case studies.

The full platform and its actors. This page focuses on the highlighted observability stack: the metrics database and Grafana.

The problem

A market-making desk runs 24/7 with capital permanently at risk across many venues and clients. Without real-time visibility, the team is flying blind.

You cannot tell, at a glance, whether a strategy is healthy: is the spread on target, is the inventory balanced, are quotes fresh?
Problems are expensive and time-sensitive: a stale feed, a spread drifting out of band or a service going quiet can cost money within minutes.
The same questions repeat across more than thirty clients, each on different venues, symbols and accounts.
Raw logs scattered across services are useless for spotting any of this in time.

Objectives

Real-time visibility into every client, venue, symbol and account.
Track the metrics that actually matter for market making: spread, order-book depth, inventory, open orders and own liquidity, fills and PnL.
Detect anomalies and page the team before they turn into losses.
One coherent pipeline, from the services to storage to dashboards and alerts.

My approach

Every Fortuna service emits structured metrics; a pipeline ships them into a time-series database; Grafana sits on top for dashboards and alerting. The result is a single source of truth the whole desk watches.

From services to insight: structured metrics land in a time-series database, and Grafana drives both dashboards and alerts.

The technical solution

The dashboards are built as a single, fully templated Grafana board rather than one per client. Variables for client, exchange, symbol, account, tag, reference currency, asset and interval mean the same board covers every one of the thirty-plus mandates: pick a client and a venue, and every panel reflects it. It is organised into sections, each answering a market-making question:

General: ticker, daily volume, PnL and last spread at a glance, with a candlestick chart.
Inventory: balances and how each asset is being used, over time.
Depths: our own order-book depth from 0.5% to 20% off mid, current and average, split by bid and ask side.
Orders: open orders and their freshness, our buy and sell liquidity, and the live ladder we are posting.
Spread: spread over time against target, plus best bid, best ask and mid price.
Trades: buys versus sells, realized PnL and filled volume.

Grafana dashboard, General section: ticker, daily volume, PnL, last spread and a candlestick chart, filtered by client, exchange and symbol. — General view: headline metrics and price action, filtered by client, venue, symbol and account.

Grafana dashboard, Inventory section: total balances and asset usage split between free and used, with usage over time. — Inventory: balances and how each asset is split between free and used, over time.

Grafana dashboard, Depths section: current and average order-book depth from 0.5% to 20% off mid, split by bid and ask side. — Depth: how much liquidity we provide at each distance from mid, by side.

Grafana dashboard, Orders section: open orders count and freshness, own buy and sell liquidity, and the live order ladder. — Orders: live open orders, freshness and the ladder we are quoting.

Grafana dashboard, Spread and Trades sections: spread over time against target, best bid and ask, realized PnL and buys versus sells. — Spread and trades: spread against target, best bid/ask, PnL and fills.

Alerting

Dashboards are for watching; alerts are for not having to. Alerts watch the signals that cost money if they drift, and page the desk in Slack and on-call, day or night:

Data freshness: how old the last update is, so a stale feed or a stuck service is caught immediately.
Spread out of band: the quoted spread leaving its target range.
Inventory drift: balances moving away from where they should be.
Silence: a service that stops reporting at all.

Results

One templated dashboard gives real-time visibility across all thirty-plus clients and their venues.
The desk sees spread, depth, inventory, orders and PnL at a glance, per client and per venue.
Problems that used to be invisible (stale feeds, drifting spread, silent services) now page the team within seconds.
The same metrics database feeds the AI copilot, so questions and reports draw on the same source of truth.

Key takeaways

For a 24/7 desk, observability is not a nice-to-have, it is part of the risk system. Sending structured metrics into a time-series database and driving both dashboards and alerts from it turned a fleet of services into something the team can actually watch and trust, and gave every other tool, including the AI copilot, a single source of truth to build on.