AI & ML Practical Magic

Microservice diagnostics become more accurate when 99% of the data is deleted and replaced with a simple bag-of-edges representation.

April 23, 2026

Original Paper

Gleaner: A Semantically-Rich and Efficient Online Sampler for Microservice Diagnostics

arXiv · 2604.16810

The Takeaway

Massive datasets often hide the real cause of system failures in a mountain of noise. This online sampler treats trace data as a collection of edges instead of complex graphs to save time and memory. Accuracy for finding root causes actually goes up compared to using the entire original dataset. It proves that more data is not always better for debugging distributed systems. This approach allows for real-time monitoring of huge cloud infrastructures without the overhead of massive storage. Engineers can find bugs faster by ignoring almost everything the system is doing.

From the abstract

Distributed tracing in microservices is critical for diagnostics but generates overwhelming data volumes, necessitating intelligent sampling. To maximize fidelity, state-of-the-art (SOTA) tail-based samplers analyze complete (or even log-enriched) traces by modeling them as graphs. However, this reliance on computationally expensive graph analysis creates a performance bottleneck that prohibits their use in online settings.To this end, we propose Gleaner, an online tail-sampling framework that b